Math2069 Lecture Notes
Math2069 Lecture Notes
Anthony Henderson
c School of Mathematics and Statistics,
The University of Sydney, 2008–2020
Part I
These lecture notes were written in 2008 (and revised in 2009) for the units
MATH2069 Discrete Mathematics and Graph Theory and MATH2969 Dis-
crete Mathematics and Graph Theory (Advanced), given at the University
of Sydney. I am extremely grateful to David Easdown for allowing me to
make use of his lecture notes on these topics, and for his careful correction
of, and comments on, these notes. Any errors which remain are entirely my
own fault.1
Anthony Henderson
1
A few minor changes were made by Alexander Molev.
ii
Contents
0 Introduction 1
1 Counting Problems 7
1.4 Inclusion/Exclusion . . . . . . . . . . . . . . . . . . . . . . . . 34
iii
3 Generating Functions 75
iv
Chapter 0
Introduction
We let n be the number of discs, and write hn for the smallest number of
moves required. If there are no discs, then there is nothing to do, so h0 = 0.
If there is one disc, then clearly one move suffices, so h1 = 1. If there are two
discs, then the rule that the larger disc cannot sit on top of the smaller one
forces you to put the smaller one temporarily on peg 3, so h2 = 3.
This illustrates a general feature: the largest disc has to move from peg 1 to
peg 2 at some stage, and while that happens all the other discs have to be
on peg 3. Clearly it takes at least hn−1 moves to get all the other discs to
1
2 Topics in Discrete Mathematics
peg 3, and then after moving the largest disc, it takes at least hn−1 moves to
get them back to peg 2. This suggests a recursive procedure for solving the
puzzle, which uses the smallest possible number of moves:
(1) Move all but the largest disc from peg 1 to peg 3 in the smallest possible
number of moves.
(2) Move the largest disc from peg 1 to peg 2.
(3) Move all the other discs from peg 3 to peg 2 in the smallest possible
number of moves.
This procedure is recursive because in steps (1) and (3) you need to go
through the whole three-step procedure, but for n − 1 discs rather than n
(the change of peg numbers is immaterial); and when you then examine each
of those, their own steps (1) and (3) call for the whole three-step procedure
for n − 2 discs, and so on. Returning to the number of moves required, we
have derived a recurrence relation
hn = 2hn−1 + 1,
which determines the answer for n in terms of the answer for n − 1. This
recurrence relation, together with the inital condition h0 = 0, determines hn
for all n. For example, continuing from where we left off,
h3 = 2h2 + 1 = 2 × 3 + 1 = 7,
h4 = 2h3 + 1 = 2 × 7 + 1 = 15,
h5 = 2h4 + 1 = 2 × 15 + 1 = 31.
You may notice that these values of hn are all one less than a power of 2. In
fact, the evidence we have so far supports the conjecture that hn = 2n − 1
always. To prove such a formula for a sequence defined recursively, the most
natural method is mathematical induction: we know that h0 = 20 − 1 is true,
and if we assume that hn−1 = 2n−1 − 1 is true, then we can deduce
hn = 2hn−1 + 1 = 2(2n−1 − 1) + 1 = 2n − 2 + 1 = 2n − 1.
By induction, hn = 2n −1 for all nonnegative integers n. So the answer to the
original question is that the smallest number of moves needed to complete
the n-disc tower of Hanoi puzzle is 2n − 1.
CHAPTER 0. INTRODUCTION 3
N = {0, 1, 2, 3, 4, · · · }.
a0 , a 1 , a 2 , a 3 , · · · .
In the tower of Hanoi example, the terms of the sequence were nonnegative
integers. This feature is not essential for a problem to be considered discrete.
For maximum flexibility, we will usually allow the terms of our sequences, i.e.
the values of our functions with domain N, to be complex numbers (though
examples with non-real values will be rare).
4 Topics in Discrete Mathematics
The usefulness of discrete mathematics comes from the fact that, even though
physics sometimes gives the impression that we live in a continuous universe,
there are countless phenomena in nature and society which are best under-
stood in terms of sequences (functions of a nonnegative integer variable)
rather than functions of a real variable.
There are some natural phenomena which could be modelled either discretely
or continuously. For example, if you wanted to predict the temperature in
Sydney tomorrow, you could think of temperature as a real-valued function
of three real variables (longitude, latitude, and time), and try to understand
what differential equations it satisfied. Or you could break up NSW into a
finite number of square kilometre blocks, and time into a discrete sequence
of days, and try to understand what recurrence relations hold: i.e. express
the temperature on a given day in a given block in terms of the temperatures
on the previous day in adjacent blocks. The discrete approach might well be
easier, in the sense of being better suited to the computer technology you
would need to use.
CHAPTER 0. INTRODUCTION 5
But it is certainly not the case that discrete mathematics is always easier
than continuous mathematics; we will encounter many challenging problems
which have no continuous analogues. For a first example, imagine a new,
improved tower of Hanoi puzzle which has four pegs rather than three. The
rules are the same as before; what is the smallest number of moves required
to move the discs from peg 1 to peg 2, if there are n discs? Nobody knows
the answer (at least, there is no known closed formula like 2n − 1).
In these notes, the main results are all called “Theorem”, irrespective of their
difficulty or significance. Usually, the statement of the Theorem is followed
(perhaps after an intervening Example or Remark) by a rigorous Proof; as is
traditional, the end of each proof is marked by an open box.
The text in the Examples and Remarks is slanted, to make them stand out
on the page. Many students will understand the Theorems more easily by
studying the Examples which illustrate them than by studying their proofs.
Some of the Examples contain blank boxes for you to fill in. The Remarks
are often side-comments, and are usually less important to understand.
The more difficult Theorems, Proofs, Examples, and Remarks are marked at
the beginning with either * or **. Those marked * are at the level which
MATH2069 students will have to understand in order to be sure of getting
a Credit, or to have a chance of a Distinction or High Distinction. Those
marked ** are really intended only for the students enrolled in the Advanced
unit MATH2969, and can safely be ignored by those enrolled in MATH2069.
6 Topics in Discrete Mathematics
Chapter 1
Counting Problems
So the foundation of the whole subject is the idea of counting the number
of elements in finite sets. This can be much harder than it sounds, and in
its higher manifestations goes by the name of enumerative combinatorics.
In this chapter we will review and extend some of the basic principles of
counting.
If X is any finite set, we will write |X| for its size (also known as its
cardinality), i.e. the number of elements of X. For instance, if X = {0, 2, 4},
then |X| = 3. The most basic principle of all is:
7
8 Topics in Discrete Mathematics
The next most basic principle, again not requiring proof (since it is bound
up with such elementary matters as the definition of addition), is:
Theorem 1.5 (Sum Principle). If a finite set X is the disjoint union of two
subsets A and B, i.e.
X = A ∪ B and A ∩ B = ∅,
which means that every element of X is either in A or in B but not both,
then |X| = |A| + |B|. More generally, if X is the disjoint union of subsets
A1 , A2 , · · · , An , i.e.
X = A1 ∪ A2 ∪ · · · ∪ An and Ai ∩ Aj for all i 6= j,
then |X| = |A1 | + |A2 | + · · · + |An |.
Example 1.6. If every person in the room is either left-handed or right-
handed, and no-one claims to be both, then the total number of people
equals the number of left-handed people plus the number of right-handed
people. Of course this relies on the disjointness of the two subsets: if anyone
was ambidextrous and was counted in both the left-handed total and the
right-handed total, it would throw out the calculation (see the section on the
Inclusion/Exclusion Principle later in this chapter).
Example 1.7. How many numbers from 1 up to 999 are palindromic, in
the sense that they read the same backwards as forwards? As soon as you
start to think about this question, you have to break it up into the cases of
one-digit numbers, two-digit numbers, and three-digit numbers. So, whether
you choose to make it explicit or not, you are using the Sum Principle: in
order to count the set
X = {n ∈ N | 1 ≤ n ≤ 999, n is palindromic},
10 Topics in Discrete Mathematics
you are using the fact that it is the disjoint union of subsets A1 , A2 , A3 , where
A1 = {n ∈ N | 1 ≤ n ≤ 9, n is palindromic},
A2 = {n ∈ N | 10 ≤ n ≤ 99, n is palindromic},
A3 = {n ∈ N | 100 ≤ n ≤ 999, n is palindromic},
and then calculating |X| as the sum |A1 | + |A2 | + |A3 |. The calculation is
left for you to complete:
|A1 | = because
|A2 | = because
|A3 | = because
|X| =
|X \ A| = |X| − |A|.
Proof. This is just the Sum Principle rearranged, because X is the disjoint
union of A and X \ A.
Example 1.9. How many three-digit numbers are divisible by 3? One way
to answer this is to see that the set of three-digit numbers divisible by 3 can
be written as X \ A, where
As seen in Example 1.3, |X| = 333, and similarly |A| = 33. By the Difference
Principle, the answer to the question is |X \ A| = |X| − |A| = 300.
CHAPTER 1. COUNTING PROBLEMS 11
Note that the condition for f to be injective is that |f −1 (y)| ≤ 1 for all y ∈ Y ,
whereas the condition for f to be surjective is that |f −1 (y)| ≥ 1 for all y ∈ Y .
Remark 1.10. If f is bijective, and therefore has an actual inverse function
f −1 : Y → X, we have a slight notational clash. If you interpret f −1 (y)
as the image of y ∈ Y under f −1 , then it is the unique x ∈ X such that
f (x) = y. If you interpret f −1 (y) as the preimage, then it is the set whose
single element is this x. In practice, the context always determines which of
the two is meant.
It is clear that X is the disjoint union of the subsets f −1 (y) as y runs over
Y , because for every x ∈ X there is a unique y ∈ Y such that f (x) = y (that
is what it means to have a well-defined function). So by the Sum Principle,
X
|X| = |f −1 (y)|. (1.1)
y∈Y
Because we are dealing mainly with functions which only make sense for
integer variables, we will quite often have to carry out such roundings. We
will use without comment the obvious inequalities
which contradicts (1.1). Therefore our assumptionl was m wrong, which means
−1 |X|
that there is some y ∈ Y for which |f (y)| ≥ |Y | . To deduce the “In
l m
particular” sentence, note that |X| > |Y | means that |Y | > 1, so |X|
|X|
|Y |
≥ 2.
So we have shown that there is some y ∈ Y for which there at least two
elements of X which are taken to y by f ; this gives the stated result.
CHAPTER 1. COUNTING PROBLEMS 13
Remark 1.15. Note that the Pigeonhole Principle asserts that something
is true for some y ∈ Y , but not for all y ∈ Y , and not for a particular y
specified in advance. For instance, in the example of students’ birthdays,
we cannot pick a particular date like June 16 and be sure that at least 126
students have that birthday. Indeed, it is possible that no students have that
birthday (though admittedly that’s unlikely).
The Pigeonhole Principle is surprisingly useful even in its basic form of prov-
ing non-injectivity.
Example 1.16. If there are 30 students in a tutorial, there must be two
whose surnames begin with the same letter, because there are only 26 possible
letters. This is an example of the Pigeonhole Principle: we are saying that
the function from the set of students to the set of letters given by taking
the first letter of the surname cannot be injective. Of course, we can’t pick
a particular letter like A and be sure that there are two students whose
surname begins with A: there might be none.
Example 1.17*. If m is any positive integer, the decimal expansion of 1/m
eventually gets into a repeating cycle: for instance,
We can prove this using the Pigeonhole Principle, as follows. Consider the
set X = {100 , 101 , · · · , 10m }. For each element of X, we can divide it by
m and find the remainder, which must be in the set Y = {0, 1, · · · , m − 1}.
This defines a function from X to Y . Since |X| = m + 1 and |Y | = m, this
function can’t be injective, so there are some nonnegative integers k1 < k2
such that 10k1 and 10k2 have the same remainder after division by m. This
k k
means that 10m1 and 10m2 are the same after the decimal point (i.e. their
k
difference is an integer). But the decimal expansion of 10m1 is obtained from
that of m1 by shifting every digit k1 places to the left, and similarly for k2 .
So for every k > k1 , the kth digit after the decimal point in m1 equals the
k
(k − k1 )th digit after the decimal point in 10m1 , which equals the (k − k1 )th
k
digit after the decimal point in 10m2 , which equals the (k + k2 − k1 )th digit
after the decimal point in m1 . So once you are sufficiently far to the right
of the decimal point, the digits in m1 repeat themselves every k2 − k1 digits
(maybe more often than that, but at least that often).
14 Topics in Discrete Mathematics
Theorem 1.18 (Reverse Pigeonhole Principle). If |X| < |Y |, then any func-
tion f : X → Y is not surjective. That is, there must always be some y ∈ Y
such that f −1 (y) is empty.
Proof. The range of f can clearly have at most |X| elements, so it cannot
be the whole of Y .
There is an old joke that a combinatorialist is someone who, to find out how
many sheep there are in a flock, counts their legs and divides by 4. That
may not be a practical method in the case of sheep, but it is a surprisingly
useful trick in other sorts of counting problems.
P
Proof. By (1.1), we have |X| = y∈Y |f −1 (y)| = m|Y |.
Example 1.20. In the spurious sheep example, X is the set of legs, Y is the
set of sheep, m = 4, and f is the function which associates to each leg the
sheep it belongs to. The assumption (which we certainly do need, to apply
the method!) amounts to saying that every sheep has four legs.
Obviously, the cases where it is useful to apply this principle are those where
the set X is for some reason easier to count than the set Y (despite being
bigger). It is vital that all the preimages have the same size, so that when
we count X, we are overcounting Y by a known factor.
Example 1.21. How many edges does a cube have? If you don’t have pen
and paper handy, you can just observe that a cube has 6 faces, and every
face has 4 edges. Is the answer 6 × 4 = 24? No: this counts every edge twice,
since every edge is on the border of two faces. So the correct answer is 12,
and this argument is a case of the Overcounting Principle. To interpret it
in the above terms, Y is the set of edges, X is the set of pairs (face, edge)
where the edge belongs to the face, and f is the function which takes such a
CHAPTER 1. COUNTING PROBLEMS 15
pair and forgets the face. Here is a similar problem: the outside of a soccer
ball is made from 32 pieces of cloth, 12 of which are pentagonal (having five
edges) and 20 of which are hexagonal (having six edges). These are stitched
together along their edges, and at every vertex where an edge ends, three
faces meet. Using the Overcounting Principle, you can deduce that:
Our final fundamental principle relates to the most important way of con-
structing new sets from old.
X × Y := {(x, y) | x ∈ X, y ∈ Y }.
X1 × X2 × · · · × Xn := {(x1 , x2 , · · · , xn ) | x1 ∈ X1 , x2 ∈ X2 , · · · , xn ∈ Xn }.
Remark 1.24. Before we extract the last part of this argument as a general
principle, a pedantic comment. According to Definition 1.22, X 3 × Y 3 ought
to be a set of pairs where each entry of the pair is a triple: its elements look
like ((L, B, M), (5, 2, 3)). So this is not really the same set as X × X × X ×
Y × Y × Y ; however, the bijection between them is so obvious (just a matter
of deleting or inserting some brackets) that it does no harm to identify them.
A similar comment applies to the identification of 6-tuples with ‘strings’.
The following is the reason why Cartesian products are called products.
In particular, |X n | = |X|n .
with the set of 5-tuples where the first entry is an element of {1, · · · , 9}
and all the other entries are elements of {0, · · · , 9}. The second approach
is necessary for some variants of the question. For instance, the number of
five-digit numbers which are palindromes (i.e. the digits have the form abcba)
is 9 × 10 × 10 = 900, because the choice of the first three digits determines
the last two. (The Bijection Principle is also involved here, because we are
effectively using the fact that the five-digit palindromes are in bijection with
the strings formed by deleting their last two digits.)
For some situations, the above statement of the Product Principle is too
restrictive. Instead of counting the ways of making independent selections
from pre-determined sets X1 , · · · , Xn , you might have a situation where the
set X2 depends on what element you selected from X1 , and then the set X3
depends on what elements you selected from X1 and X2 , and so on. All you
really need to be able to apply the Product Principle is that the sizes |Xi |
don’t depend on the choices you’ve already made.
Example 1.28. How many two-digit numbers have two different digits?
Since there are 90 two-digit numbers in total, and the digits are the same
in 9 of them (11, 22, · · · , 99), the answer is 81 (by the Difference Principle).
Another approach is that there are 9 choices for the first digit (anything from
1 to 9), and then having chosen the first digit, there are 9 choices for the
second digit (anything except the digit you’ve already used), so the answer is
9×9 = 81. This is an application of the looser form of the Product Principle,
since the set of possibilities for the second digit varies depending on what
first digit you chose; what matters is that it always has 9 elements. What
about trying to choose the digits in the other order? Since the second digit
is allowed to be zero, there are 10 choices for it, and then there should surely
be 8 choices for the first digit (anything except zero and the chosen second
digit). But the answer isn’t 80; what’s gone wrong is that if you do choose
zero as the second digit, the number of choices for the first digit is not 8 but
9. Although this example is pretty trivial, it does illustrate the need for care
in applying the Product Principle.
18 Topics in Discrete Mathematics
Remark 1.31*. To make Theorem 1.30 true when X is empty, we need the
number of functions f : ∅ → Y to be |Y |0 = 1. Indeed, the convention is
that there is a unique function from the empty set to every other set.
Example 1.32. If there are 5 different presents under a Christmas tree, and
3 greedy children grabbing at them until they are all claimed, and no present
can be shared between more than one child, how many possible results are
there? The answer is given by Theorem 1.30; the main mental difficulty is
deciding which way round the functions go, from the set of children to the set
of presents or vice versa. Here we have a giveaway phrase: “no present can
be shared between more than one child”. That is, every present is associated
with a single child, so the “possible results” are really the functions from the
set of presents to the set of children, and the answer is 35 = 243. To phrase a
problem which amounted to counting the functions from the set of children
to the set of presents, you would have to specify that every child gets a single
present and remove the ban on sharing presents and the requirement that
every present is claimed. Then the answer would be 53 = 125.
Proof. The idea of the proof is that to specify a subset is the same as
specifying, for every element of the set, whether it is in or out. We can
formalize this idea as saying that there is a bijection between the set of
subsets of X and the set of functions f : X → {0, 1}. To define this bijection,
start with any subset A of X, and associate to this the function
(
1, if x ∈ A,
fA : X → {0, 1} defined by fA (x) =
0, if x ∈ / A.
Theorem 1.36. If X and Y are sets with |X| = k and |Y | = n, then the
number of injective functions f : X → Y is n(n − 1)(n − 2) · · · (n − k + 1).
Proof. First suppose that n < k, i.e. Y is a smaller set than X; then there
cannot be any injective function from X to Y (by the Pigeonhole Principle).
The formula does indeed give the answer 0 in this case, because one of the
factors in the product is n − n.
Notice that there are k factors in the product in Theorem 1.36, which is
called a falling factorial (“falling” because the factors decrease by 1 at each
step). The usual notation for these is:
n(k) := n(n − 1)(n − 2) · · · (n − k + 1). (1.3)
This makes sense when n is any complex number, which will be useful later.
CHAPTER 1. COUNTING PROBLEMS 21
• the four-digit numbers whose digits are 3, 5, 7, and 9 (in some order);
line. Any such ordering gives rise to a way of pairing them off: simply pair
the first with the second, the third with the fourth, and so on. So we have a
function from the set of orderings to the set of pairings. If we start with a
given pairing, the number of orderings which give rise to it is 2n n!, because
we can order the n pairs in n! ways, and then decide which person comes
first in each pair in 2n ways. So every one of the preimages for our function
has size 2n n!, and by the Overcounting Principle, there are (2n)!
2n n!
pairings.
{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}.
4×3
This agrees with the stated formula: 2×1
= 6.
Theorem 1.42 is that it proves that k! always divides n(k) (i.e. their quotient
is an integer), which is not particularly easy to see otherwise. For instance,
n(n − 1)(n − 2)(n − 3) is a multiple of 24 for all n ∈ N.
read “n choose k”. This definition makes sense when n is any complex
number, which will be useful in Chapter 3. Note that
n n n n
= = 1, = = n. (1.8)
0 n 1 n−1
If n is an integer and n ≥ k, then the alternative formula (1.5) for n(k) gives
n n!
= . (1.9)
k k!(n − k)!
Remark 1.45. Equation (1.10) can also be proved using the Bijection Prin-
ciple: there is a bijection between the k-element subsets of an n-element set
X and the (n − k)-element subsets, which associates to every k-element sub-
set A its complement X \A. This is a bijection because every (n−k)-element
subset B of X arises in this way for exactly one k-element subset, namely
X \ B.
Example 1.46. There are 52 playing cards in a standard deck, 13 of each
suit (spades, hearts, diamonds, and clubs). If you are dealt 5 of them at
random, how many possible hands can you end up with? Since the order
in which you are dealt the cards doesn’t matter, this is just the number of
5-element subsets of the set of cards, which is
52 52 × 51 × 50 × 49 × 48
= = 2598960.
5 5×4×3×2×1
CHAPTER 1. COUNTING PROBLEMS 25
In card games such as poker, you need to know the probability of getting
various kinds of hands, which means you need to count how many of the
5-element subsets satisfy certain properties. For instance, how many hands:
contain no spades?
Example 1.47. The number of ways of forming a string of seven a’s and
five b’s is 12
5
= 12
7
12!
= 7!5! = 792. This is because the string must be 12
letters long, and to specify it is the same as choosing which seven of the 12
positions are occupied by an a (or equivalently, which five of the 12 positions
are occupied by a b).
n
The numbers k
are known as binomial coefficients, from their role in:
to choose b from, and then the other factors must all be ones you choose a
from (see Example 1.47). So when you collect terms together using the fact
that ab = ba, the coefficient of an−k bk is nk .
Remark 1.49*. As the proof shows, the a and b in the Binomial Theorem
don’t have to be numbers. They could be elements of any ring (a sort of
generalized number system where addition and multiplication are defined
and satisfy the distributive law), provided that ab = ba. So, for instance, the
result is still true if applied to two square matrices A and B which commute
(but would be false if A and B did not commute).
Some special cases of the Binomial Theorem are worth pointing out. When
a = b = 1, it says that
Xn
n n n n n n
2 = = + + ··· + + . (1.11)
k=0
k 0 1 n−1 n
This also follows from the Sum Principle, because the left-hand side counts
all subsets of a set of size n, and the terms of the right-hand side count the
number of subsets of a given size. When a = 1 and b = −1, we get
n (
X n 1, if n = 0,
(−1)k = 0n = (1.12)
k=0
k 0, otherwise.
If n is odd this can be seen directly, because nk and n−k n
occur on the
left-hand side with opposite signs and hence cancel each other out.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
.. .. .. .. ..
. . . . .
CHAPTER 1. COUNTING PROBLEMS 27
Proof. One can prove this from the Binomial Theorem, by expressing (a+b)n
as (a + b)(a + b)n−1 . But here is a more satisfying argument using the Sum
Principle. The left-hand side counts the number of k-element subsets A of
{1, 2, · · · , n}. Now a subset A either contains the element n or it doesn’t.
(More formally, the set of k-element subsets is the disjoint union of the set
of k-element subsets which contain n and the set of k-element subsets which
don’t contain n.) Choosing a k-element subset A which contains n just
amounts to choosing a (k − 1)-element subset of {1, 2, · · · , n − 1}, so there
are n−1 subsets which contain n (notice the implicit appeal to the Bijection
k−1
Principle in the phrase “just amounts to”). Similarly, there are n−1 k
subsets
which don’t contain n. The recurrence relation follows.
Then
n n!
= .
n1 , n2 , · · · , nm n1 !n2 ! · · · nm !
28 Topics in Discrete Mathematics
Proof. We can choose A1 in nn1 ways. Then to maintain disjointness, A2
must be a subset of X \ A1 , so it can be chosen in n−n 1
ways. Then A3
n2
n−n1 −n2
must be a subset of X \ (A1 ∪ A2 ), so it can be chosen in n3
ways, and
so forth. So we have
n n n − n1 n − n1 − n2 − · · · − nm−1
= ···
n1 , n2 , · · · , nm n1 n2 nm
n! (n − n1 )! (n − n1 − n2 − · · · − nm−1 )!
= ···
n1 !(n − n1 )! n2 !(n − n1 − n2 )! nm !(n − n1 − n2 − · · · − nm )!
n!
= ,
n1 !n2 ! · · · nm !
because everything else cancels out. A slightly more elegant proof uses
the Overcounting Principle: instead of choosing unordered subsets we could
choose the elements of A1 in order, then the elements of A2 in order, and
so on. The number of ways of doing this would be n!, since it amounts
to just ordering all the elements of X; and this overcounts by a factor of
n1 !n2 ! · · · nm !, because we have ordered each of the subsets.
The numbers n1 ,n2n,··· ,nm are called multinomial coefficients because of their
occurrence in the generalization of the Binomial Theorem:
Theorem 1.52 (Multinomial Theorem). If a1 , a2 , · · · , am are any numbers
and n ∈ N, then
X
n
n
(a1 + a2 + · · · + am ) = an1 1 an2 2 · · · anmm .
n ,n ,··· ,n ∈N
n1 , n2 , · · · , nm
1 2 m
n1 +n2 +···+nm =n
Proof. This is similar to the proof of the Binomial Theorem. When you
expand (a1 + a2 + · · · + am )n , you get a sum of terms. A term corresponds to
a set of choices of which of a1 , a2 , · · · , am to select from each of the n factors.
To get the coefficient of an1 1 an2 2 · · · anmm , you need to count how many ways
there are to select a1 from n1 factors, a2 from n2 factors, and so on. This
amounts to choosing disjoint subsets of the factors, one of size n1 , one of size
n2 , etc., which is what the multinomial coefficient counts.
Remark 1.53. A nice consequence of Theorem 1.51 is that n1 !n2 ! · · · nm !
always divides (n1 + n2 + · · · + nm )!. Beware of a slight notational conflict:
the binomial coefficient nk equals the multinomial coefficient k,n−k n
.
CHAPTER 1. COUNTING PROBLEMS 29
One can think of nk as the number of ways to make an unordered selection
of k things from a total of n with repetition not allowed. It is then natural
to consider what happens if you allow repetition.
Example 1.58. If you have 5 different marbles in a bag – say red, blue,
white,
yellow and green – and stick your hand in to pull out 3, there are
5
3
= 10 possible outcomes. (Remember that if you draw out the marbles
one by one, and you consider that the order in which you draw them out
matters, there are 5(3) = 60 possible outcomes.) To allow repetition, suppose
that, instead of just one marble of each colour, you have an unlimited supply
of each colour, indistinguishable amongst themselves. Then you may get
more than one marble of a particular colour when you stick your hand in
the bag and pull out 3. To specify an outcome, you just need to specify the
number of red marbles, the number of blue marbles, and so on; together,
these numbers constitute a 5-tuple (k1 , k2 , k3 , k4 , k5 ) of nonnegative integers
such that k1 +k2 +k3 +k4 +k5 = 3. One way to count the possible outcomes is
by dividing them into three disjoint subsets according to the following cases:
(1) all three marbles have the same colour (i.e. ks = 3 for some s, and all
the other ki ’s are zero); or
(2) two marbles have the same colour, and the third is a different colour
(i.e. ks = 2 and kt = 1 for some s 6= t, and the other ki ’s are zero); or
(3) all three marbles have different colours (i.e. three of the ki ’s are 1, and
the other two ki ’s are zero).
Proof*. We will first give the proof in the case of Example 1.58, where
we are making an unordered selection of 3 marbles from 5 possible kinds.
Having made a selection, we can lay the marbles out in a row, with the red
marbles (if any) first, followed by the blue, the white, the yellow and the
green. Imagine putting down dividing lines to mark off the different colours:
one line between red and blue, one line between blue and white, one line
between white and yellow, and one line between yellow and green. Then you
can forget the colours, and what remains is an arrangement of 3 marbles and
4 dividing lines. For instance, the arrangement
| ◦ ◦ | ◦ ||
means that we have no red marbles, two blue marbles, one white marble,
and no yellow or green marbles (the 5-tuple is (0, 2, 1, 0, 0)). What we have
32 Topics in Discrete Mathematics
shown is that the possible outcomes are in bijection with the different strings
of three ◦’s and four |’s. By the Bijection Principle, the number of outcomes
equals the number of such strings, which by Example 1.47 is 73 = 35. The
general case is no harder: we can imagine that we are selecting k marbles
from n different kinds, which amounts to considering
strings
of k ◦’s and
n − 1 |’s. The number of such strings is k+n−1k
= k+n−1
n−1
, as claimed.
If talk of “marbles” and “dividing lines” strikes you as too informal, the
real effect of this argument is to construct a bijection between n-tuples
(k1 , k2 , · · · , kn ) of nonnegative integers which satisfy k1 + k2 + · · · + kn = k
(these are what specify the possible outcomes of the selection, with ki record-
ing how many times the ith possibility was selected) and (n − 1)-element
subsets of the set {1, 2, · · · , n + k − 1} (these specify the positions of the
dividing lines in the string). Starting from an n-tuple (k1 , · · · , kn ), the sub-
set we associate to it is {k1 + 1, k1 + k2 + 2, · · · , k1 + · · · + kn−1 + n − 1};
it is easy to see that the elements here are listed in increasing order, so
they are indeed n − 1 different elements of {1, 2, · · · , n + k − 1}. For the
inverse function, we start with a subset {i1 , i2 , · · · , in−1 }, which we can
assume is listed in increasing order, and then associate to it the n-tuple
(i1 − 1, i2 − i1 − 1, · · · , in−1 − in−2 − 1, n + k − 1 − in−1 ), whose elements are
easily seen to be nonnegative with sum k. The verification that these maps
are inverse to each other is straightforward.
Example 1.62. If there are 5 identical presents under a Christmas tree, and
3 greedy children grabbing at them until they are all claimed, and no present
can be shared between more than one child, how many possible results are
there? Notice that the only difference between this question and Example
1.32 is that the presents are now indistinguishable. Is this a question about
ordered or unordered selection? Is repetition allowed or not allowed? Once
again, we should think that we are selecting, for each present, the child it
ends up with: thus it is an unordered selection of 5 things (children) from
3 possibilities, with repetition allowed (because more than one present is
3+5−1
allowed to end up with the same child), and the answer is 5
= 21.
If you find yourself confused about what is being selected, it may help to
think in terms of n-tuples: in the present case, all that matters is how many
presents child A gets, how many presents child B gets, and how many presents
child C gets (because the presents are all identical). So this is the problem of
counting 3-tuples of nonnegative integers which add up to 5, not the problem
CHAPTER 1. COUNTING PROBLEMS 33
(3, 1, 1), (1, 3, 1), (1, 1, 3), (2, 2, 1), (2, 1, 2), (1, 2, 2).
1.4 Inclusion/Exclusion
Definition 1.64. Recall that if A and B are subsets of a set X, then their
union, their intersection, and the difference of A from B are defined by:
A ∪ B := {x ∈ X | x ∈ A or x ∈ B},
A ∩ B := {x ∈ X | x ∈ A and x ∈ B},
A \ B := {x ∈ X | x ∈ A and x ∈/ B} = A ∩ (X \ B).
(1) A ∪ A = A ∩ A = A.
(2) A ∪ ∅ = A, A ∩ ∅ = ∅.
(3) A ∪ B = B ∪ A, A ∩ B = B ∩ A.
(6) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
(7) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
(8) A \ (B ∪ C) = (A \ B) ∩ (A \ C).
(9) A \ (B ∩ C) = (A \ B) ∪ (A \ C).
(10) If B ⊆ A, then A \ (A \ B) = B.
Proof. These are all equivalent to logical statements which are more or less
part of the axioms of mathematics. For example, we ‘prove’ (8) as follows.
To say that two subsets A1 and A2 are equal is to say that the condition for
x to lie in A1 is equivalent to the condition for x to lie in A2 (that is, for
every choice of x in the universal set X, they are either both true or both
CHAPTER 1. COUNTING PROBLEMS 35
Now (assuming we are dealing with finite sets) how do these operations relate
to counting? As noted in our discussion of the Sum Principle, |A ∪ B| =
|A| + |B| is true only if A and B are disjoint, i.e. A ∩ B = ∅. If there are
elements in the intersection A ∩ B, they are counted twice when we take the
sum |A| + |B|. To get the right formula for |A ∪ B|, we need to correct this:
|A ∪ B| =
|X \ (A ∪ B)| =
Similarly, if we want a formula for |A ∪ B ∪ C|, and start with |A| + |B| + |C|
as our first attempt, then we have to correct for the overcounting of the
intersections. If |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| is our second
attempt, we find that it is still not quite right, because elements in the triple
intersection A ∩ B ∩ C have been added three times in |A| + |B| + |C| and
36 Topics in Discrete Mathematics
Example 1.67. How many numbers between 1 and 1000 are divisible by at
least one of 2, 3, or 5? If we let X be {1, 2, 3, · · · , 1000}, A the subset of
numbers divisible by 2 (i.e. even), B the subset of numbers divisible by 3, and
C the subset of numbers divisible by 5, then we want to find |A ∪ B ∪ C|. We
can use (1.15), because the sizes of these subsets and their intersections can
be easily determined. The key is the observation
n that in the set {1, 2, · · · , n},
the number of multiples of m is exactly m (every mth number is divisible
by m – we round down because all the numbers after the last multiple of m
in the set are wasted). Moreover, a number is divisible by both 3 and 5 (say)
if and only if it is divisible by 15 (this relies on the fact that 3 and 5 have no
common divisors, so their least common multiple is just their product). So
1000
|A| = = 500,
2
1000
|B| = = 333,
3
1000
|C| = = 200,
5
1000
|A ∩ B| = = 166,
6
1000
|A ∩ C| = = 100,
10
1000
|B ∩ C| = = 66,
15
1000
|A ∩ B ∩ C| = = 33,
30
Proof*. We will prove the same property we saw in the special cases: namely,
that every element x of A1 ∪ A2 ∪ · · · ∪ An , no matter which of the Ai ’s it’s
in and which it’s not in, ends up being counted exactly once after all the
additions and subtractions in the right-hand side. Let I be the (nonempty)
subset of {1, 2, 3, · · · , n} consisting of those i such that x ∈ Ai . Then Ai1 ∩
Ai2 ∩ · · · ∩ Aik contains x if and only if i1 , i2 , · · · , ik all belong to I. So for
|I|
a fixed k, x is counted in k terms in the inner sum on the right-hand
side.
P|I| Thusk−1 the total
number of timesP|I|x is|I|counted
on the right-hand side is
|I| k−1
k=1 (−1) k
. But by (1.12), k=0 k (−1) = 0, so
|I|
X |I| |I|
k−1
(−1) =− (−1)0−1 = 1.
k=1
k 0
Proof*. We may as well assume that the set is {1, 2, · · · , n}. As in the
preceding example, let X be the set of all permutations of {1, 2, · · · , n}, and
let Ai be the subset consisting of those permutations which leave i unchanged.
Then for any 1 ≤ i1 < i2 < · · · < ik ≤ n,
because once you fix k elements you are justdealing with a permutation of
the remaining n − k. Moreover, there are nk such sequences 1 ≤ i1 < i2 <
· · · < ik ≤ n. So the Inclusion/Exclusion Principle says that
Xn Xn
k−1 n n!
|A1 ∪ A2 ∪ · · · ∪ An | = (−1) (n − k)! = (−1)k−1 .
k=1
k k=1
k!
as claimed.
CHAPTER 1. COUNTING PROBLEMS 39
as claimed.
k m
k
Note that the j = k term of the sum, (−1) 0 , vanishes unless m = 0.
k
Remark 1.74*. This is a much less straightforward formula than the count
of injective functions given by Theorem 1.36. For small values of k (and
assuming m ≥ 1, so the j = k term vanishes), the formula is:
k =1: 1
k =2: 2m − 2
k =3: 3m − 3 × 2m + 3
k =4: 4m − 4 × 3m + 6 × 2m − 4
Note the pattern of alternating signs, decreasing bases to the power m, and
binomial coefficients. There are two unsatisfactory aspects. Firstly, the
number of terms is k, so if we have a problem where k is growing, the formula
gets less and less ‘closed’. Secondly, the fact that there are minus signs in the
sum means that the terms can potentially grow much bigger than the final
answer. For example, when m < k we know that the final answer must be zero
by the Reverse Pigeonhole Principle, so all the terms just cancel away! To
be sure of not making bigger computations than necessary, combinatorialists
prefer ‘positive’ formulas, which do not involve minus signs.
If you experiment with the formula given in Theorem 1.73, you will notice
that it evaluates to numbers which are more divisible than you would have
a right to expect. We will prove in this section that, just as the number of
injective functions f : X → Y (i.e. |Y |(|X|) ) is always divisible by |X|!, the
number of surjective functions f : X → Y is always divisible by |Y |!.
CHAPTER 1. COUNTING PROBLEMS 41
So S(4, 2) = .
So S(4, 3) = .
Example 1.78. It is obvious that you can’t partition an n-element set into
more than n blocks, so S(n, k) = 0 when k > n.
Example 1.79. You can’t partition a nonempty set into 0 blocks, so we
have S(n, 0) = 0 when n > 0. When n = 0, the convention is to declare
that there is a single partition of the empty set into 0 blocks, so S(0, 0) = 1.
(This is the right value for subsequent formulas.)
Example 1.80. If n > 0, then there is only one way to partition an n-
element set into 1 block (everything has to be lumped in together), and only
one way to partition it into n blocks (everything has to be alone in its own
block). So S(n, 1) = S(n, n) = 1.
This isall a bit reminiscent of the binomial coefficient facts: n0 = nn = 1,
and nk = 0 if k > n. So it seems like a good idea to display the Stirling
numbers S(n, k) in a triangle like Pascal’s triangle, with rows corresponding
to n = 1, n = 2, and so on, and columns corresponding to k = 1, k = 2, and
42 Topics in Discrete Mathematics
Proof. The proof is similar to the bijective proof we gave of Theorem 1.50.
By definition, S(n, k) counts the number of partitions of the set {1, 2, · · · , n}
into k blocks. Now the element n is either in a block by itself or it’s not.
Choosing a partition in which n is in a block by itself amounts to choosing a
partition of {1, 2, · · · , n − 1} into k − 1 blocks, so there are S(n − 1, k − 1)
such partitions. Choosing a partition in which n is not on its own amounts
to choosing a partition of {1, 2, · · · , n − 1} into k blocks and then choosing
which of these k blocks to put n in. So there are k S(n − 1, k) such partitions,
and the desired recurrence relation follows.
The coefficient k is the only change from Theorem 1.50 to this, but that is
enough to make a big difference. In terms of the triangle, Theorem 1.81 tells
us that each entry is the sum of the one above and to the left and the product
of the one directly above by the column number. So we can build more of
the Stirling triangle:
1
1 1
1 3 1
1 7 6 1
1 15 25 10 1
1 31 90 65 15 1
1 63 301 350 140 21 1
.. .. .. .. ..
. . . . .
CHAPTER 1. COUNTING PROBLEMS 43
With more data, we can spot some patterns in the Stirling numbers.
Example 1.82. The second column consists of the tower of Hanoi numbers,
i.e. those which are one less than a power of 2. The reason for this is that ev-
ery partition of {1, 2, · · · , n} into two blocks gives you two nonempty proper
subsets of {1, 2, · · · , n}, which are complements of each other. (“Proper”
means “not equal to the whole set”.) Moreover, every such subset A features
in exactly one partition with two blocks, namely the partition into A and
{1, 2, · · · , n} \ A. So by the Overcounting Principle, S(n, 2) is half the num-
ber of nonempty proper subsets of {1, 2, · · · , n}. Since the total number of
subsets of {1, 2, · · · , n} is 2n , there are 2n − 2 nonempty proper subsets, so
n
S(n, 2) = 2 2−2 = 2n−1 − 1 for all n ≥ 1.
Example 1.83. The diagonal just inside the right-hand edge coincides with
one of the diagonals of Pascal’s triangle. The reason for this is that in any
partition of {1, 2, · · · , n} into n − 1 blocks, there must be two elements in
one block, and every
other element is in itsown block. The two elements can
be chosen in 2 ways, so S(n, n − 1) = n2 for all n ≥ 1.
n
Although we do not have a closed formula for the Stirling numbers, Theorem
1.81 gives us a good handle on them, which we will be able to use more
effectively in later chapters. To the extent that we understand the Stirling
numbers, the following result is a better formula for the number of surjective
functions than Theorem 1.73:
Theorem 1.84. If X and Y are sets with |X| = m and |Y | = k, then the
number of surjective functions f : X → Y is k! S(m, k).
Example 1.85. Recall from Example 1.77 the partitions of {1, 2, 3, 4} into
three blocks. Each of these gives rise to 3! = 6 surjective functions f :
{1, 2, 3, 4} → {1, 2, 3}: for instance, the partition 23|1|4 corresponds to the
six functions for which f (2) = f (3), but f (1) and f (4) are different from this
and from each other. There are six of these because this is the number of
ways of deciding which of 1, 2, 3 is to be f (2) = f (3), which is to be f (1),
and which is to be f (4).
45
46 Topics in Discrete Mathematics
The simplest type is when each term depends only on the previous one, i.e. an
(for n ≥ 1) is a function of an−1 , and only a0 needs to be given as an initial
condition; this was the case in the tower of Hanoi problem, for example.
But there are also important cases where earlier terms of the sequence are
involved.
Example 2.1. One of the most famous of all recursive sequences was men-
tioned first in 1202 by the Italian mathematician Fibonacci. He imagined a
rabbit farm which on Day 1 contains only a single newborn breeding pair of
rabbits. On Day 2 these rabbits mature, so that on Day 3 the female gives
birth to a new pair of rabbits. On Day 4 the new pair is still maturing, but
the older female gives birth to a third pair. On Day 5 the newest pair is
still maturing, but the other two females give birth to new pairs, so there are
now 5 pairs, and so on. If we let Fn denote the number of pairs after n days,
Fibonacci’s assumptions imply that Fn −Fn−1 (the number of new pairs born
on Day n) equals Fn−2 (the number of pairs existing on Day n−2). Formally,
we define the Fibonacci sequence by
(Starting the sequence with a 0 makes various formulas nicer.) Notice that
because the recurrence relation doesn’t kick in until we come to F2 , we have
to define F0 and F1 as initial conditions. The Fibonacci sequence begins:
The reason that the recurrence relation holds, if you define the Catalan
numbers using balanced strings of brackets, is that every such string must
have the form
i.e. there is a unique right bracket which corresponds to the initial left bracket,
and it must occur in the (2m + 2)th position for some m between 0 and n − 1;
the substring enclosed by these brackets, and the substring which follows
them, must both themselves be balanced. For any fixed m, the number of
ways of choosing the first balanced substring is cm , and the number of ways
of choosing the second balanced substring is cn−1−m ; the recurrence relation
follows. The Catalan sequence begins 1, 1, 2, 5, 14, 42, 132, 429, · · · .
2n = 2 × 2 × 2 × · · · × 2 (n twos).
In other words, 2n is the result of starting with 1 and doubling n times. Since,
in this procedure, you necessarily reach 2n−1 in the step before reaching 2n ,
the definition may as well be expressed using a recurrence relation:
So you could argue that our solution of the tower of Hanoi problem was not
all it was cracked up to be: we just expressed one recursive sequence, hn , in
terms of another, 2n . But, of course, the latter is something we understand
so well that expressing hn in terms of it counts as a solution.
Example 2.4. Similarly, the “ · · · ” in the definition n! = n(n − 1) · · · 2 × 1
is a sign that we can define factorials recursively:
0! = 1, n! = n × (n − 1)! for n ≥ 1. (2.4)
Although the sequence of factorials is slightly more complicated than the
sequence of powers of 2, we still regard n! as something known, and we are
quite happy if we can express some other sequence in terms of factorials. For
instance, in Example 3.45 we will prove a famous formula for the Catalan
(2n)!
numbers: cn = (n+1)! n!
.
Example 2.5. In many problems we have some function f of a nonnegative
integer variable, and we then have to consider the function of n defined by
n
X
an := f (0) + f (1) + f (2) + · · · + f (n) = f (m).
m=0
Such an unravelling procedure is often possible: the idea is that you sub-
stitute in the recursive formula for an the recursive formula for an−1 , and
CHAPTER 2. RECURSION AND INDUCTION 49
keep substituting until a pattern is clear, whereupon you jump to the point
at which the terms of the sequence disappear, leaving a large formula with
an inevitable “ · · · ”. This is not as helpful as you might think, because it is
often just rephrasing the recurrence relation in a more cumbersome way.
Example 2.6. Consider the sequence defined by
p
a0 = 1, an = 1 + an−1 for n ≥ 1.
If you unravel the recurrence relation, you get
p
an = 1 + an−1
q p
= 1 + 1 + an−2
r q p
= 1 + 1 + 1 + an−3
..
.
s r q
√ √
= 1+ 1 + 1 + ··· 1 + 1 (with n signs).
However, this way of writing an tells us nothing more than the recurrence
relation did; it certainly doesn’t count as a solution of the recurrence relation.
However, there are cases where the unravelling procedure helps to reveal that
the problem is of a kind you already know how to solve.
Example 2.7. The triangular number tn is the number of dots in a triangu-
lar array with n dots on each side. Since removing one of the sides produces
a smaller triangular array, this has a recursive definition of sum type:
t0 = 0, tn = tn−1 + n for n ≥ 1.
Unravelling this recurrence relation gives the non-closed formula
tn =
Summing the arithmetic progression gives the solution of the recurrence:
tn =
This sequence begins 0, 1, 3, 6, 10, 15, 21, 28, · · · .
50 Topics in Discrete Mathematics
a0 = 1, an = 3an−1 + 1 for n ≥ 1.
= 3n + 3n−1 + 3n−2 + · · · + 3 + 1.
This would be no real help, except that we recognize this as the sum of a
geometric progression, and hence obtain the closed formula
3n+1 − 1 3n+1 − 1
an = = ,
3−1 2
which counts as a solution of the recurrence.
Remark 2.9*. It is wise to be cautious about arguments, such as that in
Example 2.8, in which a chain of equalities goes on for a variable number
of steps. After all, if we are supposed to be writing a logical proof in which
each line is deducible from the previous one, are we really allowed to say to
the reader, in effect, “the next lot of lines all follow the same pattern, but
I can’t actually tell you how many there are”? Just as horizontal dots are
a sign that some definition is really recursive, vertical dots are a sign that
some proof is really a proof by induction (see Example 2.13).
Example 2.10. In Chapter 1 we encountered two important collections of
numbers which were more naturally displayed as a triangle than as a se-
quence, because they depended
on not one but two nonnegative integers: the
binomial coefficients nk and the Stirling numbers S(n, k). We can natu-
rally extract sequences from these numbers by fixing k, which corresponds
to looking at a single column of the Pascal or Stirling triangle (for instance,
the k = 2 column of Pascal’s triangle gives us the sequence of triangular
numbers). In both cases we have a recurrence relation (Theorems 1.50 and
1.81), which expresses an element of the kth column in terms of the previous
element of the kth column and also an element of the (k − 1)th column. So
CHAPTER 2. RECURSION AND INDUCTION 51
we can’t really consider the kth column on its own as a recursive sequence,
until we have a closed formula for the elements of the (k − 1)th column; this
suggests that we should analyse the columns from left to right. This idea is
rather pointless in the case of Pascal’s triangle, because we already know the
closed formula nk = k!(n−k)!
n!
. But it will contribute to our understanding of
the Stirling numbers in the next chapter.
The reason this works is that S(0) implies S(1), and then S(0) and S(1) to-
gether imply S(2), and then S(0), S(1), and S(2) together imply S(3), and so
on. So as long as the base case can be proved (which is often a simple check),
we can assume any or all of S(0), S(1), S(2), · · · , S(n − 1) in our attempts to
prove S(n); the assumption we make is called the induction hypothesis. This
is especially useful when S(n) involves the terms of a recursive sequence.
Example 2.11. Consider the sequence defined by
a0 = 2, an = a2n−1 + 1 for n ≥ 1.
The sequence begins 2, 5, 26, 677, · · · . To give some idea of how fast it grows,
n
we want to prove that an ≥ 22 for all n ∈ N. We do this by induction, making
n
S(n) the statement an ≥ 22 . The base case S(0) asserts that 2 ≥ 21 , which
52 Topics in Discrete Mathematics
Perhaps because assuming smaller versions of what you are trying to prove
feels a bit like cheating, there is an aesthetic (rather than logical) feeling
that the induction hypothesis should be restricted as much as possible. As
the preceding examples suggest, if you are proving a statement about a
recursively-defined sequence an where the recurrence relation expresses an
in terms of an−1 , an−2 , · · · , an−k for some k, you probably only need to ap-
peal to S(n − 1), S(n − 2), · · · , S(n − k) in proving S(n). In particular, if an
is a function of an−1 only, then assuming S(n − 1) is usually enough to prove
S(n), as it was in Example 2.11; this form of the argument is sometimes dis-
tinguished by the special term “weak induction”. However, since the proof
of S(n − 1) relies on S(n − 2), whose proof relies on S(n − 3), and so on, all
the earlier statements are still implicitly involved.
CHAPTER 2. RECURSION AND INDUCTION 53
n−1
! n
X X
an = f (m) + f (n) = f (m),
m=0 m=0
completing the inductive step and finishing the proof. The ease of such
induction proofs means that we usually don’t worry about these matters.
write n as the product of two smaller numbers which divide it; if either of
those is prime we are finished; otherwise, we write them in turn as products
of smaller numbers; and keep going in this way until we break everything
into a product of primes. There is nothing wrong with this, except that the
phrase “keep going in this way until” falls foul of the same sort of objection as
in Remark 2.9. Really it requires an induction argument to make it rigorous.
Notice that what the proof boils down to is the fact that the sequence of
squares satisfies the same initial condition and recurrence relation as an , and
therefore must be the same sequence.
The last comment in Example 2.16 is a general principle, almost trivial but
worth making explicit: if two sequences satisfy the same initial conditions
and the same recurrence relation, they must be the same. Formally:
CHAPTER 2. RECURSION AND INDUCTION 55
a0 = I0 , a1 = I1 , a2 = I2 , · · · , ak−1 = Ik−1 ,
where k ≥ 1 is fixed but the function R(n) possibly varies with n. If the
second sequence also satisfies the same initial conditions:
b0 = I0 , b1 = I1 , b2 = I2 , · · · , bk−1 = Ik−1 ,
So if we have guessed a formula for an , say f (n), and we want to prove that
it really is true that an = f (n) for all n ∈ N, all we need to do is to check
that f (n) satisfies the same initial conditions and recurrence relation that
an does. Although induction need not be explicitly mentioned, this will boil
down to an induction proof, since Theorem 2.17 is proved by induction.
Example 2.18. Consider the sequence defined by
1
a0 = 0, an = an−1 + for n ≥ 1.
n(n + 1)
56 Topics in Discrete Mathematics
Working out the first few terms of the sequence, we find that
1 2 3 4
a1 = , a 2 = , a 3 = , a 4 = ,
2 3 4 5
from which we naturally guess that an = n+1 n
for all n ∈ N. To prove this, we
n
just need to show that the candidate formula n+1 satisfies the initial condition
and the recurrence relation. The initial condition is obvious: 01 = 0. For the
n
recurrence relation we need to show that n+1 = n−1n
1
+ n(n+1) , which is true
by a trivial calculation. So the claim is proved. In fact, (2.6) is an example
1
of a “telescoping sum”: since k(k+1) = k1 − k+1
1
, the right-hand side becomes
1 1 1 1 1 1 1 1
− + − + ··· + − + − ,
1 2 2 3 n−1 n n n+1
and every term cancels out except for the first and last, giving us an =
1 n
1 − n+1 = n+1 . Although such telescoping arguments are aesthetically more
pleasing, they really boil down to the same thing as the induction method
described above: the cancellation of an unspecified number of terms needs
induction to justify it properly.
Example 2.19. The formulas for the sum of an arithmetic progression and
the sum of a geometric progression can both be proved by this induction
method. In the geometric case, we have to prove that for all r 6= 1,
rn+1 − 1
1 + r + r2 + r3 + · · · + rn−1 + rn = , for n ∈ N. (2.7)
r−1
The left-hand side can be viewed as the an term of a sequence defined by
a recurrence relation of ‘sum type’, namely an = an−1 + rn , with initial
condition a0 = 1. The right-hand side is our candidate formula. That it
satisfies the initial condition is easy to see, and then all that remains is to
prove that, for n ≥ 1,
rn+1 − 1 rn − 1
= + rn .
r−1 r−1
CHAPTER 2. RECURSION AND INDUCTION 57
a1 =
a2 =
a3 =
Can you guess a closed formula that fits all this data?
an =
To prove your formula is correct, all that remains is to check that it satisfies
the recurrence relation. Write this out to make sure:
for n ≥ 1.
Example 2.21. One formula which is surprisingly easy to guess is for the
sum of the first n positive integer cubes:
an = 13 + 23 + 33 + · · · + n3 .
a0 = 0, an = an−1 + n3 for n ≥ 1.
and notice that all the values are squares. Moreover, the square roots seem
to be the triangular numbers, so we conjecture that
2
n(n + 1) n2 (n + 1)2
an = = .
2 4
This formula obviously satisfies the initial condition, so all that remains to
prove is that
n2 (n + 1)2 (n − 1)2 n2
= + n3 ,
4 4
which follows from some straightforward manipulation. To sum up, in this
example we have shown that
n
X n2 (n + 1)2
m3 = , for all n ∈ N. (2.8)
m=0
4
Example 2.24*. Here is a formula for the Fibonacci numbers that seems
at first sight impossible to guess:
" √ !n √ !n #
1 1+ 5 1− 5
Fn = √ − . (2.9)
5 2 2
√
With all these fractions and 5’s, it’s even surprising that the right-hand
side works out to be an integer. Nevertheless, following the principle of
Theorem 2.17, we can prove this is a correct formula just by checking that
it satisfies the initial conditions and recurrence relation which define the
Fibonacci sequence. For the initial conditions:
√ !0 √ !0
1 1+ 5 1− 5 1
n=0: √ − = √ [1 − 1] = 0,
5 2 2 5
√ !1 √ !1 √
1 1+ 5 1− 5 5
n=1: √ − = √ = 1.
5 2 2 5
Remember that such a recurrence relation on its own is not enough to specify
the sequence: one also needs to give the first k terms a0 , a1 , · · · , ak−1 as initial
conditions. But the crucial idea is to consider all the sequences satisfying
the same recurrence relation together.
Example 2.26. A first-order homogeneous linear recurrence relation is of
the form an = ran−1 for some nonzero r. To specify the sequence, one
needs to give the value of a0 as an initial condition. If a0 = C, it is clear
that an = Crn for all n ∈ N. (Strictly speaking, we need an application
of Theorem 2.17: the sequence Crn satisfies the right initial condition and
recurrence relation, therefore it equals an .) So we have solved the first-order
homogeneous case.
Example 2.27. A second-order homogeneous linear recurrence relation is
of the form an = ran−1 + san−2 for some numbers r, s with s 6= 0 (because
having s = 0 would make it actually just first-order). For instance, the
Fibonacci sequence satisfies such a recurrence relation, with r = s = 1. The
philosophy now is that we should consider the Fibonacci sequence together
with all other solutions of an = an−1 + an−2 for which the initial conditions
are different. An example of another such is the Lucas sequence defined by
The main reason that linear recurrence relations are nicer is that if you have
two solutions, a linear combination of them is also a solution:
Theorem 2.29. If a0 , a1 , a2 , · · · and b0 , b1 , b2 , · · · are two sequences satisfying
the same kth-order homogeneous linear recurrence relation:
an = r1 an−1 + r2 an−2 + · · · + rk an−k , for all n ≥ k, and
bn = r1 bn−1 + r2 bn−2 + · · · + rk bn−k , for all n ≥ k,
then for any constants C1 and C2 , the sequence
C1 a0 + C2 b0 , C1 a1 + C2 b1 , C1 a2 + C2 b2 , · · ·
also satisfies this recurrence relation.
Proof. This is trivial: taking C1 times the first equation plus C2 times the
second equation immediately gives us
C1 an + C2 bn = r1 (C1 an−1 + C2 bn−1 ) + r2 (C1 an−2 + C2 bn−2 )
+ · · · + rk (C1 an−k + C2 bn−k ), for all n ≥ k,
which is the desired recurrence relation.
by the first-order case, we look for special solutions of the form an = λn and
slight modifications thereof, and the next result tells us which λ to take.
Definition 2.31. The characteristic polynomial of the recurrence relation
an = r1 an−1 + r2 an−2 + · · · + rk an−k is
xk − r1 xk−1 − r2 xk−2 − · · · − rk−1 x − rk .
This is a polynomial of degree k in the indeterminate x.
Theorem 2.32. Let λ be a (possibly complex) root of the characteristic
polynomial.
Example 2.33. We can use Theorems 2.29 and 2.32 to solve the following
second-order recurrence relation:
C1 30 + C2 40 = 2, i.e. C1 + C2 = 2, and
C1 31 + C2 41 = 5, i.e. 3C1 + 4C2 = 5.
an = 3 × 3n + (−1) × 4n = 3n+1 − 4n .
an =
is a linear combination of the k solutions arising from the roots of the char-
acteristic polynomials via parts (1) and (2) of Theorem 2.32. In particular,
the general solution of the second order recurrence relation
is as follows:
Proof**. First note that we do indeed always get k solutions to the re-
currence relation out of parts (1) and (2) of Theorem 2.32: this is because
every root λ gives rise to the m solutions ns λn , 0 ≤ s ≤ m − 1, where m
is the multiplicity (which is 1 if λ is not a repeated root), and the sum of
the multiplicities of all the roots equals the degree of the polynomial, which
is k. (This relies on the fact that we are allowing the roots to be complex
numbers, so the polynomial factorizes completely.)
C1 + C2 = I0 ,
λ1 C1 + λ2 C2 = I1 .
1 1
Because the determinant of the matrix is λ2 − λ1 , which we are
λ1 λ2
assuming is nonzero, there is a unique solution. In fact, that solution is:
λ2 I0 − I1 I1 − λ1 I0
C1 = , C2 = . (2.15)
λ2 − λ1 λ2 − λ1
(It’s probably not worth memorizing these formulas, given how easy it is to
solve any particular 2×2 linear system.) Now suppose that the characteristic
polynomial has a repeated root, i.e. x2 − rx − s = (x − λ)2 . Notice that we
have r = 2λ, s = λ2 . Our linear combination is C1 λn + C2 nλn , and the
equations we need to solve are
C1 = I0 ,
λC1 + λC2 = I1 .
1 0
Because the determinant of the matrix is λ, and λ must be nonzero
λ λ
because s 6= 0 (or else we wouldn’t be dealing with a second-order recurrence
relation at all), there is a unique solution, namely:
I1 − λI0
C1 = I0 , C2 = . (2.16)
λ
So in either case the sequence an must be expressible in the right form.
Remark 2.37. Notice that the first-order case explained in Example 2.26 is
also covered by Theorem 2.36: in this case the characteristic polynomial is
the degree-1 polynomial x − r, whose sole root is r.
CHAPTER 2. RECURSION AND INDUCTION 67
Example 2.38. We can now explain the strange formula for the Fibonacci
numbers that we proved in Example 2.24. The characteristic polynomial
√ for
the√ Fibonacci recurrence is x2 − x − 1, which has distinct roots 1+2 5 and
1− 5
2
. By Theorem 2.36, this is enough to prove that
√ !n √ !n
1+ 5 1− 5
Fn = C1 + C2 , for all n ∈ N,
2 2
Example 2.39. Recall from (2.10) that the Lucas numbers satisfy the same
recurrence relation as the Fibonacci numbers, but different initial conditions.
So they must also be of the same form:
√ !n √ !n
1+ 5 1− 5
Ln = C1 + C2 , for all n ∈ N,
2 2
a0 = I0 , a1 = I1 , an = an−2 for n ≥ 2,
68 Topics in Discrete Mathematics
where I0 and I1 are some initial values. Obviously the solution is that an
is either I0 or I1 depending on whether n is even or odd; how does this fit
the framework of Theorem 2.36? The characteristic polynomial is x2 − 1 =
(x−1)(x+1), so the general solution is an = C1 1n +C2 (−1)n = C1 +C2 (−1)n .
To fit the initial conditions we must have C1 = I0 +I
2
1
and C2 = I0 −I
2
1
, so
(
I0 + I1 I0 − I1 I0 , if n is even,
an = + (−1)n =
2 2 I1 , if n is odd,
as expected.
What makes this “non-homogeneous” is the extra term f (n), which is not of
the same kind as all the others.
Example 2.42. Recall the recurrence relation from the tower of Hanoi prob-
lem in the Introduction:
an = b n + p n ,
because the f (n) term cancels out. This shows that an − pn is a solution of
the homogeneous recurrence relation; letting it be bn , we have an = bn + pn
as claimed. Conversely, given any solution bn of the homogeneous recurrence
relation, the sequence an defined by an = bn + pn does satisfy the non-
homogeneous recurrence relation, because pn does and the subtraction of the
two equations is true.
Remark 2.46*. This theorem implies that the set of all solutions of the
non-homogeneous recurrence relation forms an affine subspace of the vector
space of all sequences (a vector subspace translated so that it doesn’t pass
through the origin).
The problem, of course, is that we still have to find one particular solution
pn . Indeed, we may seem not to have made any progress: in trying to find
the solution an of the non-homogeneous recurrence relation, we are reduced
to finding a solution pn of the non-homogeneous recurrence relation. But the
point is that pn doesn’t have to satisfy the initial conditions that an does,
which gives us more scope. Indeed, pn can often be found by something
approaching guesswork. If the extra term f (n) in the recurrence is a polyno-
mial in n of degree d, it is a good idea to try for a solution pn which is also a
polynomial in n of degree d; let the coefficients be unknowns, and see what
equations they need to satisfy.
Example 2.47. Return to the sequence from Example 2.43:
To find a closed formula for an , we first have to find the general solution of
the recurrence relation an = 2an−1 + 3an−2 + 8n − 4. So we forget about the
initial conditions a0 = 3 and a1 = 5 for now.
Putting all this together, we have found the general solution of the non-
homogeneous recurrence relation an = 2an−1 + 3an−2 + 8n − 4:
an = C1 3n + C2 (−1)n − 2n − 3, for some constants C1 , C2 . (2.18)
Only now, at the end, do we remember the initial conditions we want, a0 = 3
and a1 = 5. These are equivalent to the following equations for C1 and C2 :
C1 + C2 − 0 − 3 = 3, i.e. C1 + C2 = 6, and
3C1 − C2 − 2 − 3 = 5, i.e. 3C1 − C2 = 10.
Again this is a system of linear equations with a unique solution, namely
C1 = 4 and C2 = 2. (In general, the left-hand sides are the same as if we
were solving the homogeneous recurrence relation, so there will always be a
unique solution, for the same reason as in Theorem 2.36.) We have finally
found our closed formula for the original sequence:
an = 4 × 3n + 2 × (−1)n − 2n − 3. (2.19)
Example 2.48. We can now find the general solution of the tower of Hanoi
recurrence relation an = 2an−1 + 1. The corresponding homogeneous recur-
For this to hold for all n ≥ 2, it has to be true that C1 = 3C2 1 + 14 and
C2 = 3C2 − 1. Hence C1 = − 12 and C2 = 12 , and our particular solution is
−2n +1
2
, which gives us the general solution:
−2n + 1
an = C3n + , for some constant C. (2.20)
2
CHAPTER 2. RECURSION AND INDUCTION 73
We have the following general rule to find particular solutions in the case
where the sequence f (n) is quasi-polynomial; that is, f (n) = q(n)µn for some
polynomial q(n) and some constant µ. This constant is called the exponent of
the quasi-polynomial. Note that if µ = 1 then f (n) is a genuine polynomial.
Consider the kth-order non-homogeneous linear recurrence relation
By the same argument as for the proof of Theorem 2.45 one verifies that
if f (n) is a sum of quasi-polynomials then a particular solution of the non-
homogeneous linear recurrence relation can be found as the corresponding
sum of particular solutions provided by Theorem 2.51.
Generating Functions
75
76 Topics in Discrete Mathematics
The first question you should ask about the definition of generating func-
tion is: what is a “formal power series”? This is a very useful idea which
generalizes the concept of a polynomial.
You are familiar with the terminology that something like 2+z +4z 2 is called
a polynomial in the indeterminate (or variable) z. This means that it is a
sum of terms, each of which consists of a different power of z multiplied by a
coefficient which is just a number. (The constant term is the term in which
the power of z is z 0 , and we usually don’t write the z 0 , just the coefficient.)
It is an important part of the concept that there are only finitely many terms
– although you could imagine, if you like, that all the other powers of z are
present but not visible because their coefficient is 0. As you know, there
are natural ways to define addition, subtraction, multiplication, and even
long division of polynomials. The other important thing you can do with a
polynomial in the indeterminate z is to substitute a number for z: e.g. if you
substitute z = 3 in 2 + z + 4z 2 , you get the result 2 + 3 + 4 × 32 = 41. This
means that you can think of polynomials as just a special kind of function.
A formal power series is a ‘polynomial with infinitely many terms’. That is,
it is something like 1 + 2z + 4z 2 + 8z 3 + · · · , in which there is a term for every
power of z. It is allowed for some of the coefficients to be 0, however; indeed,
a polynomial like 2 + z + 4z 2 can be viewed as a formal power series in which
the coefficients of z 3 , z 4 , · · · happen to be 0, and even a number like 2 can be
viewed as a formal power series in which the coefficients of all the positive
powers of z happen to be 0. But having, in general, infinitely many terms
makes the crucial difference that you can’t just substitute any number for z
any more. For example, if you tried to substitute z = 3 in 1+2z +4z 2 +8z 3 +
· · · , you would get the series (infinite sum) 1 + 2 × 3 + 4 × 32 + 8 × 33 + · · · ,
which makes no sense.
Remark 3.3. Of course, this statement is too strong. Power series such as
this arise in many parts of mathematics – you would have seen Taylor series in
calculus, for example – and one of the main aims of analysis is to make sense
of them. But even in the context of analysis, 1 + 2 × 3 + 4 × 32 + 8 × 33 + · · ·
is a divergent series, which cannot be given a numerical value. (You could
CHAPTER 3. GENERATING FUNCTIONS 77
The fact that we can’t (or at least won’t) substitute a number for z means
that we can’t think of a formal power series as a function, as we could a
polynomial (despite the term “generating function”). This is what is meant
by the adjective “formal”: a0 + a1 z + a2 z 2 + a3 z 3 + · · · is not a function of
z, it’s just an object in its own right, in which the letter z and its ‘powers’
play the role of keeping the terms of the sequence in their correct order. In
fact, you could think of the generating function a0 + a1 z + a2 z 2 + a3 z 3 + · · ·
as just a different notation for the sequence a0 , a1 , a2 , a3 , · · · , using + signs
and powers of z instead of commas. To get back to the sequence, you just
have to extract the coefficients of the various powers of z.
The justification for this multiplication rule is that to get a z n term in the
product, you need to multiply the am z m term of A(z) P with the bn−m z n−m
n
term of B(z), for some m. So the coefficient of z n is m=0 am bn−m .
78 Topics in Discrete Mathematics
coefficient of z 0 in F (z)C(z) =
coefficient of z 1 in F (z)C(z) =
coefficient of z 2 in F (z)C(z) =
coefficient of z 3 in F (z)C(z) =
coefficient of z 4 in F (z)C(z) =
so F (z)C(z) =
A special case of the multiplication rule says that multiplying a0 +a1 z+a2 z 2 +
· · · by a number b0 just multiplies every coefficient by that number. So if
A(z) is the generating function of the sequence a0 , a1 , a2 , · · · , then 3A(z), for
instance, is the generating function of the sequence 3a0 , 3a1 , 3a2 , · · · .
1
so (1−z)G(z) = 1, which is commonly rewritten as G(z) = 1−z
. In summary:
1
1 + z + z2 + z3 + · · · = . (3.3)
1−z
As you may know, if you substitute for z a complex number α inside the unit
circle, the left-hand side of (3.3) becomes a series which converges to the
1
number 1−α . On the other hand, if you substitute for z a complex number α
outside the unit circle, the left-hand side becomes a divergent series which has
80 Topics in Discrete Mathematics
1
nothing to do with 1−α . The equation (3.3) is not meant to assert anything
about what happens when you substitute an actual number for z. It means
solely that 1+z +z 2 +z 3 +· · · is a formal power series which has the property
that when you multiply it by 1 − z, you get 1.
Proof*. We will prove (1) in the ‘contrapositive’ form, which says that if
both A(z) and B(z) are nonzero, then so is A(z)B(z). Since they are nonzero
formal power series, they each have a ‘starting term’; in other words,
where ap is nonzero and all the earlier coefficients ap0 for p0 < p are zero, and
similarly bq is nonzero and all the earlier coefficients bq0 for q 0 < q are zero.
From the multiplication rule we see that the coefficient of z p+q in A(z)B(z)
is ap bq , because every other term you would expect to be in the coefficient
involves either some ap0 for p0 < p or some bq0 for q 0 < q. Since ap bq 6= 0,
this shows that A(z)B(z) has at least one nonzero coefficient, so it is not
the zero power series. Part (2) follows by applying part (1) to the equation
A(z)(F (z) − G(z)) = 0.
Remark 3.9. Part (2) of Theorem 3.8 implies that there cannot be two
different power series F (z) and G(z) which satisfy (1−z)F (z) = (1−z)G(z) =
1
1, so our use of the notation 1−z for the geometric series is unambiguous.
Similarly, from now on, whenever we state that F (z) = B(z) A(z)
(a quotient of
formal power series), it just means that A(z)F (z) = B(z); this notation is
possible because there can be at most one formal power series F (z) which
satisfies this equation. These fractions can be manipulated in the usual way.
Remark 3.10*. It is not always the case that such a quotient formal power
series exists: there is no formal power series F (z) which satisfies zF (z) =
1 + z, for instance. But if A(z) has nonzero constant term, then one can find
CHAPTER 3. GENERATING FUNCTIONS 81
1
an inverse A(z) , i.e. a formal power series F (z) such that A(z)F (z) = 1; the
coefficients of the powers of z n in F (z) can be determined recursively. So in
this case, B(z)
A(z)
= B(z) A(z) 1
always makes sense.
(1 + z + z 2 + z 3 + · · · )2 = 1 + 2z + 3z 2 + 4z 3 + · · · , (3.4)
So we do indeed have
∞
X 1
(n + 1) z n = . (3.5)
n=0
(1 − z)2
P∞ n
P∞ n
Proof. If A(z) = n=0 an z and B(z) = n=0 bn z , then the equation
B 0 (z) = A(z) is equivalent to the statement that an = (n + 1)bn+1 for all
n ≥ 0. So bn must be an−1 n
for all n ≥ 1, and the value of b0 is irrelevant.
The result follows.
as required. Here the second-last step used the common trick of adding an
extra term, which works out to be zero, at the top or bottom of a sum. We
prove the power rule by induction on k. The k = 1 case is obvious, so assume
that k ≥ 2 and that the result is known for k − 1. Then using the product
rule, we have
(A(z)k )0 = (A(z)k−1 A(z))0
= (A(z)k−1 )0 A(z) + A(z)k−1 A0 (z)
= (k − 1)A(z)k−2 A0 (z)A(z) + A(z)k−1 A0 (z)
= kA(z)k−1 A0 (z),
completing the induction step. To prove the quotient rule, we differentiate
both sides of B(z)F (z) = A(z) and use the product rule on the left-hand
side (the same way the quotient rule is proved in calculus, incidentally).
This gives
B 0 (z)F (z) + B(z)F 0 (z) = A0 (z),
and multiplying both sides by B(z), using B(z)F (z) = A(z), and rearranging
gives the desired equation B(z)2 F 0 (z) = A0 (z)B(z) − A(z)B 0 (z).
84 Topics in Discrete Mathematics
Example 3.16. Our initial proof of (3.5) can now be justified: we obtain it
from (3.3) by differentiating both sides, and using the quotient rule on the
right-hand side. If we differentiate both sides of (3.5) again, we obtain:
X∞
−2(1 − z)(−1) 2
(n + 1)(n + 2) z n = 4
= 3
.
n=0
(1 − z) (1 − z)
Taking derivative of both sides (and using the quotient and power rules on
the right-hand side), we get
X∞
n+k n 0 − k(1 − z)k−1 (−1) k
(n + 1) z = 2k
= .
n=0
k−1 (1 − z) (1 − z)k+1
Since
n+1 n+k (n + 1)(n + k)(n + k − 1) · · · (n + 2) n+k
= = ,
k k−1 k (k − 1)! k
we obtain the result.
CHAPTER 3. GENERATING FUNCTIONS 85
n
n
n2 = 1
+ 2
.
Hence
∞
X ∞
X ∞
X
2 n n
n n
n z = 1
z + 2
zn
n=0 n=0 n=0
(1) If S(z) = A(z) + B(z), then S(F (z)) = A(F (z)) + B(F (z)).
Proof*. We will only give the proof in the simple case that F (z) = cz, since
this is enough for many of our later purposes. As seen above, substituting
cz for z in a formal power series just multiplies the coefficient of z n by cn ,
for all n ≥ 0. Part (1) is therefore easy: the coefficient of z n in S(F (z)) is
cn (an + bn ), whereas that in the right-hand sideP is cn an + cn bn . Similarly, for
part (2), the coefficient Pof z n in P (F (z)) is cn nm=0 am bn−m , whereas that
in the right-hand side is nm=0 (cm am )(cn−m bn−m ), which is clearly the same.
Part (3) follows immediately from part (2), if you rearrange the equations so
that they don’t involve fractions. Part (4) in the special case F (z) = cz says
that A(cz)0 = cA0 (cz); the coefficient of z n in A(cz)0 is (n + 1)cn+1 an+1 , and
that in A0 (cz) is cn (n + 1)an+1 , so this is true.
Example 3.24. Applying part (3) in the case F (z) = cz to (3.3), we find
∞
X 1
cn z n = 1 + cz + c2 z 2 + c3 z 3 + · · · = . (3.10)
n=0
1 − cz
1
For instance, the generating function of the sequence 1, 2, 4, 8, · · · is 1−2z
.
Similarly, (3.5) implies that
∞
X 1
(n + 1) cn z n = . (3.11)
n=0
(1 − cz)2
At this point we know enough to extract the coefficient of z n from any for-
F (z)
mal power series of the form G(z) where F (z) and G(z) are polynomials.
We may as well assume that the degree of F (z) is less than that of G(z),
since long division of polynomials reduces the general case to this. Then
F (z)
the method is to factorize G(z) into powers (1 − cz)m , and rewrite G(z) in
b
partial-fractions form, i.e. as a sum of terms of the form (1−cz)m where b and
c are constants (i.e. just numbers, not involving z). The coefficient of z n in
each term can then be read off from (3.10), (3.11), or in general (3.12).
88 Topics in Discrete Mathematics
2−9z
Example 3.25. To find the coefficient of z n in 1−7z+12z 2 , we need to write
Equating coefficients, we get two linear equations for the two unknowns of
2 − 9z
=
1 − 7z + 12z 2
We can use (3.10) to read off that
2 − 9z
the coefficient of z n in is
1 − 7z + 12z 2
5−9z
Example 3.26. Let us find the coefficient of z n in 1−6z+9z 2 . The denomi-
2
nator factorizes as (1 − 3z) ; when we have a repeated factor like this, the
partial-fractions form should include a term for every power of the factor up
to the multiplicity, which in this case means
5 − 9z C1 C2
2
= + .
(1 − 3z) 1 − 3z (1 − 3z)2
Clearing the denominators, this becomes
5 − 9z = C1 (1 − 3z) + C2 = (C1 + C2 ) − 3C1 z,
which has the unique solution C1 = 3, C2 = 2. So
5 − 9z 3 2
2
= + .
1 − 6z + 9z 1 − 3z (1 − 3z)2
By (3.10) and (3.11), the coefficient of z n is 3 × 3n + 2(n + 1)3n = (2n + 5)3n .
CHAPTER 3. GENERATING FUNCTIONS 89
2z
Example 3.27*. Let us find the coefficient of z n in (1−z)(1−2z)2 . The partial-
fractions form is
2z C1 C2 C3
2
= + + .
(1 − z)(1 − 2z) 1 − z 1 − 2z (1 − 2z)2
Clearing the denominators, this becomes
2z = C1 (1 − 2z)2 + C2 (1 − z)(1 − 2z) + C3 (1 − z)
= (C1 + C2 + C3 ) − (4C1 + 3C2 + C3 )z + (4C1 + 2C2 )z 2 .
So we must solve the following system of three linear equations:
C1 + C2 + C3 = 0,
4C1 + 3C2 + C3 = −2,
4C1 + 2C2 = 0.
The first equation minus the second plus the third gives C1 = 2, and then
one deduces C2 = −4 and C3 = 2. So
2z 2 4 2
2
= − + ,
(1 − z)(1 − 2z) 1 − z 1 − 2z (1 − 2z)2
and the coefficient of z n is 2 − 4 × 2n + 2(n + 1)2n = (n − 1)2n+1 + 2.
It is sometimes possible to find square roots, cube roots, etc. of formal power
series. By the Binomial Theorem, for any nonnegative integer a we have the
following equation of polynomials in z:
Xa
a a n
(1 + z) = z . (3.13)
n=0
n
Definition 3.28. For any complex number α, we define the formal power
series (1 + z)α by:
X∞
α α n
(1 + z) := z .
n=0
n
(Recall that αn = α(α−1)···(α−n+1)
n!
is defined for any α ∈ C.)
Proof**. We just need to show that the coefficient of z n is the same on both
sides, for all n ≥ 0. That is, we must prove that
Xn
α(α − 1) · · · (α − m + 1) β(β − 1) · · · (β − (n − m) + 1)
m! (n − m)!
m=0 (3.14)
(α + β)(α + β − 1) · · · (α + β − n + 1)
= .
n!
One thing we certainly know is that (1 + z)a (1 + z)b = (1 + z)a+b for every
nonnegative integers a and b, so (3.14) must hold whenever α, β ∈ N. There
is a clever way to deduce the general case from this, using the fact that a
nonzero polynomial can have only finitely many roots (this is because every
root λ of a polynomial corresponds to a factor x − λ in its factorization).
First fix α to be some nonnegative integer a. Then both sides of (3.14)
are polynomial functions of β, so their difference is a polynomial function
of β, which has infinitely many roots because it vanishes whenever β is a
nonnegative integer. The only way this is possible is if their difference is the
zero polynomial. So with α fixed to equal a, (3.14) holds for all values of β.
Now we can use the same argument turned around: for any fixed value of β,
the two sides of (3.14) are polynomial functions of α which agree whenever
α is a nonnegative integer, so their difference must be zero.
Example 3.30. Directly from Theorem 3.29 we see that (1+z)1/2 is a square
root of 1 + z, in the sense that (1 + z)1/2 (1 + z)1/2 = (1 + z)1 = 1 + z. Writing
out the definition in a slightly nicer way:
X∞ 1 X∞ 1 1
1/2 2 n ( − 1)( 12 − 2) · · · ( 21 − n + 1) n
2 2
(1 + z) = z = z
n=0
n n=0
n!
∞
X 1(−1)(−3) · · · (3 − 2n)
=1+ zn
n=1
2n n!
X∞
(−1)n−1 (2n − 3)!! n
=1+ z .
n=1
2n n!
Here (2n − 3)!! is defined as in Example 1.40 (including the convention that
(−1)!! = 1). Similarly, (1 + z)1/3 is a cube root of 1 + z, and so forth.
CHAPTER 3. GENERATING FUNCTIONS 91
Example 3.31*. More generally, if F (z) is any formal power series with zero
constant term, then we can find a square root of 1+F (z) by substituting F (z)
into (1 + z)1/2 ; however, it is not often that you can get any nice formula
for the coefficients of the result. We will only use the case of the simple
substitution of cz for z:
∞
X
1/2 (−1)n−1 (2n − 3)!! cn
(1 + cz) =1+ zn. (3.15)
n=1
2n n!
You may like to ponder which formal power series have a square root: it
is pretty clear that z does not, for instance. All we will need is the easy
observation that a square root, if it does exist, is unique up to sign:
Theorem 3.32. If F (z) and G(z) are both square roots of A(z), then either
F (z) = G(z) or F (z) = −G(z).
and part (1) of Theorem 3.8 forces one of the two factors to be zero.
Example 3.33*. Notice that if α = −a is a negative integer, then (1 + z)−a
1 a 0
must be (1+z)a , since when multiplied by (1 + z) it gives (1 + z) = 1. So
X∞
1 −a n
= z
(1 + z)a n=0
n
∞
X (−a)(−a − 1)(−a − 2) · · · (−a − n + 1)
= zn
n=0
n!
X∞
(−1)n a(a + 1)(a + 2) · · · (a + n − 1) n
= z ,
n=0
n!
Finally, we can even solve some ‘differential equations’ involving formal power
series. In the usual theory of (linear ordinary) differential equations, the
92 Topics in Discrete Mathematics
exponential function plays a crucial role, being the unique function ex which
d x x
satisfies the differential equation dx e =
Pe∞ subject to e0 = 1. So let us try
n
to find a formal power series exp(z) = n=0 en z such that
X∞
1 n 1 1 1
exp(z) = z = 1 + z + z2 + z3 + z4 + · · · (3.17)
n=0
n! 2 6 24
Proof**. The “if” direction, which just says that A(z) = a0 exp(G(z)) is
indeed a solution of the equation A0 (z) = F (z)A(z), can be proved using the
chain rule:
Now that we are familiar with the basic operations involving formal power
series, we explore how generating functions can be used to say something
about various kinds of recursive sequences a0 , a1 , a2 , · · · . The main idea is
that a recurrence relation expressing an in terms of earlier terms in the se-
quence should give rise to an equation expressing A(z) in terms of itself (or its
derivatives, powers, etc.). If this equation can be solved, we have a formula
for the generating function A(z). Ideally, we would then be able to extract
the coefficient of each z n , giving a closed formula for an (and thus solving
the recurrence relation). Even when this is not possible, a formula for the
generating function is a concise way of summarizing the sequence, and there
are standard methods for deducing qualitative information from it.
Assuming sufficient facility with the shift operation (3.2), one can write such
calculations more concisely using sigma notation rather than spelling out the
series with a “ · · · ”.
Example 3.38. Recall from Example 2.34 the sequence defined by
a0 = 1, a1 = 4, an = 4an−1 − 4an−2 for n ≥ 2.
P∞
Let A(z) = n=0 an z n be the generating function. Then
∞
X
A(z) = 1 + 4z + (4an−1 − 4an−2 ) z n
n=2
X∞ ∞
X
= 1 + 4z + 4 an−1 z n − 4 an−2 z n
n=2 n=2
2
= 1 + 4z + 4z(A(z) − 1) − 4z A(z),
1 1
which after rearrangement becomes A(z) = 1−4z+4z 2 = (1−2z)2 . Using (3.11),
H(z) =
2 3 − 7z + 17z 2 − 5z 3
(1 − 2z − 3z )A(z) = ,
(1 − z)2
96 Topics in Discrete Mathematics
so
3 − 7z + 17z 2 − 5z 3
A(z) = . (3.18)
(1 − z)2 (1 − 3z)(1 + z)
We now want to rewrite the right-hand side using partial fractions:
3 − 7z + 17z 2 − 5z 3 C1 C2 C3 C4
= + + + ,
(1 − z)2 (1 − 3z)(1 + z) 1 − 3z 1 + z 1 − z (1 − z)2
where C1 , C2 , C3 , C4 are some constants. Clearing the denominator, we get
3 − 7z + 17z 2 − 5z 3 = C1 (1 − z)2 (1 + z) + C2 (1 − z)2 (1 − 3z)
+ C3 (1 − z)(1 − 3z)(1 + z) + C4 (1 − 3z)(1 + z)
= C1 (1 − z − z 2 + z 3 ) + C2 (1 − 5z + 7z 2 − 3z 3 )
+ C3 (1 − 3z − z 2 + 3z 3 ) + C4 (1 − 2z − 3z 2 )
= (C1 + C2 + C3 + C4 ) − (C1 + 5C2 + 3C3 + 2C4 )z
+ (−C1 + 7C2 − C3 − 3C4 )z 2 − (−C1 + 3C2 − 3C3 )z 3 ,
and equating coefficients gives us a system of four linear equations for the
four unknowns:
C1 + C2 + C3 + C4 =3
C1 + 5C2 + 3C3 + 2C4 =7
−C1 + 7C2 − C3 − 3C4 = 17
−C1 + 3C2 − 3C3 =5
Using linear algebra methods, we can solve this to obtain
C1 = 4, C2 = 2, C3 = −1, C4 = −2,
and thus we get the partial-fractions formula for the generating function:
4 2 1 2
A(z) = + − − . (3.19)
1 − 3z 1 + z 1 − z (1 − z)2
From this we can read off the answer:
an = 4 × 3n + 2 × (−1)n − 1 − 2(n + 1),
which is of course the same as in Example 2.47. For this example it is
questionable whether the generating-function approach is any better: it is
more motivated and direct, but the system of linear equations involved is
larger than anything we had to deal with in the other method.
CHAPTER 3. GENERATING FUNCTIONS 97
Along the same lines, we can find a formula for the generating function of
the kth column of the Stirling triangle (compare this with Theorem 3.17):
Theorem 3.43*. For any k ≥ 1,
∞
X 1
S(n + k, k) z n = .
n=0
(1 − z)(1 − 2z) · · · (1 − kz)
(1 + z + z 2 + · · · )(1 + 2z + 22 z 2 + · · · ) · · · (1 + kz + k 2 z 2 + · · · ).
To get a z n term from the expansion of the product, one must select the z n1
term from the first factor, the z n2 term from the second factor, etc., up to the
z nk term from the kth factor, where n1 + n2 + · · · + nk = n. The coefficient
obtained from this selection is 1n1 2n2 · · · k nk , so we find
X
S(n + k, k) = 1n1 2n2 · · · k nk . (3.22)
n1 ,n2 ,··· ,nk ∈N
n1 +n2 +···+nk =n
This is a positive formula, but with the major drawback that the sum has
n+k−1
k−1
terms. So finding a formula for the generating function has not
CHAPTER 3. GENERATING FUNCTIONS 99
Generating functions can be very helpful in finding closed formulas for sums
f (0) + f (1) + · · · + f (n). The reason is that if A(z) is the generating function
of a0 , a1 , a2 , · · · , then by (3.3) and the multiplication rule,
X ∞
A(z)
= (a0 + a1 + · · · + an ) z n , (3.25)
1−z n=0
A(z)
so 1−z
is the generating function of the sequence of partial sums of a0 , a1 , · · · .
Example 3.46. To find a closed formula for 0×20 +1×21 +· · ·+n×2n , we first
need a formula for the generating function of the sequence 0 × 20 , 1 × 21 , · · · :
by (3.11), this is
∞
X
n2n z n =
n=0
(It is convenient not to combine the two terms.) Thus (3.25) tells us that
∞
X z 2z 2
02 + 12 + · · · + n2 z n = +
n=0
(1 − z)3 (1 − z)4
X∞ X∞
n+1 n n+1 n
= z +2 z .
n=0
2 n=0
3
(As in (3.9), we can let the summations start from n = 0, because the early
terms are zero anyway.) From this we read off the formula
2 2 2 n+1 n+1 (n + 1)n(2n + 1)
1 + 2 + ··· + n = +2 = . (3.26)
2 3 6
Remark 3.49*. Expressing one sum as another sum may not seem like
progress, but the sum on the right-hand side has only a terms, and we are
imagining a as being fixed while n varies.
Proof*. Using Theorem 1.86 and (3.9), we get a formula for the generating
function of the sequence of ath powers:
X∞ Xa X∞ a
a n n n X zk
n z = k! S(a, k) z = k! S(a, k) .
n=0 k=1 n=0
k k=1
(1 − z)k+1
However, this is not a closed formula, since the number of terms in the sum
grows as n grows. With sequences such as this, we often have to be satisfied
with giving a formula for the generating function.
Example 3.51**. Recall from Example 2.23 the sequence defined by
1
a0 = 1, an = (an−1 + 2an−2 + 3an−3 + · · · + na0 ) for n ≥ 1.
n
P∞
Letting A(z) = n=0 an z n be the generating function, we have
∞
X
0
A (z) = (n + 1)an+1 z n
n=0
∞
X
= (an + 2an−1 + 3an−2 + · · · + (n + 1)a0 ) z n
n=0
= (1 + 2z + 3z 2 + 4z 3 + · · · )A(z)
1
= A(z).
(1 − z)2
Again, this is a differential equation of the type we know how to solve: the
1 z 2 3
required ‘integral’ of (1−z)2 is 1−z = z + z + z + · · · , since we need it to
z
have zero constant term. So A(z) = exp 1−z . As in the previous example,
this does not lead to a closed formula for an .
Part I
Acknowledgements
These lecture notes were written in 2009 for the units MATH2069 Discrete
Mathematics and Graph Theory and MATH2969 Discrete Mathematics and
Graph Theory (Advanced), given at the University of Sydney. I am extremely
grateful to the previous lecturer Bill Palmer for allowing me to make use of
his lecture notes on graph theory.
Anthony Henderson
Contents
0 Introduction 1
3 Trees 47
3
4 CONTENTS
4 Colourings of Graphs 71
Introduction
Graph theory is one of the most important and useful areas of discrete math-
ematics. To get a preliminary idea of what it involves, let us consider some
of the real-world problems which can be translated in graph-theoretic terms.
If either of the middle two computers crashes, the others would be split into
two groups which could no longer communicate. To fix this problem, suppose
1
2 Introduction to Graph Theory
Then the loss of any single computer would not be fatal: the remaining
computers would still be linked to each other. However, if you examine the
new network, you will be able to find two computers such that if they both
crashed simultaneously, the remaining four computers would be split into
two groups; if this represented an unacceptable risk, you would have to add
another connection. This sort of thinking about the nodes of a network and
their connections is quintessentially graph-theoretic.
E B
D C
Then clearly A, B, and C need to be allocated three different time-slots, but
D could be scheduled concurrently with B and E with C. Again the problem
amounts to considering abstract properties of the picture, and which courses
are connected to which. A similar problem confronts map-makers when they
have to choose a colour for each region of the map so that no two adjacent
regions have the same colour; again, the only information which is relevant is
the pattern of adjacencies between regions, which could be displayed in the
same sort of picture as above.
One of the most famous problems in graph theory is the Travelling Salesman
Problem. The original context was that a salesman wanted to visit a certain
CHAPTER 0. INTRODUCTION 3
number of towns, returning to his starting point, and having as short a total
distance to travel as possible. The information necessary to solve the problem
is not only which towns are connected to which, but how far apart they are
(considering only the shortest way of travelling between each pair of towns).
For example, there might be four towns A, B, C, D, where A is the salesman’s
home town, with distances in kilometres given in the following picture.
B
9 7
5
A C
8
9 6
One way to solve the problem would be to list all the possible routes and
calculate their distances: for instance, the route A–B–C–D–A has distance
9+7+6+9 = 31, whereas the route A–B–D–C–A has distance 9+5+6+8 =
28. The disadvantage of this approach is that if there are n towns, there are
(n−1)! possible routes, and as n increases, the growth rate of (n−1)! is faster
than exponential. There are many smarter algorithms which cut down the
amount of calculation involved, but it is a major open question whether there
is any algorithm for solving the Travelling Salesman Problem whose running
time has polynomial growth rate. The ramifications of such an algorithm
would be huge, because there are many problems in all sorts of areas which
can be rephrased in terms of finding the shortest paths through a diagram.
In these notes, the main results are all called “Theorem”, irrespective of their
difficulty or significance. Usually, the statement of the Theorem is followed
(perhaps after an intervening Example or Remark) by a rigorous Proof; as is
traditional, the end of each proof is marked by an open box.
The text in the Examples and Remarks is slanted, to make them stand out
on the page. Many students will understand the Theorems more easily by
studying the Examples which illustrate them than by studying their proofs.
Some of the Examples contain blank boxes for you to fill in. The Remarks
are often side-comments, and are usually less important to understand.
The more difficult Theorems, Proofs, Examples, and Remarks are marked at
the beginning with either * or **. Those marked * are at the level which
MATH2069 students will have to understand in order to be sure of getting
a Credit, or to have a chance of a Distinction or High Distinction. Those
marked ** are really intended only for the students enrolled in the Advanced
unit MATH2969, and can safely be ignored by those enrolled in MATH2069.
Chapter 1
What the problems in the introduction have in common is that they are all
concerned with a collection of objects (computers, courses, towns) some of
which are linked together (in a literal physical sense, or by some property
such as two courses having a student in common). The abstract definition
of a graph is meant to capture this basic idea. (Note that the word “graph”
is used here in a different sense from the graphs of functions in calculus.)
5
6 Introduction to Graph Theory
In applications, the vertices of the graphs may be computers and the edges
may be connections between them, or the vertices may be towns and the
edges may be roads, and so forth. But for the purpose of developing the
theory, we will usually use positive integers or letters of the alphabet as the
vertices. Wherever possible, n will denote the number of vertices.
2 4
3
5 6
The edges in this graph are as follows:
{1, 2}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {3, 5}, {3, 6}, {5, 6}.
simple graphs). On the rare occasions where we need to discuss graphs with
multiple edges or loops, we will call them multigraphs. There is also an im-
portant concept of directed graph, where the edges are ordered pairs rather
than unordered pairs, and can hence be thought of as having a definite di-
rection from one end to the other. We will ignore all these variants for now.
Proof. If you fix the vertex set V , then the graph is determined by specifying
the edge set E. By definition, E is a subset of the set X of two-element
subsets of V . We know that if |V | = n, then |X| = n2 . So the size of a
subset of X can be anything from 0 to n2 ; the number of subsets of X of size
n
k is ( 2 ) ; and the total number of subsets of X is 2( 2 ) . (Recall the reason
n
k
for this last part: specifying a subset of X is the same as deciding, for each
element of X, whether it is in or out. In our situation, we have n2 pairs of
vertices which are ‘potential edges’, and we have to decide for each pair of
vertices whether to join them or not.)
Example 1.5. Since 32 = 3, there are 8 = 23 graphs with vertex set
{1, 2, 3}:
1 1 1 1
2 3 2 3 2 3 2 3
1 1 1 1
2 3 2 3 2 3 2 3
Of these graphs, 30 = 1 has no edges, 31 = 3 have one edge, 3
2
= 3 have
two edges, and 33 = 1 has three edges.
8 Introduction to Graph Theory
There is a natural sense in which there are ‘really’ only four graphs involved
in Example 1.5: all the graphs with one edge have ‘the same form’, if you
forget about the labels of the vertices, and so have all the graphs with two
edges. This looser kind of sameness is called isomorphism (from the Greek
for “same form”). It is formalized in the following definition.
Definition 1.6. A graph G = (V, E) is said to be isomorphic to a graph
G′ = (V ′ , E ′ ) if there is a bijection between their vertex sets under which
their edge sets correspond; that is, a bijective function f : V → V ′ such that
E ′ = {{f (v), f (w)} | {v, w} ∈ E},
i.e. f (v), f (w) are adjacent in G′ if and only if v, w are adjacent in G.
Example 1.7. Consider the two graphs represented by these pictures.
b u
a c
v w x y
d
e f z
The first graph is (V, E) where V = {a, b, c, d, e, f } and
E = {{a, b}, {a, c}, {a, d}, {b, c}, {c, d}, {d, e}, {d, f }, {e, f }},
and the second graph is (V ′ , E ′ ) where V ′ = {u, v, w, x, y, z} and
E ′ = {{u, w}, {u, x}, {v, y}, {v, z}, {w, x}, {w, z}, {x, z}, {y, z}}.
Although their pictures look superficially different, (V, E) is isomorphic to
(V ′ , E ′ ) via the following bijection:
a ↔ w, b ↔ u, c ↔ x, d ↔ z, e ↔ v, f ↔ y.
You can check from the above listings that the edge sets correspond; a more
visual way to see the isomorphism is to re-draw the second graph so that it
obviously has the same form as the first graph.
b u
a c w x
d z
e f v y
Note that the bijection given above is not the only one we could have used:
it would be just as good to let e correspond to y and f to v, for instance.
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 9
Remark 1.8. To avoid confusion, the two graphs in Example 1.7 had differ-
ent vertex sets. But it is also possible for two graphs with the same vertex set
to be isomorphic. In the special case that V ′ = V , Definition 1.6 says that
(V, E) is isomorphic to (V, E ′ ) if and only if there is some permutation of the
vertices which makes the edge sets correspond; in other words, a picture of
(V, E ′ ) can be obtained from a picture of (V, E) by permuting the labels of
the vertices while leaving the edges where they are.
This means that it makes sense to classify graphs into isomorphism classes,
where two graphs are in the same isomorphism class if and only if they are iso-
morphic to each other; in visual terms, if and only if they can be represented
by the same picture with (possibly) different labellings of the vertices. We
represent isomorphism classes of graphs by pictures with unlabelled vertices.
10 Introduction to Graph Theory
I should confess straight away that we will never actually complete any gen-
eral classification of graphs; in fact, our focus will shift away from this prob-
lem in later chapters. But it is useful to examine some small examples of
such classification, to get some familiarity with basic concepts.
Example 1.11. There is only one graph with 0 vertices (the vertex set and
the edge set both have to be the empty set). By contrast, there are arguably
infinitely many graphs with 1 vertex, because that vertex could be anything
you like (the edge set has to be empty). However, what really matters is that
there is only one isomorphism class of graphs with 1 vertex: they all have
the ‘same form’, since they just consist of a single vertex and no edges.
Example 1.12. If a graph has two vertices, it can either have 0 edges or
1 edge. Clearly all the graphs with two vertices and 0 edges are isomorphic
to each other, as are all the graphs with two vertices and 1 edge. Hence
there are two isomorphism classes of graphs with two vertices, which can be
represented by the following pictures.
Remember that there is no point labelling the vertices, because the names
of the vertices are irrelevant to the isomorphism class of a graph.
Example 1.13. As foreshadowed after Example 1.5, there are four isomor-
phism classes of graphs with three vertices, one for each of the possible num-
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 11
For instance, we are claiming here that every graph with three vertices and
two edges must have a vertex where the two edges meet; this is clear, because
if the two edges had no ends in common there would have to be at least
2 × 2 = 4 vertices.
Example 1.14. There is only one isomorphism class of graphs with four
vertices and no edges, and similarly for four vertices and one edge. If a graph
has four vertices and two edges, the only essential information remaining is
whether the edges meet at a vertex or not. This gives rise to two isomorphism
classes:
Classifying graphs with four vertices and three edges is a little more inter-
esting. One way to approach it is to fix the specific vertex set {1, 2, 3, 4},
and draw pictures of all the graphs with this vertex set and three edges; by
4
Theorem 1.4, there are (32) = 20 of these. Examining the pictures, we can
easily classify them into the following isomorphism classes:
We will develop more systematic ways of doing this classification later. If you
start considering the possible configurations of four edges, you will quickly
realize that it is easier to classify these graphs according to the two ‘non-
edges’, which must either meet at a vertex or not; this gives two isomorphism
12 Introduction to Graph Theory
classes:
By the same reasoning, there is only one isomorphism class of graphs with
four vertices and five edges (and one non-edge), and clearly there is only one
isomorphism class of graphs with four vertices and six edges (where every
possible edge is included).
n
Note that if G has n vertices and k edges, then G has
n vertices and 2
−k
edges. So classifying graphs with n vertices and n2 − k edges is essentially
the same as classifying graphs with n vertices and k edges.
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 13
One glaring difference between some of the graphs we classified in the previ-
ous section was that sometimes there was a way to get from every vertex to
every other vertex along the edges, and sometimes there wasn’t: if the edges
represent lines of communication, then sometimes all the vertices could com-
municate with each other and sometimes they split into more than one group.
Definition 1.18. Let G = (V, E) be a graph. If v, w ∈ V , a walk from v to
w in the graph G is a sequence of vertices
v0 , v1 , v2 , · · · , vℓ ∈ V, with v0 = v and vℓ = w,
such that vi and vi+1 are adjacent for all i = 0, 1, · · · , ℓ − 1, or in other words
the following are all in E:
{v0 , v1 }, {v1 , v2 }, · · · , {vℓ−1 , vℓ }.
The length of such a walk is the number of steps, i.e. ℓ. We say that v is
linked to w in G if there exists a walk from v to w in G.
Theorem 1.19. In any graph G = (V, E), the relation of being linked is an
equivalence relation on the vertex set V . In other words:
Proof. For (1), we can use the walk of length 0 which consists solely of v.
For (2), if v0 , v1 , · · · , vℓ is a walk from v to w, then its reversal vℓ , · · · , v0
is a walk from w to v. The proof of part (3) uses the fact that we can
concatenate walks: if v0 , v1 , · · · , vℓ is a walk from v to w and w0 , w1 , · · · , wk
is a walk from w to x, then v0 , v1 , · · · , vℓ , w1 , · · · , wk is a walk from v to x,
since vℓ = w = w0 .
Note that Nn has 0 edges whereas Kn has n2 , and Nn has n connected com-
ponents (every vertex is a component on its own) whereas Kn is connected.
Example 1.23. Here are the pictures of Kn for n ≤ 4:
1 1 2
1 1 2
2 3 3 4
K1 K2 K3 K4
The following result means that the classification of graphs up to isomorphism
reduces to the case of connected graphs.
Theorem 1.24*. Let G and G′ be graphs.
Note that the converse is not required to hold. A subgraph with the same
vertex set as the whole graph is called a spanning subgraph.
···
3 2 4 3
C3 C4
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 17
(1) v and w are linked in G if and only if there is a path in G with end-
vertices v and w.
Proof*. For part (1), recall that v and w are said to be linked in G if there
is a walk in G from v to w. So the “if” direction is clear: if there is a path
with end-vertices v and w, then there is a walk along that path from v to
w. For the “only if” direction, we assume that there is a walk from v to
w in G, say v = v0 , v1 , v2 , · · · , vℓ = w. We need to produce a path with
end-vertices v and w; the problem is that there may be repeated vertices in
the walk. But if vi = vj for some i < j, then the part of the walk between
vi and vj is a needless detour; we can cut it out to obtain the shorter walk
v0 , · · · , vi , vj+1 , · · · , vℓ . Continuing in this way, we must eventually obtain a
walk from v to w with no repeated vertices, which (together with the edges
between consecutive vertices) constitutes a path as required.
In part (2), if there is a cycle containing e, we may as well number its vertices
v = v1 , v2 , · · · , vn = w, and then the walk v1 , v2 , · · · , vn shows that v and
w are linked without using the edge e. Conversely, suppose that v and w
are linked in G − e. By part (1), there is a path in G − e with vertices
v = v1 , v2 , · · · , vn = w, and we can add the edge e to this path to form a
cycle. (We cannot have n = 2, because then the path would contain e.)
The reason for the name is that such an edge e is the only way to get from
v to w, as if they were separated by water and e was the only bridge.
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 19
a b c w x
d z
e f v y
the bridges are {a, d}, {b, c}, {c, d}, and {d, z}.
n
(1) If G is connected, then n − 1 ≤ k ≤ 2
.
n−s+1
(2) If G has s connected components, then n − s ≤ k ≤ 2
.
Proof*. In part (1), we already know the upper bound k ≤ n2 , so the only
new thing to prove is that connectedness requires at least n − 1 edges. We
can prove this by induction on n; the n = 1 base case is obvious. Assume
that n ≥ 2 and that we know the result for graphs with fewer than n vertices.
Suppose that G contains a bridge e = {v, w}. Since G is connected, every
vertex in G − e must be linked to either v or w; that is, G − e has two
connected components, the component G1 containing v and the component
G2 containing w. Suppose that Gi has ni vertices and ki edges, for i = 1, 2.
Then n1 + n2 = n, so n1 , n2 < n; thus G1 and G2 are graphs to which
the induction hypothesis applies, and we conclude that k1 ≥ n1 − 1 and
k2 ≥ n2 − 1. Hence
k = k1 + k2 + 1 ≥ (n1 − 1) + (n2 − 1) + 1 = n1 + n2 − 1 = n − 1.
On the other hand, if G does not contain a bridge, then if we delete any
edge it remains connected; if we continue to delete edges, we must eventually
reach a spanning subgraph which does contain a bridge. Since the number
of edges in this subgraph is at least n − 1, the number of edges in G is even
more. So in either case, k ≥ n − 1 and the inductive step is complete.
20 Introduction to Graph Theory
If δ(G) = ∆(G), i.e. every vertex has the same degree d, we say that G is
regular of degree d.
e b
d c
then the degrees of the vertices are
Example 1.42. There is a famous graph called the Petersen graph which
has 10 vertices and is regular of degree 3:
In the latter graph there are only two 5-cycles, the inner ring and the outer
ring; but in the Petersen graph, there are many ways other than the most
visible way to separate the vertices into two 5-cycles.
It is obvious that isomorphic graphs have the same degree sequence, so con-
sidering the degree sequence can be a useful way to distinguish between
non-isomorphic graphs. (Unfortunately, the preceding example shows that
two non-isomorphic graphs can have the same degree sequence, so it is not a
panacea.) The degree sequence contains some of the pieces of information we
have previously used for classification: the number of vertices is the number
CHAPTER 1. FIRST PROPERTIES OF GRAPHS 23
of terms in the degree sequence, and the number of edges can be derived
from the sum of the terms, by the following result.
Theorem 1.43 (Hand-shaking Lemma). For any graph, the number of edges
is half the sum of the degrees of all the vertices.
v1 v2 v4 v5 v3
So the answer to the original question is that (1, 2, 2, 3, 3, 3) is graphic.
with the same degree, we can arrange that W = {vn−1 , vn−2 , · · · , vn−dn } as
required. On the other hand, if the inequality in (1.2) is strict, there must
be some vertex w in W whose degree is strictly less than that of some vertex
x ∈ {v1 , · · · , vn−1 } \ W . The inequality deg(w) < deg(x) implies that there
is some vertex y 6= w which is adjacent to x but not to w. Thus vn , w, x, y
are four distinct vertices of G such that {w, vn } and {x, y} are edges of G,
but {x, vn } and {w, y} are not. Now we modify G by deleting the edges
{w, vn } and {x, y}, and adding the edges {x, vn } and {w, y}. Since each of
vn , w, x, y is involved in one of the deleted edges and one of the added edges,
this modification has not changed any of the degrees; however, since w has
been replaced by x in the set W , the quantity d has strictly increased. After
repeating this step a finite number of times, we must reach a graph for which
equality holds in (1.2), and then the result follows as seen before.
We can deduce the following recursive rule for determining whether a se-
quence is graphic.
Theorem 1.50. Let (d1 , d2 , · · · , dn ) be a weakly increasing sequence of non-
negative integers. We have three cases.
Proof*. Case (1) is obvious, because the null graph Nn has degree sequence
(0, 0, · · · , 0). We have already seen the reason for case (2): a vertex of
26 Introduction to Graph Theory
In case (3), Theorem 1.50 does not immediately decide whether (d1 , · · · , dn )
is graphic or not, but it reduces the question to the same question for the
shorter sequence (e1 , · · · , en−1 ); we can then apply Theorem 1.50 again to
this shorter sequence and so on, and since the sequence can’t go on getting
shorter indefinitely, we must eventually land in either case (1) or case (2).
(Of course, we can stop before that point if the sequence becomes obviously
graphic or not graphic by some other reasoning.)
Example 1.51. The sequence (1, 1, 1, 1, 4, 4) falls into case (3). To obtain
the shorter sequence, we apply the three steps prescribed in Theorem 1.50:
The sequence (0, 0, 0, 1, 3) falls into case (2), so it is not graphic; hence the
original sequence (1, 1, 1, 1, 4, 4) is not graphic.
Remark 1.52. Since Havel and Hakimi provided the solution to the basic
question of whether a sequence is the degree sequence of a graph, there
has been much research on more refined questions: for instance, replacing
“graph” by “connected graph”. Unfortunately we will have to omit these
later developments.
Chapter 2
The origin of graph theory was a question posed to the great 18th-century
mathematician Euler by the citizens of Königsberg. Their city was built
around two islands in the river Pregel, and included seven bridges joining
the islands to the banks and to each other; they wanted to know whether
it was possible to walk through the city crossing every bridge exactly once.
To decide this, Euler was led to introduce the concept of a graph. Many
subsequent applications of graph theory also involve special kinds of walks.
(1) uses every edge exactly once, i.e. for every {v, w} in V there is exactly
one i such that {vi , vi+1 } = {v, w};
27
28 Introduction to Graph Theory
It is clear that if an Eulerian circuit does exist, you can make it start (and
therefore also finish) at any vertex you want.
Example 2.2. Consider the following two graphs.
a a
e b e b
d c d c
The first is clearly Eulerian: an example of an Eulerian circuit is the walk
a, b, c, a, d, e, a. But if you try to find an Eulerian circuit in the second graph,
you will get stuck: the extra edge {c, d} can’t be fitted into the walk with-
out repeating some other edge. The best you can do is a walk such as
d, e, a, d, c, a, b, c, which uses every edge exactly once but doesn’t return to
its starting point. So the second graph is not Eulerian.
Example 2.3. The cycle graph Cn is obviously Eulerian: a walk around the
cycle is an Eulerian circuit.
Remark 2.4. We restricted to connected graphs in Definition 2.1, because
if a graph has more than one connected component which contains an edge,
there clearly cannot be a walk which uses all the edges of the graph.
Remark 2.5*. We may seem to be avoiding some interesting examples of
this problem by not allowing multiple edges in our graphs. Indeed, the
bridges of Königsberg formed a multigraph and not a graph, because some
of the landmasses were joined by more than one bridge. But given any multi-
graph, the question of whether it has an Eulerian circuit can be rephrased in
terms of a graph, namely the one where you sub-divide every multiple edge
into two edges by putting a new vertex in the middle of it.
The reason that the first graph in Example 2.2 is so obviously Eulerian is that
its edges can be split up into two 3-cycles. Euler realized that this property
is crucial to the existence of an Eulerian circuit, and that it depends on the
evenness of the vertex degrees; what is wrong with the vertices c and d in
the second graph in Example 2.2 is that their degrees are odd.
Proof. We first prove the “only if” direction, that in an Eulerian graph all
the degrees are even. Let v0 , v1 , · · · , vℓ be an Eulerian circuit. A vertex v
may appear more than once in this circuit, but every time it does appear,
two of the edges ending at v are used: namely, {vi−1 , vi } and {vi , vi+1 } if
v = vi , or {v0 , v1 } and {vℓ−1 , vℓ } if v = v0 = vℓ . Since every edge is used
exactly once, the total number of edges ending at v must be even.
To prove the “if” direction, we use induction on the number of edges. The
base case of this induction is the case where there are no edges at all, i.e. the
graph consists of a single vertex; this graph is trivially Eulerian (the Eulerian
circuit has length 0). So we assume that G does have some edges, and that
all the vertex degrees are even. Our induction hypothesis says that every
connected graph with fewer edges than G and all degrees even is Eulerian,
and we want to prove that G is also. We start by showing that G contains
a cycle. Pick any vertex v0 ; since G is connected, there must be some other
vertex v1 which is adjacent to v0 ; since deg(v1 ) can’t be 1, there must be some
vertex v2 6= v0 which is adjacent to v1 ; since deg(v2 ) can’t be 1, there must
be some vertex v3 6= v1 which is adjacent to v2 ; and we can continue in this
way indefinitely. Because there are only finitely many vertices, there must
be some k < ℓ such that vk = vℓ , and we can assume that vk , vk+1 , · · · , vℓ−1
are all distinct. The subgraph C which consists of these distinct vertices and
the edges between them (including the edge {vℓ−1 , vk }) is a cycle.
Now let H be the subgraph of G obtained by deleting all the edges in C (but
keeping all the vertices). The vertex degrees in H are the same as those in
G, except that 2 is subtracted from the degree of every vertex in C; so the
vertex degrees in H are all even. H need not be connected, but if we let
H1 , H2 , · · · , Hs be the connected components of H, then each Hi is Eulerian
by the induction hypothesis. Since G is connected, every Hi must contain
at least one of the vertices vk , · · · , vℓ−1 of C; let ki be such that vki lies in
Hi , and renumber if necessary so that k1 < k2 < · · · < ks . Then we can
construct an Eulerian circuit in G as follows:
vk , · · ·, vk1 , ·c
· ·, vk1 , · · ·, vk2 , ·c
· ·, vk2 , · · · , vks , ·c
· ·, vks , · · ·, vℓ−1 , vk ,
where · · · indicates a walk around the appropriate part of the cycle C, and
·c
· · indicates an Eulerian circuit in the appropriate Hi starting and finishing
at vki . This completes the induction step.
30 Introduction to Graph Theory
Theorem 2.6 not only gives us a simple criterion for the existence of an
Eulerian circuit; its proof supplies a recursive way of finding one.
Example 2.7. Let G be the following graph, in which all degrees are even.
b w
a x
c z
d y
e v
Theorem 2.6 guarantees that an Eulerian circuit exists. To use the construc-
tion in the proof, we need to choose a cycle C, say the 4-cycle with vertices
b, w, v, e. After removing the edges of C, we obtain the graph H:
b w
a x
c z
d y
e v
We now need to find an Eulerian circuit in H; again, the first step is to
remove a cycle, say the 4-cycle with vertices a, b, z, c.
b w
a x
c z
d y
e v
This leaves four connected components, and Eulerian circuits in these are
obvious: the length-0 walks a and b in those components, the walk c, d, e, c,
and the walk z, v, y, z, w, x, z. We now patch these into the appropriate places
in the cycle a, b, z, c to form an Eulerian circuit in H:
a, b, z, v, y, z, w, x, z, c, d, e, c, a.
Finally, we make this circuit start at b instead, and patch it into the original
cycle b, w, v, e to form an Eulerian circuit in G:
b, z, v, y, z, w, x, z, c, d, e, c, a, b, w, v, e, b.
Of course, we made many choices along the way, so the result is not unique.
CHAPTER 2. SPECIAL WALKS IN GRAPHS 31
There is an easy variant of Theorem 2.6 for the case of an Eulerian trail,
which is a walk that uses every edge exactly once but finishes at a different
vertex from where it started. (We saw such a walk in Example 2.6.)
Proof. The “only if” direction, that a graph with an Eulerian trail must
have exactly two vertices of odd degree, works in the same way as in the
proof of Theorem 2.6. The difference is that the first vertex and last vertex
of the trail are no longer the same, so their appearances at the beginning and
end of the trail use up only 1 of their edges, and their total number of edges
ends up odd. Notice that this proves the second sentence in the statement:
the first and last vertex of the trail must be the two odd-degree vertices.
For the “if” direction, suppose that G is connected and has exactly two
vertices of odd degree, say v and w. If v and w are not adjacent, then we
can add the edge {v, w} to form the graph G + {v, w}; since this increases
the degrees of v and w by 1, all vertices in G + {v, w} have even degree. By
Theorem 2.6, there is an Eulerian circuit in G + {v, w}. Since this circuit
includes the edge {v, w} at some point, we may as well suppose that it starts
v, w, · · · , v; we then obtain an Eulerian trail w, · · · , v in G by deleting {v, w}
from the circuit. If v and w are adjacent in G, the argument is similar, but
instead we add a new vertex x as well as the edges {v, x} and {w, x}. (This
is to avoid having multiple edges.) Since all the degrees in the new graph are
even, it has an Eulerian circuit. Since this circuit visits x only once, we may
as well suppose that it starts v, x, w, · · · , v; again, deleting the superfluous
edges gives an Eulerian trail in G.
Example 2.9. What is the condition on n for the graph Kn to have:
an Eulerian circuit?
an Eulerian trail?
neither?
32 Introduction to Graph Theory
Since such a walk never repeats a vertex except for returning to where it
began, it can’t possibly repeat an edge (here we need the assumption that
there are more than 2 vertices). So the vertices and edges involved in the
walk form a cycle. Thus we can rephrase the definition more simply: G is
Hamiltonian if and only if it contains a spanning cycle (recall that “spanning”
just means “using all the vertices of G”). In other words, G is Hamiltonian
if and only if you can delete some of the edges of G (without deleting any of
the vertices) and obtain a graph isomorphic to Cn , where n is the number of
vertices of G. More precisely, “some of the edges” could read “all but n of
the edges”, since Cn has n edges. Despite the fact that this condition is as
easy to state as the existence of an Eulerian circuit, it is significantly more
difficult to give a criterion for when it holds.
One obvious principle is that if you add more edges to a Hamiltonian graph,
it remains Hamiltonian. Heuristically speaking, the more edges G has, the
more likely it is that n of them form a cycle.
Example 2.11. For any n≥ 3, the complete graph Kn is certainly Hamil-
tonian. In fact, since all n2 possible edges are present, you can write down
the vertices in any order you like, and walk from the first to the second to
the third and so on to the last and back to the first again. So there are n!
walks which satisfy the Hamiltonian condition. (The number of spanning
n!
cycles is 2n = (n−1)!
2
, because every spanning cycle gives rise to 2n different
Hamiltonian walks, as explained in Remark 1.32.)
CHAPTER 2. SPECIAL WALKS IN GRAPHS 33
a d
Indeed, the only two cycles in this graph are the visible 3-cycles, so there
is no cycle containing all the vertices. (There is an obvious Eulerian circuit
a, b, c, d, e, c, a, but that is not a cycle because it repeats the vertex c.)
Example 2.13. The graph of vertices and edges of a cube is Hamiltonian:
The same holds for the other regular polyhedra: tetrahedron, octahedron,
dodecahedron, and icosahedron. The origin of the term “Hamiltonian” was
a puzzle invented by the Irish mathematician Hamilton, in which you had to
find a spanning cycle in the vertices and edges of a dodecahedron.
Example 2.14*. Generalizing the ordinary 3-dimensional cube, there is an
m-dimensional cube for any positive integer m, which sits in Rm . Its vertices
are the m-tuples (x1 , x2 , · · · , xm ) where every xi is either 0 or 1 (hence there
are 2m vertices). Two such m-tuples are adjacent if and only if they differ in
exactly one coordinate. Forgetting about the m-dimensional object itself, the
vertices and edges form a graph called the cube graph Qm . These cube graphs
have a natural recursive structure: the vertices where the last coordinate is
0 form a subgraph isomorphic to Qm−1 , as do the vertices where the last
coordinate is 1. So we can construct Qm by taking two copies of Qm−1 and
joining corresponding vertices. When m = 3, this is the standard way to draw
a perspective picture of a cube: draw two squares and join the corresponding
vertices. Here is a picture of Q4 constructed in the same way:
Example 2.15. The complete bipartite graph Kp,q , for positive integers p
and q, is the graph with vertex set {1, 2, · · · , p + q} where any vertex m ≤ p
is adjacent to any vertex m′ > q, but no two vertices ≤ p are adjacent, nor
are any two vertices > p. Thus the vertices fall into two parts (hence the
name “bipartite”), {1, 2, · · · , p} and {p + 1, p + 2, · · · , p + q}, where there are
no edges between vertices in the same part but every possible edge between
vertices in different parts. For instance, here is a picture of K5,3 .
1 2 3 4 5
6 7 8
In every cycle in Kp,q , the vertices must alternate between one part and the
other. So it is impossible for there to be a spanning cycle unless there are the
same number of vertices in both parts, i.e. p = q. This shows that when p 6= q,
Kp,q and all its spanning subgraphs are not Hamiltonian. Up to isomorphism,
a spanning subgraph of Kp,q is any graph where the vertices can be divided
into two parts of sizes p and q in such a way that no two vertices in the same
part are adjacent; such graphs are called bipartite. So any bipartite graph
where the two parts have different numbers of vertices is not Hamiltonian.
(This doesn’t mean, by the way, that a bipartite graph where the two parts
have the same number of vertices is Hamiltonian; for p ≥ 2, the complete
bipartite graph Kp,p is obviously Hamiltonian, but subgraphs of it need not
be.)
CHAPTER 2. SPECIAL WALKS IN GRAPHS 35
(2) More generally, for any nonempty subset S of the set of vertices of G,
the graph G − S obtained by deleting all the vertices in S and their
edges has at most |S| connected components.
The proof of part (2) is similar: instead of the fact that C − v is connected,
we use the fact that C − S has at most |S| connected components, which is
clear from a picture of a cycle; this implies that G − S must have at most
|S| connected components also.
A nonempty set S of vertices such that G − S has more than |S| connected
linked 5-cycles: if you delete only vertices in the outer 5-cycle or only ver-
tices in the inner 5-cycle the graph stays connected; and if you delete m1 ≥ 1
outer vertices and m2 ≥ 1 inner vertices, then the outer cycle breaks into
at most m1 components and the inner cycle breaks into at most m2 compo-
nents, so the graph as a whole has at most m1 + m2 components. But the
Petersen graph is not Hamiltonian, as may be seen as follows. Suppose there
is a cycle C in the Petersen graph using all 10 vertices, and hence having
10 edges. At each vertex, C must use two of the three edges: in particular,
at each outer vertex, C must use at least one of the two edges in the outer
cycle. So C must contain either 4 of the edges of the outer cycle, or 3 of the
edges (two meeting at a vertex and the other disjoint from these). The choice
of which edges in the outer cycle belong to C determines which outer-inner
edges belong to C, and hence which edges in the inner cycle belong to C; it
is easy to check that both cases result in contradictions.
In the other direction, here is a sufficient, but not necessary, condition for a
graph to be Hamiltonian.
Theorem 2.19 (Ore’s Theorem). Let G be a connected graph with n vertices
where n ≥ 3. If every non-adjacent pair of vertices {v, w} satisfies deg(v) +
deg(w) ≥ n, then G is Hamiltonian.
Our situation now is that G is not Hamiltonian, but for some non-adjacent
pair of vertices {v, w}, G + {v, w} is Hamiltonian. Let C be a spanning
cycle of G + {v, w}; since C is not contained in G, it must use the edge
{v, w}. So C − {v, w} is a spanning path in G with end-vertices v and w.
CHAPTER 2. SPECIAL WALKS IN GRAPHS 37
Since all the vertices adjacent to v1 except v2 are of the form vi+1 for some
i ∈ A, |A| = deg(v1 ) − 1. Similarly all the vertices adjacent to vn except
vn−1 are of the form vi for some i ∈ B, so |B| = deg(vn ) − 1. Moreover,
A ∪ B ⊆ {2, 3, · · · , n − 2}, so |A ∪ B| ≤ n − 3. Hence
|A ∩ B| = deg(v1 ) + deg(vn ) − 2 − |A ∪ B| ≥ n − 2 − (n − 3) = 1.
So far we have been treating all the edges of a graph on the same footing, but
in many applications you need to weight the edges to take account of their
38 Introduction to Graph Theory
Note that d(v, v) = 0 (the length-0 walk from v to itself has weight 0). It is
clear that any minimal walk from v to w must be a walk along a path, since
visiting a vertex twice would increase the weight needlessly. So d(v, w) could
also be defined as the minimal weight of a path with end-vertices v and w.
Remark 2.23. An ordinary un-weighted graph can be viewed as a weighted
graph by saying that all edges have weight 1. Then the weight of a walk is
just its length, and the distance d(v, w) is just the smallest number of edges
you have to traverse to get from v to w. The term “distance” should arguably
be restricted to this un-weighted context; its use in the weighted context is
influenced by those applications where the weights of the edges are literally
distances (between cities, and so forth).
We will represent weighted graphs by labelling each edge with its weight
(with no attempt to make the visual length of the edge proportional to it).
Example 2.24. Consider the following weighted graph.
B 2 C
1 5
A 4 1 Z
8 4
D
To find d(A, D), we need to consider walks from A to D (and we need not
consider walks which repeat a vertex). The walk along the edge {A, D} itself
CHAPTER 2. SPECIAL WALKS IN GRAPHS 39
Remark 2.25. Notice that in Example 2.24, since the edge {A, D} does
not give a minimal walk from A to D, it can never occur in any minimal
walk: it is always preferable to get from A to D via B and C. So for the
purposes of minimal walks, the edge {A, D} may as well be deleted. From
the opposite point of view, you could imagine that we are always dealing
with complete graphs, where some of the edges have such large weights that
they are guaranteed never to be used.
(3) Let v be the vertex which has most recently acquired a permanent
label, and let ℓ be that label.
(5) Of all the vertices which have temporary labels, choose one whose label
is smallest, and make that the permanent label of that vertex.
At the end of this algorithm, the permanent label of every vertex v is guar-
anteed to be d(A, v). We will not give a rigorous proof, but here is the idea.
At any point in the algorithm, the label of a vertex w is the minimum weight
of all walks from A to w which you have ‘considered so far’ (only implic-
itly considered, because the point of having the algorithm is that you don’t
actually need to consider all these walks). The label becomes permanent
at a point where you are certain that there are no other walks with smaller
weights. In Step (4), you are ‘considering’ walking from A to w by first doing
a minimal walk from A to v and then using the edge {v, w}. This walk has
weight ℓ + f ({v, w}), so you have to decide whether that is smaller than the
previously known minimum k; if it is, it becomes the new minimum.
Example 2.26. Let us apply the algorithm to the graph in Example 2.24.
(Of course, there is no need for the algorithm in such a small example, but
it will illustrate the general procedure.) We will put temporary labels in
parentheses and permanent labels in square brackets. The first time we
come to Step (4), v is A itself which has the label [0]; neither of the adjacent
CHAPTER 2. SPECIAL WALKS IN GRAPHS 41
vertices has a label yet, so they both get temporary labels under case (b).
(1)
B 2 C
1 5
[0] A 4 1 Z
8 4
D
(8)
Step (5) then makes the label of B permanent, since it is the smallest tempo-
rary label. Thus the next time we come to Step (4), v is B. Of the vertices
adjacent to B, vertex A is left alone under case (a); vertex C now gets a
temporary label under case (b); and vertex D gets a new temporary label
under case (c), because 1 + 4 is smaller than 8.
[1] (3)
B 2 C
1 5
[0] A 4 1 Z
8 4
D
(5)
Step (5) then makes the label of C permanent, so it is the new v; the next
pass through Step (4) gives Z a label, and changes the label of D again.
[1] [3]
B 2 C
1 5
[0] A 4 1 Z (8)
8 4
D
(4)
Step (5) then makes the label of D permanent; the next pass through Step (4)
leaves the label of Z unchanged, and the final Step (5) makes that permanent
also. The labels now display the distances from A to each vertex.
42 Introduction to Graph Theory
A good feature of Dijkstra’s algorithm is that it not only tells you the min-
imum weight of a walk from A to Z, it allows you to find all walks which
attain this minimum. The reason is that every minimal walk from A to Z
must be of the form A, · · · , Y, Z where A, · · · , Y is a minimal walk from A to
Y . Moreover, the only possible vertices Y which can occur in this way are the
vertices which are adjacent to Z and satisfy d(A, Y ) + f ({Y, Z}) = d(A, Z).
We can then use the same principle to find the minimal walks from A to Y ;
there is no chance of getting into a loop in this recursion, because d(A, Y ) is
less than d(A, Z).
Example 2.27. Consider the following weighted graph, to which Dijkstra’s
algorithm has already been applied.
[1] [4]
B 5 C
1 3
[0] A 3 1 Z [7]
3 4
D
[3]
To find all the minimal walks from A to Z, we note that the second-last vertex
Y could be either C or D, because both of these satisfy d(A, Y )+f ({Y, Z}) =
d(A, Z). In the minimal walks from A to C, the second-last vertex can only
be D, because
d(A, B) + f ({B, C}) = 1 + 4 > 4, d(A, Z) + f ({Z, C}) = 7 + 3 = 10 > 4.
Similarly, in the minimal walks from A to D the second-last vertex can only
be A itself. Thus
the unique minimal walk from A to D is A, D;
the unique minimal walk from A to C is A, D, C;
so the two minimal walks from A to Z are A, D, Z and A, D, C, Z.
Note that this method finds the walks ‘backwards’. It is tempting to try to
find minimal walks ‘forwards’ by starting at A and following the smallest-
weight edges; but Example 2.27 shows that this wouldn’t always work. (To
make it workable we would need to know the distances d(B, Z) for all vertices
B, i.e. we would need to have run Dijkstra’s algorithm for Z instead.)
CHAPTER 2. SPECIAL WALKS IN GRAPHS 43
Remark 2.28. We will not go into the details of running times for these
algorithms, but it is easy enough to see that if n is the number of vertices,
both Dijkstra’s algorithm and the above method of finding minimal walks
have running times which grow no faster than n2 . The reason is that they
both essentially consist of a loop which is executed no more than n times,
and in each pass through the loop, any vertex is considered at most once.
The names come from the imagined situations of a postman doing the rounds
of various streets, and a salesman visiting various potential clients. (The
postman is Chinese in honour of the graph theorist Mei-Ko Kuan.) As with
the Eulerian and Hamiltonian problems, the starting point can be arbitrary.
Notice that in the Chinese Postman Problem, the walk has to use every edge
of the graph, but is allowed to use an edge more than once. It is obvious that
the weight of such a walk is at least the sum of the weights of all the edges.
So if the graph is Eulerian, the solutions of the Chinese Postman Problem
are exactly the Eulerian circuits (which we already have a way of finding). If
the graph is not Eulerian, i.e. it has some vertices of odd degree, then there
will inevitably be some back-tracking in the walk. The idea of the general
solution is to find the distances between the odd-degree vertices by Dijkstra’s
algorithm, and use that information to decide what is the most economical
back-tracking to do. Here is a precise result in the case of graphs which have
an Eulerian trail (see Theorem 2.8).
Proof**.
P Let E be the set of edges of G; then the proposed walk has weight
e∈E f (e) + d(v, w). We need to prove that any walk in G which returns
to its starting point and uses every edge has weight at least as large as this.
The proof is easier if we allow ourselves to talk about multigraphs. Imagine a
duplicate copy of the set of vertices in G, on which we construct a multigraph
G′ by following the walk in G and drawing an edge in G′ as the corresponding
edge in G is used – thus edges in G which are used more than once correspond
to multiple edges in G′ . We also give each edge in G′ the same weight as the
corresponding edge in G. Our walk in G then corresponds to an Eulerian
circuit in the multigraph G′ , and its weight is the weight of G′ . For the
same reason as for ordinary graphs, the degrees of an Eulerian multigraph
are all even, so every degree in G′ is even (we continue to define the degree
of u as the number of edges ending at u). Now we form a new (possibly
not connected) multigraph G′′ by deleting from G′ one copy of every edge
′′
of G. The weightP of G expresses the amount by which the weight ′′of our
walk exceeds e∈E f (e), so we need to prove that the weight of G is at
least d(v, w). It clearly suffices to prove that there is a walk from v to w in
G′′ . Since degG′′ (u) = degG′ (u) − degG (u) for every vertex u, we know that
degG′′ (v) and degG′′ (w) are odd while degG′′ (u) is even for every other vertex
u. So we construct our new walk in G′′ as follows. Starting from v, we use
any edge from v to an adjacent vertex v1 in G′′ (there must be such an edge
since degG′′ (v) is odd); if the vertex v1 is not w, then we can imagine deleting
the edge we used, which makes degG′′ (v) even and degG′′ (v1 ) odd, and then
we can continue the walk with any edge from v1 to an adjacent vertex v2 ,
and so on; we must eventually reach w, because we can’t go on deleting edges
indefinitely.
Example 2.30. Return to the graph from Example 2.27, in which B and C
are the odd-degree vertices.
Its weight is
CHAPTER 2. SPECIAL WALKS IN GRAPHS 45
In the complete graph K there is no reason to visit a vertex more than once,
so the Travelling Salesman Problem in K amounts to finding a minimum-
weight spanning cycle. The brute-force approach would be to consider every
spanning cycle. Unfortunately, as seen in Example 2.11, the number of span-
ning cycles in Kn is (n−1)!
2
, which grows super-exponentially as the number of
vertices n increases. We are thus a long way from the situation of the Chinese
Postman Problem, where there is a solution with cubic growth (see Remark
2.31). In fact, it is widely believed that there is no algorithm for solving the
Travelling Salesman Problem whose running time has fixed-power growth: if
one was discovered, it would revolutionize mathematics and computing. We
will say a little more about this problem in the next chapter.
Chapter 3
Trees
One of the earliest uses of graph theory outside mathematics was determining
the possible molecular structures of various hydrocarbons; one of the most
recent has been determining the possible evolutionary relationships between
species based on their genetic closeness. In both cases, graphs without cycles
are particularly important.
1 2
3 4
5 6
47
48 Introduction to Graph Theory
Example 3.3. Of the graphs we have seen previously, the path graph Pn is
a tree for all n. The complete bipartite graph K1,q is also a tree for all q:
1 1 1
···
2 2 3 2 3 4
K1,1 = P2 K1,2 K1,3
The cycle Cn and the complete graph Kn for n ≥ 3 are obviously not trees.
Example 3.4. The list in Example 1.14 shows that any tree with 4 vertices
either is a path (hence is isomorphic to P4 ) or has a vertex which is adjacent
to all the others (hence is isomorphic to K1,3 ).
One reason that trees are so nice to work with is that if you remove a leaf
from a tree (and its single edge), what you have left is still a tree; this means
that trees are ideally suited for proofs by induction on the number of vertices.
Before we can use this, however, we need to ensure that leaves always exist.
Theorem 3.5. Every tree T with ≥ 2 vertices has at least two leaves.
Proof. Among all the paths in the graph T , there must be one which is of
maximal length; so if the vertices of the path are labelled v1 , v2 , · · · , vm in
order, with v1 and vm being the end-vertices, there is no path with more than
m vertices. Since T is not just a single vertex, it is clear that the maximal
m is at least 2. Now we claim that v1 is a leaf. Since we know that v1 is
adjacent to v2 , this amounts to saying that v1 is not adjacent to any other
vertex of T . But v1 can’t be adjacent to any vk with k ≥ 3, because then
we would have a cycle with vertices v1 , v2 , · · · , vk , and T contains no cycles.
Also v1 can’t be adjacent to any vertex v which is not one of the vi ’s, because
then we would have a path with vertices v, v1 , v2 , · · · , vm , contradicting the
maximality of m. So v1 is a leaf, as claimed; the same argument shows that
vm is a leaf, so T has at least two leaves.
We can now prove that trees and forests are exactly the graphs which attain
the minimum numbers of edges found in Theorem 1.37.
CHAPTER 3. TREES 49
We can deduce the “only if” direction of the general result. If G is a forest,
then its connected components G1 , G2 , · · · , Gs are all trees. So if Gi has
ni vertices, it must have ni − 1 edges. Thus the number of edges in G is
(n1 − 1) + · · · + (ns − 1) = (n1 + · · · + ns ) − s = n − s.
Now we prove the “if” direction. Suppose that k = n − s, and let {v, w} be
any edge of G (if G has no edges, it is clearly a forest). If we delete this edge,
the graph G − {v, w} has n − s − 1 edges. According to Theorem 1.37, this is
less than the minimum number of edges a graph with s connected components
can have, so G − {v, w} must have more connected components than G. The
only way this is possible is if v and w are not linked in G − {v, w}, i.e. {v, w}
is a bridge. So we have shown that every edge of G is a bridge. We saw in
Theorem 1.34 that this is equivalent to saying that no edge of G is contained
in a cycle, so G contains no cycles and is a forest.
Theorem 3.6 means that classifying trees with n vertices is the same as
classifying connected graphs with n vertices and n − 1 edges.
Example 3.7. One of the major problems in 19th century chemistry was to
find the structure of various molecules, knowing only the molecular formula,
i.e. how many atoms of various elements the molecule contained. Molecules
were thought of as connected graphs, where the atoms were the vertices and
the bonds between atoms were the edges. Each element had a known valency,
which in graph-theoretic terms meant that the degree of all atoms of that
element was the same. For instance, in hydrocarbons every vertex is either
a carbon atom (C) of degree 4 or a hydrogen atom (H) of degree 1. Suppose
50 Introduction to Graph Theory
we know that a molecule has formula Ck H2k+2 , for some positive integer k;
this means that the graph has k vertices of degree 4 and 2k + 2 vertices of
degree 1. We can then work out the number of edges by the Hand-shaking
Lemma: it is 4k+2k+2
2
= 3k + 1. Since this is exactly 1 less than the number
of vertices, Theorem 3.6 implies that the graph is a tree. (This rules out
the possibility of cycles of carbon atoms, which give rise to special chemical
properties.) The hydrogen atoms are the leaves of the tree, so if you delete
them all and consider just the carbon atoms you still have a tree. In fact,
if you know the tree formed by the carbon atoms, you can reconstruct the
whole molecule by joining hydrogen atoms where necessary to make the C
degrees equal 4. For instance, there are only two possible structures for a
molecule with formula C4 H10 :
H
H C H
H H H H H H
H C C C C H H C C C H
H H H H H H H
Both stuctures do exist: the molecules are called butane and isobutane.
(2) (To arrive here, we must have n ≥ 3.) Let ℓ be the smallest (in nu-
merical order) of all the leaves of T . Define the first term p1 to be the
unique vertex of T which is adjacent to ℓ.
2
3 4
5 6
The crucial point is that we can reconstruct a tree uniquely from its Prüfer
sequence. The easiest way to see this is to write down a recursive algorithm
going in the opposite direction to the above algorithm.
(1) If n = 2, output the tree with vertex set V and a single edge joining
the two vertices.
(2) (To arrive here, we must have n ≥ 3.) Let ℓ be the smallest element of
V which does not occur in the sequence.
(3) Recursively call REVERSE PRÜFER with the smaller set V \ {ℓ} and
the shorter sequence (p2 , · · · , pn−2 ); let T ′ be the resulting tree.
(4) Form a tree T with vertex set V by adding to T ′ the new vertex ℓ and
a single new edge joining ℓ to p1 .
CHAPTER 3. TREES 53
Example 3.12. Suppose that V = {1, 2, 3, 4, 5} and the sequence is (4, 4, 2).
The algorithm tells us that our tree T is obtained by attaching the leaf 1 to
the vertex 4 of the tree T ′ , where T ′ is the output of the algorithm applied
to the set {2, 3, 4, 5} and the sequence (4, 2). In turn, T ′ is obtained by
attaching the leaf 3 to the vertex 4 of the tree T ′′ , where T ′′ is the output of
the algorithm applied to the set {2, 4, 5} and the sequence (2). Finally, T ′′
is obtained by attaching the leaf 4 to the vertex 2 of the unique tree with
vertex set {2, 5}. So T is:
3
2 4
5 1
Proof. Let X be the set of all trees with vertex set {1, 2, · · · , n}, and let
Y be the set of all sequences (p1 , p2 , · · · , pn−2 ) where each pi belongs to
{1, 2, · · · , n}. The PRÜFER SEQUENCE algorithm gives us a function f :
X → Y , taking each tree to its Prüfer sequence. The REVERSE PRÜFER
algorithm gives us a function g : Y → X, taking a sequence (and the set
{1, 2, · · · , n}) and constructing a tree. From the definitions of the algorithms,
it is clear that these functions are inverse to each other, i.e. if you apply
REVERSE PRÜFER to the output of PRÜFER SEQUENCE you do recover
the original tree, and if you apply PRÜFER SEQUENCE to the output of
REVERSE PRÜFER you do recover the original sequence. (You could prove
this formally by induction.) So X and Y are in bijection, and |X| = |Y |. But
clearly |Y | = nn−2 , because there are n choices for each of the n − 2 terms.
Cayley’s Formula follows.
As we have seen, a graph with the smallest possible number of edges subject
to being connected is a tree. So for a general connected graph G, we consider
the spanning trees of G, i.e. the subgraphs which use all the vertices of G,
are themselves connected, and contain no cycles.
Theorem 3.13. Every connected graph G contains a spanning tree.
Note that you can’t find a spanning tree by simply deleting all edges of G
which belong to cycles in G: after all, it is perfectly possible that every edge
belongs to a cycle. What the above proof does is different: it deletes an edge
of a cycle, and then re-considers the cycles in that smaller graph (some of
the cycles in G will have been removed by the deletion), and so on. At any
stage, the choices for which edge you can delete depend on which edges you
have deleted so far, so there can be many different spanning trees.
Example 3.14. Suppose that G is the graph
b w
a x
c z
d y
e v
Two of the many spanning trees of G are:
b w b w
a x a x
c z c z
d y d y
e v e v
Most of the important algorithms for finding spanning trees take the reverse
approach: they start with a single vertex and add edges. As long as every
added edge joins a vertex already in the tree with a new vertex, it is obvious
that no cycles are created.
CHAPTER 3. TREES 55
The reason for the name “search” is the idea that this algorithm searches for
the vertices of G: first finding all the vertices adjacent to v1 , then finding all
the unfound vertices adjacent to those, and so on until all vertices have been
found. Each time Step (4) is executed, the new vertices are added to T along
with the edges which record ‘through which earlier vertex they were found’.
Note that because of the choice in ordering the new vertices, the resulting
spanning tree is not unique.
Example 3.15. Let us apply this algorithm to the graph G in Example
3.14, with v1 = a. The first pass through Step (3) defines v2 = c and v3 = b,
say. The order of these two vertices is arbitrarily chosen, but it affects what
happens next, because on the next pass through Step (3) it is the vertices
adjacent to v2 which get added, say v4 = d, v5 = e, v6 = z. The earliest vi
which is adjacent to an unused vertex is now v5 , so v7 = v, v8 = w. Then v6
is responsible for naming the last two vertices v9 = x and v10 = y. Thus the
order of the vertices is a, c, b, d, e, z, v, w, x, y, and the spanning tree is:
b w
a x
c z
d y
e v
56 Introduction to Graph Theory
Proof. We already know by Theorem 1.34 that there is a path between v and
w, so the content of this result is the uniqueness. The idea is simple: if there
were two different paths, you could patch appropriate pieces of them together
to form a cycle, contradicting the fact that the graph is a tree. However, the
notation involved in writing this out is a bit messy, so we will disguise the
idea by giving an induction proof. Let n be the number of vertices; if n = 1
then v and w are the same and there is nothing to prove. If n ≥ 2 and the
result is known for smaller trees, then let ℓ be any leaf of T . If v = w = ℓ
there is obviously a unique path between v and w, namely the path with no
edges and single vertex ℓ. If neither v nor w equals ℓ, then we know by the
induction hypothesis that there is a unique path in T − ℓ with end-vertices v
and w. But clearly no path with end-vertices v and w can include ℓ as one of
its other vertices, because deg(ℓ) = 1; so there is a unique path in T between
v and w. The remaining cases are v = ℓ, w 6= ℓ and v 6= ℓ, w = ℓ; these
are symmetric, so we can assume v = ℓ, w 6= ℓ. If u is the unique vertex
adjacent to v, then any path whose first vertex is v must have u as its second
vertex. By the induction hypothesis, there is a unique path in T − ℓ with
end-vertices u and w, so there is a unique path in T between v and w. The
induction step is complete.
Proof*. Let dT (v1 , v) denote the length (i.e. number of edges) of the unique
path between v1 and v in T , and let dG (v1 , v) denote the length of the shortest
possible path between v1 and v in G. It is obvious that dG (v1 , v) ≤ dT (v1 , v)
for all v. We suppose for a contradiction that dG (v1 , v) < dT (v1 , v) for some
v; we can choose v so that dG (v1 , v) is minimal, subject to this inequality
holding. Since dT (v1 , v) ≥ 1, v is not v1 . Let w be the vertex ‘through which’
v was added to the tree T ; then w is adjacent to v on the path in T between
v1 and v, so dT (v1 , w) = dT (v1 , v) − 1. Let u be the vertex adjacent to v on
some shortest-possible path in G between v1 and v. Then the part of the path
from v1 to u must also be as short as possible, so dG (v1 , u) = dG (v1 , v) − 1.
By our minimality assumption, we have
This implies that in the progress of the BFS algorithm, u was added to the
tree T before w; but then it is impossible for the vertex v to have been
added through w, because it would have already been added through u if not
through some other vertex.
(3) Let m be the largest number for which vm has been defined. Let ℓ ≤ m
be the largest number such that there are vertices adjacent to vℓ which
are not in T . Choose one of these vertices and call it vm+1 .
(4) Add to T the vertex vm+1 , and the edge joining this vertex to vℓ .
One useful feature of a DFS spanning tree is that it contains some information
about all the edges of the graph.
Theorem 3.19. Let T be a spanning tree of the graph G obtained by ap-
plying the DFS algorithm, and let v1 , v2 , · · · , vn be the resulting ordering of
the vertices. If vi is adjacent to vj in G, with i < j, then vi lies on the path
between v1 and vj in T .
Proof*. Let vi1 , vi2 , · · · , vis be the successive vertices of the path from v1 to
vi , with i1 = 1 and is = i; from the construction it is clear that i1 < i2 <
· · · < is . Similarly, let vj1 , vj2 , · · · , vjt be the successive vertices of the path
from v1 to vj , with j1 = 1, jt = j, and j1 < j2 < · · · < jt . Let k be maximal
so that ik = jk ; the vertex vik is the ‘latest common ancestor’ of vi and vj ,
if you think of T as a family tree of descendants of v1 . We want to prove
that, given the adjacency of vi and vj in G, this latest common ancestor is vi
itself, i.e. k = s. Suppose for a contradiction that k < s. It is impossible for
k to be t, because that would imply that j < i, so the two paths diverge and
continue through the different vertices vik+1 and vjk+1 . From the fact that
the descendant vi of vik+1 was added to the tree before the descendant vj of
vjk+1 , we deduce that ik+1 ≤ i < jk+1 . But then consider the pass through
Step (3) of the algorithm which gave vjk+1 its name. At this point of the
algorithm, the value of m must have been jk+1 − 1, so vi was already in T ;
CHAPTER 3. TREES 59
and the value of ℓ must have been ik , because vik was the vertex to which the
new vertex vjk+1 became attached in T . But that contradicts the maximality
in the choice of ℓ, because i > ik and vi was also adjacent to a vertex not yet
in T , namely vj .
In the context of weighted graphs, one would prefer a minimal spanning tree,
i.e. one whose weight (the sum of the weights of the edges) is as small as
possible. The naive approach is to build the tree by starting with a single
vertex and adding one new edge at a time, where the edge has the smallest
weight of all those you can possibly add to the tree.
Example 3.20. Consider the weighted graph
B 5 C
1 5
A 4 6 Z
3 4
D
If we start with vertex A and want to add another vertex to our tree in a way
which minimizes the total weight of the edges, we should obviously choose
B rather than D. Having done that, the possible edges we can add in the
next step are {A, D}, {B, C}, and {B, D}; since {A, D} has the smallest
weight, we choose that. We cannot now add the edge {B, D}, because that
would form a cycle: to ensure that we still have a tree, we have to add a new
vertex along with each new edge. The possible edges are {B, C}, {C, D},
and {D, Z}; the last of these has the smallest weight, so that is the one we
add. Finally, we choose between the three edges ending at C: since {B, C}
and {C, Z} have the same weight, we arbitrarily choose {B, C} to reach a
spanning tree of weight 13.
B 5 C
1
A Z
3 4
D
It is easy to see that there is no spanning tree of weight less than 13, so we
have found a minimal spanning tree for our weighted graph.
60 Introduction to Graph Theory
Somewhat remarkably, this naive method always works, and is worth describ-
ing formally.
(3) Let m be the largest number for which vm has been defined. Consider
all the edges {vi , w} in G where i ≤ m and w is not in T ; choose one
of minimal weight, and let vm+1 be w.
(4) Add to T the vertex vm+1 , and the edge joining this vertex to vi .
With Prim’s Algorithm to help us find minimal spanning trees, we can say a
little more about the Travelling Salesman Problem. Remember that we had
reduced the problem to finding minimal spanning cycles in a Hamiltonian
weighted graph (in fact, we saw that we could assume the graph is complete,
but it doesn’t help much).
Proof. Let C be any spanning cycle in H. Then the weight of C equals the
weight of C − v plus the sum of the weights of the two edges of C at v. Since
C − v is a path and contains all the vertices of H − v, it is a spanning tree
of H − v, so its weight is at least equal to the weight of a minimal spanning
tree of H − v. Trivially, the sum of the weights of the two edges of C at v
is at least equal to the sum of the two smallest weights of edges of H at v.
The result follows.
Theorem 3.22 gives us a lower bound for the weight of a minimal spanning
cycle, or rather one lower bound for each vertex, which can be computed
efficiently. However, there is no guarantee that equality holds.
Example 3.23. If K is the complete weighted graph of Example 2.32, then
the weight of a minimal spanning tree in K − A is 7, and the two smallest
weights of edges at A are 1 and 4, so the lower bound provided by Theorem
3.22 is 12. Similarly, the lower bounds obtained by considering the vertices
B, C, D, and Z are 11, 11, 12, and 13 respectively; all well short of the
actual weight of a minimal spanning cycle of K, which we saw is 16.
62 Introduction to Graph Theory
1 4
Since G has five vertices, any spanning tree must have four edges, so it must
be obtained from G by deleting two edges. Since a spanning tree cannot
contain cycles, we need to delete one edge from each of the two 3-cycles.
Clearly we can choose any edge from the left-hand 3-cycle and any edge
from the right-hand 3-cycle, so G has 3 × 3 = 9 spanning trees. (These fall
into three isomorphism classes, but we are not concerned with isomorphism
classes at the moment.)
Example 3.25. Let G be the graph
1 2
3 4
To find a spanning tree, we need to delete two edges; of the 52 = 10 possible
pairs of edges, the only cases which would leave a cycle remaining are if
we deleted {1, 2} and {2, 4}, or {1, 3} and {3, 4}. So there are 10 − 2 = 8
spanning trees.
Example 3.26. If G is a tree, then of course the unique spanning tree of G
is G itself.
Example 3.27. If G = Cn , then we obtain a spanning tree (specifically, a
path) by deleting any edge, so there are n spanning trees.
Example 3.28. Another general class for which we have already answered
the question is that of the complete graphs Kn , for n ≥ 2. A spanning tree
of Kn is just the same thing as a tree with vertex set {1, 2, · · · , n}, and by
Cayley’s Formula there are nn−2 of these.
CHAPTER 3. TREES 63
The physicist Kirchhoff (who was interested in graphs through his study of
electrical circuits) found a general formula for the number of spanning trees,
which reveals an unexpected connection with the determinants of matrices.
Definition 3.29. Let G be a graph with vertex set {1, 2, · · · , n}. The
Laplacian matrix of G is the n × n matrix M whose (i, j)-entry is defined by
deg(i) if i = j,
mij = −1 if i =6 j and {i, j} is an edge of G,
0 if i 6= j and {i, j} is not an edge of G.
That is, the diagonal entries give the degrees of the vertices, and the off-
diagonal entries are −1 in positions corresponding to an edge and 0 elsewhere.
Example 3.30. The Laplacian matrix of the graph in Example 3.24 is
2 −1 −1 0 0
−1 2 −1 0 0
−1 −1 4 −1 −1 .
0 0 −1 2 −1
0 0 −1 −1 2
(1) M is symmetric (its (j, i)-entry equals its (i, j)-entry for all i, j).
(4) For any k, ℓ ∈ {1, 2, · · · , n}, let M k̂,ℓ̂ denote the (n−1)×(n−1) matrix
obtained from M by deleting the kth row and the ℓth column. Then
(−1)k+ℓ det(M k̂,ℓ̂ ) is the same for all values of k, ℓ.
Proof*. Part (1) is obvious from the definition. Because M is symmetric, its
rows are the same as its columns, so we only need to prove part (2) for a row,
64 Introduction to Graph Theory
say the ith row. By definition, the nonzero entries in the ith row are a single
entry of deg(i), and an entry of −1 for every vertex adjacent to i; since there
are deg(i) adjacent vertices, these entries cancel out and give zero, proving
part (2). Now suppose we multiply the matrix M by the n × 1 column vector
v whose entries are all 1. By the definition of matrix multiplication, M v
is another column vector whose ith entry is the sum of the entries in the
ith row of M ; we have just seen that this sum is always zero, so M v is the
zero vector. In other words, v belongs to the null space of the matrix M . A
standard linear algebra result says that a square matrix has a nontrivial null
space if and only if its determinant is zero, so part (3) follows.
The quantity (−1)k+ℓ det(M k̂,ℓ̂ ) is called the (k, ℓ)-cofactor of the matrix M .
Remark 3.32. These cofactors are what you multiply the matrix entries by
when calculating the determinant by expanding along a row or column. For
instance, the rule for calculating det(M ) by expanding along the first row is
det(M ) = m11 det(M 1̂,1̂ ) − m12 det(M 1̂,2̂ ) + · · · + m1n (−1)1+n det(M 1̂,n̂ ).
In our case there is no need to use this rule, since part (3) of Theorem 3.31
tells us that det(M ) is zero.
CHAPTER 3. TREES 65
Before we give the proof, here are some examples of how to use the Matrix–
Tree Theorem to count spanning trees.
Example 3.34. Let us apply the Matrix–Tree Theorem to find the number
of spanning trees in the graph G of Example 3.24, whose Laplacian matrix
M was found in Example 3.30. Since all the cofactors of M are guaranteed
to be the same, we may as well choose to delete the row and column with
the most nonzero entries; the (3, 3)-cofactor is
2 −1 0 0
−1 2 0 0
(−1)3+3 det(M 3̂,3̂ ) = det
0
.
0 2 −1
0 0 −1 2
When a matrix has block-diagonal form like this, its determinant is the prod-
uct of the determinants of the blocks. Since
2 −1
det = 2 × 2 − (−1) × (−1) = 3,
−1 2
det = .
66 Introduction to Graph Theory
The proof of the Matrix–Tree Theorem uses a fact about determinants which
you may not have seen. Recall that determinants are only defined for square
matrices, and that they are multiplicative in the sense that
whenever A and B are square matrices of the same size. There is a general-
ization of this multiplicativity property for possibly non-square matrices:
Theorem 3.37 (Cauchy–Binet Formula)*. Let A be an n × m matrix and
B an m × n matrix. Then
X
det(AB) = det(A−,J ) det(B J,− ),
J⊆{1,2,··· ,m}
|J|=n
where A−,J is the n × n matrix obtained from A by deleting all the columns
except those whose number lies in J, and B J,− is the n × n matrix obtained
from B by deleting all the rows except those whose number lies in J. (The
− in the superscripts is to indicate that there is no deletion of any rows in
forming A−,J or columns in forming B J,− .)
We will omit the proof. Note that if m = n, the only choice for J is all of
{1, · · · , n}, and for this J we have A−,J = A and BJ,− = B, which brings the
formula in line with (3.3). In general, there are m n
terms on the right-hand
side of the Cauchy–Binet Formula (in particular, if n > m the statement is
that det(AB) = 0).
(2) Let I be a subset of {1, 2, · · · , n} such that |I| = k. Let F I,− be the
k × k matrix obtained from F by deleting all the rows except those
whose number lies in I. If I contains all the vertices of some connected
component of G, then det(F I,− ) = 0.
Proof**. Part (1) is obvious, because the entries of each column of F are
a 1, a −1, and n − 2 zeroes. For part (2), let I ′ be a subset of I consisting
of all the vertices of some connected component of G. For convenience, we
continue to number the rows of the matrix F I,− by their row numbers from
F (that is, the elements of I) rather than changing their numbers to 1, · · · , k.
Let v be a row vector whose columns are indexed by the elements of I, in
which the ith entry is 1 if i ∈ I ′ and 0 otherwise. Then vF I,− is a 1 × k
row vector whose jth entry is the sum, as i runs over I ′ , of the ith entry
of the jth column of F I,− . If the edge ej does not belong to the connected
component with vertex set I ′ , all these entries are zero; if the edge ej does
belong to this connected component, the entries consist of a 1, a −1, and the
rest zeroes. In either case the sum is zero, so vF I,− is the zero row vector.
Thus the left null space of F I,− is nontrivial, proving that det(F I,− ) = 0 as
required.
is known for trees with fewer edges. Now by the same argument as in the
proof of Theorem 3.31, the fact that every column of F sums to zero implies
that all the determinants det(F î,− ) are the same up to sign, so it is enough to
show that just one of them equals ±1. We can therefore assume that vertex i
is a leaf of the tree G. Let i′ be the unique vertex which is adjacent to i, and
suppose that the edge {i, i′ } is ej . Then the jth column of F î,− has a single
nonzero entry, namely a ±1 in row i′ . By expanding along this column, we
conclude that
c′
det(F î,− ) = ± det(F i,i ,ĵ ),
where the meaning of the superscripts on the right-hand side is hopefully
clear. But F î,ĵ is the incidence matrix of the tree G − i, so by the induction
hypothesis, the right-hand side is ±1 as required.
Proof*. By definition
P of matrix multiplication and transpose, the (i, i′ )-
entry of F F t is kj=1 fij fi′ j . Since fij is zero unless i is an end of ej , the only
nonzero terms in this sum are those where i and i′ are ends of ej . If i 6= i′ and
{i, i′ } is not an edge, there are no such nonzero terms, so the entry is zero. If
i 6= i′ and {i, i′ } is an edge, then the unique nonzero term is 1 × (−1) = −1,
so the entry is −1. If i = i′ , there is a nonzero term of fij2 = 1 for every j
such that i is an end of ej , so the entry is deg(i). Thus in every case the
(i, i′ )-entry of F F t agrees with the (i, i′ )-entry of M .
we obtain
det(M n̂,n̂ ) = det((F n̂,− )(F n̂,− )t )
X
= det(F n̂,J ) det((F n̂,J )t )
J⊆{1,2,··· ,k}
|J|=n−1
X
= det(F n̂,J )2 ,
J⊆{1,2,··· ,k}
|J|=n−1
where the meaning of F n̂,J should by now be clear, and the last step uses the
fact that the transpose of a square matrix has the same determinant. Now
F −,J is the incidence matrix of a spanning subgraph GJ of G, namely the
one with edges ej for j ∈ J. Since GJ has n vertices and n − 1 edges, it is
either a tree or has more than one connected component. If GJ has more
than one connected component, then there is a connected component which
does not include the vertex n, and part (2) of Theorem 3.40 implies that
det(F n̂,J ) = 0. If GJ is a tree, then part (3) of Theorem 3.40 implies that
det(F n̂,J ) = ±1. So the nonzero terms in the above sum are all 1, and the
number of them is the number of spanning trees of G, as required.
Chapter 4
Colourings of Graphs
71
72 Introduction to Graph Theory
W 3 4 B
We have shown that the chromatic number χ(G) is 3. Since the colours
red, white, and blue could be shuffled arbitrarily, there are actually 3! = 6
different vertex colourings with these colours.
Proof. If G has a vertex colouring with 1 colour, then no two vertices can
be adjacent; conversely, if no two vertices are adjacent we can obviously
colour all vertices the same colour, which proves part (1). Now it is clear
that if we have n colours we can give every vertex a different colour. (In
scheduling terms, this corresponds to the easy but potentially uneconomic
option of having no simultaneous processes.) If G is a complete graph, i.e.
any two vertices are adjacent, then we are in fact forced to give every vertex
a different colour, so we cannot have a vertex colouring with fewer than n
colours. If G is not a complete graph, there must be two vertices v and w
which are not adjacent, so we can construct a vertex colouring with n − 1
colours by giving v and w the same colour and using the other n − 2 colours
for the other n − 2 vertices, one each. This proves part (2).
For part (3), any vertex colouring of G clearly restricts to give a vertex
colouring of the subgraph H, which implies the statement. The reason that
we have an inequality χ(H) ≤ χ(G) is that there could be vertex colourings
of H which cannot be extended to a vertex colouring of G: for instance,
because vertices which are not adjacent in H may be adjacent in the larger
graph G. However, in the special case where H is a connected component of
G, H and the rest of the graph are completely independent of each other; if
you have a vertex colouring of every connected component of G, then taken
together they form a vertex colouring of G. Part (4) follows.
74 Introduction to Graph Theory
B W
W B
Thus χ(Cn ) = 2 if n is even, and χ(Cn ) = 3 if n is odd.
W B W B W B ? ?
B W B B
W B W B
B W B W
76 Introduction to Graph Theory
Note that in this algorithm you colour all the red vertices (say) at once and
then never use the colour red again: to determine which vertices to colour
red, you choose the first uncoloured vertex and then whatever later vertices
you can allowably give the same colour to. The ordering by decreasing degree
is not necessary, but tends to reduce the number of colours which are needed.
Example 4.8. The vertices in the following graph have been ordered by de-
creasing degree. Use the Welsh–Powell Algorithm to give a vertex colouring.
v6 v7 v4
v9
v2 v1
v8 v10
v3 v5
There is an upper bound for the chromatic number of G in terms of ∆(G), the
maximum of the vertex degrees. (We assume that the vertex set is nonempty,
so ∆(G) make sense.) We start with a weak form of the result.
Another way to state this result is that χ(G) ≤ ∆(G) + 1. Note that equality
holds in the cases where G = Kn (since χ(Kn ) = n and ∆(Kn ) = n − 1) or
G = Cn for n ≥ 3 odd (since χ(Cn ) = 3 and ∆(Cn ) = 2). It turns out that
these are the only connected graphs for which equality holds.
set is {1, 2, · · · , d}. We want to extend this vertex colouring to the whole of
G (possibly after some modification). If G is not regular of degree d, then
deg(v) = δ(G) < d, so there is a colour not used among the vertices adjacent
to v, and we can colour v with this colour to obtain a vertex colouring of G
as required. So henceforth we may assume that G is regular of degree d. If
d = 2 this forces G to be a cycle (an even cycle, by assumption), in which
case we know the result. So we may also assume that d ≥ 3.
We now have various cases for the vertex colouring of G − v. To save words,
we will omit to say in the description of each case that the previous cases do
not hold, but that should always be understood.
In the remaining cases, every colour is used exactly once among the vertices
adjacent to v; we let vi denote the unique vertex adjacent to v with colour i.
In the remaining cases, we let vij denote the unique vertex in G − v which is
adjacent to vi and has colour j, for all i 6= j.
In the remaining cases, we let Hij denote the unique connected component
of the subgraph of G − v formed by vertices coloured i and j which contains
the particular vertices vi and vj .
CHAPTER 4. COLOURINGS OF GRAPHS 79
Case 5: for some distinct i, j, k, Hij and Hik have a vertex in common
other than vi . Let this vertex be u; its colour must clearly be i. Since u
has two neighbours coloured j on the path Hij and two neighbours coloured
k on the path Hik , we must have d ≥ 4; moreover, there must be a fourth
colour not used among the vertices adjacent to u. If we change u to this
fourth colour we disconnect Hij and Hik , putting us back in Case 3.
In the only case remaining, we know that all the vi ’s are adjacent to each
other. So the d neighbours of vi in G are exactly v and the other vj ’s; this
shows that G consists of just the vertices v, v1 , · · · , vd , and is a complete
graph. This contradicts our assumption, so the last case vanishes and the
induction step (remember that this was all inside the induction step!) is
finished.
80 Introduction to Graph Theory
If you think the proof of Brooks’ Theorem was long, you’ll be relieved that
we are only going to mention the most famous result about vertex colourings,
whose proof runs to hundreds of pages.
Proof. Part (1) follows from the fact that choosing a vertex colouring of G
is the same as independently choosing vertex colourings of all the Gi ’s. If
G is complete, then any vertex colouring must use different colours for all
the vertices. So the number of vertex colourings is the number of ordered
selections of n colours from t possibilities with repetition not allowed, which
is given by t(n) as stated in part (2).
However, there need not be any vertex which satisfies the condition in The-
orem 4.18. In such cases we may need the Sum Principle as well.
CHAPTER 4. COLOURINGS OF GRAPHS 83
4 3
If we try to use the same sort of Product Principle arguments to compute
PC4 (t), we run into a slight problem. If we first choose the colours of vertices
1, 2, and 3 (in t, t − 1, and t − 1 ways respectively), then when we come to
choose the colour of vertex 4 there could be either t − 1 or t − 2 possibilities,
depending on whether the colours we have chosen for vertices 1 and 3 are
the same or not. We need to consider each case separately:
So PC4 (t) = .
The logic of Example 4.19 can be applied more generally using the following
definition.
4 3 4
84 Introduction to Graph Theory
5 2
4 3
CHAPTER 4. COLOURINGS OF GRAPHS 85
Applying part (1) of Theorem 4.22 with the vertices 2 and 5, we see that
PC5 (t) is the sum of the chromatic polynomials of the following graphs:
1 1
5 2 2∗5
4 3 4 3
In both of these graphs, Theorem 4.18 applies to vertex 1. Hence their
chromatic polynomials are t(t − 1)(t2 − 3t + 3)(t − 2) and t(t − 1)(t − 2)(t − 1)
respectively, and we thus find that
PC5 (t) = t(t − 1)(t − 2)[(t2 − 3t + 3) + (t − 1)] = t(t − 1)(t − 2)(t2 − 2t + 2).
5 2
5 1∗2
4 3 4 3
The first of these being a tree, and the second isomorphic to C4 , their chro-
matic polynomials are t(t − 1)4 and t(t − 1)(t2 − 3t + 3) respectively, so
PC5 (t) = t(t − 1)[(t3 − 3t2 + 3t − 1) − (t2 − 3t + 3)] = t(t − 1)(t3 − 4t2 + 6t − 4),
for some nonnegative integers b1 , · · · , bn−1 and c1 , · · · , cn−2 (note that G[v, w]
has only n − 1 vertices). By part (2) of Theorem 4.22, we get
a b c
88 Introduction to Graph Theory
Since there are three students who need to meet lecturer b, there will have
to be at least three meeting timeslots. Can all the meetings be scheduled in
only three timeslots? Try to find an edge colouring of the graph with three
colours (say red, white, and blue). The conclusion is that the edge chromatic
Proof. By definition of ∆(G), there is some vertex where ∆(G) edges all
end; these edges must have different colours in any edge colouring, which
implies part (1). Parts (2) and (3) follow from the same reasoning as in the
case of vertex colourings (see parts (3) and (4) of Theorem 4.4).
A graph G has edge chromatic number 0 if and only if it has no edges, i.e.
∆(G) = 0. If ∆(G) = 1, i.e. the graph does have edges but no two edges ever
end at the same vertex, then obviously the edges can all be given the same
colour, so the edge chromatic number is 1. In these cases, the inequality in
part (1) of Theorem 4.28 is actually equality. But there are examples where
equality fails, i.e. graphs G which are not ∆(G)-edge-colourable.
Example 4.29. Consider the cycle graph Cn for n ≥ 3. An edge colouring
of Cn requires at least ∆(Cn ) = 2 colours. In any edge colouring with 2
colours, the colours must alernate as you go around the cycle; if n is odd,
this results in a contradiction, exactly as with vertex colourings of Cn . So
the edge chromatic number is actually the same as the (vertex) chromatic
number of Cn , namely 2 if n is even and 3 if n is odd.
CHAPTER 4. COLOURINGS OF GRAPHS 89
Now consider the case of even n. It is clear that χ′ (K2 ) = 1, so we can assume
that n ≥ 4. Since ∆(Kn ) = n − 1, we just need to prove that Kn is (n − 1)-
edge-colourable. But if we delete the vertex n, the result is the complete
graph Kn−1 , and we have just seen how to construct an edge colouring of
Kn−1 with colours 1, · · · , n − 1. It is clear from our construction that none
of the edges coloured with colour i ends at the vertex i. So when we add the
vertex n, we can colour the edge {i, n} with colour i for i = 1, · · · , n − 1, and
the result is an edge colouring of Kn with colours 1, · · · , n − 1. The proof is
finished.
90 Introduction to Graph Theory
Thus the whole universe of graphs is divide into two types, those for which
χ′ (G) = ∆(G) and those for which χ′ (G) = ∆(G) + 1. For instance, Exam-
ple 4.29 says that even cycles are the first type whereas odd cycles are the
second type; Theorem 4.30 says that complete graphs with an even number
of vertices are the first type, whereas complete graphs with an odd number
of vertices (at least 3) are the second type.
We will not give the general proof of Vizing’s Theorem, but some idea of the
argument is provided by the following result, which shows that all bipartite
graphs (including all trees and forests) are of the first type.
the result holds for bipartite graphs with fewer edges than G. Let {v, w} be
an edge of G. By the induction hypothesis, the graph G − {v, w} (which is
clearly still bipartite) has an edge colouring with ∆(G − {v, w}) colours, so it
has an edge colouring with ∆(G) colours. We choose such an edge colouring
of G − {v, w} where the colour set is {1, 2, · · · , ∆(G)}; we want to extend
this edge colouring to the whole of G (possibly after some modification).
Now starting from the vertex v = v0 , we can build a path in G−{v, w} where
the edges alternate between the colours i and j: the first edge {v0 , v1 } is the
unique edge ending at v of colour i, the next edge {v1 , v2 } is the unique edge
ending at v1 of colour j (if such an edge exists), the next edge {v2 , v3 } is the
unique edge ending at v2 of colour i (if such an edge exists), and so on until
the next required edge does not exist. This is indeed a path, because if any
vertex was ever repeated, we would have three edges ending at a vertex with
only two colours between them, contradicting the edge colouring property
(or if v0 was the repeated vertex, we would have two edges ending there with
the colour i, because we assumed the colour j is not used by any edge ending
there). If w belonged to this path, it would have to be the other end-vertex,
coming after an edge of colour j (there being no edge of colour i with which
to continue the path). But this would mean that w = vm for some even
integer m, and then we could add the edge {v, w} to form a cycle in G with
an odd number of vertices (namely, m + 1); since G is bipartite, it contains
no odd cycles, so this cannot be the case. So w does not belong to the path.
We can now exchange the colours i and j throughout the path; this clearly
still gives an edge colouring of G − {v, w}, in which the colour i is not used
among the edges ending at v or the edges ending at w. We can then colour
{v, w} with colour i, and we are finished.
92 Introduction to Graph Theory
Note that this proof gives a concrete recursive procedure for finding an edge
colouring of G with ∆(G) colours.
Since it is traditional for books to end with a wedding, we will use Theorem
4.33 to prove a special case of what is called the Marriage Theorem.
Example 4.35. Suppose there are n men and n women, and every woman
has a list of which of the men she would be happy to marry. Every woman’s
list has d men on it, and every man occurs on d lists, where d ≥ 1. Is there
a way to marry the men off to the women so that every woman gets one of
the men on her list? If we construct a graph G where the vertices are the
people and there is an edge between a man and a woman if the man is on
the woman’s list, then G is bipartite and regular of degree d. By Theorem
4.33, there is an edge colouring of G with d colours. Let i be any colour;
since the edges coloured i cannot have any men or women in common, there
are at most n of them. But the total number of edges is nd, so in fact there
must be exactly n edges of colour i for every i. Thus for every i, the edges
of colour i join n men to n women, and thus provide a way to marry them
off. So the answer is that there is not just one way; you can actually find d
different ways which have no weddings in common.