0% found this document useful (0 votes)
34 views43 pages

Discrete Mathematics: Swearword

The document discusses the author's reflections on the Discrete Mathematics curriculum, emphasizing its importance and expressing disappointment over the perceived inadequacies in the teaching materials. It critiques a specific textbook, referred to as the 'Monster,' for its overwhelming length and lack of clarity, while proposing a concise review of essential topics in Discrete Mathematics. The author aims to provide a more accessible and meaningful exposition of the subject, complete with examples and exercises.

Uploaded by

Jayabharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views43 pages

Discrete Mathematics: Swearword

The document discusses the author's reflections on the Discrete Mathematics curriculum, emphasizing its importance and expressing disappointment over the perceived inadequacies in the teaching materials. It critiques a specific textbook, referred to as the 'Monster,' for its overwhelming length and lack of clarity, while proposing a concise review of essential topics in Discrete Mathematics. The author aims to provide a more accessible and meaningful exposition of the subject, complete with examples and exercises.

Uploaded by

Jayabharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

DISCRETE MATHEMATICS

ALEXANDER BORISOVICH

Swearword

Bastard Fred, who had fun with his pistol


Shooting one by one city’s old spinsters,
Was, with public condemn,
Given an AKM,
And all spinsters were gone in an instant.1

I have always considered the standard college course of Discrete


Mathematics to be the only meaningful part of the lower-division math
curriculum. After all, logic – sets – induction – recursion – modular
arithmetic – combinatorics – probability – graph theory, however ele-
mentary, form a sapid chunk of real mathematics. So, when my son, as-
piring to a EECS career, was allowed to skip this course in anticipation
of a more advanced replacement, my parental pride was spiced with a
grain of disappointment. Respectively, when (for reasons not worth ex-
plaining) he was nonetheless, on short notice, required to demonstrate
his facility with these basics on the final exam of an ongoing course, I
viewed it as a potentially useful challenge, especially since some help
could come from me. Thus, I immediately located the detailed syllabus
for the class online, and googled up the required textbook. To my sur-
prise, in a few clicks, the free PDF file was downloaded on my laptop
and voila — the entire text opened to my stupefied view: I was looking
at a colorful Monster of one thousand letter-size pages long!
In its own way, it was a sweeping masterpiece. Indeed, what else is
left to do in the business of ruining one’s hope to understand mathe-
matics — a subject which proudly strives for ultimate clarity, natural
elegance, intellectual depth, economy of thought, and conciseness of
representation — when the traditional math circle content that fits
well in a middle-schooler’s sandbox is watered down by a flood of bib-
lical proportion to muddy the brain of a college student? Is there any
1
Reverse-engineered from a notable Russian version of an unremarkable English
limerick. In our age of prose poetry, the accents should help.
1
2 ALEXANDER BORISOVICH

more cunning way to humiliate the Queen of Sciences, and compro-


mise Her in the eyes of a sensible human being, than by ritualizing the
“formal mathematical proof” and then practicing it for nauseatingly
meticulous derivation of an infinitude of self-obvious trivialities? And
then, when the greater half of the Monster is already behind, by stun-
ning the reader with a new Principle, whose very name — pigeonhole
— speaks of its accessibility to a 5th-grader, and “proving” it in a
page-long argument sporting several displayed formulas?
As if to celebrate the success of this devilish enterprise, the records
of the Monster’s authors at the Math Genealogy Project show zero
number of descendants, while their professional websites, shining with
scores of educationists’ publications, testify to the authors’ integrity by
showing zero math productivity of any kind.
To be fair, the course’s professor contributed their bit by not follow-
ing the Monster religiously, but rather omitting from the syllabus a few
sections which had an unusually high concentration of genuine math-
ematical content (such as, e.g., the relation between path-counting in
graphs and matrix multiplication), thereby saving time for, say, (ap-
parently dear to their heart) fossilized tree-searching algorithms.
What the reader finds below is a result of my instinctive defensive
reaction. With numerous swearwords and the aforementioned limerick
(with no obvious connection at all) constantly reverberating in my
mind, I dedicated a week of my sabbatical time to sifting the waters
of the Monster for anything worthwhile, adding what had to be there
but wasn’t, learning what I never knew (e.g. the fossilized algorithms),
correcting some crap found here and there, and ended up with what I
propose as a sample of a new genre.
Doesn’t a 1000-page book merit a 40-page book review? Here it
is: A comprehensive monster text review — a concise, but compre-
hensive exposition of all the meaningful material to be included in a
semester-long college course of Discrete Mathematics. It is equipped
with a minimal but sufficient supply of examples and exercises, and
with complete proofs, though occasionally presented as problem sets.
Read, solve, learn, enjoy — and shoot down all the monsters!
DISCRETE MATHEMATICS 3

A COMPREHENSIVE MONSTER TEXT REVIEW

1. Logic and digital circuits


1. Operations with propositions. A proposition is a statement
which is either TRUE (T) or FALSE (F), but not both. E.g. “x=7” and
“The weather is good” are not propositions, but “2+2=5” is. We will
denote propositions by the letters p, q, . . . (“Boolean variables”). Here
are some operations with propositions, resulting in new propositions:
‚ negation p TRUE whenever p is FALSE
‚ conjunction p ^ q, TRUE whenever both p AND q are TRUE
‚ disjunction p_q, TRUE whenever one of p, q, or both are TRUE
(“non-exclusive OR”)
‚ exclusive “or” p ‘ q, TRUE whenever only one of p, q is (it is
denoted so because if T “ 1 and F “ 0, then p ‘ q is addition
modulo 2)
‚ conditional (or implication) p Ñ q, FALSE only when the “hy-
pothesis” p is TRUE, but the “conclusion” q is FALSE
‚ bi-conditional p Ø q, which is TRUE whenever p and q have
the same truth values
By the way, p Ñ q ” p _ q (“either the hypothesis is FALSE, or the
conclusion is TRUE”), where2 the equivalence ” of propositions means
that both expressions are simultaneously TRUE and simultaneously
FALSE for each possible combination of T/F values of the participating
variables. Such identities between logical expressions in, say, n Boolean
variables can be checked directly by “truth tables”, i.e. by comparing
the T/F values of the expressions for each of the 2n combinations of
the T/F values of the variables.
Exercise: Check the following equivalences by truth tables, or by
using the previously checked ones (and correct one, which is wrong):
p pq ” p, pp ^ qq ” p _ q, pp _ qq ” p ^ q,
pp ‘ qq ” p ‘ q, pp Ø qq ” p ‘ q, p”p‘T
p ^ pq _ rq ” pp ^ qq _ pp ^ rq, p _ pq ^ rq ” pp _ qq ^ pp _ rq
pp Ñ qq ı pq Ñ pq, pp Ñ qq ” p q Ñ pq, pq Ñ pq ” p p Ñ qq
The middle two in the top line are called De Morgan rules, while the
third line contains distributive laws. Of course, _ and ^ are also asso-
ciative and commutative.
2Note the convention on the “precedence order of operations”: p _ q ” r means
pp pq_ qq ” r since ”, along with Ø, are considered operations of the lowest order,
Ñ, _, ^, and ‘ of the next one, while of the highest.
4 ALEXANDER BORISOVICH

The propositions in the bottom line illustrate “Alice in Wonderland”:


direct proposition p Ñ q is not equivalent to the converse proposition
q Ñ p, but is equivalent to the contrapositive proposition q Ñ p.
The converse q Ñ p is likewise equivalent to the inverse proposition
p Ñ q. To prove a theorem p Ñ q “by contradiction” means to
prove pp ^ qq Ñ F (assuming that the hypothesis is TRUE but the
conclusion FALSE, arrive at a contradiction), while proving “by con-
traposition” means to prove the equivalent theorem q Ñ p instead
(assume that the conclusion is FALSE, and derive that the hypothesis
must be FALSE too).
n
Boolean functions. There are 22 Boolean functions spp1 , . . . , pn q
in n Boolean variables p1 , . . . , pn and taking Boolean values T {F : Each
can be specified by the choice of a value T {F for each of the 2n combi-
nations of the values of the variables. For CS purposes, it is convenient
to replace T and F with 1 and 0 respectively.
Every Boolean function can be expressed by using the operations
, _, ^. Indeed, for each string of input values pp1 , . . . , pn q (take for
example p1, 1, 0, 1, 0, 0, 1q, where n “ 7) write the unique conjunction
monomial containing either pi or pi for each i which takes the value
1 on this particular input. In our example, it is
p1 ^ p2 ^ p p3 q ^ p4 ^ p p5 q ^ p p6 q ^ p7 .
By taking the disjunction (_) of all such monomials over those inputs
for which our Boolean function takes the value s “ 1 (and not including
those for which it is 0) we obtain a Boolean expression representing
our function. E.g. sp0, 0q “ 0 and sp0, 1q “ sp1, 0q “ sp1, 1q “ 1 is
represented this way by p p1 ^ p2 q _ pp1 ^ p2 q _ pp1 ^ p2 q. Of course,
this is not economical, because the function is p1 _ p2 .
There are various ways of producing minimal (in some sense, which
can also vary) Boolean expressions for Boolean functions. As an exam-
ple, consider minimal disjunctive normal form, which we won’t define
here formally. Rather we will give an example illustrating both what it
is and how to construct one. One can think of the domain of a Boolean
function as the set of vertices of the unit cube in n-dimensional space
(because each vertex is specified by the string of n coordinates equal
0 or 1). For example, consider the Boolean function spp, q, rq equal
to 1 at the vertices marked black in Figure 1, and equal to 0 at the
“white” vertices. To produce a minimal disjunctive normal form, one
needs to cover the set of all black vertices by as few “faces” of the cube
(the “faces” can be of any dimension from 0 to n, and may overlap) as
possible in such a way that they don’t cover any white vertex. Thus,
DISCRETE MATHEMATICS 5

the 5 black vertices in the picture can be covered by two blue faces:
one of dimension 2 (on top), the other of dimension 1 (the vertical blue
line). The Boolean function, equal to 1 at the vertices of the top blue
face only, is simply q. The Boolean function, equal to 1 at the vertices
of the vertical blue edge, is p pq ^ r. Therefore q _ p p ^ rq (in general,
the disjunction of the conjunction monomials representing the faces) is
an economical Boolean expression for our function.

1
r

0
0 1 p

Figure 1

Note that the white vertices (defining the function spp, q, rq) can
be covered by two green edges, leading to the expression

s ” p q ^ rq _ pp ^ qq

and consequently

s” pp q ^ rq _ pp ^ qqq.

This representation of s is also “economical,” but not disjunctive (be-


cause it is not a disjunction of conjunction monomials).

3. Digital logic circuits. In 1930s, C. Shannon noticed the anal-


ogy between electrical switchboards and Boolean functions. Namely, a
Boolean expression involving ^ and _, but no negation, can be real-
ized as such an electric circuit and vice versa. For example, the circuit
shown in Figure 2, where each of the switches P, Q, R, S can be in ei-
ther the closed position (TRUE) or the open position (FALSE), will
light the bulb if and only if the Boolean expression P _ ppQ _ Rq ^ Sq
assumes the value TRUE.
6 ALEXANDER BORISOVICH

S
bulb
battery
element
R

Figure 2

Modern logical circuits use a variety of technologies. Regardless of


the technology, one can think of a circuit as a “black box” with n bi-
nary inputs p1 , . . . , pn and one binary output s, and encode the circuit’s
ability to transform the inputs into the output by the corresponding
Boolean expression. Our problem of constructing economical expres-
sions for Boolean functions translates into the task of realizing given
“black boxes” in equivalent circuits using fewer NOT, AND, and OR
gates (i.e. elementary circuits implementing these operations).
Thus, in our example in Figure 1, the two representations (“blue”,
and the negation of “green”) of the function s correspond to the circuits
shown in Figure 3.

p NOT
p
AND

AND q NOT OR NOT s


r

OR s AND
q r NOT

Figure 3

Exercise. Show that any logical circuit can be constructed using


only two gates (Boolean operations): NAND, denoted by Sheffer stroke
|, and NOR, denoted by Pierce arrow Ó, and defined by p|q ” pp ^ qq
and p Ó q ” pp _ qq respectively. Hint: Combine p ” p|p with the
De Morgan rules.

For example, the function q _ p p ^ rq from Figure 1 can be repre-


sented by the NAND/NOR circuit shown in Figure 4, i.e. as ppp|pq Ó
qq Ó pq Ó rq (check this!)
DISCRETE MATHEMATICS 7

p NAND

NOR

q NOR s

NOR
r

Figure 4

4. Circuits for binary addition and subtraction. Addition of


two binary digits, p and q, can be expressed by two Boolean functions:
the sum digit p‘q, i.e. pp_qq^ pp^qq, and the carry digit p^q. The
circuit in Figure 5 which realizes these functions is called a half-adder.

p
OR
AND Sum
q

AND NOT

Carry

Figure 5. Half-adder

The circuit shown in Figure 6 is called a full-adder. It adds three


binary digits, p, q, and r. In the binary addition of multi-digit numbers,
the third summand may be a carry-over from the previous binary place.
Check that it works! The key is that in the two additions needed for
adding three binary digits, at most one carry-over can occur.

S2 S
r
q S1 half−adder 2
p half−adder 1 C2 OR
C
C1

Figure 6. Full-adder
8 ALEXANDER BORISOVICH

Now the addition of multi-digit binary numbers can be realized as


shown in Figure 7, where . . . p2 p1 p0 ` . . . q2 q1 q0 “ . . . s3 s2 s1 s0 .

p s0
0
q half−adder c
0
0 s1
p1 full−adder c
1
q
1

s2
p2 full−adder c2
q
2

s3
full−adder

Figure 7. Multi-digit binary addition

5. Computer addition with negative numbers. One can use


the n-digit binary format in order to represent integers a from the inter-
val ´2n´1 ď a ă 2n´1 . Namely, replace the integer a by its remainder
a modulo 2n , which satisfies 0 ď p
p a ă 2n . In practice this means that
n´1
a non-negative a ă 2 will be represented by 0 in the leftmost, nth
place, followed by the n ´ 1 digits of a, while negative a ě ´2n´1 will
be represented by 2n ´ |a|. It will have 1 in the nth place to signal that
a was negative. Note that the binary code for 2n ´ |a| is obtained by
adding 1 to that of 2n ´ 1 ´ |a|, which is obtained from the code for |a|
by inverting all digits (think why!).
In the arithmetic modulo 2n , we have az `b “ p a ` pb mod 2n . That
is, assuming that the sum a ` b fits the range between ´2n´1 and
2n´1 ´ 1, the correct representation of a ` b is obtained by adding the
representations pa and pb and, if the sum p
a ` pb reaches 2n , dropping the
pn ` 1qst digit (i.e. reducing the sum modulo 2n ).3
6. Predicate logic. Suppose n denotes an integer, n P Z. Is n ą 5
a proposition? No, because it is true for some n and is false for other
n. If it is not a proposition, then what is it? The answer is that it is
a function from Z to the set P of propositions: It is a rule, which to
any given value of n (say, n “ 11) associates a proposition (11 ą 5 in
this example), which is either true or false (true in this case). Such
functions p : X Ñ P from sets to P are called predicates.
3Thus,mathematics relieves us from the need to consider the cases of various
signs and sizes of a and b, as it is done in the Monster.
DISCRETE MATHEMATICS 9

To each predicate, p : X Ñ P one can associate the truth set T ppq


which is a subset in X. By definition, it consists of those x P X for
which the proposition ppxq P P is TRUE. For example the truth set of
the predicate n ą 5 is the subset in Z of all integers greater than 5.
The universal (@) and existential (D) quantifiers are two operations,
associating propositions to predicates. Namely, given a predicate p :
X Ñ P, we obtain propositions @x ppxq and Dx ppxq: the former is
TRUE only when T ppq “ X, i.e. when ppxq is TRUE for all x P X,
and the latter is TRUE whenever T ppq ‰ H, i.e. when there exsts
x P X for which ppxq is TRUE. Note that T ppq “ X is false whenever
T p pq ‰ H, and T ppq ‰ H is false whenever T p pq “ X. Therefore
p@x ppxqq ” Dx p ppxqq, and pDx ppxqq ” @x p ppxqq.
Warning: We often casually omit the logical quantifiers. For ex-
ample, by writing the identity p : a2 ´ b2 “ pa ` bqpa ´ bq we mean
@a, b P R, ppa, bq. To accommodate this habit, let’s use the notation
ppxq ñ qpxq to mean @x pppxq Ñ qpxqq.
Exercises. (a) Disproving by counter-example: Show that the the-
orem ppxq ñ qpxq is FALSE whenever Dx pppxq ^ qpxqq is TRUE.
(b) Which of the implications ppxq ñ qpxq and qpxq ñ ppxq expresses
that p is a necessary condition for q, and which that p is the sufficient
condition for q?
(c) Show that ppxq ñ qpxq is TRUE whenever T ppq Ă T pqq.
(d) Express contrapositive, inverse, and converse propositions, i.e.
qpxq ñ qpxq, qpxq ñ ppxq, and ppxq ñ qpxq, as relationships
between the truth sets T ppq and T pqq.
(e) A real-valued function f : R Ñ R of one real variable is said to
be uniformly continuous, if
@ǫ ą 0 Dδ ą 0, such that |x ´ y| ă δ ñ |f pxq ´ f pyq| ă ǫ.
Write down the negation of this proposition, and check whether the
functions f “ x and f “ x2 are uniformly continuous.
(f) Change the order of the quantifiers in the above definition, i.e.
write
Dδ ą 0 such that @ǫ ą 0, we have |x ´ y| ă δ ñ |f pxq ´ f pyq| ă ǫ,
and describe all functions f : R Ñ R which satisfy this condition.
10 ALEXANDER BORISOVICH

2. Set theory
1. Operations with sets. By a set one means a collection of
objects of any nature. The notion is so basic that it cannot be defined
formally, simply because there are no more basic terms to rely on. The
best one can do is to say that a set S is well-defined if for any object
x in the Universe it is known whether x is an element of S (x P S) or
not (x R S). One says that T is a subset of S, T Ă S, if x P T ñ x P S.
Two sets are called equal, T “ S, if T Ă S and S Ă T , i.e. if they
consist of the same elements.
Warnings. (a) A set can be described by listing its elements, e.g.
S “ t2, 3, 5, 7u is the set of all one-digit primes. However, lists with
repeated objects, strictly speaking, don’t describe sets, since elements
of a set cannot be “repeated”: they are either there or not.
(b) One can consider the set t3, t3uu, consisting of two elements:
the integer 3 P Z, and the subset t3u Ă Z consisting of one integer,
3. Yet, the statement t3u P Z is false, since the elements of Z :“
t0, ˘1, ˘2, . . . u are numbers, not sets.
One defines the complement of a set, and the intersection, union,
difference, and symmetric difference of two sets A and B by
px P Ac q ô px R Aq
x P pA X Bq ô px P Aq ^ px P Bq
x P pA Y Bq ô px P Aq _ px P Bq
x P pB ´ Aq ô px P Bq ^ px P Aq
x P pA△Bq ô px P Aq ‘ px P Bq
Exercise. Prove the following properties of the operations:
pAc qc “ A, A△B “pA ´ Bq Y pB ´ Aq
pA X Bq “ Ac X B c ,
c
pA Y Bqc “ Ac X B c
A X pB Y Cq “pA X Bq Y pA X Cq
A Y pB X Cq “pA Y Bq X pA Y Cq
Example: Venn diagrams. Let us prove a ridiculous identity about
sets, e.g. that for any three sets A, B, C we have:
pA Y B Y Cq ´ pA△Bq “ pA X Bq Y pC ´ pA Y Bqq.
We can use the definitions of the operations and translate this into a
Boolean identity with three propositions — a : x P A, b : x P B, and
c : x P C:
pa _ b _ cq ^ pa ‘ bq ” pa ^ bq _ pc ^ pa _ bqq,
DISCRETE MATHEMATICS 11

where the left (or right) side of ” expresses the fact that x is an element
of the set written on the left (resp. right) side of “. Now we can try
to check the equivalence using truth tables. Consider, however, the
example shown on Figure 8, where A, B, and C are three discs on the
plane.

A
A

C C

B B

Figure 8. Venn diagrams

The area shaded green represents A△B, and the complement of it


inside A Y B Y C (i.e. the relative complement) is shaded in cyan. In
the right-hand picture, the relative complement of A Y B inside C is
shaded in magenta, while the yellow region represents A X B. Since
the total region shaded on the right coincides with the region shaded
in cyan on the left, we conclude that at least for these three sets, our
identity is true. So what? Have we proved the identity? Will it hold
for any three sets? We claim that the answer is yes. The reason is that
the three circles divide the plane into 8 regions (can you find all eight?)
in which the predicates x P A, x P B, x P C assume all 8 possible truth
values. Consequently, checking our identity for sets in this example is
equivalent to verifying the Boolean identity by truth tables.
2. Boolean algebras. The parallelism between propositions and
sets can be captured by the axiomatic algebraic structure called Boolean
algebra. By definition, a Boolean algebra is a set B equipped with two
operations, denoted ` and ¨, which are required to satisfy the following
properties (axioms):
‚ ` and ¨ are commutative and associative, and obey two distribu-
tive laws: a¨pb`cq “ pa¨bq`pa¨cq and a`pb¨cq “ pa`bq¨pa`cq
for all a, b, c P B;
‚ there exist elements of B (denoted 0 and 1) such that a ` 0 “ a
and a ¨ 1 “ a for all a P B;
12 ALEXANDER BORISOVICH

‚ for every a P B there exists an element in B (denoted ā) such


that a ` ā “ 1 and a ¨ ā “ 0.
Taking B to be P, the set of propositions, with _, ^, F , T , and in
the roles of `, ¨, 0, 1, and ¯ respectively, we obtain the Boolean algebra
of propositions.
Taking B to be the set of all subsets of a given set U, with Y, X, H,
U and U ´ A in the roles of the respective operations, we obtain other
examples of Boolean algebras.
From the axioms, many properties common to all Boolean algebras
can be derived.
Example: Complements are unique. If a ` x “ 1 and a ¨ x “ 0, then
x “ x ¨ 1 “ x ¨ pa ` āq “ x ¨ a ` x ¨ ā “ 0 ` x ¨ ā
“ a ¨ ā ` x ¨ ā “ pa ` xq ¨ ā “ 1 ¨ ā “ ā.
Exercise. Derive the double-complement rule a “ a, and the De
Morgan rules: a ` b “ ā ¨ b̄ and a ¨ b “ ā ` b̄, to be true for all a, b P B.
3. Russell’s paradox and Turing’s halting theorem. It turns
out that a naive approach to set theory leads to various contradictions.
One of them, formulated by Bertrand Russell, shows than it is unsafe
to use self-referential sentences to define sets. In a playful form it is
usually formulated this way: “The barber in a town shaves everyone
who doesn’t shave himself; does the barber shave himself?” That is,
does the barber belong to the set of all those he shaves, if the set is
defined by his rule?
There is no answer: if he shaves himself, then he doesn’t, and if he
doesn’t, then he does; i.e. the rule does not determine a set since it
remains uncertain whether the barber himself is an element of this set
or not.
An abstract version of this paradox can be illustrated by an attempt
to define the set S of all those sets which do not contain themselves as
their elements. Again, is S an element of S? If it is, then it isn’t, and if
it isn’t, then it is. This contradiction shows that the phrase purporting
to define S does not define any set.
While this (and some other) difficulties of set theory were resolved by
a much more careful construction of its logical foundations, the para-
doxes led to some fundamental “incompleteness” results in the work of
Kurt Gödel and other logicians. Namely, mimicking the self-referential
phrase This statement is unprovable (which, reasoning naively, is un-
provable, because if it were provable, it would have been false, and false
statements can’t be proved, and so it is a true statement — yet un-
provable!), Gödel managed to rigorously establish that not every true
DISCRETE MATHEMATICS 13

mathematical statement can be derived from the axioms of a given the-


ory. Here is another similar development: The halting theorem by Alan
Turing.
Theorem. There is no algorithm, which for any algorithm X and
any input data D would determine whether X with the input D will
loop forever, or it will halt after finitely many steps.
Proof. We will show that any algorithm, say CheckHaltspX, Dq,
which for every given algorithm X and every data D outputs either
halts or loopsforever, will fail to function the required way on X “ T est
and D “ T est, where T est is the following algorithm (considered as
both an algorithm X and text input data D):
T estpXq;
a : IF CheckHaltspX, Xq “ halts
T HEN GO T O a
ELSE GO T O b;
b : ST OP
Indeed, when CheckHaltspT est, T estq “ halts, T estpT estq loops for-
ever, and when CheckHaltspT est, T estq “ loopsf orever, T estpT estq
actually halts.
4. Functions. A function f from a set X (called the domain of
f ) to a set Y (called codomain of f ) is a rule which to each element
of X associates a unique element of Y . We write: f : X Ñ Y for the
function as a whole, but X Q x ÞÑ f pxq P Y to specify the element
y “ f pxq. We write f pXq for the subset in Y called the range of f , and
defined as f pXq :“ ty P Y | Dx P X, with y “ f pxqu.
A function is respectively called injective, surjective, or bijective if it
is one-to-one (i.e. px1 ‰ x2 q ñ f px1 q ‰ f px2 q), onto (i.e. F pXq “ Y ),
or both one-to-one and onto. One also says that a bijective function
establishes a one-to-one correspondence between the sets X and Y : for
every y P Y there exists (surjectivity) a unique (injectivity) x P X
such that y “ f pxq. This defines the function f ´1 : Y Ñ X called the
inverse of f . As a mapping between sets, f ´1 “undoes” what f does.
Example: Hash functions. How to efficiently store records of, say,
up to 10, 000 students in a college, when the records are to be identified
by 9-digit social security numbers (SSN)? A function Hash from the
set of 109 possible SSNs to the set of 104 labels ranging from 0000
to 9999 can be defined by a simple rule, e.g. by HashpSSNq :“
the last 4 digits of the SSN. One problem is that this function might
14 ALEXANDER BORISOVICH

not be injective. When such a collision happens, a collision resolution


algorithm is applied, e.g. one can assign the smallest allowed Hash
value as yet unused by the database records. The resulting function
Hash may look very irregular, but it will be injective.
Given two functions g : X Ñ Y , and f : Y Ñ Z, their composition
h :“ f ˝ g : X Ñ Z is defined by hpxq :“ f pgpxqq. The operation of
composition of functions is associative: Given three functions, h : X Ñ
Y , g : Y Ñ Z, and f : Z Ñ W , the composition of f with the result
of composiing g and h coincides with the result of composing f and g,
composed with h: f ˝ pg ˝ hq “ pf ˝ gq ˝ h : X Ñ W . Indeed, both rules
applied to x P X give the same function f pgphpxqq (which is the only
way of getting from X to W using f , g, and h). In fact, whenever there
is an associative operation in mathematics, it can be defined in terms
of the operation of composing functions. For example, the addition of
plane vectors u`v can be interpreted as the composition of translations
on the plane: one by the vector u, the other by the vector v. Thus,
the tautological associativity of function composition is the reason why
this property is shared by many algebraic operations.
The graph of function f : X Ñ Y is defined as the subset F Ă X ˆ Y
in the Cartesian product4 of X and Y consisting of all pairs px, f pxqq,
that is F :“ tpx, yq P X ˆ Y | y “ f pxqu. High-school students often
fail to distinguish functions (which are rules, e.g. x ÞÑ x2 ) from their
graphs, which are sets (parabola y “ x2 on the plane R2 :“ R ˆ R).
In fact logicians do the same. Indeed, in order to reduce the number
of undefinable notions, they define functions in terms of sets. Namely,
a function from X to Y in their sense is a subset F Ă X ˆ Y such
that for every x P X there exists a unique y P Y such that px, yq P F .
Of course, to arrive at this formal definition, one needs to have the
informal idea of functions as rules beforehand.
Exercises. (a) Prove that f : X Ñ Y is bijective (injective, surjec-
tive) if and only if f has a two-sided inverse (resp. left inverse, right
inverse) with respect to the composition operation, i.e. if there exists
g : Y Ñ X such that g ˝ f “ idX (a left inverse) and f ˝ g “ idY (a
right inverse), where idX and idY denote the identity maps on X and
Y.
(b) Show that if F Ă X ˆ Y is the graph of an invertible function,
then the graph of the inverse function coincides with F considered as
a subset of Y ˆ X (which can be identified with X ˆ Y by reordering
the pairs).
4By definition, X ˆ Y :“ tpx, yq | x P X, y P Y u, i.e. the set of all ordered pairs.
DISCRETE MATHEMATICS 15

5. Equivalence relations. Dealing with numbers, one often writes


x ă y instead of simply saying: the ordered pair px, yq is in the binary
relation “less than.” This example illustrates the informal idea of a
binary relation. Speaking formally, the notion generalizes that of a
function X Ñ Y (identified with its graph). Namely, a binary relation
between X and Y is any subset of their Cartesian product: R Ă X ˆY .
One says that x P X and y P Y are in the relation R if px, yq P R. The
statement that x and y are in this relation R can be expressed as xRy
(like x ă y), or by Rpx, yq (as in the Monster).
In mathematics, one often needs to partition a given set into non-
overlapping subsets (“classes of equivalence” in a certain given sense),
and introduce a new set, whose elements are the equivalence classes.
For example, in arithmetic modulo 2 one deals with the set Z{2Z of
two equivalence classes into which the set of integers Z is partitioned
according to their parity (even/odd). The idea of classification is for-
malized in the notion of equivalence relations. By definition, an equiv-
alence relation on a set X is a binary relation R Ă X ˆ X which is
reflexive, symmetric, and transitive, i.e. respectively
@x P X, Rpx, xq
@x, y P X, Rpx, yq Ñ Rpy, xq
@x, y, z P X, Rpx, yq ^ Rpy, zq Ñ Rpx, zq.
Proposition - exercise. For each x P X, define the equivalence
p :“ ty P X|Rpx, yqu, and show that if R is an equivalence rela-
class x
tion, then the equivalence classes form a partition of X, i.e. their union
is the whole of X, and any two classes which have a common element
coincide. Conversely, show that if the subsets x p form a partition of X,
then R is an equivalence relation.
Exercises. (a) Which of the relations x ă y, x ď y, x “ y on
R (meaning X “ Y “ R, the set of reals) are reflexive? symmetric?
transitive? Draw the corresponding subsets on the plane R ˆ R.
(b) Fix n P Z, and write x ” y mod n if5 n|px ´ yq. Check that the
binary relation ”” mod n” is reflexive, symmetric, and transitive, and
show that the corresponding partition of Z is the partition according
to the remainders of integers modulo n. What happens when n “ 0?
Given an equivalence relation, one usually writes x y instead of
Rpx, yq. The set of equivalence classes is generally denoted by X{ , or
X{R, though in the example from the previous exercise the standard
notation is Z{nZ, or even shorter: Zn .
5The vertical bar | here means ”divides”.
16 ALEXANDER BORISOVICH

6. Cardinality. How to find out whether two finite sets have the
same number of elements? One way is to count their elements. In fact,
counting is the process of establishing a one-to-one correspondence with
some reference set. E.g. there are 5 business days in a week because I
can establish a one-to-one correspondence between Mo, T u, W e, T h, F r
and the fingers of my right hand. Abstractly speaking, a counting
number is the class of all those finite sets which can be put into one-
to-one correspondence with each other. By the number of elements in
a finite set we simply mean the class to which this set belongs.
The notion of cardinality extends this idea beyond finite sets. Two
sets are said to have the same cardinality (written as |X| “ |Y |, or
X Y ), if there exists a one-to-one correspondence between them. One
should check at this point that this is an equivalence relation. There-
fore all sets are partitioned into equivalence classes according to their
cardinalities.
Cardinalities of finite sets are usually denoted by non-negative inte-
gers 0, 1, 2, . . . , n, . . . , where 0 stands for the class of the empty set H.
These cardinalities (i.e. the classes of equivalence of finite sets) form a
set denoted in the book by N. It itself is infinite. Its subset t1, 2, 3, . . . u
(denoted in the book Z` ) is also infinite, and seems smaller. However
|N| “ |Z` |, since the function n ÞÑ n ` 1 from N to Z` establishes a
one-to-one correspondence.6 Sets with the same cardinality as Z` (or
N) are called countable.
Exercises. (a) Prove that the set Z of all integers (positive and
negative) is countable.
(b) Prove that the union of countably many countable sets is count-
able. (Hint: Draw the 2-dimensional table whose rows contain the lists
of elements of the given sets, and number all the items in the table by
scanning it diagonally and skipping repeated items.)
(c) Prove that the set Q of all rational numbers is countable. (Hint:
Write elements of Q as fractions m{n.)
(d) Prove that the real number line R has the same cardinality as
any segment pa, bq :“ tx P R|a ă x ă bu with a ă b. (Hint: Use
stereographic projection.)
(e) The same for pa, bq and pa, bs or ra, bs, obtained from pa, bq by
adding one of the endpoints or both. (Hint: Find a common countable
subset in pa, bq and pa, bs.)
(f) Prove that the unit interval r0, 1s and the unit square r0, 1s ˆ
r0, 1s have the same cardinality. (Hint: Encode a pair x, y of decimal
6By
the way, the official definition of an infinite set is that its cardinality should
not change when a new element is adjoined.
DISCRETE MATHEMATICS 17

expansions x “ 0.x1 x2 . . . and y “ 0.y1 y2 . . . , where xi and yi are


decimal numerals, by a single expansion z “ 0.x1 y1 x2 y2 . . . .)
Let us write |X| ď |Y | if X has the same cardinality as some subset
of Y . The following celebrated Cantor-Bernstein theorem shows that
such ordering of cardinalities is well-defined.
Theorem. If |X| ď |Y | and |Y | ď |X|, then |X| “ |Y |.
Proof. In math circles, the problem is playfully phrased in terms
of two languages, Mumbo (M) and Jumbo (J), and two injective, but
not surjective dictionaries: f : M Ñ J and g : J Ñ M (allowing
each of the tribes to consider their language superior). The task is to
construct a third dictionary, h : M Ñ J, which would be bijective. It
has an elegant solution, which we recommend you to find on your own,
but if you are lazy, here it is.
Since the functions are injective, inverse images are unique when
they exist. So, starting with m P M, we try to construct a sequence
m0 , j0 , m1 , j1 , . . . , where m0 “ m, j0 “ g ´1 pm0 q if it exists, m1 “
f ´1 pj0 q, if it exists, and so on. Here are the possibilities: (i) the result-
ing sequence never terminates; (ii) it terminates in M: m0 , ¨ ¨ ¨ , mN ;
(iii) it terminates in J: m0 , ¨ ¨ ¨ , jN . We define hpmq as f pmq in the first
two cases, and as j0 in the third. The inverse function h´1 : J Ñ M
can be described this way: given j P J, try to construct the se-
quence j, m0 , j0 , . . . using inverse images as before. If it is infinite,
then m “ m0 falls into type (i), and so h´1 pjq “ m0 . If it is fi-
nite and terminates in M, then m “ m0 falls into type (ii), and so
h´1 pjq “ m0 too. If it terminates in J, then m “ gpjq falls into type
(iii), and so h´1 pjq “ gpjq. (Indeed, the sequence built from m has
the form m, j, . . . , jN (where possibly j is the last term), in which case
hpmq “ j “ g ´1 pmq.) Since both h and h´1 are well-defined everywhere
on M and J respectively, they establish a one-to-one correspondence
between M and J. Q.E.D.
7. Uncountability and non-computability. With the Cantor-
Bernstein theorem at hands, it becomes obvious that the real line R “
p´8, 8q, the half-line p0, 8q, the segment r0, 1s, etc. have the same
cardinality. It is called the cardinality of the continuum.
Exercise. Prove that the set, denoted 2N , of all subsets in N (or in
any other countable set) has the cardinality of the continuum. (Hint:
Encode a subset by a string of binary digits which extends infinitely
to the right, and fight the difficulties caused by the duplicate binary
representation of some reals on r0, 1s, using the Cantor-Bernstein the-
orem.)
18 ALEXANDER BORISOVICH

The following celebrated theorem by Georg Cantor is based on the


diagonal argument, somewhat reminiscent of the self-referential algo-
rithm in Turing’s theorem.
Theorem. The continuum is uncountable.
Proof. We will show that any countable list of binary sequences
cannot contain all such sequences. Indeed, if s1 “ ps11 , s21 , s31 , . . . q is the
first sequence in the list, s2 “ ps12 , s22 , s32 , . . . q is the second, etc., where
each sji “ 0 or 1, then the diagonal sequence s “ ps11 , s22 , s33 , . . . q shares
with the sequence s1 its first digit, with s2 the 2nd, with s3 the 3rd,
and so on. Therefore the reverse sequence s̄ “ ps̄11 , s̄22 , . . . q (obtained
from s by replacing 0s with 1s and 1s with 0s) is not in our list, since
it differs from s1 at its 1st place, from s2 at the 2nd, from s3 at the
3rd, and so on.
Exercises. (a) A real (or complex) number is called algebraic, if it
is a root of a polynomial in one variable with integer coefficients. Prove
the existence of non-algebraic numbers.
(b) Generalizing Cantor’s theorem, prove for any set S that |2S | ą
|S|, where 2S denotes the set of all subsets in S (also denoted PpSq and
called the power set of S). In other words, show that the cardinality of
the power set is strictly greater than that of S, i.e. no injective function
f : S Ñ PpSq is surjective. (Hint: Show that F̄ :“ tx P F̄ ô x R f pxqu
is not in the range of f .)
Corollary (existence of non-computable functions). For every pro-
gramming language, there exist functions f : N Ñ t0, 1u (or to any
other non-trivial set) for which no program in this language computes
f , i.e. for each input n P N outputs f pnq.
Proof. A program in a programming language is a text, i.e. a finite
collection of symbols of a finite alphabet. All such sequences form a
countable set. Indeed, there are finitely many such sequences of every
fixed length, and so we can make a list of all sequences by first listing
all one-letter sequences, then all two-letter sequence, and so on. Thus,
there are countably many programs computing functions N Ñ t0, 1u
(they form a subset in our countable list), but there are uncountably
many such functions. Q.E.D.
DISCRETE MATHEMATICS 19

3. Iterative processes
1. Induction. Consider an infinite sequence of propositions ppnq,
n “ 0, 1, 2, . . . , and suppose that the following two theorems are proved:
1˝ pp0q (the base of induction)
.
2˝ @n P N, ppnq Ñ ppn ` 1q (the step of induction).

The principle of mathematical induction guarantees that then all ppnq


are true: 1˝ ^2˝ ñ @n, ppnq. Indeed, to prove ppnq, start from pp0q “ T ,
and conclude that pp1q “ T (since pp0q Ñ pp1q is true), then conclude
that pp2q “ T (since pp1q Ñ pp2q is true), and so on, and arrive after
n steps to ppnq “ T .
A practical conclusion from this argument is that mathematical in-
duction is a powerful logical tool (as it is indeed), because without it, in
order to reach pp1000q, one would have to make a very long argument,
repeating similar steps 1000 times. The drawback is that very often
induction leads to a conclusive verdict, but the intuitive comprehension
of why all ppnq are true gets lost in the process. The theoretical aspect
of this argument leads, however, to the notion of a well-ordered set.
A (partial) ordering on a set S is a binary relation R on it, which
is reflexive (@x P S, xRx), transitive (pxRyq ^ pyRzq ñ pxRzq), and
anti-symmetric: pxRyq ^ pyRxq ñ py “ xq. Examples: (a) Divisibility
a|b, on Z` . (b) Inclusion Ă, on the set PpUq of subsets in a given set.
(c) Ordering of cardinalities |X| ď |Y | (thanks to Cantor–Bernstein).
A partial ordering is total if @x, y P S, pxRyq _ pyRxq. Examples: (a)
ď on R. (b) The ordering of words in a dictionary. (c) Lexicographical
ordering, e.g. of points px, yq ď px1 , y 1q on the plane: first by x ď x1 ,
and if x “ x1 , then by y ď y 1 — as in the dictionary.
Suppose that every non-empty subset X Ă S in a given totally or-
dered set has its minimal element with respect to R, i.e. there exists
x0 P X such that for all x P X we have x0 Rx. Then one says that S is
well-ordered by R.
The truth is that R is not well-ordered by ď (because, e.g. the set
of positive reals has no minimal element), but N is well-ordered by ď:
every non-empty subset of N has its minimal element.
The principle of mathematical induction can be proved on the basis
of this property of N. Namely, consider the set X Ă N of those n for
which ppnq is false. Assume X ‰ H. Then X has the minimal element
n0 . It cannot be 0, since pp0q “ T (the base). Therefore n0 ´ 1 is
defined, and ppn0 ´ 1q “ T . But since ppn0 ´ 1q Ñ ppn0 q is true (the
step of induction), then ppn0 q “ T , i.e. n0 R X. This contradiction
shows that X “ H.
20 ALEXANDER BORISOVICH

Here is an example illustrating why it is dangerous to discard the


base of induction as “uninteresting”.
Problem: Find the number Rpnq of the regions into which n generic
straight lines divide the plane. (“Generic” here means that no two are
parallel, and no three are concurrent.)
Solution. Given n´1 lines on the plane, a generic nth line intersects
each of the previous n ´ 1 lines once. The n ´ 1 intersection points
divide the nth line into n parts. Therefore the nth line cuts through
exactly n previous regions, dividing each of them in two, and creating
therefore n new ones. Thus, Rpnq “ Rpn ´ 1q ` n. Thus, Rpnq “
1 ` 2 ` ¨ ¨ ¨` n “ npn ` 1q{2 is the nth triangular number. For example,
Rp3q “ 3p3 ` 1q{2 “ 6 — Oops! Rp3q “ 7. The error comes from the
fact that Rp0q “ 1, not 0, as for the triangular numbers, and so the
right answer is Rpnq “ 1 ` p1 ` 2 ` ¨ ¨ ¨ ` nq “ 1 ` npn ` 1q{2.
The following elegant induction problem is probably included into
every DM text. Removing one cell from a 2ˆ2 chess-board, one obtains
a shape called a tromino.
Problem: Show that any 2n ˆ 2n chess-board with one cell removed
can be tiled by trominoes without overlaps.
Solution. Divide the board into four 2n´1 ˆ 2n´1 parts. The re-
moved cell falls into one of them. Place one tromino at the center so
that it covers one cell from each of the remaining three parts. Now
it remains to tile the four parts, each with one cell removed. By the
induction hypothesis (i.e. ppn ´ 1q) this can be done, and ppnq follows.
Note that the base of induction (pp1q in this case) is checked in the
very formulation of the problem.
Exercises. (a) Use the solution of the tiling problem as an algo-
rithm to tile the 8 ˆ 8 chess-board with the cell d7 removed.
(b) Find another inductive solution, based on the partitioning of the
board into 2n´1 ˆ 2n´1 cells of size 2 ˆ 2 each.
Remark (on well-ordering). A countable set can be well-ordered
(why?) Can this be generalized to sets of other cardinalities? It turns
out (and this is a deep result of mathematical logic) that generally
speaking, it is not possible to construct such a well-ordering, nor prove
its existence in any other way, although assuming that it is possible
would not lead to any contradiction. Thus, the well-ordering principle,
saying that every set admits a well-ordering, can be accepted as an
axiom.
DISCRETE MATHEMATICS 21

2. Recursion. A sequence of elements of a set X is defined as a


function s : N Ñ X (or Z` Ñ X): n ÞÑ sn .
A recursively defined sequence is constructed by choosing s0 P X,
and using the rule sn “ f psn´1 q for each n ą 0, where f : X Ñ X is a
given function.
Example: 2nd order linear recursion relations. Such a relation has
the form an`1 “ αan ` βan´1 , where α and β are fixed numbers. Given
a0 and a1 , the rule determines an for all n ą 1. In order to formally
interpret it the rule as the recursively defined sequence in the above
sense, one needs to take X “ R2 , s0 “ pa0 , a1 q, and define f as the linear
map f px, yq :“ py, αy ` βxq. One can apply linear algebra in order to
analyze the behavior of the dynamical system (with discrete time) in
R2 defined by iterating the linear map f . Alternatively, note that if a1n
and a2n are two sequences satisfying the given recursion relation, and γ 1
and γ 2 are arbitrary numbers, then an :“ γ 1 a1n ` γ 2 a2n also satisfies the
recursion relation. (Check this linearity property!) This means that all
solutions form a vector space. It has dimension 2, since a solution is
determined by two constants: a0 and a1 . So, let us look for particular
solutions in the form an “ λn . From the recursion relation, we find:
λn`1 “ αλn ` βλn´1 , or, if λ ‰ 0: λ2 ´ αλ ´ β “ 0.
Let λ˘ be two distinct roots of this quadratic equation (called the char-
acteristic equation). Therefore the general solution to this recursion
relation has the form an “ γ` λn` ` γ´ λn´ where γ˘ are arbitrary con-
stants. In the notorious example of the Fibonacci sequence, we have:
an`1 “ an ` an´1 , so that?the characteristic equation is λ2 ´ λ ´ 1 “ 0,
and has roots λ˘ “ p1 ˘ 5q{2. Thus, the general solution is
ˆ ? ˙n ˆ ? ˙n
1` 5 1´ 5
an “ γ` ` γ´ .
2 2
Exercise. (a) Find the coefficients γ˘ corresponding to the initial
conditions a0 “ 0, a1 “ 1.
(b) Suppose the characteristic equation has the form pλ ´ λ0 q2 “
0, i.e. has a double root λ0 . Show that the general solution to the
corresponding recursion relation an`1 “ 2λ0 an ´ λ20 an´1 has the form
an “ γ0 λn0 ` γ1 nλn0 , where γ0 , γ1 are arbitrary constants.
Remark. The idea of recursion goes beyond the notion of recursively
defined sequences. For example, one can start with Boolean variables
p1 , . . . pn and define Boolean expressions as obtained from them by
recursively applying the operations _, ^, . Or, one can recursively
define vector spaces as linear subspaces in spaces of all vector-valued
22 ALEXANDER BORISOVICH

function S Ñ V , where S is any set, and V is any previously defined


vector space, and equip the space (and its subspaces) with the oper-
ations of pointwise addition of functions and pointwise multiplication
of functions by scalars. The proof of Turing’s halting theorem was
also based on the recursive use of algorithms CheckHalts and T est.
Iterative and self-referring processes are common in programming.
3. Integer division and correctness of algorithms. We know
from first grade that division with remainders is recurring subtraction.
Let us use a computer realization of the division algorithm to illustrate
verification of algorithms. The division is realized by the code (in color)
with commentaries in black:
p0 ď a P Zq ^ p0 ă d P Zq precondition P repxq
r :“ a; q :“ 0 initial values y :“ y0
while pr ě dq loop guard Gpx, yq
r :“ r ´ d; q :“ q ` 1 loop body y :“ F px, yq
end while
% pa “ qd ` rq ^ p0 ď r ă dq postcondition P ostpx, yq
Thus, a general while loop receiving an input x and producing an output
y consists of: the precondition, which is a predicate, P repxq, that has
to be TRUE for the loop to start; the initial condition y0 of the loop’s
output; the loop guard, which is a predicate, Gpx, yq, that has to be
FALSE for the loop to terminate; and the body of the loop, which
iterates a function y :“ F px, yq.
One says that the loop functions correctly, if it does terminate af-
ter finitely many iterations whenever P repxq “TRUE, and produces
the final output y such that the postcondition, which is a predicate
P ostpx, yq (expressing the purpose of the loop), becomes TRUE.
Proposition (C.A.R. Hoare). Let Ipx, yq be a loop invariant, i.e. a
predicate satisfying:
piq P repxq ñ Ipx, y0q
piiq Gpx, yq ^ Ipx, yq ñ Ipx, F px, yqq
piiiq Gpx, yq ^ Ipx, yq ñ P ostpx, yq.
Then, assuming additionally that the loop guard is known to eventually
becomes FALSE, the while loop functions correctly.
Indeed, (i) guarantees that the initial value of I is TRUE, (ii) guaran-
tees that the value stays TRUE as long as the loop is iterated, and (iii)
guarantees that when the loop stops, the output fulfills the purpose.
Exercise. Verify the division algorithm by checking that Ipa, d, r, qq
defined as 0 ď r “ a ´ qd is a loop invariant.
DISCRETE MATHEMATICS 23

4. Elementary number theory


1. The Euclidean algorithm. The following remarkable theory
can be found (in a rather geometric form) in the oldest math textbook:
Euclid’s “Elements”. As we have just discussed, given two integers, a
and b ‰ 0, one divides a by b to obtain the quotient q and remainder
r such that a “ qb ` r and 0 ď r ă |b|. A common divisor of q
and r also divides a “ qb ` r, and a common divisor of a and b also
divides r “ a ´ qb. Therefore the set of common divisors of a and b
coincides with the set of common divisors of b and r. The Euclidean
algorithm proposes to replace the pair pa, bq with the pair pb, rq, and
continue the process: divide b by r with the remainder r1 satisfying
0 ď r1 ă r, then divide r by r1 with the remainder r2 satisfying
0 ď r2 ă r1 , and so on. Of course, if a remainder turns out equal 0, the
next division becomes impossible, and the process stops. Moreover, it
is guaranteed to stop after ď |b| iterations, because the non-negative
integer remainders decrease: |b| ą r ą r1 ą r2 ą ¨ ¨ ¨ ě 0. Thus, we
will have a finite succession of pairs
pa, bq, pb, rq, pr, r1q, pr1 , r2 q, . . . pd, 0q,
(where we denoted the last non-zero remainder by d) with the property
that they all have the same set of common divisors. Since the common
divisors of d and 0 are the same as the divisors of d alone (which is, by
the way, its own divisor), we arrive at the following conclusion:
Euclid’s theorem. For any integers a and b ‰ 0 there is a unique
positive integer d such that every common divisor of a and b is a divisor
of d.
Such d is called the greatest common divisor of a and b, and is denoted
by GCDpa, bq or simply by pa, bq. E.g. pa, bq “ 1 means that a and b
have no non-trivial common divisors (i.e. none except ˘1), in which
case they are called relatively prime.
Moreover, writing out the steps of the Euclidean algorithm
a “ qb ` r ñ r “ a ´ qb
b “ q1 r ` r1 ñ r1 “ b ´ q1 r
r “ q2 r1 ` r2 ñ r2 “ r ´ q2 r1
,
...
rn´1 “ qn rn ` d ñ d “ rn´1 ´ qn rn
rn “ qn`1 d ` 0
we realize that each of the participating remainders is the sum of integer
multiples of a and b. In particular, d “ ka ` lb for some k, l P Z. This
fact has important consequences.
24 ALEXANDER BORISOVICH

Let us call an integer p ą 1 irreducible if it cannot be factored in


a non-trivial way (i.e. does not have divisors other than ˘1 and ˘p),
and prime, if p|ab ñ pp|aq _ pp|bq. Obviously, a prime p is irreducible,
because existence of a factorization p “ ab with |a|, |b| ă p would
contradict primality. Let us establish the converse: Assume that an
irreducible p divides the product ab, but does not divide a, and deduce
that p divides b. Indeed, since p has no other non-trivial factors than
˘p, and none of them divides a, we have pa, pq “ 1 and hence (by
the above observation about the Euclidean algorithm), 1 “ ka ` lp for
some k, l P Z. Therefore b “ kab ` lpb, and since p|ab, we conclude
that p|b as promised.
Thus, irreducible positive integers p “ 2, 3, 5, 7, 11, 13, . . . are the
same as primes. One can derive by induction that if such p divides the
product of several factors, then it divides at least one of them.
Corollary (the Fundamental Theorem of Arithmetic). Any posi-
tive integer can be written as the product of powers of positive primes,
and such a representation is unique up to reordering of the factors.
(Rephrasing: Every n ą 0 is uniquely written as n “ 2a 3b 5c 7d 11e ¨ ¨ ¨ )
Proof. Existence of the factorization is almost obvious. If a given
n ą 1 is irreducible, then it is its own factorization. If it is reducible,
factor it as n “ ab, where 0 ă a, b ă n, and continue factoring each
of a and b if possible. Since the factors becomes smaller each time,
the process eventually stops. yielding a factorization n “ p1 ¨ ¨ ¨ pN into
primes. To prove uniqueness. assume that p1 , ¨ ¨ ¨ pN “ q1 ¨ ¨ ¨ qM , where
all pi and qj are irreducible. Since p1 divides the product q1 ¨ ¨ ¨ qM , it
divides one of qj (because an irreducible pi is also prime!) Since qj
is irreducible, qj “ p1 . Thus, we can cancel p1 with qj , and proceed
the same way with p2 ¨ ¨ ¨ pN “ q1 ¨ ¨ ¨ qpj ¨ ¨ ¨ qM (where the “hat” over qj
means that it is omitted). Eventually we will conclude that N “ M,
and that the list of qj is obtained from the list of pi by reordering.
Exercise. (a) Forget the previous theory, and start anew: Given
a, b P Z, introduce7 the set Ia,b :“ tka ` lb } k, l P Zu Ă Z, and prove
that Ia,b consists of all multiples of some d P Z. (Hint: Try the smallest
positive number in Ia,b in the role of d.)
(b) Show that Ia,b “ Id :“ trd } r P Zu implies that d “ pa, bq, i.e.
that d is a common divisor of a and b, and that every other common
divisor divides d.
(c) Prove that there are infinitely many primes. (Hint: None of a
finite list of primes p1 , . . . , pn divides p1 ¨ ¨ ¨ pn ` 1.)
7We use } to avoid the collision with the “a divides b” notation a|b.
DISCRETE MATHEMATICS 25

2. Modular arithmetic. We already know that for a fixed n ą 0,


the relation a ” b mod n, meaning by definition that n|pa ´ bq, is an
equivalence relation. It partitions Z into n equivalence classes (often
called congruence classes of integers modulo n), uniquely represented
by the remainders 0, 1, . . . , n ´ 1.
Let us use the notation Zn for the set of congruence classes modulo
n. The operations of addition and multiplication of integers descend to
Zn . Namely, define the sum and product of congruence classes pa, pb P Zn
of two integers a, b P Z as the congruence classes of their sum and of
their the product respectively:
a ` pb :“ az
p apb :“ ab.
` b and p p
Exercises. (a) Check that the result does not depend on the choice
of class representatives, i.e. that pa1 ” a mod n and b1 ” b mod n
imply pa1 ` b1 q ” pa ` bq mod n and a1 b1 ” ab mod n. Thus, the map
Z Ñ Zn : x ÞÑ x p transforms sums to sums and products to products.
(b) Show that when n|N, there is a unique map ZN Ñ Zn which
transforms sums to sums and products to products. (Hint: a ” b
mod N implies a ” b mod n.)
Theorem (The Chinese Remainder Theorem). If pm, nq “ 1 then the
map (from the previous exercise) Zmn Ñ Zm ˆ Zn is bijective.
Proof. The map takes the congruence class of an integer a modulo
mn into the pair of congruence classes of a — one modulo m, the other
modulo n.8 The total number of such pairs is mn, the same as the
cardinality of Zmn . So, it suffices to prove that the map is injective, i.e.
to show that if a ” b mod m and a ” b mod n, then a ” b mod mn.
Indeed, for c “ a ´ b, we are given that m|c and n|c. Since m and
n have no common factors, the Fundamental Theorem of Arithmetic
implies that c must be divisible by their product. Q.E.D.
Exercise. Show that the map Zmn Ñ Zm ˆ Zn is not injective
whenever GCDpm, nq ą 1. (Hint: LCMpm, nq ă mn.)
In Z, only ˘1 are multiplicatively invertible. Our goal is to study
the congruence classes in Zn which are invertible with respect to mul-
tiplication. We claim that p a P Zn is invertible if and only if pa, nq “ 1.
Indeed, if they have a common factor, i.e. a “ a1 d, n “ dn1 for some
d ą 1, then p anp1 “ az x
1 dn1 “ a1n “ p0 in Zn . This implies that p a is not
invertible, since if it were, we would multiply p p1
an by the inverse of p a
1
and conclude that n p “ 0, which is not true.
8Note that it maps sums and products to (componentwise) sums and products,
since according to the previous exercise, each of Zmn Ñ Zm and Zmn Ñ Zn does.
26 ALEXANDER BORISOVICH

Conversely, when pa, nq “ 1, the Euclidean algorithm gives us k and


l such that ka ` ln “ 1. Reducing this modulo n we find p a “ 1, i.e. p
kp k
is the multiplicative inverse of p
a.
Exercises. (a) Show that multiplicatively invertible congruence
classes (called units of Zn ) form a subset (denoted Zˆ n ) which is closed
with respect to multiplication.
(b) Find all units in Z8 and analyze their multiplication table. Do
the same for Z5 .
(c) Prove that |Zˆ pk
| “ pk ´ pk´1 , where p is prime and k ą 0.
(Hint: Which of numbers 1, 2, . . . , pk are not relatively prime to pk ?)
(d) Prove that for m, n relatively prime, the number of units modulo
mn is the product of the numbers of units modulo m and n separately.
(Hint: Show that Zmn Ñ Zm ˆ Zn maps Zˆ ˆ ˆ
mn bijectively to Zm ˆ Zn .)

Remark. The Euler function ϕ associates to an integer n ą 1 the


number of integers between 1 and n which are relatively prime to n. In
our notation, ϕpnq :“ |Zˆ n |, the number of units in Zn . The previous
exercises imply that in terms of the prime factorization n “ pk11 ¨ ¨ ¨ pkr r ,
where p1 , . . . , pr are distinct prime divisors of n, we have
ˆ ˙ ˆ ˙
k1 ´1 kr ´1 1 1
ϕpnq “ p1 ¨ ¨ ¨ pr pp1 ´1q ¨ ¨ ¨ ppr ´1q “ nˆ 1 ´ ¨¨¨ 1 ´ .
p1 pr
Indeed, the factors pki i corresponding to different primes are pairwise
relatively prime. Thus, applying Exercise (d) inductively, we find that
ϕpnq “ ϕppk11 q ¨ ¨ ¨ ϕppkr r q, while Exercise (c) determines the value of
each factor: ϕppk q “ pk ´ pk´1 “ pk´1 pp ´ 1q.
Fermat’s Little Theorem. For all a P Z and a prime p, we have
p
a ” a mod p.
Proof. In other words, p ap “ p a in Zp . Since this is obvious for
p
a “ 0, and all other congruence classes in Zp are invertible (because p
is prime), it suffices to prove that p ap´1 “ p a P Zˆ
1 for all p p.
Take the string p 1, p
2, . . . , pz´ 1 and multiply each term by p a. This
will result in the string p1p a, p2p {
a, . . . , pp ´ 1qp
a, which is in fact just a per-
mutation of the original string, because the operation of multiplication
by pa is invertible. Multiplying the terms of the string, we conclude
that pp{´ 1q! pap´1 coincides with pp{ ´ 1q! Since the factorial product is
ˆ
invertible in Zp , we conclude that p ap´1 “ p1 as required. Q.E.D.
Exercises. (a) Use the same reasoning to prove Euler’s Theorem:
If pa, nq “ 1, then aϕpnq ” 1 mod n.
(b) Compute 22017 mod 111.
DISCRETE MATHEMATICS 27

3. Public key cryptography. It is an elegant idea of a number-


theoretic origin which enables one to receive messages from anyone in
the world, encrypted by a publicly announced cipher, and yet deci-
pherable only by the recipient, so that no third party can eavesdrop
on the content of the messages. There exist some relatively efficient
algorithms which allow one to determine whether some large numbers
(say, consisting of hundreds of decimal digits) are prime or composite
(without actually factoring them in the latter case), though factoring
the product pq of two such prime numbers p and q would be unfeasible
for modern computers in any reasonable time.
So, the recipient, being in possession of two such primes, publicly
announces their product, R, and one more number, e, which must be
relatively prime to ϕppqq “ pp ´ 1qpq ´ 1q. Messages to the recipient
must be expressed in integers and partitioned into “packets” of size
ă R. To send such a “packet” M, the sender transmits the remainder
C ” M e mod R. A potential eavesdropper can see C, which will
have the form M e ´ xR for some x, and though R and e are publicly
known, the uncertainty in the number x of “overflows” would prevent
the eavesdropper from recovering the message M.
Now, with probability close to 1 (compute it!) the number M is rela-
tively prime to R, and hence, according to Euler’s theorem, M ϕpRq ” 1
mod R. Since e is relatively prime to ϕpRq “ ϕppqq “ pp ´ 1qpq ´ 1q, it
is invertible in ZR . The recipient can pre-calculate d such that ed ” 1
mod ϕpRq. Thus, the deciphering procedure consists in computing the
remainder of C d modulo R: C d “ M ed ” M mod R. Since the initial
packet M is smaller than R, the remainder coincides with M.
In order to determine d from e and R, one needs to know ϕpRq. Even
if it is known that R is the product of two primes, p and q, determining
ϕppqq “ pp ´ 1qpq ´ 1q “ R ` 1 ´ pp ` qq is equivalent to finding the
sum p ` q, which together with the product R “ pq would determine
p and q. Thus, for anyone who cannot find the factorization of R, the
message C would remain undecipherable.
Exercise. Take small numbers for the public key: R “ 55, e “ 3.
(a) Encrypt message M “ 8. Answer: C “ . . .
(b) Determine the inverse of e modulo ϕp55q. Answer: d “ . . .
(c) Decipher the encrypted message: C d ” . . . mod 55.
Example. How to raise a to the power 27 modulo 55? Write 2710 “
110112 (in binary code). Therefore a27 “ a16 ˆ a8 ˆ a2 ˆ a1 . E.g., to
compute 1727 mod 55, we find 172 ” 14 mod 55, 174 ” 142 ” ´24
mod 55, 178 ” p´24q2 ” 26 mod 55, 1716 ” 262 ” 16 mod 55. Hence
1727 “ 1716 178 172 171 ” 16 ¨ 26 ¨ 14 ¨ 17 “ 99008 ” 8 mod 55.
28 ALEXANDER BORISOVICH

5. Combinatorics
1. The Pigeonhole Principle says that if n ` 1 items are placed
into n compartments, then at least one of the compartments contains
more than one item. Scientifically speaking, this means that no func-
tion f : X Ñ Y is injective if |Y | ă |X| ă 8. A proof, if desired at all,
can be done by induction on the cardinality n “ |Y |. The principle is
clear when Y “ H (for there are no functions from X ‰ H to H), and
for n ą 0, if |f ´1 py0 q| ď 1, the problem is reduced to the case n ´ 1 for
the function from pX ´ f ´1 py0 qq Ñ pY ´ ty0 uq.
Exercises. (a) Show that in any group of people, there are at least
two who have the same number of friends within the group.
(b) Prove that the decimal (or binary) expansion of any fraction
m{n repeats (starting from some place) and has a period fewer than
n places. (Hint: The process of long division of m by n (let n ą m)
retraces itself as soon as the remainder repeats.)
2. Permutations, arrangements, combinations. Invertible
functions from a set to itself are called permutations. When the set
is finite (e.g. S “ t1, 2, . . . nu), the total number of such permutations
equals n! :“ n ¨ pn ´ 1q! “ n ¨ ppn ´ 1q ¨ ¨ ¨ 2 ¨ 1q. Indeed, after making
one of the n choices for f p1q, there remains n ´ 1 choices for f p2q, after
which, there remains n ´ 2 choices for f p3q, etc.9
Exercises. (a) Show that the number An,r of ordered r-tuples of el-
ements of t1, . . . , nu (they are called arrangements, or r-permutations)
is equal to n!{pn ´ rq! “ n ¨ pn ´ 1q ¨ ¨ ¨ pn ´ r ` 1q (assuming that
0 ď r ď n, of course). ` ˘
(b) Show that the number, denoted Cn,r or nr (“n choose r”),
of unordered r-tuples of elements from t1, . . . , nu (they are called r-
combinations, but are, simply speaking, r-element subsets) is equal to
n!{pn ´ rq!r!
(c) Prove that for prime p and 0 ă r ă p, the integer p!{r!pp ´ rq! is
divisible by p.
(d) Find how many times a prime p occurs in the prime factorization
of n! (Answer: tn{pu ` tn{p2 u ` ¨ ¨ ¨ where txu is the integer part, or
floor function, assigning to x P R the greatest integer not exceeding x.)
(e) An expression xr11 ¨ ¨ ¨ xrnn , where each ri is a non-negative integer,
is called a monomial in n variables px1 , . . . , xn q of total degree r “
r1 ` ¨ ¨ ¨ ` rn . Prove that the total number of monomials in n variables
9One can organize this
argument as formal induction by showing that the number
of permutations, ppnq, satisfies ppnq “ n ¨ ppn ´ 1q for all n ą 0. Can you explain,
though, why pp0q “ 1 (the base of induction)?
DISCRETE MATHEMATICS 29
` ˘
of total degree r is equal to n`r´1
r
. (Hint: Placing r ´ 1 crosses “ˆ”
in the string of n ` r ´ 1 blank spaces divides the spaces remaining
blank into r, possibly empty, groups.)
(f) An (unordered) collection of possibly repeated letters taken from
a certain alphabet is called a multiset. How many multisets of size r
can be made of n letters? What if the order of the letters also matters?
3. The binomial formula. Multiplying out n copies of the factor
px ` yq, we find
px ` yqn “px ` yqpx ` yq ¨ ¨ ¨ px ` yq
ˆ ˙ ˆ ˙ ˆ ˙ n ˆ ˙
n n 0 n n´1 n 0 n ÿ n n´k k
“ x y ` x y ` ¨¨¨` xy “ x y .
0 1 n k“0
k

Indeed, the number of times a monomial xn´k y k occurs in the sum


equals the number of ways to choose k letters y from among the n
parentheses px ` yq, i.e. “n choose k” times.
We already know how` to ˘ express the values of the binomial coeffi-
n n!
cients using factorials: k “ k!pn´kq! . It turns out however, that the
properties of these numbers are more clearly encoded in the properties
of the expression px`yqn , sometimes referred tořas `the˘ generating func-
n n
tion for the
` ˘ binomial coefficient. For example, “ p1 ` 1q “ 2n .
ř k n n
k k
n`1
k p´1q k “ p1 ´ 1q “ 0. Furthermore, obviously px ` yq “
n n n
px ` yqpx ` yq “ xpx ` yq ` ypx ` yq . This translates into
ÿ ˆ n ` 1˙ ÿ ˆn˙ ÿˆ n ˙
n`1´k k n´k k
x y “x x y `y xn`1´k y k´1.
k
k k
k k
k ´ 1

Since the equality of two polynomials in px, yq means equality of their


coefficients, we obtain Pascal’s identity:
ˆ ˙ ˆ ˙ ˆ ˙
n`1 n n
“ ` .
k k k´1
This is the defining property for the rows of Pascal’s triangle, where
each term equals the sum of two adjacent terms from the previous row:
0 1 0
0 1 1 0
0 1 2 1 0
0 1 3 3 1 0
0 1 4 6 4 1 0
1 5 10 10 5 1
1 6 15 20 15 6 1
30 ALEXANDER BORISOVICH

Let us experiment with each row: 12 “ 1, 12 ` 12 “ 2, 12 ` 22 ` 12 “ 6,


12 ` 32 ` 32 ` 12 “ 20: we get the numbers marked blue!
řn `n˘2 `2n˘
Exercises. (a) Prove the same for all rows: k“0 k “ n
.
(Hint: px ` yq2n “ px ` yqn px ` yqn .)
(b) Prove that px ` yqp ” xp ` y p mod p when p is prime. (Remark:
Note that it follows from Fermat’s Little Theorem that pp x ` ypqp “
x ` ypq “ x
pp pp ` ypp for all x
p, yp P Zp . The claim, however, is that the
polynomial expression px ` yqn in two variables x, y, when its integer
coefficients are reduced modulo p, turns into the expression xp ` y p .)
(c) Apply (b) to give a new proof of Fermat’s Little Theorem: ap ” a
mod p, using induction on a “ 0, 1, 2 . . . , p ´ 1.
2 3
(d) Define the exponential power series as ex :“ 1`x` x2! ` x3! `¨ ¨ ¨ “
ř x n
x y
ně0 n! , and rearrange the formal power series e e in two variables
n ř k l
into ex`y . (Hint: px`yq
n!
“ k`l“n xk! yl! ).
4. Counting. In principle, the subject of combinatorics is counting,
i.e. answering the question How many? about elements of various
interesting finite sets. Some such questions are easy to answer.
Examples (a) The product formula. If the set S “ X ˆ Y ˆ ¨ ¨ ¨ ˆ Z
is the Cartesian product of several finite sets, then |S| “ |X| ˆ |Y | ˆ
¨ ¨ ¨ ˆ |Z|.
(b) Power sets. The set of functions f : X Ñ Y is denoted Y X
(variant: 2X when Y “ t0, 1u), because when the sets are finite, the
total number of such functions is |Y ||X| (prove it!)
(c) Another counting technique, based on representation of elements
of certain sets as “leaves” of a tree with regular branching properties,
can be visualized in the context of the following playfully formulated
exercise: An alien has 3 arms with 7 fingers each, and 2 arms with 4
fingers each. How many fingers does the alien have in total?
Many sets are trickier to count.
Example: Inclusion-exclusion principle. Let us determine (anew)
the number ϕpnq of integers in t1, 2, . . . , nu relatively prime to n which
has, say, exactly three distinct prime factors p, q, and r. Clearly, we
have to exclude from the set all multiples of p, q or r. The multiples
of p form a fraction 1{p of the total size n of the set (and likewise 1{q
and 1{r for the multiples of p and r). Thus, it seems that the answer
for ϕpnq should be
ˆ ˙
1 1 1
nˆ 1´ ´ ´ .
p q r
DISCRETE MATHEMATICS 31

But some integers are divisible by both, say, p and q, and so we have
excluded them twice. Therefore we should include them once (and the
same for the multiples of qr and rp). Thus, it seems now that the
answer for ϕpnq should be
ˆ ˙
1 1 1 1 1 1
nˆ 1´ ´ ´ ` ` ` .
p q r pq qr rp
However, some integers are divisible by all the three primes. Each of
them was excluded 3 times and then included 3 times. Excluding them
once again, we obtain
ˆ ˙
1 1 1 1 1 1 1
ϕpnq “ n ˆ 1 ´ ´ ´ ` ` ` ´
p q r pq qr rp pqr
ˆ ˙ˆ ˙ˆ ˙
1 1 1
“nˆ 1´ 1´ 1´
p q r
in agreement with the general number-theoretic formula.
Example: Stirling’s partition numbers (also known as “Stirling’s
numbers of the second kind”, and denoted Sn,r ) count the number
of partitions of the set t1, 2, . . . , nu into r non-empty subsets (in other
words, the number of equivalence relations with r equivalence classes
on a set of n elements). The number of such partitions, for which tnu is
one of the subsets, coincides with the number Sn´1,r´1 of partitions of
t1, . . . , n ´ 1u into r ´ 1 non-empty subsets. Partitions not containing
tnu as one of the subsets are obtained by dividing t1, . . . , n ´ 1u into r
non-empty subsets, and then adjoining n to one of them. The number
of such partitions is rSn´1,r . Thus, Stirling’s partition numbers satisfy
the recursion relation somewhat resembling Pascal’s:
Sn,r “ Sn´1,r´1 ` r Sn´1,r .
Yet, no simple closed formula for Sn,r is known.
Exercises. (a) Show that Sn,1 “ Sn,n “ 1, for all n ě 1, and
compute a few rows of “Stirling’s triangle”.
(b) Prove that Sn,2 “ 2n ´ 1 for all n ě 1.
32 ALEXANDER BORISOVICH

6. Probability
1. Kolmogorov’s axioms. One often encounters experiments (e.g.
rolling dice) which, even when repeated under seemingly identical con-
ditions, can lead to different outcomes. Applying the idea of probability
to such situations, one assumes that for each outcome there exists a
real number p between 0 and 1 such that the fraction of the total num-
ber of experiments which lead to this outcome tends to p when the
experiment is repeated indefinitely. While applications of probabil-
ity theory to real life situations rely on this (unverifiable) assumption,
the purely mathematical formalism of probability theory can be put on
solid logical foundations. This was done in 1933 by Andrey Nikolaevich
Kolmogorov, who proposed the following axiomatic approach.
Let S be the sample space, i.e. the set of all possible outcomes
of a given experiment. By an event one means a subset of S which
belongs to a certain collection of subsets which form a so-called σ-
algebra. To explain what it is, let us recall that all subsets of S form a
Boolean algebra with respect to the operations of complement, union,
and intersection of sets. A σ-algebra must form a Boolean subalgebra,
i.e. be closed with respect to those operations: together with each
subset A, to contain its complement Ac “ S ´ A, and together with
any finite collection of subsets, to contain their intersection and their
union. However, for the purposes of most applications of probability
theory one needs to require more: that for any countable collection of
events, their union and intersection are events too. Thus, by definition,
a σ-algebra of events must be some collection of subsets in S which
includes H, S, and is closed with respect to passing to complements,
as well as taking countable unions and intersections.
Given such a σ-algebra, Kolmogorov’s axioms require that there be
defined a function, which to each event A (i.e. a subset of S which
belongs to the given σ-algebra) associates a real number P pAq (inter-
preted as probability of the event A) in such a way that the following
properties are satisfied:
(1) 0 ď P pAq ď 1 for all events A
(2) P pHq “ 0 and P pSq “ 1
(3) For any countable collection of pairwise disjoint10 events
˜ ¸
ď8 ÿ
8
P Ai “ P pAi q.
i“1 i“1

10i.e. Ai X Aj “ H for all i ‰ j.


DISCRETE MATHEMATICS 33

2. Discrete probability. In the majority of scientific applications,


the outcomes (e.g. measurements of a physical quantity) form a con-
tinuum (e.g. R, or Rn ), making probability theory an applied branch
of mathematical analysis. However, when the set S “ tx1 , x2 , . . . u of
outcomes is finite, or even countable, the situation simplifies. Namely,
without loss of generality one may assume that the σ-algebra of events
consists of all subsets of S. Then the probability function P must as-
sociate to each of the one-element events txi u their probabilities, pi ,
ř
which are real numbers between 0 and 1 such that |S| i“1 pi “ 1, and
the probability of an event A Ă S is given by
ÿ
P pAq “ pi .
i:xi PA

Exercise. Check that any sequence of non-negative numbers pi


whose sum is 1 determines this way a probability function (also called
a distribution) satisfying Kolmogorov’s axioms.
The assumptions about finiteness or countability of the sample space
lead to the theory of discrete probability, which in fact, while remaining
relatively elementary, captures most phenomena important in applica-
tions. When the sample set S is finite, it is often plausible (as, say,
in the case of rolling a “fair” die) that the probability distribution is
uniform, i.e. that all pi are the same (and hence each equal to 1{|S|).
Exercises. (a) Suppose that one randomly chooses an integer x and
computes its congruence class modulo n. Assuming that the resulting
probability distribution on S “ Zn is uniform, find the probability
P pAq of the event that x: is divisible by some divisor d of n; is not
divisible by d; is relatively prime to n.
(b) Monty Hall hides a prize behind one of three closed doors. You
must choose one. Then he opens one of the other two to show you that
the prize isn’t there. What should you do to maximize your chance of
collecting the prize: stick to your first choice, or change your mind?
2. Expected value. A real-valued function f : S Ñ R is called a
random variable. The expected value (also called the mean, or mathe-
matical expectation) of a random variable is defined as the sum of its
values weighted by the probabilities:
ÿ
Epf q :“ pi f pxi q.
i: xi PS

Example. The sample set S “ t1, 2, 3, 4, 5, 6u of a die is a subset of


the number line. The identity function (assigning to each outcome its
numerical value) is therefore a random variable (let us call it x).
34 ALEXANDER BORISOVICH

Exercises. (a) Assuming that the die is fair, compute: Epxq; Epx2 q;
Eppx ´ Epxqq2 q.
(b) If px, yq is the outcome of a fair pair of dice, compute Epx ` yq.
(c) Let f, g : S Ñ R be two random variables. Prove that Epf `gq “
Epf q ` Epgq, and Epkf q “ kEpf q for any k P R. (These properties
establish linearity of the mean value.)
(d) Give an example showing that generally speaking, the median
value does not possess such a linearity property.
(e) Does Epf gq always coincide with Epf qEpgq?
Example: Gambler’s ruin. A gambler’s strategy is successful if the
expected dollar amount of his gain is positive. Here is a gambling game:
The gambler wins $1 each time a fair coin comes up heads, and loses
$1 each time it comes up tails. Initially he has $n, and quits when he
loses all, or when he reaches his goal of having $m (where m ě n).
Let pn be the probability that starting with $n the gambler loses (and
respectively the probability that he reaches his goal be 1 ´ pn ). We
want to determine pn (which should depend on m too).
After the first toss, the gambler has either $pn ` 1q or $pn ´ 1q with
probability 1{2 each, after which his chances of losing become pn`1 and
pn´1 respectively. Therefore pn “ pn`1 {2 ` pn´1 {2, and we get the 2nd
order linear recursion relation:
pn`1 “ 2pn ´ pn´1 .
Its characteristic polynomial λ2 ´ 2λ ` 1 “ pλ ´ 1q2 has a double root
λ “ 1. The general solution in this case has the form pn “ A`Bn where
A and B are arbitrary constants. To find them, we note that p0 “ 1
and pm “ 0, because in either case the gambler quits right away, thus
having no chance to win in the former case, and to lose in the latter.
n
From these we find A “ 1 and B “ ´1{m so that pn “ 1 ´ m . When
m ą 2n, the gambler is more likely to lose, and when m ă 2n he is
more likely to win. The expected value E of the dollar amount at the
end of the game gives a quantitative measure of his average success:
´ n¯ n
E :“ pn $0 ` p1 ´ pn q$m “ 1 ´ $0 ` $m “ $n.
m m
Thus as one might expect, on average the gambler is left with just as
much as he invested into the gambling, neither gaining nor losing.
4. Conditional probability. We talk of chances when there is
no certainty. But if new information arrives, chances can change. For
example, the probability of the event A that the sum x ` y on two
dice is at least 10, equals 1{6 (check this); but if the first die shows 5
(condition C), then the chances for x ` y ě 10 improve to 1{3. Thus,
DISCRETE MATHEMATICS 35

we have calculated the conditional probability P pA|Cq “ 1{3 of event


A under the condition C (without even rolling any die). Here is the
general definition.
Given some condition C Ă S (i.e. an event which has supposedly
occurred), the conditional probability P pA|Cq of event A Ă S under
the condition C is the ratio P pA X Cq{P pCq of the (unconditional)
probabilities P pA X Cq and P pCq.
In our situation of discrete probability, one can describe this by say-
ing that when C occurs, all elementary events outside C become impos-
sible, but the relative weights of all elementary events within C remain
unchanged:
P ptxuq
@x R C, P ptxu|Cq “ 0, and @x P C, P ptxu|Cq “ ř .
yPC P ptyuq
ř
The denominator in the formula guarantees that xPS P px|Cq “ 1, i.e.
that the conditional probabilities form a new probability distribution
on S, which is actually zero outside C. The relative probability P pA|Cq
of events in S are simply their probabilities with respect to this new
distribution (based on the premise that C occurs).
The above formula is a special case of the so-called Bayes’ theorem,
which says that when the sample space S is partitioned into mutually
exclusive events Xi (cases) of known positive probabilities P pXi q, and
C is some condition whose relative probability P pC|Xi q in each case is
known, then the distribution of conditional probabilities between the
cases if C occurs, can be found from
P pC|Xi qP pXi q
@i, P pXi |Cq “ ř .
j P pC|Xj qP pXj q

Indeed,
P pC X Xi q P pC|Xi qP pXi q
P pXi |Cq “ “ ,
P pCq P pCq
where, since C is the disjoint union of all C X Xj ,
ÿ ÿ
P pCq “ P pC X Xj q “ P pC|Xj qP pXj q.
j j

In the simplest situation of just two cases, X and X c , Bayes’ theorem


says:
P pC|XqP pXq
P pX|Cq “ ,
P pC|XqP pXq ` P pC|X c qP pX c q
where of course P pXq ` P pX c q “ 1.
36 ALEXANDER BORISOVICH

Example. Suppose a large group of people is screened for some


disease which occurs in, say, 2% of the population. Let T denote the
event that the test comes back positive, while D denote the event that
the person actually has the disease. The probability P pD|T q, that
a person who tested positive is actually sick, can be far less than 1
because of false positive results, which occur with small probability
P pT |D cq of, say, 3%. Also, there is a chance P pT c|Dq “ 1 ´ P pT |Dq of
false negative outcomes (say, on the scale of 1%), when the screening
fails to detect the existing disease. Using Bayes’ formula, we find the
portion of sick people detected by the test:
0.99 ¨ 0.02
P pD|T q “ « 0.4,
0.99 ¨ 0.02 ` 0.03 ¨ 0.98
and the portion of sick people missed by the test:
0.01 ¨ 0.02
P pD|T c q “ « 0.0002,
0.01 ¨ 0.02 ` 0.97 ¨ 0.98
where we also applied P pT c |D c q “ 1 ´ P pT |D cq “ 0.97. It may come
as a surprise that in our example only 40% of those who test positive
have the disease, so that more thorough and expensive testing will be
needed, but only for those who tested positive in the screening, which
is about 5% of the population. At the same time, testing negatively
improves confidence in health by a hundred fold: from a 2% chance of
being sick to 0.02%.
5. Statistical independence. Returning to the Monty Hall prob-
lem, let A, B, C be the three doors, and say that you chose A. Assuming
that the prize isn’t there, you can argue that Monty Hall is forced to
open the only door remaining prize-less: If he opens B, the prize must
be behind C. The assumption that A is prize-less is true with prob-
ability 2{3. Therefore this is your chance to win if you change your
choice. If you stick to door A, your chance is 1 ´ 2{3 “ 1{3.
In fact there is a simpler way to arrive at the same conclusion. It is
based on the notion of statistically independent events. Two events A
and B are called independent if the probability that both occur equals
the product of their individual probabilities:
P pA X Bq “ P pAqP pBq.
Assuming that P pBq ą 0, we find this equivalent to P pA|Bq “ P pAq,
i.e. that the chance of A does not change when B occurs.
This is true about your choice of door A and the event that the
host opens door B. Indeed, how can any actions of the host of the
show after the prize is placed change the probability of the prize being
DISCRETE MATHEMATICS 37

behind A? So, when Monty Hall opens door B, it is still true that
P pA|B opensq “ P pAq “ 1{3, and hence P pC|B opensq “ 2{3.

More generally, several events are mutually independent if for any


collection of them the probability of their intersection is equal to the
product of their probabilities. Say, A, B, C are mutually independent,
if they are pairwise independent and P pA X B X Cq “ P pAqP pBqP pCq.

Exercises. (a) Show that if A and B are independent, then A and


c
B are independent.
(b) Let a random integer x belong to every congruence class modulo
N with equal probabilities 1{N, and let m and n be divisors of N. Prove
that the events m|x and n|x are independent if and only if m and n
are relatively prime. Generalize this to the case of several divisors.
(c) Let p1 , . . . pr be distinct prime divisors of N. Prove that the
events pi ∤ x are mutually independent, and derive from this the (al-
ready familiar) formula for Euler’s function:

ϕpNq ź ˆ 1
˙
P px is relatively prime to Nq :“ “ 1´ .
N prime p|N
p

(d) Bernoulli trials. A “loaded” coin turns heads with probability


p, and tails with probability q “ 1 ´ p (possibly not equal to 1{2).
Assuming that 10 tosses of the coin are mutually independent, find the
probability of the outcome HT T HT T T HT H.
(e) Show that the `n˘probability P pn, kq of the event that k out of n
k n´k
tosses are heads is k p q . (This distribution is called binomial.)
(f) Let s be the random variable equal to `1 when the toss of a
loaded coin turns heads, and ´1 when it turns tails. Compute Epsq
and Eps2 q.
(g) Independent random variables. Two (or several) discrete random
variables f, g, . . . , h : S Ñ R are called (mutually) independent if the
events f pxq “ a, gpxq “ b, . . . , hpxq “ c are mutually independent for
all values of a, b, . . . , c. Prove that if f and g are independent, then
Epf gq “ Epf qEpgq. (Hint: If ai are the values of f , and bj are the
values of g, then the events pf “ ai q ^ pg “ bj q are mutually disjoint.)
(h) Random walks. Every second, a drunkard makes one step for-
ward or one backward with equal probabilities and independently of
his previous steps. How far from the starting point is he expected to
be 100 seconds later? More precisely, find the expected value of the
square distance E pps1 ` ¨ ¨ ¨ ` s100 q2 q (and then take the square root).
(Hint: By (g), Epsi sj q “ 0 for i ‰ j.)
38 ALEXANDER BORISOVICH

7. Graphs and trees


1. Königsberg bridges. Leonard Euler amused himself by trying
to walk around town and visit every of its seven bridges exactly once.
He always failed, and to find out whether it is possible at all, he trans-
formed the map of Königsberg (Figure 9, left) into the graph (Figure
9, right) showing the islands and river banks as vertices (A, B, C, D)
connected by edges (for the bridges a, b, c, d, e, f, g), and — voila! —
graph theory was born.
C
c
d g
A e D
b f
a
B

Figure 9
By definition, a graph consists of several vertices connected by several
edges, and is called connected if for any two vertices one can find a chain
of edges connecting them.
Exercises. (a) Show that if a connected graph has an Euler circuit
(i.e. a closed chain of edges containing every edge exactly once), then
the numbers of edges attached to each vertex11 are all even.
(b) For each “vertex” of Königsberg, compute the number (it is called
the degree of the vertex) of the edges attached to it, and derive that
this town does not have Euler circuits.
(c) Conversely, prove that if all vertices of a connected graph have
even degrees, then Euler circuits exist. (Hint: Show that whenever
Euler enters a vertex, he can leave it along a yet unwalked edge, until
he comes to his starting point. If more than one of such walks are
needed to include all edges, argue that the union of these walks can be
walked differently as a single circuit.)
(d) A closed circuit in a graph is called Hamiltonian, if it visits
each vertex exactly once. Show that every Hamiltonian graph with n
vertices is obtained from the regular n-gon by adding extra edges.
Remark. We should have said “is isomorphic to the graph, obtained
...” Two graphs are called isomorphic, if there exist a bijection between
the sets of their vertices, and a bijection between the sets of their edges,
such that the corresponding edges connect corresponding vertices.
11To
be accurate, if an edge is a loop, meaning that it connects a vertex to itself
(which is allowed), then the edge should be considered attached twice to this vertex.
DISCRETE MATHEMATICS 39

2. The adjacency matrix. A directed graph is specified by two


finite sets: vertices (V ) and edges (E), and by two functions v˘ :
E Ñ V , assigning to each edge e P E its head v` peq P E and its tail
v´ peq P V . Pictorially, a directed edge is an arrow pointing from its
tail to its head.
The adjacency matrix of a directed graph is the square matrix A “
raij s whose rows and columns are labeled by the vertices of the graph,
and whose entry aij in the ith row and jth column is equal to the
number of edges from the vertex j to the vertex i.
Theorem. The matrix P pNq whose entry pij pNq is defined as the
number of directed walks to vertex i from vertex j consisting of exactly
N edges (placed tail-to-head) coincides with the matrix power AN .

i j

aik p (N−1)
kj

Figure 10
Proof: Induction on N. Base: P p0q is the identity matrix A0 .
Step: For N ą 0, any N-edge-long walk starting at the vertex j and
terminating at the vertex i (Figure 10, right) can come from any of the
vertices k, and consist of one of the aik edges leading from vertex k to
vertex i, preceded by any of the pkj pN ´1q walks of length N ´1 leading
to vertex k from vertex j. The possibilities of passing
ř through different
vertices k are mutually exclusive, hence pij pNq “ k aik ¨ pkj pN ´ 1q.
Miraculously, this is the formula for the elements of the matrix product:
P pNq “ A ¨ P pN ´ 1q. By the induction hypothesis, P pN ´ 1q “ AN ´1 ,
and therefore P pNq “ AN . Q.E.D.
Exercises. (a) Write down the adjacency matrix of the graph on
Figure 10, left, and compute the numbers of walks of length 4 between
its vertices.
(b) Formulate and prove the analogue of the above theorem in the
case of undirected graphs (as in Euler’s problem). (Hint: The matrix
A in this case will be symmetric.)
(c) Use (b) and Linear Algebra to compute the total number of
closed walks of length N in the complete graph with n vertices. (In
this graph, every two distinct vertices are connected by an undirected
edge.) (Answer: pn ´ 1qN ` pn ´ 1qp´1qN .)
40 ALEXANDER BORISOVICH

3. Spanning trees. An (undirected) connected circuit-free graph


is called a tree. In other words, in a tree, every two vertices can be
connected by a unique trail (i.e. a walk with no repeated edges).
Every connected graph has a spanning tree (blue on Figure 11, left)
i.e. a connected subgraph with the same vertices, which is a tree.
Indeed, removing one edge of a circuit leaves the graph connected, and
does not change the set of its vertices. Thus, removing such edges one
by one as long as circuits remain, we end up with a spanning tree.
Theorem. In a tree, the number of vertices exceeds the number of
edges by one: |V | ´ |E| “ 1.
Proof: Induction on |V |. Base: If |V | “ 1, any edge would be a
circuit, so |E| “ 0. Step: A tree with |V | ą 1 has vertices of degree
1 (they are called leaves). Indeed, if all degrees ą 1, one can find a
circuit (as in Euler’s problem). Removing a leaf together with the only
adjacent edge does not change |V |´|E|, but results in a tree with fewer
vertices, for which |V | ´ |E| “ 1 by the induction hypothesis. Q.E.D.

H H H

H C C C H

H H C H H

Figure 11

Exercises. (a) Show that the converse is also true: A connected


graph with |E| “ |V | ´ 1 is a tree. (Hint: Find its spanning tree.)
(b) A forest is a disconnected graph, each of whose connected com-
ponents is a tree. Show that every disconnected graph has a spanning
forest, and that |V | ´ |E| “ # ´ |C|, where # is the number of con-
nected components and |C| is the number of “independent circuits”,
i.e. the minimal number of edges that need to be removed to get rid
of all circuits.
(c) A hydrocarbon molecule (Figure 11, right) consists of hydrogen
and carbon atoms (they can form one, and up to four chemical bonds
respectively), and is called saturated if the number |H| of hydrogen
atoms in it is maximal for a given number |C| of carbon atoms. Prove
Cayley’s theorem: A saturated hydrocarbon molecule must be a tree
with |H| “ 2|C| ` 2.
DISCRETE MATHEMATICS 41

4. Rooted trees. The tree in Figure 12 (left) encodes the syntax


of the expression ppa ´ bq ¨ cq ` pd{eq. It is an example of a full binary
rooted tree of height 3, where rooted means that one vertex has been
designated as the root, height indicates the maximal distance from the
root to the leaves (along a unique trail of edges), binary refers to the
fact that each parent vertex (i.e. not a leaf) has at most two children
(i.e. adjacent vertices one step farther from the root), and full refers
to the fact that each parent has exactly two children (designated as
left and right). In general rooted trees don’t have to be binary, and
binary rooted trees don’t have to be “full”. They are useful in many
situations, e.g. for describing syntax of natural (linguistic) or formal
(CS) languages as in this example, but the terminology emphasizes
resemblance to genealogical trees.

root + level 0

. / 1

_ c d e 2

a b 3

Figure 12

Exercises. (a) Prove that if a full binary tree has k internal vertices
(i.e. not leaves), then it has k ` 1 terminal vertices (i.e. leaves). (Hint:
Show that |E| “ 2 ˆ k.)
(b) Prove that the number t of terminal vertices of any binary tree
of height h does not exceed 2h . (Hint: Complete such a tree to the full
binary tree of height h with exactly 2h leaves on the bottom level, as
in the example of Figure 12, right.)
5. Kruskal’s and Prim’s algorithms. Suppose the edges of a con-
nected graph are labeled by positive weights wpeq (e.g. the cost of trans-
portation of something between cities represented by the vertices),
ř and
one needs to find a spanning tree T of minimal total weight ePT wpeq.
Kruskal’s algorithm proposes to build the tree, TK , by adding edges
one at a time in the order of increasing weight, but skipping an edge
if it creates a circuit with the previously added ones. Prim’s algorithm
proposes to grow the tree, TP , from a root vertex, by adding one vertex
at a time and always picking the “cheepest” one among the available
choices. Figure 13, where the weights a ă b ă ¨ ¨ ¨ ă k are ordered al-
phabetically, shows the order in which the edges of the tree are added
42 ALEXANDER BORISOVICH

in Kruskal’s (blue), and in Prim’s (red and green) algorithms started


from the root vertex of that color.

d 4 5 5
e a
1
j 14
h g
5 f
k 3
i 2
c b
3 1 4 2 2 3

Figure 13

Theorem. Kruskal’s and Prim’s algorithms work correctly.

Proof. (i) To have the total weight less than that of TK , a tree T
must replace edges ei of TK of weights wpe1 q ă ¨ ¨ ¨ ă wpem q ă . . .
with some edges e1i of weights wpe11 q ă ¨ ¨ ¨ ă wpe1m q ă . . . such that
wpe1m q ă wpem q for at least one m. But all edges with wpeq ă wpem q,
together with their vertices, form a subgraph where the edges of TK
with wpeq ă wpem q form a spanning forest. Therefore replacing only
m ´ 1 edges e1 , . . . , em´1 of this forest with m edges e11 , . . . , e1m will
create a circuit in T . This contradiction shows that TK is minimal.

Remark. Note that TK depends not on the weights wpeq, but only
on the total ordering of the edges by their weights. Thus, if some
weights are equal, to run the algorithm, one can perturb them a little
to remove the coincidence and yet not destroy their ordering relative to
other weights. The tree TK may depend on the perturbation, but the
total “unperturbed” weight of TK will be the same. Now we will prove
that TK coincides with TP constructed using the same edge ordering.

(ii) Suppose TP differs from TK , and let e be an edge in TP not


contained in TK . On the step of Prim’s algorithm when e is added to
TP , it connects a vertex v´ in the previous part of TP (call it S) with
a new vertex v` not in S. Adding e to TK creates a circuit consisting
of e and the unique path in TK from v´ to v` . Let e1 be the first edge
of this path which is not in S (it exists, since v´ is in S, but v` is
not). Then wpe1 q ą wpeq, for otherwise e1 would have been added to S
instead of e on that step of Prim’s algorithm. But then, replacing e1
in TK with e, we would get a spanning tree “cheaper” than TK . This
contradicts part (i), and implies that TP “ TK . Q.E.D.
DISCRETE MATHEMATICS 43

6. Dijkstra’s algorithm. Let us interpret the weights wpeq ą 0 as


the edges’ lengths, and consider the problem of finding shortest paths
from a vertex designated as the root of the graph to all other vertices
v P V . We may assume (slightly perturbing the weights if needed) that
different paths between the same vertices have unequal lengths.
Exercise: Bellman’s principle. Show that the (unique) shortest
path from a given vertex to the root also provides the shortest paths
for all intermediate vertices. Deduce that these paths altogether form
a spanning (rooted) tree.
Dijkstra’s algorithm builds this tree starting from the root. On each
step, the algorithms recalculates the function d : V Ñ R` Y 8 of
tentative distances to the root from all vertices v P V by: (i) selecting
from the set U Ă V of yet unvisited vertices the one (vactive ) with the
minimal value of d, (ii) for each v P U adjacent to vactive , replacing dpvq
with the sum dpvactive q ` wpevactive ,v q provided that this sum is smaller
than the current value of dpvq, and (iii) removing vactive from U. At
the initialization, U :“ V , and the value of d is set to 0 for the root
vertex, and 8 for all others. The process stops when U “ H.
An example on Figure 14, where weights are marked black, vactive red,
and visited vertices green, shows the evolution of function d (blue). By
storing for each vertex v with dpvq ă 8 the last edge evactive ,v from (ii)
which decreased dpvq (shown green), we find Bellman’s tree at the end.
1 1 2 1 2 3 1 2 3
1 1 1 9 1 1 1 9 1 1 1 9 1 1 1 9
0 7 8 0 7 8 0 7 8 0 7 8 12
4 1 1 1 4 1 1 1 4 1 1 1 4 1 1 1
4 4 8 4 8 10 4 8 10
1 2 3 1 2 3 1 2 3 1 2 3
1 1 1 9 1 1 1 9 1 1 1 9 1 1 9
1
0 7 8 12 7 8 12 7 8 7 8
0 0 7 0 7
4 1 1 1 4 4 4
1 1 1 1 1 1 1 1 1
4 5 10 4 5 6 4 5 6 4 5 6

Figure 14
Theorem. Dijkstra’s algorithm works correctly.
Proof. We show by induction that dpvactive q is the shortest distance
from vactive to the root. Base: When the root is active, this is true.
Step: On a path from vactive P U to the root, let euv be an edge con-
necting u P U with v R U. By the induction hypothesis, the minimal
distance from (previously active) v to the root equals dpvq. Therefore
the length of the path ě dpvq ` wpeuv q ě dpuq ą dpvactive q since vactive
provides the minimum of d on U.

You might also like