Notes PDF
Notes PDF
James Aspnes
2016-08-03 16:41
Contents
Table of contents i
List of tables xv
Preface xvii
1 Introduction 1
1.1 So why do I need to learn all this nasty mathematics? . . . . 1
1.2 But isnt math hard? . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thinking about math with your heart . . . . . . . . . . . . . 3
1.4 What you should know about math . . . . . . . . . . . . . . . 3
1.4.1 Foundations and logic . . . . . . . . . . . . . . . . . . 4
1.4.2 Basic mathematics on the real numbers . . . . . . . . 4
1.4.3 Fundamental mathematical objects . . . . . . . . . . . 5
1.4.4 Modular arithmetic and polynomials . . . . . . . . . . 6
1.4.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . 6
1.4.6 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.7 Counting . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.8 Probability . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.9 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Mathematical logic 9
2.1 The basic picture . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Axioms, models, and inference rules . . . . . . . . . . 9
i
CONTENTS ii
2.1.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 What can go wrong . . . . . . . . . . . . . . . . . . . 10
2.1.4 The language of logic . . . . . . . . . . . . . . . . . . 11
2.1.5 Standard axiom systems and models . . . . . . . . . . 11
2.2 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Operations on propositions . . . . . . . . . . . . . . . 13
2.2.1.1 Precedence . . . . . . . . . . . . . . . . . . . 15
2.2.2 Truth tables . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Tautologies and logical equivalence . . . . . . . . . . . 17
2.2.3.1 Inverses, converses, and contrapositives . . . 19
2.2.3.2 Equivalences involving true and false . . . . 21
Example . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Normal forms . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Predicate logic . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Variables and predicates . . . . . . . . . . . . . . . . . 25
2.3.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2.1 Universal quantifier . . . . . . . . . . . . . . 26
2.3.2.2 Existential quantifier . . . . . . . . . . . . . 26
2.3.2.3 Negation and quantifiers . . . . . . . . . . . 27
2.3.2.4 Restricting the scope of a quantifier . . . . . 27
2.3.2.5 Nested quantifiers . . . . . . . . . . . . . . . 28
2.3.2.6 Examples . . . . . . . . . . . . . . . . . . . . 30
2.3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.4 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . 32
2.3.5 Models . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.5.1 Examples . . . . . . . . . . . . . . . . . . . . 33
2.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Inference Rules . . . . . . . . . . . . . . . . . . . . . . 35
2.4.2 Proofs, implication, and natural deduction . . . . . . . 37
2.4.2.1 The Deduction Theorem . . . . . . . . . . . 37
2.5 Natural deduction . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Inference rules for equality . . . . . . . . . . . . . . . 39
2.5.2 Inference rules for quantified statements . . . . . . . . 39
2.6 Proof techniques . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Set theory 47
3.1 Naive set theory . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Operations on sets . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Proving things about sets . . . . . . . . . . . . . . . . . . . . 50
CONTENTS iii
6 Summation notation 86
6.1 Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.1 Formal definition . . . . . . . . . . . . . . . . . . . . . 87
6.1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1.3 Summation identities . . . . . . . . . . . . . . . . . . . 87
6.1.4 Choosing and replacing index variables . . . . . . . . . 89
6.1.5 Sums over given index sets . . . . . . . . . . . . . . . 89
6.1.6 Sums without explicit bounds . . . . . . . . . . . . . . 91
6.1.7 Infinite sums . . . . . . . . . . . . . . . . . . . . . . . 91
6.1.8 Double sums . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Other big operators . . . . . . . . . . . . . . . . . . . . . . . 92
6.4 Closed forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4.1 Some standard sums . . . . . . . . . . . . . . . . . . . 94
6.4.2 Guess but verify . . . . . . . . . . . . . . . . . . . . . 95
6.4.3 Ansatzes . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.4 Strategies for asymptotic estimates . . . . . . . . . . . 97
6.4.4.1 Pull out constant factors . . . . . . . . . . . 97
6.4.4.2 Bound using a known sum . . . . . . . . . . 97
Geometric series . . . . . . . . . . . . . . . . . 98
Constant series . . . . . . . . . . . . . . . . . . 98
Arithmetic series . . . . . . . . . . . . . . . . . 98
Harmonic series . . . . . . . . . . . . . . . . . . 98
6.4.4.3 Bound part of the sum . . . . . . . . . . . . 98
6.4.4.4 Integrate . . . . . . . . . . . . . . . . . . . . 98
6.4.4.5 Grouping terms . . . . . . . . . . . . . . . . 99
6.4.4.6 Oddities . . . . . . . . . . . . . . . . . . . . . 99
6.4.4.7 Final notes . . . . . . . . . . . . . . . . . . . 99
9 Relations 122
9.1 Representing relations . . . . . . . . . . . . . . . . . . . . . . 122
9.1.1 Directed graphs . . . . . . . . . . . . . . . . . . . . . . 122
9.1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.2 Operations on relations . . . . . . . . . . . . . . . . . . . . . 124
9.2.1 Composition . . . . . . . . . . . . . . . . . . . . . . . 124
9.2.2 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.3 Classifying relations . . . . . . . . . . . . . . . . . . . . . . . 125
9.4 Equivalence relations . . . . . . . . . . . . . . . . . . . . . . . 125
9.4.1 Why we like equivalence relations . . . . . . . . . . . . 128
9.5 Partial orders . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.5.1 Drawing partial orders . . . . . . . . . . . . . . . . . . 130
9.5.2 Comparability . . . . . . . . . . . . . . . . . . . . . . 130
9.5.3 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.5.4 Minimal and maximal elements . . . . . . . . . . . . . 131
9.5.5 Total orders . . . . . . . . . . . . . . . . . . . . . . . . 132
9.5.5.1 Topological sort . . . . . . . . . . . . . . . . 132
9.5.6 Well orders . . . . . . . . . . . . . . . . . . . . . . . . 135
9.6 Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
CONTENTS vi
10 Graphs 140
10.1 Types of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.1.1 Directed graphs . . . . . . . . . . . . . . . . . . . . . . 141
10.1.2 Undirected graphs . . . . . . . . . . . . . . . . . . . . 141
10.1.3 Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . 142
10.2 Examples of graphs . . . . . . . . . . . . . . . . . . . . . . . . 143
10.3 Local structure of graphs . . . . . . . . . . . . . . . . . . . . 144
10.4 Some standard graphs . . . . . . . . . . . . . . . . . . . . . . 144
10.5 Subgraphs and minors . . . . . . . . . . . . . . . . . . . . . . 147
10.6 Graph products . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10.6.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 150
10.7 Paths and connectivity . . . . . . . . . . . . . . . . . . . . . . 151
10.8 Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.9 Proving things about graphs . . . . . . . . . . . . . . . . . . . 155
10.9.1 Paths and simple paths . . . . . . . . . . . . . . . . . 155
10.9.2 The Handshaking Lemma . . . . . . . . . . . . . . . . 156
10.9.3 Characterizations of trees . . . . . . . . . . . . . . . . 156
10.9.4 Spanning trees . . . . . . . . . . . . . . . . . . . . . . 160
10.9.5 Eulerian cycles . . . . . . . . . . . . . . . . . . . . . . 160
11 Counting 162
11.1 Basic counting techniques . . . . . . . . . . . . . . . . . . . . 163
11.1.1 Equality: reducing to a previously-solved case . . . . . 163
11.1.2 Inequalities: showing |A| |B| and |B| |A| . . . . . 163
11.1.3 Addition: the sum rule . . . . . . . . . . . . . . . . . . 164
11.1.3.1 For infinite sets . . . . . . . . . . . . . . . . 165
11.1.3.2 The Pigeonhole Principle . . . . . . . . . . . 165
11.1.4 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . 165
11.1.4.1 Inclusion-exclusion for infinite sets . . . . . . 166
11.1.4.2 Combinatorial proof . . . . . . . . . . . . . . 166
11.1.5 Multiplication: the product rule . . . . . . . . . . . . 167
11.1.5.1 Examples . . . . . . . . . . . . . . . . . . . . 168
11.1.5.2 For infinite sets . . . . . . . . . . . . . . . . 168
11.1.6 Exponentiation: the exponent rule . . . . . . . . . . . 168
11.1.6.1 Counting injections . . . . . . . . . . . . . . 169
11.1.7 Division: counting the same thing in two different ways170
11.1.8 Applying the rules . . . . . . . . . . . . . . . . . . . . 172
11.1.9 An elaborate counting problem . . . . . . . . . . . . . 173
CONTENTS vii
Bibliography 359
Index 362
List of Figures
xiv
List of Tables
xv
List of Algorithms
xvi
Preface
These were originally the notes for the Fall 2013 semester of the Yale course
CPSC 202a, Mathematical Tools for Computer Science. They have been sub-
sequently updated to incorporate numerous corrections suggested by Dana
Angluin and her students.
xvii
Internet resources
PlanetMath http://planetmath.org
WikiPedia http://en.wikipedia.org
Google http://www.google.com
xviii
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
1. If x is in S, then x + 1 is in S.
But because the first is about boring numbers and the second is about
fascinating social relationships and rules, most people have a much easier
time deducing that to show somebody is royal we need to start with some
known royal and follow a chain of descendants than they have deducing that
to show that some number is in the set S we need to start with some known
element of S and show that repeatedly adding 1 gets us to the number we
want. And yet to a logician these are the same processes of reasoning.
So why is statement (1) trickier to think about than statement (1)? Part
of the difference is familiaritywe are all taught from an early age what it
means to be somebodys child, to take on a particular social role, etc. For
mathematical concepts, this familiarity comes with exposure and practice,
just as with learning any other language. But part of the difference is that
1
For a description of some classic experiments that demonstrate this, see http://en.
wikipedia.org/wiki/Wason_selection_task.
CHAPTER 1. INTRODUCTION 3
we humans are wired to understand and appreciate social and legal rules:
we are very good at figuring out the implications of a (hypothetical) rule
that says that any contract to sell a good to a consumer for $100 or more
can be canceled by the consumer within 72 hours of signing it provided the
good has not yet been delivered, but we are not so good at figuring out the
implications of a rule that says that a number is composite if and only if it
is the product of two integer factors neither of which is 1. Its a lot easier to
imagine having to cancel a contract to buy swampland in Florida that you
signed last night while drunk than having to prove that 82 is composite. But
again: there is nothing more natural about contracts than about numbers,
and if anything the conditions for our contract to be breakable are more
complicated than the conditions for a number to be composite.
Propositional logic.
Predicate logic.
Proofs.
Set theory.
Functions.
Functions as sets.
Injections, surjections, and bijections.
Cardinality.
Finite vs infinite sets.
Sequences.
Relations.
Other algebras.
Arithmetic in Zm .
RSA encryption.
Geometric interpretations.
1.4.6 Graphs
Why: Good for modeling interactions. Basic tool for algorithm design.
1.4.7 Counting
Why: Basic tool for knowing how much resources your program is going to
consume.
Basic combinatorial counting: sums, products, exponents, differences,
and quotients.
Combinatorial functions.
Factorials.
Binomial coefficients.
The 12-fold way. (*)
Advanced counting techniques.
Inclusion-exclusion.
Recurrences. (*)
Generating functions. (Limited coverage.)
1.4.8 Probability
Why: Cant understand randomized algorithms or average-case analysis
without it. Handy if you go to Vegas.
Discrete probability spaces.
Events.
Independence.
Random variables.
Expectation and variance.
Probabilistic inequalities.
Markovs inequality.
Chebyshevs inequality. (*)
Chernoff bounds. (*)
Stochastic processes. (*)
Markov chains. (*)
Martingales. (*)
Branching processes. (*)
CHAPTER 1. INTRODUCTION 8
1.4.9 Tools
Why: Basic computational stuff that comes up, but doesnt fit in any of the
broad categories above. These topics will probably end up being mixed in
with the topics above.
Things you may have forgotten about exponents and logarithms. (*)
P Q
and notation.
Asymptotic notation.
Chapter 2
Mathematical logic
9
CHAPTER 2. MATHEMATICAL LOGIC 10
2.1.2 Consistency
A theory is consistent if it cant prove both P and not-P for any P . Con-
sistency is incredibly important, since all the logics people actually use can
prove anything starting from P and not-P .
The natural numbers N. These are defined using the Peano axioms,
and if all you want to do is count, add, and multiply, you dont need
much else. (If you want to subtract, things get messy.)
The integers Z. Like the naturals, only now we can subtract. Division
is still a problem.
The rational numbers Q. Now we can divide. But what about 2?
p
The real numbers R. Now we have 2. But what about (1)?
CHAPTER 2. MATHEMATICAL LOGIC 12
The complex numbers C. Now we are pretty much done. But what if
we want to talk about more than one complex number at a time?
The universe of sets. These are defined using the axioms of set theory,
and produce a rich collection of sets that include, among other things,
structures equivalent to the natural numbers, the real numbers, col-
lections of same, sets so big that we cant even begin to imagine what
they look like, and even bigger sets so big that we cant use the usual
accepted system of axioms to prove whether they exist or not. Fortu-
nately, in computer science we can mostly stop with finite sets, which
makes life less confusing.
In practice, the usual way to do things is to start with sets and then define
everything else in terms of sets: e.g., 0 is the empty set, 1 is a particular
set with 1 element, 2 a set with 2 elements, etc., and from here we work our
way up to the fancier numbers. The idea is that if we trust our axioms for
sets to be consistent, then the things we construct on top of them should
also be consistent, although if we are not careful in our definitions they may
not be exactly the things we think they are.
2 + 2 = 4. (Always true).
2 + 2 = 5. (Always false).
Examples of non-propositions:
Exclusive or If you want to exclude the possibility that both p and q are
true, you can use exclusive or instead. This is written as p q, and
is true precisely when exactly one of p or q is true. Exclusive or is
not used in classical logic much, but is important for many computing
applications, since it corresponds to addition modulo 2 (see 8.4) and
1
The symbol is a stylized V, intended to represent the Latin word vel, meaning
or. (Thanks to Noel McDermott for remembering this.) Much of this notation is actu-
ally pretty recent (early 20th century): see http://jeff560.tripod.com/set.html for a
summary of earliest uses of each symbol.
CHAPTER 2. MATHEMATICAL LOGIC 14
NOT p p p, p
p AND q pq
p XOR q pq
p OR q pq
p implies q pq p q, p q
p if and only if q pq pq
Table 2.1 shows what all of this looks like when typeset nicely. Note that
in some cases there is more than one way to write a compound expression.
Which you choose is a matter of personal preference, but you should try to
be consistent.
2.2.1.1 Precedence
The short version: for the purposes of this course, we will use the ordering in
Table 2.1, which corresponds roughly to precedence in C-like programming
languages. But see caveats below. Remember always that there is no shame
in putting in a few extra parentheses if it makes a formula more clear.
Examples: (p q r s t) is interpreted as ((((p) (q r))
s) t). Both OR and AND are associative, so (p q r) is the same as
((p q) r) and as (p (q r)), and similarly (p q r) is the same as
((p q) r) and as (p (q r)).
Note that this convention is not universal: many mathematicians give
AND and OR equal precedence, so that the meaning of p q r is ambigu-
ous without parentheses. There are good arguments for either convention.
Making AND have higher precedence than OR is analogous to giving mul-
tiplication higher precedence than addition, and makes sense visually when
AND is written multiplicatively (as in pq qr for (p q) (q r). Mak-
ing them have the same precedence emphasizes the symmetry between the
two operations, which well see more about later when we talk about De
Morgans laws in 2.2.3. But as with anything else in mathematics, either
convention can be adopted, as long as you are clear about what you are
doing and it doesnt cause annoyance to the particular community you are
writing for.
There does not seem to be a standard convention for the precedence of
XOR, since logicians dont use it much. There are plausible arguments for
CHAPTER 2. MATHEMATICAL LOGIC 16
putting XOR in between AND and OR, but its probably safest just to use
parentheses.
Implication is not associative, although the convention is that it binds
to the right, so that a b c is read as a (b c); except for
type theorists and Haskell programmers, few people ever remember this,
so it is usually safest to put in the parentheses. I personally have no idea
what p q r means, so any expression like this should be written with
parentheses as either (p q) r or p (q r).
p p
0 1
1 0
And here is a truth table for the rest of the logical operators:
p q pq pq pq pq pq
0 0 0 0 0 1 1
0 1 1 1 0 1 0
1 0 1 1 0 0 0
1 1 1 0 1 1 1
Q, etc. We can check that each truth table we construct works by checking
that the truth values each column (corresponding to some subexpression of
the thing we are trying to prove) follow from the truth values in previous
columns according to the rules established by the truth table defining the
appropriate logical operation.
For predicate logic, model checking becomes more complicated, because
a typical system of axioms is likely to have infinitely many models, many of
which are likely to be infinitely large. There we will need to rely much more
on proofs constructed by applying inference rules.
p p p p 0
0 1 0 0
1 0 0 0
and observe that the last two columns are always equal.
CHAPTER 2. MATHEMATICAL LOGIC 18
p pp
0 0
1 1
p q p q p q
0 0 1 1
0 1 1 1
1 0 0 0
1 1 1 1
p q p q (p q) p q p q
0 0 0 1 1 1 1
0 1 1 0 1 0 0
1 0 1 0 0 1 0
1 1 1 0 0 0 0
p q r q r p (q r) p q p r (p q) (p r)
0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0
0 1 0 0 0 1 0 0
0 1 1 1 1 1 1 1
1 0 0 0 1 1 1 1
1 0 1 0 1 1 1 1
1 1 0 0 1 1 1 1
1 1 1 1 1 1 1 1
associativity of ):
(p r) (q r) (p r) (q r) [Using p q p q twice]
p q r r [Associativity and commutativity of ]
p q r [p p p]
(p q) r [De Morgans law]
(p q) r. [p q p q]
p p Double negation
(p q) p q De Morgans law
(p q) p q De Morgans law
pq qp Commutativity of AND
pq qp Commutativity of OR
p (q r) p (q r) Associativity of AND
p (q r) p (q r) Associativity of OR
p (q r) (p q) (p r) AND distributes over OR
p (q r) (p q) (p r) OR distributes over AND
p q p q Equivalence of implication and OR
p q q p Contraposition
p q (p q) (q p) Expansion of if and only if
p q p q Inverse of if and only f
pqqp Commutativity of if and only if
Table 2.2: Common logical equivalences (see also [Fer08, Theorem 1.1])
CHAPTER 2. MATHEMATICAL LOGIC 21
P P 1
P P 0.
The law of the excluded middle is what allows us to do case analysis, where
we prove that some proposition Q holds by showing first that P implies Q
and then that P also implies Q.3
One strategy for simplifying logical expressions is to try to apply known
equivalences to generate sub-expressions that reduce to true or false via
3
Though we will use the law of the excluded middle, it has always been a little bit
controversial, because it is non-constructive: it tells you that one of P or P is true,
but it doesnt tell you which.
For this reason, some logicians adopt a variant of classical logic called intuitionistic
logic where the law of the excluded middle does not hold. Though this was originally
done for aesthetic reasons, it turns out that there is a deep connection between computer
programs and proofs in intuitionistic logic, known as the Curry-Howard isomorphism.
The idea is that you get intuitionistic logic if you interpret
P as an object of type P ;
P Q as a function that takes a P as an argument and returns a Q;
P Q as an object that contains both a P and a Q (like a struct in C);
P Q as an object that contains either a P or a Q (like a union in C); and
P as P , a function that given a P produces a special error value that
cant otherwise be generated.
With this interpretation, many theorems of classical logic continue to hold. For example,
modus ponens says
(P (P Q)) Q.
Seen through the Curry-Howard isomorphism, this means that there is a function that,
given a P and a function that generates a Q from a P , generates a Q. For example, the
following Scheme function:
(define (modus-ponens p p-implies q) (p-implies-q p))
CHAPTER 2. MATHEMATICAL LOGIC 22
P 00 P 0P
P 1P P 11
P 0 P P 0P
P 1P P 1 P
P 0 P 0P 1
P 11 1P P
Table 2.3: Absorption laws. The first four are the most important. Note
that , , , and are all commutative, so reversed variants also work.
the law of the excluded middle or the law of non-contradiction. These can
then be absorbed into nearby terms using various absorption laws, shown in
Table 2.3.
In this derivation, weve labeled each step with the equivalence we used.
Most of the time we would not be this verbose.
(P Q) (P R) (P Q) (Q R)
`(P Q) (P R) (P Q) (Q R) Q
`(P Q) (P R) (P Q) (Q R) Q R
`(P Q) (P R) (P Q) (Q R) Q R P
`P.
CHAPTER 2. MATHEMATICAL LOGIC 24
Socrates is a man.
Spocrates is a man.
choice of axioms, we may not know this. What we would like is a general
way to say that humanity implies mortality for everybody, but with just
propositional logic, we cant write this fact down.
x is human.
x is the parent of y.
x + 2 = x2 .
These are not propositions because they have variables in them. Instead,
they are predicates; statements whose truth-value depends on what con-
crete object takes the place of the variable. Predicates are often abbreviated
by single capital letters followed by a list of arguments, the variables that
appear in the predicate, e.g.:
H(x) = x is human.
Q(x) = x + 2 = x2 .
We can also fill in specific values for the variables, e.g. H(Spocrates) =
Spocrates is human. If we fill in specific values for all the variables, we
have a proposition again, and can talk about that proposition being true
(e.g. H(2) and H(1) are true) or false (H(0) is false).
In first-order logic, which is what we will be using in this course, vari-
ables always refer to things and never to predicates: any predicate symbol
is effectively a constant. There are higher-order logics that allow variables
to refer to predicates, but most mathematics accomplishes the same thing
by representing predicates with sets (see Chapter 3).
2.3.2 Quantifiers
What we really want is to be able to say when H or P or Q is true for many
different values of their arguments. This means we have to be able to talk
CHAPTER 2. MATHEMATICAL LOGIC 26
x : P (x) x : P (x).
x : P (x) x : P (x).
These are essentially the quantifier version of De Morgans laws: the first
says that if you want to show that not all humans are mortal, its equivalent
to finding some human that is not mortal. The second says that to show
that no human is mortal, you have to show that all humans are not mortal.
x : x > 0 x 1 0
talking about real numbers (two of which happen to be square roots of 79),
we can exclude the numbers we dont want by writing
x Z : x2 = 79
which is interpreted as
x : (x Z x2 = 79)
or, equivalently
x : x Z x2 6= 79.
Here Z = {. . . , 2, 1, 0, 1, 2, . . .} is the standard set of integers.
For more uses of , see Chapter 3.
i.e., there does not exist an x that is prime and any y greater than x is not
prime. Or in a shorter (though not strictly equivalent) form:
xy : y > x Prime(y)
xy : likes(x, y)
CHAPTER 2. MATHEMATICAL LOGIC 29
and
yx : likes(x, y)
mean very different things. The first says that for every person, there is
somebody that that person likes: we live in a world with no complete mis-
anthropes. The second says that there is some single person who is so
immensely popular that everybody in the world likes them. The nesting of
the quantifiers is what makes the difference: in xy : likes(x, y), we are
saying that no matter who we pick for x, y : likes(x, y) is a true statement;
while in yx : likes(x, y), we are saying that there is some y that makes
x : likes(x, y) true.
Naturally, such games can go on for more than two steps, or allow the
same player more than one move in a row. For example
xyz : x2 + y 2 = z 2
Now that we know how to read nested quantifiers, its easy to see what
the right-hand side means:
CHAPTER 2. MATHEMATICAL LOGIC 30
2. We pick N .
lim 1/x = 0
x
2. Let N > 1/. (Note that we can make our choice depend on previous
choices.)
4. Then x > N > 1/ > 0, so 1/x < 1/N < |1/x 0| < . QED!
2.3.2.6 Examples
Here we give some more examples of translating English into statements in
predicate logic.
xCrow(x) Black(x)
or
x : Black(x) Crow(x).
2.3.3 Functions
A function symbol looks like a predicate but instead of computing a truth
value it returns an object. So for example the successor function S in the
Peano axioms for the natural numbers returns x + 1 when applied as S(x).
Sometimes when there is only a single argument we omit the parentheses,
e.g., Sx = S(x), SSSx = S(S(S(x))).
2.3.4 Equality
Often we include a special equality predicate =, written x = y. The inter-
pretation of x = y is that x and y are the same element of the domain. It
CHAPTER 2. MATHEMATICAL LOGIC 32
2.3.4.1 Uniqueness
An occasionally useful abbreviation is !xP (x), which stands for there ex-
ists a unique x such that P (x). This is short for
2.3.5 Models
In propositional logic, we can build truth tables that describe all possible
settings of the truth-values of the literals. In predicate logic, the analogous
concept to an assignment of truth-values is a structure. A structure con-
sists of a set of objects or elements (built using set theory, as described in
Chapter 3), together with a description of which elements fill in for the con-
stant symbols, which predicates hold for which elements, and what the value
of each function symbol is when applied to each possible list of arguments
(note that this depends on knowing what constant, predicate, and function
CHAPTER 2. MATHEMATICAL LOGIC 33
2.3.5.1 Examples
Consider the axiom x. This axiom has exactly one model (its
empty).
Now consider the axiom !x. This axiom also has exactly one model
(with one element).
We can enforce exactly k elements with one rather long axiom, e.g. for
k = 3 do x1 x2 x3 y : y = x1 y = x2 y = x3 . In the absence of
any special symbols, a structure of 3 undifferentiated elements is the
unique model of this axiom.
Suppose we add a predicate P and consider the axiom xP x. Now
we have many models: take any nonempty model you like, and let P
be true of at least one of its elements. If we take a model with two
elements a and b, with P a and P b, we get that xP x is not enough
to prove xP x, since the later statement isnt true in this model.
Now lets bring in a function symbol S and constant symbol 0. Con-
sider a stripped-down version of the Peano axioms that consists of just
the axiom xy : Sx = Sy x = y. Both the natural numbers N and
the integers Z are a model for this axiom, as is the set Zm of integers
mod m for any m (see 8.4). In each case each element has a unique
predecessor, which is what the axiom demands. If we throw in the
first Peano axiom x : Sx 6= 0, we eliminate Z and Zm because in
each of these models 0 is a successor of some element. But we dont
eliminate a model that consists of two copies of N sitting next to each
other (only one of which contains the real 0), or even a model that
consists of one copy of N (to make 0 happy) plus any number of copies
of Z and Zm .
CHAPTER 2. MATHEMATICAL LOGIC 34
2.4 Proofs
A proof is a way to derive statements from other statements. It starts
with axioms (statements that are assumed in the current context always
to be true), theorems or lemmas (statements that were proved already;
the difference between a theorem and a lemma is whether it is intended as
a final result or an intermediate tool), and premises P (assumptions we
are making for the purpose of seeing what consequences they have), and
uses inference rules to derive Q. The axioms, theorems, and premises
are in a sense the starting position of a game whose rules are given by the
inference rules. The goal of the game is to apply the inference rules until Q
pops out. We refer to anything that isnt proved in the proof itself (i.e., an
axiom, theorem, lemma, or premise) as a hypothesis; the result Q is the
conclusion.
When a proof exists of Q from some premises P1 , P2 , . . . , we say that Q
is deducible or provable from P1 , P2 , . . . , which is written as
P1 , P2 , . . . ` Q.
If we can prove Q directly from our inference rules without making any
assumptions, we may write
`Q
The turnstile symbol ` has the specific meaning that we can derive the
conclusion Q by applying inference rules to the premises. This is not quite
the same thing as saying P Q. If our inference rules are particularly
weak, it may be that P Q is true but we cant prove Q starting with
P . Conversely, if our inference rules are too strong (maybe they can prove
anything, even things that arent true) we might have P ` Q but P Q is
false.
For propositions, most of the time we will use inference rules that are
just right, meaning that P ` Q implies P Q (soundness) and P Q
implies P ` Q (completeness). Here the distinction between ` and
is then whether we want to talk about the existence of a proof (the first
CHAPTER 2. MATHEMATICAL LOGIC 35
case) or about the logical relation between two statements (the second).
Things get a little more complicated with statements involving predicates;
in this case there are incompleteness theorems that say that sufficiently
powerful sets of axioms have consequences that cant be proven unless the
theory is inconsistent.
p ` p q. Addition
p q ` p. Simplification
p, q ` p q. Conjunction
p, p q ` q. Modus ponens
q, p q ` p. Modus tollens
p q, q r ` p r. Hypothetical syllogism
p q, p ` q. Disjunctive syllogism
p q, p r ` q r. Resolution
1. If you give a mouse a cookie, hes going to ask for a glass of milk.
[Axiom]
Will the mouse want a straw? No: Mice cant ask for glasses of milk, so
Axiom 1 is false.
CHAPTER 2. MATHEMATICAL LOGIC 37
` P Q.
P `P Q
and thus
P ` P, P Q,
which gives
P `Q
, P1 , P2 , . . . , Pn ` Q
CHAPTER 2. MATHEMATICAL LOGIC 38
to
` (P1 P2 . . . Pn ) Q.
The statement that we can do this, for a given collection of inference
rules, is the Deduction Theorem:
The actual proof of the theorem depends on the particular set of inference
rules we start with, but the basic idea is that there exists a mechanical
procedure for extracting a proof of the implication from the proof of Q
assuming P1 etc.
x = y, P (x) ` P (y).
`x=x
This says that if we can prove that some property holds for a generic
y, without using any particular properties of y, then in fact the prop-
erty holds for all possible x.
In a written proof, this will usually be signaled by starting with some-
thing like Let y be an arbitrary [member of some universe]. For
example: Suppose we want to show that there is no biggest natural
number, i.e. that n N : n0 N : n0 > n. Proof: Let n be any
element of N. Let n = n + 1. Then n0 > n. (Note: there is also an
instance of existential generalization here.)
CHAPTER 2. MATHEMATICAL LOGIC 40
`P
(I)
` P
` P
(E)
`P
`P `Q
(I)
`P Q
`P Q
(E1 )
`P
`P Q
(E2 )
`Q
`P
(I1 )
`P Q
`Q
(I2 )
`P Q
` P Q ` Q
(E1 )
`P
` P Q ` P
(E2 )
`Q
, P ` Q
( I)
`P Q
`P Q `P
( E1 )
`Q
` P Q ` Q
( E2 )
` P
The idea is that to show that Q(x) holds for at least one x, we can
point to c as a specific example of an object for which Q holds. The
corresponding style of proof is called a proof by construction or
proof by example.
For example: We are asked to prove that there exists an even prime
number. Look at 2: its an even prime number. QED.
Not all proofs of existential statements are constructive, in the sense
of identifying a single object that makes the existential statement true.
An example is a well-known non-constructive proof that there are ir-
rational numbers a and b for which ab is rational. The non-constructive
2
proof is to consider 2 . Ifthis number is rational, its an example
2
2 2
of the claim; if not, 2 = 2 = 2 works.7 Non-constructive
proofs are generally not as useful as constructive proofs, because the
example used in a constructive proof may have additional useful prop-
erties in other contexts.
Existential instantiation x : Q(x) ` Q(c) for some c, where c is a new
name that hasnt previously been used (this is similar to the require-
ment for universal generalization, except now the new name is on the
right-hand side).
The idea here is that we are going to give a name to some c that satisfies
Q(c), and we know that we can get away this because x : Q(x) says
that some such thing exists.8
7
log 9
For this particular claim, there is also a constructive proof: 2 2 = 3 [Sch01].
8
This is actually a fairly painful idea to formalize. One version in pure first-order logic
is the axiom
((x : (Q(x) P )) y : Q(y)) P.
Nobody but a logician would worry about this.
CHAPTER 2. MATHEMATICAL LOGIC 42
Set theory
Set theory is the dominant foundation for mathematics. The idea is that
everything else in mathematicsnumbers, functions, etc.can be written
in terms of sets, so that if you have a consistent description of how sets
behave, then you have a consistent description of how everything built on
top of them behaves. If predicate logic is the machine code of mathematics,
set theory would be assembly language.
47
CHAPTER 3. SET THEORY 48
Sometimes the original set that an element has to be drawn from is put
on the left-hand side of the vertical bar:
Using set comprehension, we can see that every set in naive set theory
is equivalent to some predicate. Given a set S, the corresponding predicate
is x S, and given a predicate P , the corresponding set is {x | P x}. But
watch out for Russells paradox: what is {S | S 6 S}?
CHAPTER 3. SET THEORY 49
(Of these, union and intersection are the most important in practice.)
Corresponding to implication is the notion of a subset:
A B (A is a subset of B) if and only if x : x A x B.
Sometimes one says A is contained in B if A B. This is one of two
senses in which A can be in Bit is also possible that A is in fact an
element of B (A B). For example, the set A = {12} is an element of the
set B = {Moe, Larry, Curly, {12}}, but A is not a subset of B, because As
element 12 is not an element of B. Usually we will try to reserve is in for
and is contained in for , but its safest to use the symbols (or is an
element/subset of) to avoid any possibility of ambiguity.
Finally we have the set-theoretic equivalent of negation:
A = {x | x 6 A}. The set A is known as the complement of A.
If we allow complements, we are necessarily working inside some fixed
universe, since the complement U = of the empty set contains all pos-
sible objects. This raises the issue of where the universe comes from. One
approach is to assume that weve already fixed some universe that we un-
derstand (e.g. N), but then we run into trouble if we want to work with
different classes of objects at the same time. The set theory used in most of
mathematics is defined by a collection of axioms that allow us to construct,
essentially from scratch, a universe big enough to hold all of mathematics
without apparent contradictions while avoiding the paradoxes that may arise
in naive set theory. However, one consequence of this construction is that
the universe (a) much bigger than anything we might ever use, and (b) not
a set, making complements not very useful. The usual solution to this is to
replace complements with explicit set differences: U \ A for some specific
universe U instead of A.
CHAPTER 3. SET THEORY 50
Lemma 3.3.1. The following statements hold for all sets S and T , and all
predicates P :
ST S (3.3.1)
ST S (3.3.2)
{x S | P (x)} S (3.3.3)
S = (S T ) (S \ T ) (3.3.4)
CHAPTER 3. SET THEORY 51
1. If x T , then x (S T ).
2. If x 6 T , then x (S \ T ).
1. If x (S \ T ), then x S and x 6 T .
2. If x (S T ), then x S and x T .
In either case, x S.
Since weve shown that both the left-hand and right-hand sides of
(3.3.4) are subsets of each other, they must be equal.
Extensionality Any two sets with the same elements are equal.3
Union For any set of sets S = {x, y, z, . . .}, the set S = x y z ...
S
exists.6
Power set For any set S, the power set P(S) = {A | A S} exists.7
Specification For any set S and any predicate P , the set {x S | P (x)}
exists.8 This is called restricted comprehension, and is an ax-
iom schema instead of an axiom, since it generates an infinite list
of axioms, one for each possible P . Limiting ourselves to construct-
ing subsets of existing sets avoids Russells Paradox, because we cant
construct S = {x | x 6 x}. Instead, we can try to construct S =
{x T | x 6 x}, but well find that S isnt an element of T , so it
doesnt contain itself but also doesnt create a contradiction.
2
Technically this only gives us Z, a weaker set theory than ZFC that omits Replacement
(Fraenkels contribution) and Choice.
3
x : y : (x = y) (z : z x z y).
4
x : y : y 6 x.
5
x : y : z : q : q z q = x q = y.
6
x : y : z : z y (q : z q q x).
7
x : y : z : z y z x.
8
x : y : z : z y z x P (z).
CHAPTER 3. SET THEORY 53
Infinity There is a set that has as a member and also has x{x} whenever
it has x. This gives an encoding of N.9 Here represents 0 and x {x}
represents x + 1. This effectively defines each number as the set of all
smaller numbers, e.g. 3 = {0, 1, 2} = {, {} , {, {}}}. Without this
axiom, we only get finite sets. (Technical note: the set whose existence
is given by the Axiom of Infinity may also contain some extra elements,
but we can strip them outwith some effortusing Specification.)
There are three other axioms that dont come up much in computer
science:
Choice For any set of nonempty sets S there is a function f that assigns
to each x in S some f (x) x. This axiom is unpopular in some
circles because it is non-constructive: it tells you that f exists, but
it doesnt give an actual definition of f . But its too useful to throw
out.
rule (a, b) = {{a} , {a, b}}, which was first proposed by Kuratowski [Kur21,
Definition V].12
Given sets A and B, their Cartesian product AB is the set {(x, y) | x A y B},
or in other words the set of all ordered pairs that can be constructed
by taking the first element from A and the second from B. If A has n
elements and B has m, then A B has nm elements.13 For example,
{1, 2} {3, 4} = {(1, 3), (1, 4), (2, 3), (2, 4)}.
Because of the ordering, Cartesian product is not commutative in gen-
eral. We usually have A B 6= B A (Exercise: when are they equal?).
The existence of the Cartesian product of any two sets can be proved
using the axioms we already have: if (x, y) is defined as {{x} , {x, y}}, then
P(A B) contains all the necessary sets {x} and {x, y} , and P(P(A B))
contains all the pairs {{x} , {x, y}}. It also contains a lot of other sets we
dont want, but we can get rid of them using Specification.
A special class of relations are functions. A function from a domain
A to a codomain14 B is a relation on A and B (i.e., a subset of A B
such that every element of A appears on the left-hand side of exactly one
ordered pair. We write f : A B as a short way of saying that f is a
function from A to B, and for each x A write f (x) for the unique y B
with (x, y) f .15
The set of all functions from A to B is written as B A : note that the order
of A and B is backwards here from A B. Since this is just the subset
of P(A B) consisting of functions as opposed to more general relations, it
exists by the Power Set and Specification axioms.
When the domain of a function is is finite, we can always write down
a list of all its values. For infinite domains (e.g. N), almost all functions
are impossible to write down, either as an explicit table (which would need
to be infinitely long) or as a formula (there arent enough formulas). Most
12
This was not the only possible choice. Kuratowski cites a previous encoding suggested
by Hausdorff [Hau14] of (a, b) as {{a, 1} , {b, 2}}, where 1 and 2 are tags not equal to a or
b. He argues that this definition seems less convenient to me than {{a} , {a, b}}, because
it requires tinkering with the definition if a or b do turn out to be equal to 1 or 2. This
is a nice example of how even though mathematical definitions arise through convention,
some definitions are easier to use than others.
13
In fact, this is the most direct way to define multiplication on N, and pretty much the
only sensible way to define multiplication for infinite cardinalities; see 11.1.5.
14
The codomain is sometimes called the range, but most mathematicians will use range
for {f (x) | x A}, which may or may not be equal to the codomain B, depending on
whether f is or is not surjective.
15
Technically, knowing f alone does not tell you what the codomain is, since some
elements of B may not show up at all. This can be fixed by representing a function as a
pair (f, B), but its generally not something most people worry about.
CHAPTER 3. SET THEORY 55
3.5.2 Sequences
Functions let us define sequences of arbitrary length: for example, the in-
finite sequence x0 , x1 , x2 , . . . of elements of some set A is represented by a
function x : N A, while a shorter sequence (a0 , a1 , a2 ) would be repre-
sented by a function a : {0, 1, 2} A. In both cases the subscript takes
the place of a function argument: we treat xn as syntactic sugar for x(n).
Finite sequences are often called tuples, and we think of the result of taking
the Cartesian product of a finite number of sets A B C as a set of tu-
ples (a, b, c), even though the actual structure may be ((a, b), c) or (a, (b, c))
depending on which product operation we do first.
We can think of the Cartesian product of k sets (where k need not be 2)
as a set of sequences indexed by the set {1 . . . k} (or sometimes {0 . . . k 1}).
CHAPTER 3. SET THEORY 56
or even Y
Ax .
xR
3.5.5.1 Surjections
A function f : A B that covers every element of B is called onto,
surjective, or a surjection. This means that for any y in B, there exists
some x in A such that y = f (x). An equivalent way to show that a function is
surjective is to show that its range {f (x) | x A} is equal to its codomain.
For example, the function f (x) = x2 from N to N is not surjective,
because its range includes only perfect squares. The function f (x) = x + 1
from N to N is not surjective because its range doesnt include 0. However,
the function f (x) = x + 1 from Z to Z is surjective, because for every y in
Z there is some x in Z such that y = x + 1.
3.5.5.2 Injections
If f : A B maps distinct elements of A to distinct elements of B (i.e.,
if x 6= y implies f (x) 6= f (y)), it is called one-to-one, injective, or an
injection. By contraposition, an equivalent definition is that f (x) = f (y)
implies x = y for all x and y in the domain. For example, the function
f (x) = x2 from N to N is injective. The function f (x) = x2 from Z to Z is
not injective (for example, f (1) = f (1) = 1). The function f (x) = x + 1
from N to N is injective.
3.5.5.3 Bijections
A function that is both surjective and injective is called a one-to-one cor-
respondence, bijective, or a bijection.16 Any bijection f has an inverse
f 1 ; this is the function {(y, x) | (x, y) f }.
Of the functions we have been using as examples, only f (x) = x + 1 from
Z to Z is bijective.
negative integers are represented as (0, z). Its not hard to define
addition, subtraction, multiplication, etc. using this representation.
Rationals The rational numbers Q are all fractions of the form p/q where
p is an integer, q is a natural number not equal to 0, and p and q have
no common factors. Each such fraction can be represented as a set
using an ordered pair (p, q). Operations on rationals are defined as
you may remember from grade school.
3.14159265358979323846264338327950288419716939937510582 . . . ,
the sequence and rest is all the other elements. For example,
f (0, 1, 2) = 1 + h0, f (1, 2)i
= 1 + h0, 1 + h1, f (2)ii
= 1 + h0, 1 + h1, 1 + h2, 0iii
= 1 + h0, 1 + h1, 1 + 3ii = 1 + h0, 1 + h1, 4ii
= 1 + h0, 1 + 19i
= 1 + h0, 20i
= 1 + 230
= 231.
This assigns a unique element of N to each finite sequence, which is
enough to show |N | |N|. With some additional effort one can show
that f is in fact a bijection, giving |N | = |N|.
Since any bijection is also a surjection, this means that theres no bijec-
tion between S and P(S) either, implying, for example, that |N| is strictly
less than |P(N)|.
(On the other hand, it is the case that NN = 2N , so things are still
weird up here.)
Sets that are larger than N are called uncountable. A quick way to
show that there is no surjection from A to B is to show that A is countable
but B is uncountable. For example:
Corollary 3.7.2. There are functions f : N {0, 1} that are not computed
by any computer program.
Proof. Let P be the set of all computer programs that take a natural number
as input and always produce 0 or 1 as output (assume some fixed language),
and for each program p P , let fp be the function that p computes. Weve
already argued that P is countable (each program is a finite sequence drawn
from a countable alphabet), and since the set of all functions f : N
{0, 1} = 2N has the same size as P(N), its uncountable. So some f gets
missed: there is at least one function from Nto {0, 1} that is not equal to fp
for any program p.
The fact that there are more functions from N to N than there are
elements of N is one of the reasons why set theory (slogan: everything is
a set) beat out lambda calculus (slogan: everything is a function from
functions to functions) in the battle over the foundations of mathematics.
And this is why we do set theory in CS202 and lambda calculus (disguised
as Scheme) in CS201.
The real numbers R are the subject of high-school algebra and most
practical mathematics. Some important restricted classes of real numbers
are the naturals N = 0, 1, 2, . . . , the integers Z = . . . , 2, 1, 0, 1, 2, . . . ,
and the rationals Q, which consist of all real numbers that can be written
as ratios of integers p/q, otherwise known as fractions.
The rationals include 1, 3/2, 22/7, 355/113, an so on, but not some com-
mon mathematical constants like e 2.718281828 . . . or 3.141592 . . . .
Real numbers that are not rational are called irrational. There is no single-
letter abbreviation for the irrationals.
The typeface used for N, Z, Q, and R is called blackboard bold and
originates from the practice of emphasizing a letter on a blackboard by
writing it twice. Some writers just use ordinary boldface: N, etc., but this
does not scream out this is a set of numbers as loudly as blackboard bold.
You may also see blackboard bold used for more exotic sets of numbers like
the complex numbers C, which are popular in physics and engineering,
and for some more exotic number systems like the quaternions H,1 which
are sometimes used in graphics, or the octonions O, which exist mostly to
see how far complex numbers can be generalized.
Like any mathematical structure, the real numbers are characterized by
a list of axioms, which are the basic facts from which we derive everything
we know about the reals. There are many equivalent ways of axiomatizing
the real numbers; we will give one here. Many of these properties can also
be found in [Fer08, Appendix B]. These should mostly be familiar to you
from high-school algebra, but we include them here because we need to know
1
Why H? The rationals already took Q (for quotient), so the quaternions are abbre-
viated by the initial of their discoverer, William Rowan Hamilton.
64
CHAPTER 4. THE REAL NUMBERS 65
a + b = b + a. (4.1.1)
a + (b + c) = (a + b) + c. (4.1.2)
Axiom 4.1.3 (Additive identity). There exists a number 0 such that, for
all numbers a,
a + 0 = 0 + a = a. (4.1.3)
An object that satisfies the condition a+0 = 0+a = a for some operation
is called an identity for that operation. Later we will see that 1 is an identity
for multiplication.
Its not hard to show that identities are unique:
Lemma 4.1.4. Let 00 + a = a + 00 = a for all a. Then 00 = 0.
Proof. Compute 00 = 00 + 0 = 0. (The first equality holds by the fact that
a = a + 0 for all a and the second from the assumption that 00 + a = a for
all a.)
and
Like annihilation, these are not axiomsor at least, we dont have to include
them as axioms if we dont want to. Instead, we can prove them directly
from axioms and theorems weve already got. For example, here is a proof
of (4.1.15):
a0=0
a (b + (b)) = 0
ab + a (b) = 0
(ab) + (ab + a (b)) = (ab)
((ab) + ab) + a (b) = (ab)
0 + a (b) = (ab)
a (b) = (ab).
(1) a = a. (4.1.18)
They do not hold for the integers Z (which dont have multiplicative
inverses) or the natural numbers N (which dont have additive inverses ei-
ther). This means that Z and N are not fields, although they are examples
of weaker algebraic structures (a ring in the case of Z and a semiring in
the case of N).
Proof. Take a 0 and add a to both sides (using Axiom 4.2.4) to get
0 a.
Theorem 4.3.2 (Archimedean property). For any two real numbers 0 <
x < y, there exists some n N such that n x > y.
This process stops with the complex numbers C, which consist of pairs
of the form a + bi where i2 = 1. The reason is that the complex numbers
are algebraically closed: if you write an equation using only complex
numbers, +, and , and it has some solution x in any field bigger than
C, then x is in C as well. The down side in comparison to the reals is
that we lose order: there is no ordering of complex numbers that satisfies
the translation and scaling invariance axioms. As in many other areas of
mathematics and computer science, we are forced to make trade-offs based
on what is important to us at the time.
4.5 Arithmetic
In principle, it is possible to show that the standard grade-school algorithms
for arithmetic all work in R as defined by the axioms in the preceding sec-
tions. This is sometimes trickier than it looks: for example, just showing
that 1 is positive requires a sneaky application of Axiom 4.2.5.9
To avoid going nuts, we will adopt the following rule:
Rule 4.5.1. Any grade-school fact about arithmetic that does not involve
any variables will be assumed to be true in R.
So for example, you dont need to write out a proof using the definition of
multiplicative inverses and the distributive law to conclude that 21 + 35 = 11
10 ;
just remembering how to add fractions (or getting a smart enough computer
to do it for you) is enough.
Caveat: Dumb computers will insist on returning useless decimals like
1.1. As mathematicians, we dont like decimal notation, because it cant
9
Suppose 1 0. Then 1 1 0 1 (Theorem 4.2.13, which simplifies to 1 0. Since
1 6= 0, this contradicts our assumption, showing that 1 > 0.
CHAPTER 4. THE REAL NUMBERS 74
represent exactly even trivial values like 13 . Similarly, mixed fractions like
1
1 10 , while useful for carpenters, are not popular in mathematics.
The integers Z. These are what you get if you throw in additive
inverses: now in addition to 0, 1, 1 + 1, etc., you also get 1, (1 + 1),
etc. The order axioms are still satisfied. No multiplicative inverses,
though.
The rationals Q. Now we ask for multiplicative inverses, and get them.
Any rational can be written as p/q where p and q are integers. Unless
extra restrictions are put on p and q, these representations are not
unique: 22/7 = 44/14 = 66/21 = (110)/(35). You probably first
saw these in grade school as fractions, and one way to describe Q is
as the field of fractions of Z.
CHAPTER 4. THE REAL NUMBERS 75
The rationals satisfy all the field axioms, and are the smallest sub-
field of R. They also satisfy all the ordered field axioms and the
Archimedean property. But they are not complete. Adding complete-
ness gives the real numbers.
An issue that arises here is that, strictly speaking, the natural num-
bers N we defined back in 3.4 are not elements of R as defined in terms
of, say, Dedekind cuts. The former are finite ordinals while the latter are
downward-closed sets of rationals, themselves represented as elements of
N N. Similarly, the integer elements of Q will be pairs of the form (n, 1)
where n N rather than elements of N itself. We also have a definition
(G.1) that builds natural numbers out of 0 and a successor operation S.
So what does it mean to say N Q R?
One way to think about it is that the sets
{, {} , {, {}} , {, {} , {, {}}} . . .} ,
{(0, 1), (1, 1), (2, 1), (3, 1), . . .} ,
{{(p, q) | p < 0} , {(p, q) | p < q} , {(p, q) | p < 2q} , {(p, q) | p < 3q} , . . .} ,
and
{0, S0, SS0, SSS0, . . .}
are all isomorphic: there are bijections between them that preserve the
behavior of 0, 1, +, and . So we think of N as representing some Platonic
ideal of natural-numberness that is only defined up to isomorphism.10 So in
the context of R, when we write N, we mean the version of N that is a subset
of R, and in other contexts, we might mean a different set that happens to
behave in exactly the same way.
In the other direction, the complex numbers are a super-algebra of the
reals: we can think of any real number x as the complex number x + 0i,
and this complex number will behave exactly the same as the original real
number x when interacting with other real numbers carried over into C in
the same way.
The various features of these algebras are summarized in Table 4.1.
Symbol N Z Q R C
Name Naturals Integers Rationals Reals
Complex
numbers
Typical element 12 12 12
7 12 12 + 22
7 i
Associative Yes Yes Yes Yes Yes
0 and 1 Yes Yes Yes Yes Yes
Inverses No + only Yes Yes Yes
Associative Yes Yes Yes Yes Yes
Ordered Yes Yes Yes Yes No
Least upper bounds Yes Yes No Yes No
Algebraically closed No No No No Yes
The absolute value function erases the sign of x: |12| = |12| = 12.
The signum function sgn(x) returns the sign of its argument, encoded
as 1 for negative, 0 for zero, and +1 for positive:
1
if x < 0,
sgn(x) = 0 if x = 0,
+1 if x > 0.
77
CHAPTER 5. INDUCTION AND RECURSION 78
Any proof that uses the induction schema will consist of two parts, the
base case showing that P (0) holds, and the induction step showing that
P (x) P (x + 1). The assumption P (x) used in the induction step is called
the induction hypothesis.
For example, lets suppose we want to show that for all n N, either
n = 0 or there exists n0 such that n = n0 + 1. Proof: We are trying to show
that P (n) holds for all n, where P (n) says x = 0 (x0 : x = x0 + 1). The
base case is when n = 0, and here the induction hypothesis holds by the
addition rule. For the induction step, we are given that P (x) holds, and
want to show that P (x + 1) holds. In this case, we can do this easily by
observing that P (x + 1) expands to (x + 1) = 0 (x0 : x + 1 = x0 + 1). So
let x0 = x and we are done.1
Heres a less trivial example. So far we have not defined exponentiation.
Lets solve this by declaring
x0 = 1 (5.1.2)
n+1 n
x =xx (5.1.3)
Induction step: Suppose the induction hypothesis holds for n, i.e., that
n > 0 an > 1. We want to show that it also holds for n + 1. Annoyingly,
there are two cases we have to consider:
1. n = 0. Then we can compute a1 = a a0 = a 1 = a > 1.
2. n > 0. The induction hypothesis now gives an > 1 (since in this case
the premise n > 0 holds), so an+1 = a an > a 1 > 1.
f (0) = x0
f (n + 1) = g(f (n))
1. 0 S and
2. x S implies x + 1 S,
then S = N.
This is logically equivalent to the fact that the naturals are well-ordered.
This means that any non-empty subset S of N has a smallest element. More
formally: for any S N, if S 6= , then there exists x S such that for all
y S, x y.
Its easy to see that well-ordering implies induction. Let S be a subset of
N, and consider its complement N \ S. Then either N \ S is empty, meaning
S = N, or N \ S has a least element y. But in the second case either y = 0
and 0 6 S or y = x + 1 for some x and x S but x + 1 6 S. So S 6= N
implies 0 6 S or there exists x such that x S but x + 1 6 S. Taking the
contraposition of this statement gives induction.
CHAPTER 5. INDUCTION AND RECURSION 81
The converse is a little trickier, since we need to figure out how to use
induction to prove things about subsets of N, but induction only talks about
elements of N. The trick is consider only the part of S that is smaller than
some variable n, and show that any S that contains an element smaller than
n has a smallest element.
Proof. By induction on n.
The base case is n = 0. Here 0 S and 0 x for any x N, so in
particular 0 x for any x S, making 0 the smallest element in S.
For the induction step, suppose that the claim in the lemma holds for n.
To show that it holds for n + 1, suppose that n + 1 S. Then either (a) S
contains an element less than or equal to n, so S has a smallest element by
the induction hypothesis, or (b) S does not contain an element less than or
equal to n. But in this second case, S must contain n + 1, and since there
are no elements less than n + 1 in S, n + 1 is the smallest element.
5.5.1 Examples
Every n > 1 can be factored into a product of one or more prime
numbers.2 Proof: By induction on n. The base case is n = 2, which
factors as 2 = 2 (one prime factor). For n > 2, either (a) n is prime
itself, in which case n = n is a prime factorization; or (b) n is not
prime, in which case n = ab for some a and b, both greater than 1.
Since a and b are both less than n, by the induction hypothesis we
have a = p1 p2 . . . pk for some sequence of one or more primes and
similarly b = p01 p02 . . . p0k0 . Then n = p1 p2 . . . pk p01 p02 . . . p0k0 is a prime
factorization of n.
players turn to move. In either case each f (y) is well-defined (by the
induction hypothesis) and so f (x) is also well-defined.
The key point is that in each case the definition of an object is recur-
sivethe object itself may appear as part of a larger object. Usually we
assume that this recursion eventually bottoms out: there are some base
cases (e.g. leaves of complete binary trees or variables in Boolean formulas)
that do not lead to further recursion. If a definition doesnt bottom out in
this way, the class of structures it describes might not be well-defined (i.e.,
we cant tell if some structure is an element of the class or not).
The depth of a binary tree For a leaf, 0. For a tree consisting of a root
with two subtrees, 1 + max(d1 , d2 ), where d1 and d2 are the depths of
the two subtrees.
The Fibonacci series Let F (0) = F (1) = 1. For n > 1, let F (n) =
F (n 1) + F (n 2).
Bounding the size of a binary tree with depth d Well show that it
has at most 2d+1 1 nodes. Base case: the tree consists of one leaf,
d = 0, and there are 20+1 1 = 2 1 = 1 nodes. Induction step:
Given a tree of depth d > 1, it consists of a root (1 node), plus two
subtrees of depth at most d 1. The two subtrees each have at most
2d1+1 1 = 2d 1 nodes (induction hypothesis), so the total number
of nodes is at most 2(2d 1) + 1 = 2d+1 + 2 1 = 2d+1 1.
Chapter 6
Summation notation
6.1 Summations
Summations are the discrete versions of integrals; given a sequence xa , xa+1 , . . . , xb ,
its sum xa + xa+1 + + xb is written as bi=a xi .
P
86
CHAPTER 6. SUMMATION NOTATION 87
b
(
X 0 if b < a
f (i) = Pb (6.1.1)
i=a f (a) + i=a+1 f (i) otherwise.
b
(
X 0 if b < a
f (i) = Pb1 (6.1.2)
i=a f (b) + i=a f (i) otherwise.
In English, we can compute a sum recursively by computing either the
sum of the last n 1 values or the first n 1 values, and then adding in
the value we left out. (For infinite sums we need a different definition; see
below.)
6.1.2 Scope
The scope of a summation extends to the first addition or subtraction symbol
that is not enclosed in parentheses or part of some larger term (e.g., in the
numerator of a fraction). So
n n n n
!
X X X X
2 2
i +1= i +1=1+ i2 6= (i2 + 1).
i=1 i=1 i=1 i=1
Here the looming bulk of the second sigma warns the reader that the
first sum is ending; it is much harder to miss than the relatively tiny plus
symbol in the first example.
Products of sums can be turned into double sums of products and vice
versa: ! m0
m X X m m0 XX
xi yj = xi yj .
i=n j=n0 i=n j=n0
These identities can often be used to transform a sum you cant solve
into something simpler.
To prove these identities, use induction and (6.1.2). For example, the
following lemma demonstrates a generalization of (6.1.3) and (6.1.4):
Lemma 6.1.1. m m m
X X X
(axi + byi ) = a xi + b yi .
i=n i=n i=n
Proof. If m < n, then both sides of the equation are zero. This proves
that (6.1.1) holds for small m and gives us a base case for our induction at
m = n 1 that we can use to show it holds for larger m.
For the induction step, we want to show that (6.1.1) holds for m + 1 if
it holds for m. This is a straightforward computation using (6.1.2) first to
unpack the combined sum then to repack the split sums:
m+1
X m
X
(axi + byi ) = (axi + byi ) + (axm + bym )
i=n i=n
m
X m
X
=a xi + b yi + axm + bym
i=n i=n
m m
! !
X X
=a xi + xm + b yi + ym
i=n i=n
m+1
X m+1
X
=a +b yi .
i=n i=n
CHAPTER 6. SUMMATION NOTATION 89
has a well-defined meaning, the version on the right-hand side is a lot less
confusing.
In addition to renaming indices, you can also shift them, provided you
shift the bounds to match. For example, rewriting
n
X
(i 1)
i=1
Or we could sum the inverses of all prime numbers less than 1000:
X
1/p.
p < 1000, p is prime
1i<jn
j
or X
|A|
xAS
where the first sum sums over all pairs of values (i, j) such that 1 i, i j,
and j n, with each pair appearing exactly once; and the second sums over
all sets A that are subsets of S and contain x (assuming x and S are defined
outside the summation). Hopefully, you will not run into too many sums
that look like this, but its worth being able to decode them if you do.
Sums over a given set are guaranteed to be well-defined only if the set is
finite. In this case we can use the fact that there is a bijection between any
finite set S and the ordinal |S| to rewrite the sum as a sum over indices in |S|.
For example, if |S| = n, then there exists a bijection f : {0 . . . n 1} S,
so we can define
X n1
X
xi = xf (i) . (6.1.5)
iS i=0
This allows us to apply (6.1.2) to decompose the sum further:
X 0 if S = ,
x i = P (6.1.6)
iS
iS\z x i + xz if z S.
The idea is that for any particular z S, we can always choose a bijection
that makes z = f (|S| 1).
If S is infinite, computing the sum is trickier. For countable S, where
there is a bijection f : N S, we can sometimes rewrite
X
X
xi = xf (i) .
iS i=0
CHAPTER 6. SUMMATION NOTATION 91
and use the definition of an infinite sum (given below). Note that if the
xi have different signs the result we get may depend on which bijection we
choose. For this reason such infinite sums are probably best avoided unless
you can explicitly use N or a subset of N as the index set.
all possible values in some obvious range, and can be a mark of sloppiness
in formal mathematical writing. Theoretical physicists adopt a still more
P
lazy approach, and leave out the i part entirely in certain special types
of sums: this is known as the Einstein summation convention after the
notoriously lazy physicist who proposed it.
If you think of a sum as a for loop, a double sum is two nested for loops.
The effect is to sum the innermost expression over all pairs of values of the
two indices.
CHAPTER 6. SUMMATION NOTATION 92
Heres a more complicated double sum where the limits on the inner sum
depend on the index of the outer sum:
n X
X i
(i + 1)(j + 1).
i=0 j=0
6.2 Products
What if you want to multiply a series of values instead of add them? The
notation is the same as for a sum, except that you replace the sigma with a
pi, as in this definition of the factorial function for non-negative n:
n
def Y
n! = i = 1 2 n.
i=1
The other difference is that while an empty sum is defined to have the
value 0, an empty product is defined to have the value 1. The reason for
this rule (in both cases) is that an empty sum or product should return the
identity element for the corresponding operationthe value that when
added to or multiplied by some other value x doesnt change x. This allows
writing general rules like:
X X X
f (i) + f (i) = f (i)
iA iB iAB
! !
Y Y Y
f (i) f (i) = f (i)
iA iB iAB
Big AND:
^
P (x) P (x1 ) P (x2 ) . . . x S : P (x).
xS
Big OR:
_
P (x) P (x1 ) P (x2 ) . . . x S : P (x).
xS
Big Intersection:
n
\
Ai = A1 A2 . . . An .
i=1
Big Union:
n
[
Ai = A1 A2 . . . An .
i=1
These all behave pretty much the way one would expect. One issue that
is not obvious from the definition is what happens with an empty index set.
Here the rule as with sums and products is to return the identity element
for the operation. This will be True for AND, False for OR, and the empty
set for union; for intersection, there is no identity element in general, so the
intersection over an empty collection of sets is undefined.
S = 1 + 2 + ... + n
S = n + n 1 + ... + 1
2S = (n + 1) + (n + 1) + . . . + (n + 1) = n(n + 1),
then
X
X
rS = ri+1 = ri
i=0 i=1
and so
S rS = r0 = 1.
CHAPTER 6. SUMMATION NOTATION 95
n S(n)
0 0
1 1
2 1+3=4
3 1+3+5=9
4 1 + 3 + 5 + 7 = 16
5 1 + 3 + 5 + 7 + 9 = 25
At this point we might guess that S(n) = n2 . To verify this, observe
that it holds for n = 0, and for larger n we have S(n) = S(n1)+(2n1) =
(n 1)2 + 2n 1 = n2 2n + 1 2n 1 = n2 . So we can conclude that our
guess was correct.
6.4.3 Ansatzes
A slightly more sophisticated approach to guess but verify involves guessing
the form of the solution, but leaving a few parameters unfixed so that we can
adjust them to match the actual data. This parameterized guess is called
an ansatz, from the German word for starting point, because guesswork
sounds much less half-baked if you can refer to it in German.
To make this work, it helps to have some idea of what the solution to a
sum might look like. One useful rule of thumb is that a sum over a degree-d
polynomial is usually a degree-(d + 1) polynomial.
For example, lets guess that
n
X
i2 = c3 n3 + c2 n2 + c1 n + c0 , (6.4.1)
i=0
when n 0.
Under the assumption that (6.4.1) holds, we can plug in n = 0 to get
P0 2
i=0 i = 0 = c0 . This means that we only need to figure out c3 , c2 , and c1 .
Plugging in some small values for n gives
0 + 1 = 1 = c3 + c2 + c1
0 + 1 + 4 = 5 = 8c3 + 4c2 + 2c1
0 + 1 + 4 + 9 = 14 = 27c3 + 8c2 + 3c1
n+1
n i 1x
and i 1
P P
Geometric series i=0 x = 1x i=0 x = 1x .
The way to recognize a geometric series is that the ratio between adjacent
terms is constant. If you memorize the second formula, you can rederive the
first one. If youre Gauss, you can skip memorizing the second formula.
A useful trick to remember for geometric series is that if x is a constant
that is not exactly 1, the sum is always big-Theta of its largest term. So for
example ni=1 2i = (2n ) (the exact value is 2n+1 2), and ni=1 2i = (1)
P P
general series expands so easily to the simplest series, its usually not worth
memorizing the general formula.
P n
Harmonic series i=1 1/i = Hn = (n log n).
Can be rederived using the integral technique given below or by summing
the last half of the series, so this is mostly useful to remember in case you
run across Hn (the n-th harmonic number).
6.4.4.4 Integrate
Integrate.
Rb
If f (n) is non-decreasing
R b+1
and you know how to integrate it, then
Pb
a1 f (x) dx i=a f (i) a f (x) dx, which is enough to get a big-
Theta bound for almost all functions you are likely to encounter in algorithm
analysis. If you dont know how to integrate it, see F.3.
CHAPTER 6. SUMMATION NOTATION 99
6.4.4.6 Oddities
One oddball sum that shows up occasionally but is hard to solve using
any of the above techniques is ni=1 ai i. If a < 1, this is (1) (the exact
P
P i
formula for i=1 a i when a < 1 is a/(1 a)2 , which gives a constant upper
bound for the sum stopping at n); if a = 1, its just an arithmetic series; if
a > 1, the largest term dominates and the sum is (an n) (there is an exact
formula, but its uglyif you just want to show its O(an n), the simplest
Pn1 ni
approach is to bound the series i=0 a (n i) by the geometric series
Pn1 ni n n/(1 a1 ) = O(an n). I wouldnt bother memorizing this
i=0 a n a
one provided you remember how to find it in these notes.
Asymptotic notation
7.1 Definitions
O(f (n)) A function g(n) is in O(f (n)) (big O of f (n)) if there exist
constants c > 0 and N such that |g(n)| c|f (n)| for all n > N .
o(f (n)) A function g(n) is in o(f (n)) (little o of f (n)) if for every c > 0
there exists an N such that |g(n)| c|f (n)| for all n > N . This is
equivalent to saying that limn g(n)/f (n) = 0.
100
CHAPTER 7. ASYMPTOTIC NOTATION 101
Constant factors vary from one machine to another. The c factor hides
this. If we can show that an algorithm runs in O(n2 ) time, we can be
confident that it will continue to run in O(n2 ) time no matter how fast
(or how slow) our computers get in the future.
Proof. We must find c, N such that for all n > N , |n| cn3 . Since n3
n > N , |n| n3 . It is not the case that |n| n3 for all n (try plotting
Proof. Here we need to negate the definition of O(n), a process that turns
all existential quantifiers into universal quantifiers and vice versa. So what
we need show is that for all c > 0 and N , there exists some n > N for
3to
which n is not less than c|n|. So fix some such c > 0 and N . We must find
an n > N for which n3 > cn. Solving for n in this inequality gives n > c1/2 ;
so setting n > max(N, c1/2 ) finishes the proof.
Proof. Since f1 (n) is in O(g(n)), there exist constants c1 , N1 such that for
all n > N1 , |f1 (n)| < c|g(n)|. Similarly there exist c2 , N2 such that for all
n > N2 , |f2 (n)| < c|g(n)|.
To show f1 (n) + f2 (n) in O(g(n)), we must find constants c and N such
that for all n > N , |f1 (n) + f2 (n)| < c|g(n)|. Lets let c = c1 + c2 . Then
if n is greater than max(N1 , N2 ), it is greater than both N1 and N2 , so we
can add together |f1 | < c1 |g| and |f2 | < c2 |g| to get |f1 + f2 | |f1 | + |f2 | <
(c1 + c2 )|g| = c|g|.
Use big- when you have a lower bound on a function, e.g. every year
the zoo got at least one new gorilla, so there were at least (t) gorillas
at the zoo in year t.
CHAPTER 7. ASYMPTOTIC NOTATION 103
Use big- when you know the function exactly to within a constant-
factor error, e.g. every year the zoo got exactly five new gorillas, so
there were (t) gorillas at the zoo in year t.
For the others, use little-o and when one function becomes vanishingly
small relative to the other, e.g. new gorillas arrived rarely and with declining
frequency, so there were o(t) gorillas at the zoo in year t. These are not used
as much as big-O, big-, and big- in the algorithms literature.
But watch out for exponents and products: O(3n n3.1178 log1/3 n) is
already as simple as it can be.
f (n) f 0 (n)
lim = lim 0
n g(n) n g (n)
when f (n) and g(n) both diverge to infinity or both converge to zero. Here
f 0 and g 0 are the derivatives of f and g with respect to n; see F.2.
1
Note that this is a sufficient but not necessary condition. For example, the function
f (n) that is 1 when n is even and 2 when n is odd is O(1), but limn f (n)
1
doesnt exist.
CHAPTER 7. ASYMPTOTIC NOTATION 104
What we want this to mean is that the left-hand side can be replaced by
the right-hand side without causing trouble. To make this work formally,
we define the statement as meaning that for any f in O(n2 ) and any g in
O(n3 ), there exists an h in O(n3 ) such that f (n) + g(n) + 1 = h(n).
In general, any appearance of O, , or on the left-hand side gets
a universal quantifier (for all) and any appearance of O, , or on the
right-hand side gets an existential quantifier (there exists). So
means that for any g in o(f (n)), there exists an h in (f (n)) such that
f (n) + g(n) = h(n), and
means that for any r in O(f (n)) and s in O(g(n)), there exists t in O(max(f (n), g(n))
such that r(n) + s(n) + 1 = t(n) + 1.
The nice thing about this definition is that as long as you are careful
about the direction the equals sign goes in, you can treat these compli-
cated pseudo-equations like ordinary equations. For example, since O(n2 ) +
O(n3 ) = O(n3 ), we can write
n2 n(n + 1)(n + 2)
+ = O(n2 ) + O(n3 )
2 6
= O(n3 ),
which is much simpler than what it would look like if we had to talk about
particular functions being elements of particular sets of functions.
This is an example of abuse of notation, the practice of redefining
some standard bit of notation (in this case, equations) to make calculation
easier. Its generally a safe practice as long as everybody understands what
is happening. But beware of applying facts about unabused equations to the
abused ones. Just because O(n2 ) = O(n3 ) doesnt mean O(n3 ) = O(n2 )
the big-O equations are not reversible the way ordinary equations are.
More discussion of this can be found in [Fer08, 10.4] and [GKP94, Chap-
ter 9].
Chapter 8
Number theory
If d|m and d|n, then d|(m + n). Proof: Let m = ad and n = bd, then
(m + n) = (a + b)d.
106
CHAPTER 8. NUMBER THEORY 107
If p is prime, then p|ab if and only if p|a or p|b. Proof: follows from
the extended Euclidean algorithm (see below).
Proof. First we show that q and r exist for n 0 and m > 0. This is done
by induction on n. If n < m, then q = 0 and r = n satisfies n = qm + r
and 0 r < m. If n m, then n m 0 and n m < n, so from the
induction hypothesis there exist some q 0 , r such that n m = q 0 m + r and
0 r < m. Then if q = q + 1, we have n = (n m) + m = q 0 m + r + m =
(q 0 + 1)m + r = qm + r.
Next we extend to the cases where n might be negative. If n < 0 and
m > 0, then there exist q 0 , r0 with 0 r < m such that n = q 0 m + r. If
r0 = 0, let q = q 0 and r = 0, giving n = (n) = (q 0 m + r0 ) = qm + r.
If r0 6= 0, let q = q 0 1 and r = m r; now n = (n) = (q 0 m + r0 ) =
((q+1)m+(mr)) = (qmr) = qm+r. So in either case appropriate
q and r exist.
CHAPTER 8. NUMBER THEORY 108
Note that quotients of negative numbers always round down. For exam-
ple, b(3)/17c = 1 even though 3 is much closer to 0 than it is to 17.
This is so that the remainder is always non-negative (14 in this case). This
may or may not be consistent with the behavior of the remainder operator
in your favorite programming language.
nal version was for finding the largest square you could use to tile a given
rectangle, but the idea is the same). Euclids algorithm is based on the
recurrence (
n if m = 0,
gcd(m, n) =
gcd(n mod m, m) if m > 0.
The first case holds because n|0 for all n. The second holds because
if k divides both n and m, then k divides n mod m = n bn/mc m; and
conversely if k divides m and n mod m, then k divides n = (n mod m) +
m bn/mc. So (m, n) and (n mod m, m) have the same set of common factors,
and the greatest of these is the same.
So the algorithm simply takes the remainder of the larger number by the
smaller recursively until it gets a zero, and returns whatever number is left.
m0 m + n0 n = gcd(m, n).
It has the same structure as the Euclidean algorithm, but keeps track of
more information in the recurrence. Specifically:
For m = 0, gcd(m, n) = n with n0 = 1 and m0 = 0.
8.2.2.1 Example
Figure 8.1 gives a computation of the gcd of 176 and 402, together with
the extra coefficients. The code used to generate this figure is given in
Figure 8.2.
8.2.2.2 Applications
If gcd(n, m) = 1, then there is a number n0 such that nn0 + mm0 = 1.
This number n0 is called the multiplicative inverse of n mod m and
acts much like 1/n in modular arithmetic (see 8.4.2).
CHAPTER 8. NUMBER THEORY 110
Finding gcd(176,402)
q = 2 r = 50
Finding gcd(50,176)
q = 3 r = 26
Finding gcd(26,50)
q = 1 r = 24
Finding gcd(24,26)
q = 1 r = 2
Finding gcd(2,24)
q = 12 r = 0
Finding gcd(0,2)
base case
Returning 0*0 + 1*2 = 2
a = b1 - a1*q = 1 - 0*12 = 1
Returning 1*2 + 0*24 = 2
a = b1 - a1*q = 0 - 1*1 = -1
Returning -1*24 + 1*26 = 2
a = b1 - a1*q = 1 - -1*1 = 2
Returning 2*26 + -1*50 = 2
a = b1 - a1*q = -1 - 2*3 = -7
Returning -7*50 + 2*176 = 2
a = b1 - a1*q = 2 - -7*2 = 16
Returning 16*176 + -7*402 = 2
#!/usr/bin/python3
def output(s):
if trace:
print("{}{}".format( * depth, s))
if m == 0:
output("base case")
a, b, g = 0, 1, n
else:
q = n//m
r = n % m
output("q = {} r = {}".format(q, r))
a1, b1, g = euclid(r, m, trace, depth + 1)
a = b1 - a1*q
b = a1
output("a = b1 - a1*q = {} - {}*{} = {}".format(b1, a1, q, a))
if __name__ == __main__:
import sys
If p is prime and p|ab, then either p|a or p|b. Proof: suppose p 6 |a;
since p is prime we have gcd(p, a) = 1. So there exist r and s such
that rp + sa = 1. Multiply both sides by b to get rpb + sab = b. Then
p|rpb and p|sab (the latter because p|ab), so p divides their sum and
thus p|b.
8.3.1 Applications
Some consequences of unique factorization:
CHAPTER 8. NUMBER THEORY 113
Similarly, for every a and b, we can compute the least common multi-
ple lcm(a, b) by taking the maximum of the exponents on each prime
that appears in the factorization of a or b. It can also be found by
computing lcm(a, b) = ab/ gcd(a, b), which is more efficient for large a
and b because we dont have to factor.
Proof. From Lemma 8.4.1, m|(xx0 ) and m|(yy 0 ). So m|((xx0 )+(yy 0 )),
which we can rearrange as m|((x + y) (x0 + y 0 )). Apply Lemma 8.4.1 in
the other direction to get x + y m x0 + y 0 .
+ 0 1 2 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1
8.4.2 Division in Zm
One thing we dont get general in Zm is the ability to divide. This is not
terribly surprising, since we dont get to divide (without remainders) in Z
either. But for some values of x and m we can in fact do division: for these
x and m there exists a multiplicative inverse x1 (mod m) such that
xx1 = 1 (mod m). We can see the winning xs for Z9 by looking for ones
in the multiplication table for Z9 , given in Table 8.2.
Here we see that 11 = 1, as wed expect, but that we also have 21 = 5,
4 = 7, 51 = 2, 71 = 4, and 81 = 8. There are no inverses for 0, 3, or
1
6.
What 1, 2, 4, 5, 7, and 8 have in common is that they are all relatively
prime to 9. This is not an accident: when gcd(x, m) = 1, we can use the
extended Euclidean algorithm (8.2.2) to find x1 (mod m). Observe that
what we want is some x0 such that xx0 m 1, or equivalently such that
x0 x + qm = 1 for some q. But the extended Euclidean algorithm finds such
an x0 (and q) whenever gcd(x, m) = 1.
CHAPTER 8. NUMBER THEORY 116
0 1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0
1 0 1 2 3 4 5 6 7 8
2 0 2 4 6 8 1 3 5 7
3 0 3 6 0 3 6 0 3 6
4 0 4 8 3 7 2 6 1 5
5 0 5 1 6 2 7 3 8 4
6 0 6 3 0 6 3 0 6 3
7 0 7 5 3 1 8 6 4 2
8 0 8 7 6 5 4 3 2 1
n mod m1 = n1 ,
n mod m2 = n2 ,
n n 1 n2
0 0 0
1 1 1
2 2 2
3 0 3
4 1 0
5 2 1
6 0 2
7 1 3
8 2 0
9 0 1
10 1 2
11 2 3
Proof. Well show an explicit algorithm for constructing the solution. The
first trick is to observe that if a|b, then (x mod b) mod a = x mod a. The
proof is that x mod b = xqb for some q, so (x mod b) mod a = (x mod a)
(qb mod a) = x mod a since any multiple of b is also a multiple of a, giving
qb mod a = 0.
Since m1 and m2 are relatively prime, the extended Euclidean algorithm
gives m01 and m02 such that m01 m1 = 1 (mod m2 ) and m02 m2 = 1 (mod m1 ).
result is due to the fifth-century Indian mathematician Aryabhata. The name Chinese
Remainder Theorem appears to be much more recent. See http://mathoverflow.net/
questions/11951/what-is-the-history-of-the-name-chinese-remainder-theorem
for a discussion of the history of the name.
4
This means that gcd(m1 , m2 ) = 1.
CHAPTER 8. NUMBER THEORY 118
The general version allows for any number of equations, as long as the
moduli are all relatively prime.
(m1
X Y Y
n= ni j (mod mi ))mj mod mi .
i j6=i i
As in the two-modulus case, the factor (m1 j (mod mi ))mj , where m1j
(mod mi ) is the multiplicative inverse of mj mod mi , acts like 1 mod mi
CHAPTER 8. NUMBER THEORY 119
(m1
X Y Y
n mod mk = ni j (mod mi ))mj mod mi mod mk
i j6=i i
= ni (m1
X Y
j (mod mi ))mj mod mk
i j6=i
X
= nk 1 + (ni 0) mod mk
i6=k
= nk .
unless p|n. There are exactly pk1 numbers less than pk that are divisible
by p (they are 0, p, 2p, . . . (pk 1)p), so (pk ) = pk pk1 = pk1 (p 1).5
For composite numbers m that are not prime powers, finding the value of
(m) is more complicated; but we can show using the Chinese Remainder
Theorem (Theorem 8.4.3) that in general
k k
!
pei piei 1 (pi 1).
Y Y
i =
i=1 i=1
Proof. We will prove this using an argument adapted from the proof of
[Big02, Theorem 13.3.2]. Let z1 , z2 , . . . , z(m) be the elements of Zm . For any
n o
y Zm , define yZm = yz1 , yz2 , . . . , yz(m) . Since y has a multiplicative
5
Note that (p) = (p1 ) = p11 (p 1) = p 1 is actually a special case of this.
CHAPTER 8. NUMBER THEORY 120
as claimed.
111 = 11
112 = 121 = 30
114 = 302 = 900 = 81
115 = 114 111 = 81 11 = 891 = 72.
CHAPTER 8. NUMBER THEORY 121
When the recipient (who knows d) receives the encrypted message 72,
they can recover the original by computing 7229 mod 91:
721 = 72
722 = 5184 = 88
724 = 882 = (3)2 = 9
728 = 92 = 81
7216 = 812 = (10)2 = 100 = 9
7229 = 7216 728 724 721 = 9 81 9 72 = 812 72 = 9 72 = 648 = 11.
Note that we are working in Z91 throughout. This is what saves us from
computing the actual value of 7229 in Z,6 and only at the end taking the
remainder.
For actual security, we need m to be large enough that its hard to
recover p and q using presently conceivable factoring algorithms. Typical
applications choose m in the range of 2048 to 4096 bits (so each of p and
q will be a random prime between roughly 10308 and 10617 . This is too
big to show a hand-worked example, or even to fit into the much smaller
integer data types shipped by default in many programming languages, but
its not too large to be able to do the computations efficiently with good
large integer arithmetic library.
6
If youre curious, its 728857113063526668247098229876984590549890725463457792.
Chapter 9
Relations
122
CHAPTER 9. RELATIONS 123
1 2
Figure 9.2: Relation {(1, 2), (1, 3), (2, 3), (3, 1)} represented as a directed
graph
term(e1 ) = term(e2 ).
If we dont care about the labels of the edges, a simple directed graph
can be described by giving E as a subset of V V ; this gives a one-to-one
correspondence between relations on a set V and (simple) directed graphs.
For relations from A to B, we get a bipartite directed graph, where all edges
go from vertices in A to vertices in B.
Directed graphs are drawn using a dot or circle for each vertex and an
arrow for each edge, as in Figure 9.1.
This also gives a way to draw relations. For example, the relation on
{1, 2, 3} given by {(1, 2), (1, 3), (2, 3), (3, 1)} can be depicted as show in Fig-
ure 9.2.
A directed graph that contains no sequence of edges leading back to
their starting point is called a directed acyclic graph or DAG. DAGs are
important for representing partially-ordered sets (see 9.5).
9.1.2 Matrices
A matrix is a two-dimensional analog of a sequence: in full generality, it is
a function A : S T U , where S and T are the index sets of the matrix
(typically {1 . . . n} and {1 . . . m} for some n and m). As with sequences, we
write Aij for A(i, j). Matrices are typically drawn inside square brackets
CHAPTER 9. RELATIONS 124
like this:
0 1 1 0
A = 2 1 0 0
1 0 0 1
The first index of an entry gives the row it appears in and the second one
the column, so in this example A2,1 = 2 and A3,4 = 1. The dimensions
of a matrix are the numbers of rows and columns; in the example, A is a
3 4 (pronounced 3 by 4) matrix.
Note that rows come before columns in both indexing (Aij : i is row, j
is column) and giving dimensions (n m: n is rows, m is columns). Like
the convention of driving on the right (in many countries), this choice is
arbitrary, but failing to observe it may cause trouble.
Matrices are used heavily in linear algebra (Chapter 13), but for the
moment we will use them to represent relations from {1 . . . n} to {1 . . . m},
by setting Aij = 0 if (i, j) is not in the relation and Aij = 1 if (i, j) is. So
for example, the relation on {1 . . . 3} given by {(i, j) | i < j} would appear
in matrix form as
0 1 1
0 0 1 .
0 0 0
When used to represent the edges in a directed graph, a matrix of this
form is called an adjacency matrix.
iRj the order of the product is reversed from the order of composition.
For relations on a single set, we can iterate composition: Rn is defined
by R0 = (=) and Rn+1 = R Rn . (This also works for functions, bearing in
CHAPTER 9. RELATIONS 125
mind that the equality relation is also the constant function.) In directed
graph terms, xRn y if and only if there is a path of exactly n edges from x
to y (possibly using the same edge more than once).
9.2.2 Inverses
Relations also have inverses: xR1 y yRx. Unlike functions, every rela-
tion has an inverse.
examples are:
There are also some common relations that are not partial orders or
strict partial orders but come close. For example, the element-of relation
() is irreflexive and antisymmetric (this ultimately follows from the Axiom
of Foundation) but not transitive; if x y and y z we do not generally
expect x z. The is at least as rich as relation is reflexive and transitive
but not antisymmetric: if you and I have a net worth of 0, we are each as rich
as the other, and yet we are not the same person. Relations that are reflexive
and transitive (but not necessarily antisymmetric) are called quasiorders or
preorders and can be turned into partial orders by defining an equivalence
relation x y if x y and y x and replacing each equivalence class with
respect to by a single element.
As far as I know, there is no standard term for relations that are irreflex-
ive and antisymmetric but not necessarily transitive.
CHAPTER 9. RELATIONS 130
12 12
4 6 4 6
2 3 2 3
1 1
9.5.2 Comparability
In a partial order, two elements x and y are comparable if x y or y x.
Elements that are not comparable are called incomparable. In a Hasse
2
There is special terminology for this situation: such an x is called a predecessor
or sometimes immediate predecessor of y; y in turn is a successor or sometimes
immediate successor of x.
CHAPTER 9. RELATIONS 131
diagram, comparable elements are connected by a path that only goes up.
For example, in Figure 9.3, 3 and 4 are not comparable because the only
paths between them requiring going both up and down. But 1 and 12 are
both comparable to everything.
9.5.3 Lattices
A lattice is a partial order in which (a) each pair of elements x and y has a
unique greatest lower bound or meet, written x y, with the property
that (x y) x, (x y) y, and z (x y) for any z with z x and z y;
and (b) each pair of elements x and y has a unique least upper bound or
join, written x y, with the property that (x y) x, (x y) y, and
z (x y) for any z with z x and z y.
Examples of lattices are any total order (x y is min(x, y), x y is
max(x, y)), the subsets of a fixed set ordered by inclusion (x y is x y,
x y is x y), and the divisibility relation on the positive integers (x y
is the greatest common divisor, x y is the least common multiplesee
Chapter 8). Products of lattices with the product order are also lattices:
(x1 , x2 )(y1 , y2 ) = (x1 1 y1 , x2 y2 ) and (x1 , x2 )(y1 , y2 ) = (x1 1 y1 , x2 y2 ).
3
b c d i j
a e f g h
Figure 9.4: Maximal and minimal elements. In the first poset, a is minimal
and a minimum, while b and c are both maximal but not maximums. In the
second poset, d is maximal and a maximum, while e and f are both minimal
but not minimums. In the third poset, g and h are both minimal, i and j
are both maximal, but there are no minimums or maximums.
12
12 4
4 6
6
2
2 3
1 3
Figure 9.5: Topological sort. On the right is a total order extending the
partial order on the left.
sort exist. We wont bother with efficiency, and will just use the basic idea
to show that a total extension of any finite partial order exists.
The simplest version of this algorithm is to find a minimal element, put
it first, and then sort the rest of the elements; this is similar to selection
sort, an algorithm for doing ordinary sorting, where we find the smallest
element of a set and put it first, find the next smallest element and put it
second, and so on. In order for the selection-based version of topological
sort to work, we have to know that there is, in fact, a minimal element.4
Lemma 9.5.1. Every nonempty finite partially-ordered set has a minimal
element.
Proof. Let (S, ) be a nonempty finite partially-ordered set. We will prove
that S contains a minimal element by induction on |S|.
If |S| = 1, then S = {x} for some x; x is the minimal element.
Now consider some S with |S| > 2. Pick some element x S, and let
T = S \ {x}. Then by the induction hypothesis, T has a minimal element
y, and since |T | 2, T has at least one other element z 6 y.
4
There may be more than one, but one is enough.
CHAPTER 9. RELATIONS 134
2. a = x. Then x 0S b always.
3. b = x. Then a S x a = x a 0S x.
Next, let us show that 0S is a partial order. This requires verifying that
adding x to T doesnt break reflexivity, antisymmetry, or transitivity. For
reflexivity, x x from the first case of the definition. For antisymmetry, if
y 0S x then y = x, since y 60T x for any y. For transitivity, if x 0S y 0S z
then x 0S z (since x 0S z for all z in S), and if y 0S x 0S z then y = x 0S z
and if y 0S z 0S x then y = z = x.
Finally, lets make sure that we actually get a total order. This means
showing that any y and z in S are comparable. If y 60S z, then y 6= x, and
either z = x or z T and y 60T z implies z 0T y. In either case z 0S y.
The case y 60S z is symmetric.
transitivity, and repeat. Unfortunately this process may take infinitely long,
so we have to argue that it converges in the limit to a genuine total order
using a tool called Zorns lemma, which itself is a theorem about partial
orders.5
Then T is reflexive (because it contains all the pairs (x, x) that are in R) and transitive (by
a tedious case analysis), and antisymmetric (by another tedious case analysis), meaning
that it is a partial order that extends Rand thus an element of Swhile also being a
proper superset of R. But this contradicts the assumption that R is maximal. So R is in
fact the total order we are looking for.
6
Proof: We can prove that any nonempty S N has a minimum in a slightly round-
about way by induction. The induction hypothesis (for x) is that if S contains some
element y less than or equal to x, then S has a minimum element. The base case is when
x = 0; here x is the minimum. Suppose now that the claim holds for x. Suppose also that
S contains some element y x + 1; if not, the induction hypothesis holds vacuously. If
there is some y x, then S has a minimum by the induction hypothesis. The alternative
is that there is no y in S such that y x, but there is a y in S with y x + 1. This y
must be equal to x + 1, and so y is the minimum.
CHAPTER 9. RELATIONS 136
9.6 Closures
In general, the closure of some mathematical object with respect to a
given property is the smallest larger object that has the property. Usu-
ally smaller and larger are taken to mean subset or superset, so we are
really looking at the intersection of all larger objects with the property. Such
a closure always exists if the property is preserved by intersection (formally,
if (i : P (Si )) P (Si )) and every object has at least one larger object
with the property.
This rather abstract definition can be made more explicit for certain
kinds of closures of relations. The reflexive closure of a relation R (whose
domain and codomain are equal) is the smallest super-relation of R that
is reflexive; it is obtained by adding (x, x) to R for all x in Rs domain.
CHAPTER 9. RELATIONS 137
0 1 2
0 1 2
0 1 2
0 1 2
2 4 4
1 6 123
3 5 56
9.6.1 Examples
Let R be the relation on subsets of N given by xRy if there exists some
n 6 x such that y = x {n}. The transitive closure of R is the proper
subset relation , where x y if x y but x 6= y. The reflexive
transitive closure R of R is just the ordinary subset relation . The
reflexive symmetric transitive closure of R is the complete relation;
given any two sets x and y, we can get from x to via (R )1 and
then to y via R . So in this case the reflexive symmetric transitive
closure is not very interesting.
Graphs
140
CHAPTER 10. GRAPHS 141
10.1.3 Hypergraphs
In a hypergraph, the edges (called hyperedges) are arbitrary nonempty
sets of vertices. A k-hypergraph is one in which all such hyperedges con-
nected exactly k vertices; an ordinary graph is thus a 2-hypergraph.
Hypergraphs can be drawn by representing each hyperedge as a closed
curve containing its members, as in the left-hand side of Figure 10.3.
Hypergraphs arent used very much, because it is always possible (though
not always convenient) to represent a hypergraph by a bipartite graph.
In a bipartite graph, the vertex set can be partitioned into two subsets S
and T , such that every edge connects a vertex in S with a vertex in T .
To represent a hypergraph H as a bipartite graph, we simply represent the
vertices of H as vertices in S and the hyperedges of H as vertices in T , and
put in an edge (s, t) whenever s is a member of the hyperedge t in H. The
right-hand side of Figure 10.3 gives an example.
CHAPTER 10. GRAPHS 143
1
1 2
2
3
3 4
4
Such graphs are often labeled with edge lengths, prices, etc. In com-
puter networking, the design of network graphs that permit efficient routing
of data without congestion, roundabout paths, or excessively large routing
CHAPTER 10. GRAPHS 144
K1 K2 K3 K4
K5 K6 K7
K8 K9 K10
C3 C4 C5
C6 C7 C8
C9 C10 C11
Path Pn . This has vertices {0, 1, 2, . . . n} and an edge from i to i+1 for
each i. Note that, despite the usual convention, n counts the number
of edges rather than the number of vertices; we call the number of
edges the length of the path. See Figure 10.6.
P0 P1 P2 P3 P4
K3,4
The cube Qn . This is defined by letting the vertex set consist of all
n-bit strings, and putting an edge between u and u0 if u and u0 differ
in exactly one place. It can also be defined by taking the n-fold square
product of an edge with itself (see 10.6).
Graphs may not always be drawn in a way that makes their structure
obvious. For example, Figure 10.9 shows two different presentations of Q3 ,
neither of which looks much like the other.
0 1
0 1
4 5
4 5
2 3
6 7
6 7
2 3
2 4 2 4
1 6 1
3 5 5
2 2
1 1 45
3 3
Figure 10.10: Examples of subgraphs and minors. Top left is the original
graph. Top right is a subgraph that is not an induced subgraph. Bottom
left is an induced subgraph. Bottom right is a minor.
10.6.1 Functions
A function from a graph G to another graph H typically maps VG to VH ,
with the edges coming along for the ride. Functions between graphs can be
classified based on what they do to the edges:
makes sense to talk about this in terms of reachability, or whether you can
get from one vertex to another along some path.
Connected components
10.8 Cycles
The standard cycle graph Cn has vertices {0, 1, . . . , n 1} with an edge from
i to i + 1 for each i and from n 1 to 0. To avoid degeneracies, n must be
at least 3. A simple cycle of length n in a graph G is an embedding of Cn
in G: this means a sequence of distinct vertices v0 v1 v2 . . . vn1 , where each
pair vi vi+1 is an edge in G, as well as vn1 v0 . If we omit the requirement
that the vertices are distinct, but insist on distinct edges instead, we have a
cycle. If we omit both requirements, we get a closed walk; this includes
very non-cyclic-looking walks like the short excursion uvu. We will mostly
worry about cycles.1 See Figure 10.11
1
Some authors, including Ferland [Fer08], reserve cycle for what we are calling a simple
cycle, and use circuit for cycle.
CHAPTER 10. GRAPHS 154
2 4 2
1 6 1
3 5 3 5
2 4 2 4
1 1 6
3 5 3 5
Figure 10.11: Examples of cycles and closed walks. Top left is a graph. Top
right shows the simple cycle 1253 found in this graph. Bottom left shows
the cycle 124523, which is not simple. Bottom right shows the closed walk
12546523.
The converse of this lemma is trivial: any simple path is also a path.
Essentially the same argument works for cycles:
Lemma 10.9.2. If there is a cycle in G, there is a simple cycle in G.
Proof. As in the previous lemma, we prove that there exists a simple cycle
if there is a cycle of length k for any k, by induction of k. The base case is
k = 3: all 3-cycles are simple. For larger k, if v0 v1 . . . vk1 is a k-cycle that
is not simple, there exist i < j with vi = vj ; patch the edges between them
CHAPTER 10. GRAPHS 156
dG (v) 2 = 2m 2, giving
P P
So vV vV dG (v) = 2m.
Theorem 10.9.4. A graph is a tree if and only if there is exactly one simple
path between any two distinct vertices.
CHAPTER 10. GRAPHS 157
Because a graph with two vertices and fewer than one edges is not con-
nected, Lemma 10.9.5 implies that any graph with fewer than |V | 1 edges
is not connected.
Proof. By induction on n = |V |.
For the base case, if n = 0, then |E| = 0 6< n 1.
For larger n, suppose that n 1 and |E| < n1. From Lemma 10.9.3 we
have v d(v) < 2n 2, from which it follows that there must be at least one
P
In the other direction, combining the lemma with the fact that the unique
graph K3 with three vertices and at least three edges is cyclic tells us that
any graph with at least as many edges as vertices is cyclic.
Proof. By induction on n = |V |.
6 |V 1|, so the claim holds vacuously.2
For n 2, |E| >
For larger n, there are two cases:
1. G is connected.
2. G is acyclic.
3. |E| = |V | 1.
Proof. We will use induction on n for some parts of the proof. The base
case is when n = 1; then all three statements hold always. For larger n, we
show:
(1) and (2) imply (3): Use Corollary 10.9.6 and Corollary 10.9.7.
(1) and (3) imply (2). From Lemma 10.9.3, vV d(v) = 2(n1) < 2n.
P
(2) and (3) imply (1). As in the previous case, G contains a vertex
v with d(v) 1. If d(v) = 1, then G v is a nonempty graph with
n 2 edges and n 1 vertices that is acyclic by Lemma 10.9.5. It is
thus connected by the induction hypothesis, so G is also connected by
Lemma 10.9.5. If d(v) = 0, then G v has n 1 edges and n 1
vertices. From Corollary 10.9.7, G v contains a cycle, contradicting
(2).
Proof. (Only if part). Fix some cycle, and orient the edges by the
direction that the cycle traverses them. Then in the resulting directed
graph we must have d (u) = d+ (u) for all u, since every time we enter
a vertex we have to leave it again. But then d(u) = 2d+ (u) is even.
(If part). Suppose now that d(u) is even for all u. We will construct
an Eulerian cycle on all nodes by induction on |E|. The base case
is when |E| = 2|V | and G = C|V | . For a larger graph, choose some
starting node u1 , and construct a path u1 u2 . . . by choosing an arbi-
trary unused edge leaving each ui ; this is always possible for ui 6= u1
since whenever we reach ui we have always consumed an even num-
ber of edges on previous visits plus one to get to it this time, leaving
at least one remaining edge to leave on. Since there are only finitely
many edges and we can only use each one once, eventually we must
get stuck, and this must occur with uk = u1 for some k. Now delete
CHAPTER 10. GRAPHS 161
Why doesnt this work for Hamiltonian cycles? The problem is that in a
Hamiltonian cycle we have too many choices: out of the d(u) edges incident
to u, we will only use two of them. If we pick the wrong two early on, this
may prevent us from ever fitting u into a Hamiltonian cycle. So we would
need some stronger property of our graph to get Hamiltonicity.
Chapter 11
Counting
162
CHAPTER 11. COUNTING 163
|A B| = |A| + |B|.
2. |A| y < |A| + |B|. In this case 0 y |A| < |B|, putting y |A| in
the codomain of g and giving h(h1 (y)) = g(g 1 (y |A|)) + |A| = y.
One way to think about this proof is that we are constructing a total
order on A B by putting all the A elements before all the B elements. This
gives a straightforward bijection with [|A| + |B|] by the usual preschool trick
of counting things off in order.
Generalizations: If A1 , A2 , A3 . . . Ak are pairwise disjoint (i.e., Ai
Aj = for all i 6= j), then
k k
[ X
Ai = |Ai |.
i=1 i=1
11.1.4 Subtraction
For any sets A and B, A is the disjoint union of A B and A \ B. So
|A| = |A B| + |A \ B| (for finite sets) by the sum rule. Rearranging gives
|A \ B| = |A| |A B|. (11.1.1)
CHAPTER 11. COUNTING 166
Proof. Compute
|A B| = |A B| + |A \ B| + |B \ A|
= |A B| + (|A| |A B|) + (|B| |A B|)
= |A| + |B| |A B|.
follows that |L| = |A| + |B| and |R| = |A B| + |A B|. Now define the
function f : L R by the rule
|A B| = |A| |B|.
where the product on the left is a Cartesian product and the product on the
right is an ordinary integer product.
CHAPTER 11. COUNTING 168
11.1.5.1 Examples
As I was going to Saint Ives, I met a man with seven sacks, and every
sack had seven cats. How many cats total? Answer: Label the sacks
0, 1, 2, . . . , 6, and label the cats in each sack 0, 1, 2, . . . , 6. Then each cat
can be specified uniquely by giving a pair (sack number, cat number),
giving a bijection between the set of cats and the set 7 7. Since
|7 7| = 7 7 = 49, we have 49 cats.
How many different ways can you order n items? Call this quantity
n! (pronounced n factorial). With 0 or 1 items, there is only one
way; so we have 0! = 1! = 1. For n > 1, there are n choices for the
first item, leaving n 1 items to be ordered. From the product rule we
thus have n! = n (n 1)!, which we could also expand out as ni=1 i.
Q
The secret of why its called a binomial coefficient will be revealed when we
talk about generating functions in 11.3.
Example: Heres a generalization of binomial coefficients: let the multi-
nomial coefficient !
n
n1 n2 . . . nk
be the number of different ways to distribute n items among k bins where the
i-th bin gets exactly ni of the items and we dont care what order the items
appear in each bin. (Obviously this only makes sense if n1 +n2 + +nk = n.)
Can we find a simple formula for the multinomial coefficient?
Here are two ways to count the number of permutations of the n-element
set:
1. Pick the first element, then the second, etc. to get n! permuations.
2. Generate a permutation in three steps:
(a) Pick a partition of the n elements into groups of size n1 , n2 , . . . nk .
(b) Order the elements of each group.
(c) Paste the groups together into a single ordered list.
There are !
n
n1 n2 . . . nk
ways to pick the partition and
n1 ! n2 ! nk !
If its OK if some people dont get a car at all, then you can imagine
putting n cars and k 1 dividers in a line, where relative 1 gets all
the cars up to the first divider, relative 2 gets all the cars between the
first and second dividers, and so forth up to relative k who gets all
the cars after the (k 1)-th divider. Assume that each carand each
dividertakes one parking space. Then you have n + k 1 parking
spaces with k 1 dividers in them (and cars in the rest). There are
n+k1
exactly k1 ways to do this.
Alternatively, suppose each relative demands at least 1 car. Then you
can just hand out one car to each relative to start with, leaving n k
cars to divide as in the previous case. There are (nk)+k1 n1
k1 = k1
ways to do this.
As always, whenever some counting problem turns out to have an eas-
ier answer than expected, its worth trying to figure out if there is a
more direct combinatorial proof. In this case we want to encode as-
signments of at least one of n cars to k people, so that this corresponds
to picking k 1 out of n 1 things. One way to do this is to imagine
lining up all n cars, putting each relative in front of one of the cars,
and giving them that car plus any car to the right until we hit the next
relative. In order for this to assign all the cars, we have to put the
leftmost relative in front of the leftmost car. This leaves n 1 places
for the k 1 remaining relatives, giving n1
k1 choices.
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
(3, 1)
(3, 2)
(3, 4)
(3, 5)
(3, 6)
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
(4, 6)
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 6)
(6, 1)
(6, 2)
(6, 3)
(6, 4)
(6, 5)
(6, 6)
6
Without looking at the list, can you say which 3 of the 62 = 36 possible length-2
sequences are missing?
CHAPTER 11. COUNTING 176
Adding the two cases together (using the sum rule), we conclude that
the identity holds.
Using the base case and Pascals identity, we can construct Pascals
triangle, a table of values of binomial coefficients:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
...
entry, we add together the entry directly above it and the entry diagonally
above and to the left.
as advertised.
the whole set if we limit ourselves to choosing exactly k from the last n.
The identity follow by summing over all possible values of k.
So now consider
m+n
!
X m+n r
x = (1 + x)m+n
r=0
r
= (1 + x)n (1 + x)m
n
! ! m !
X n X m
= xi xj
i=0
i j=0
j
m+n r
! !!
X X n m
= xr .
r=0 k=0
k rk
Theorem 11.2.2.
n
[
|S|+1
X \
Ai = (1) Aj . (11.2.3)
i=1 S{1...n},S6= jS
This rather horrible expression means that to count the elements in the
union of n sets A1 through An , we start by adding up all the individual sets
|A1 | + |A2 | + . . . |An |, then subtract off the overcount from elements that
appear in two sets |A1 A2 | |A1 A3 | . . . , then add back the resulting
undercount from elements that appear in three sets, and so on.
Why does this work? Consider a single element x that appears in k of
k k
the sets. Well count it as +1 in 1 individual sets, as 1 in 2 pairs, +1
in k3 triples, and so on, adding up to
k k k
! !! ! !
X
k+1 k X
k k X
k k
(1) = (1) = (1) 1 = (0 1) = 1.
i=1
i i=1
i i=0
i
CHAPTER 11. COUNTING 182
This turns out to actually be correct, since applying the geometric series
formula turns the last line into
1 1 1 1
= = ,
z 1 1/z z1 1z
We can now read off the number of words of each length directly off the
coefficients of this polynomial.
In some cases, the sum has a more compact representation. For example,
we have
1 X
= zi,
1z i=0
This almost gets us the representation for the series iai , but the expo-
nents on the zs are off by one. But thats easily fixed:
d X X
z F (z) = z ai iz i1 = ai iz i .
dz i=0 i=0
d z z 2z 2
z 2
= 2
+ .
dz (1 z) (1 z) (1 z)3
As you can see, some generating functions are prettier than others.
(We can also use integration to divide each term by i, but the details are
messier.)
Another way to get the sequence 0, 1, 2, 3, 4, . . . is to observe that it
satisfies the recurrence:
a0 = 0.
CHAPTER 11. COUNTING 187
an z n + 1/(1 z). The first term on the right-hand side is the generating
P
function for an , which we can call F (z) so we dont have to keep writing it
out. The second term is just the generating function for 1, 1, 1, 1, 1, . . . . But
what about the left-hand side? This is almost the same as F (z), except the
coefficients dont match up with the exponents. We can fix this by dividing
F (z) by z, after carefully subtracting off the a0 term:
!
X
n
(F (z) a0 )/z = an z a0 /z
n=0
!
X
= an z n /z
n=1
X
= an z n1
n=1
X
= an+1 z n .
n=0
pad each weight-k object out to weight n in exactly one way using n k
junk objects, i.e. multiply F (z) by 1/(1 z).
1 X
= zi
1z i=0
z X
= iz i
(1 z)2 i=0
n
! !
n
X n i
X n
(1 + z) = z = zi
i=0
i i=0
i
!
1 X n+i1
= zi
(1 z)n i=0
i
Of these, the first is the most useful to remember (its also handy for
remembering how to sum geometric series). All of these equations can be
proven using the binomial theorem.
11.3.4.3 Repetition
Now let C consists of all finite sequences of objects in A, with the weight
of each sequence equal to the sum of the weights of its elements (0 for an
empty sequence). Let H(z) be the generating function for C. From the
preceding rules we have
1
H = 1 + F + F2 + F3 + = .
1F
This works best when H(0) = 0; otherwise we get infinitely many weight-
0 sequences. Its also worth noting that this is just a special case of substi-
tution (see below), where our outer generating function is 1/(1 z).
Example: (0|11) Let A = {0, 11}, and let C be the set of all sequences
of zeros and ones where ones occur only in even-length runs. Then the
generating function for A is z + z 2 and the generating function for C is
1/(1zz 2 ). We can extract exact coefficients from this generating function
using the techniques below.
This means that there is 1 way to express 0 (the empty sum), and 2n1
ways to express any larger value n (e.g. 241 = 8 ways to express 4).
Once we know what the right answer is, its not terribly hard to come
up with a combinatorial explanation. The quantity 2n1 counts the number
of subsets of an (n 1)-element set. So imagine that we have n 1 places
and we mark some subset of them, plus add an extra mark at the end; this
might give us a pattern like XX-X. Now for each sequence of places ending
with a mark we replace it with the number of places (e.g. XX-X = 1, 1, 2,
X--X-X---X = 1, 3, 2, 4). Then the sum of the numbers we get is equal to n,
because its just counting the total length of the sequence by dividing it up
at the marks and the adding the pieces back together. The value 0 doesnt
fit this pattern (we cant put in the extra mark without getting a sequence
of length 1), so we have 0 as a special case again.
If we are very clever, we might come up with this combinatorial expla-
nation from the beginning. But the generating function approach saves us
from having to be clever.
CHAPTER 11. COUNTING 192
11.3.4.4 Pointing
This operation is a little tricky to describe. Suppose that we can think of
each weight-k object in A as consisting of k items, and that we want to count
not only how many weight-k objects there are, but how many ways we can
produce a weight-k object where one of its k items has a special mark on
it. Since there are k different items to choose for each weight-k object, we
are effectively multiplying the count of weight-k objects by k. In generating
function terms, we have
d
H(z) = z F (z).
dz
Repeating this operation allows us to mark more items (with some items
possibly getting more than one mark). If we want to mark n distinct items
in each object (with distinguishable marks), we can compute
dn
H(z) = z n F (z),
dz n
where the repeated derivative turns each term ai z i into ai i(i1)(i2) . . . (i
n + 1)z in and the z n factor fixes up the exponents. To make the marks
indistinguishable (i.e., we dont care what order the values are marked in),
divide by n! to turn the extra factor into ni .
(If you are not sure how to take a derivative, look at F.2.)
Example: Count the number of finite sequences of zeros and ones where
exactly two digits are underlined. The generating function for {0, 1} is 2z,
so the generating function for sequences of zeros and ones is F = 1/(1 2z)
by the repetition rule. To mark two digits with indistinguishable marks, we
need to compute
1 2 d2 1 1 2 d 2 1 2 8 4z 2
z = z = z = .
2 dz 2 1 2z 2 dz (1 2z)2 2 (1 2z)3 (1 2z)3
11.3.4.5 Substitution
Suppose that the way to make a C-thing is to take a weight-k A-thing and
attach to each its k items a B-thing, where the weight of the new C-thing
is the sum of the weights of the B-things. Then the generating function for
C is the composition F (G(z)).
Why this works: Suppose we just want to compute the number of C-
things of each weight that are made from some single specific weight-k A-
thing. Then the generating function for this quantity is just (G(z))k . If we
expand our horizons to include all ak weight-k A-things, we have to multiply
CHAPTER 11. COUNTING 193
But this is just what we get if we start with F (z) and substitute G(z)
for each occurrence of z, i.e. if we compute F (G(z)).
that have i xs and j ys. (There is also the obvious generalization to more
than two variables). Consider the multivariate generating function for the
set {0, 1}, where x counts zeros and y counts ones: this is just x + y. The
multivariate generating function for sequences of zeros and ones is 1/(1
x y) by the repetition rule. Now suppose that each 0 is left intact but
each 1 is replaced by 11, and we want to count the total number of strings
by length, using z as our series variable. So we substitute z for x and z 2
for y (since each y turns into a string of length 2), giving 1/(1 z z 2 ).
This gives another way to get the generating function for strings built by
repeating 0 and 11.
becomes
G = (zG z) + z 2 G + 1 + z.
(here F = 0).
Solving for G gives
G = 1/(1 z z 2 ).
Unfortunately this is not something we recognize from our table, al-
though it has shown up in a couple of examples. (Exercise: Why does the
recurrence T (n) = T (n 1) + T (n 2) count the number of strings built
from 0 and 11 of length n?) In the next section we show how to recover a
closed-form expression for the coefficients of the resulting series.
2. To find the k-th coefficient of F (z), compute the k-th derivative dk /dz k F (z)
and divide by k! to shift ak to the z 0 term. Then substitute 0 for z.
For example, if F (z) = 1/(1 z) then a0 = 1 (no differentiating),
a1 = 1/(1 0)2 = 1, a2 = 1/(1 0)3 = 1, etc. This usually only
works if the derivatives have a particularly nice form or if you only
care about the first couple of coefficients (its particularly effective if
you only want a0 ).
3. If the generating function is of the form 1/Q(z), where Q is a poly-
nomial with Q(0) 6= 0, then it is generally possible to expand the
generating function out as a sum of terms of the form Pc /(1 z/c)
where c is a root of Q (i.e. a value such that Q(c) = 0). Each de-
nominator Pc will be a constant if c is not a repeated root; if c is a
repeated root, then Pc can be a polynomial of degree up to one less
than the multiplicity of c. We like these expanded solutions because
we recognize 1/(1 z/c) = i ci z i , and so we can read off the coef-
P
1 B(1 az)
=A+ .
1 bz 1 bz
Now plug in z = 1/a to get
1
= A + 0.
1 b/a
With a bit of tweaking, we can get rid of the sums on the RHS by
CHAPTER 11. COUNTING 197
= z + zF (z) + 2z 2 F (z).
z z A B
Now solve for F (z) to get F (x) = 1z2z 2 = (1+z)(12z) = z 1+z + 12z ,
where we need to solve for A and B.
We can do this directly, or we can use the cover-up method. The cover-
up method is easier. Setting z = 1 and covering up 1 + z gives A =
1/(1 2(1)) = 1/3. Setting z = 1/2 and covering up 1 2z gives B =
1/(1 + z) = 1/(1 + 1/2) = 2/3. So we have
(1/3)z (2/3)z
F (z) = +
1+z 1 2z
X (1)n n+1 X 2 2n n+1
= z + z
n=0
3 n=0
3
X (1)n1 n X 2n n
= z + z
n=1
3 n=1
3
n
2 (1)n
X
n
= z .
n=1
3
n(1)
This gives f (0) = 0 and, for n 1, f (n) = 2 3 n . Its not hard to
check that this gives the same answer as the recurrence.
Since each part can be chosen independently of the other two, the generating
function for all three parts together is just the product:
1
.
(1 z)(1 2z)(1 3z)
Lets use the cover-up method to convert this to a sum of partial frac-
tions. We have
1 1
1
1 (12)(13) (1 12 )(1 32 ) (1 13 )(1 32 )
= + +
(1 z)(1 2z)(1 3z) 1z 1 2z 1 3z
1 9
2 4 2
= + + .
1z 1 2z 1 3z
So the exact number of length-n sequences is (1/2) 4 2n + (9/2) 3n .
We can check this for small n:
n Formula Strings
0 1/2 4 + 9/2 = 1 ()
1 1/2 8 + 27/2 = 6 M, O, U, G, H, K
2 1/2 16 + 81/2 = 25 M M, M O, M U, M G, M H, M K, OO, OU, OG, OH, OK, U O, U U, U G, U H
3 1/2 32 + 243/2 = 90 (exercise)^
1 3 1 2 1 2 9 2
4z 4z 15 z z z
= + + + 6 + 10 .
1 + 2z 1 6z 1z 1 + 2z 1 6z
From this we can immediately read off the value of T (n) for n 2:
1 3 1 1 9
T (n) = (2)n1 + 6n1 + (2)n2 + 6n2
4 4 15 6 10
1 1 1 1 1
= (2)n + 6n + (2)n + 6n
8 8 15 24 40
3 n 1 n 1
= 6 (2) .
20 12 15
Lets check this against the solutions we get from the recurrence itself:
n T (n)
0 0
1 1
2 1 + 4 1 + 12 0 = 5
3 1 + 4 5 + 12 1 = 33
4 1 + 4 33 + 12 5 = 193
Well try n = 3, and get T (3) = (3/20) 216 + 8/12 1/15 = (3 3 216 +
40 4)/60 = (1944 + 40 4)/60 = 1980/60 = 33.
To be extra safe, lets try T (2) = (3/20) 36 4/12 1/15 = (3 3 36
20 4)/60 = (324 20 4)/60 = 300/60 = 5. This looks good too.
The moral of this exercise? Generating functions can solve ugly-looking
recurrences exactly, but you have to be very very careful in doing the math.
CHAPTER 11. COUNTING 200
a0 = a0
a1 = 2a0 + 1
a2 = 4a0 +2+2= 4a0 + 4
a3 = 8a0 +8+3= 8a0 + 11
a4 = 16a0 + 22+4= 16a0 + 26
1 A0
1 2 = (1 z)2 + B.
(1 2 )
The reason for the large n caveat is that z 2 /(1 z)2 doesnt generate
precisely the sequence xn = n1, since it takes on the values 0, 0, 1, 2, 3, 4, . . .
instead of 1, 0, 1, 2, 3, 4, . . . . Similarly, the power series for z/(1 2z) does
not have the coefficient 2n1 = 1/2 when n = 0. Miraculously, in this
particular example the formula works for n = 0, even though it shouldnt:
2(n 1) is 2 instead of 0, but 4 2n1 is 2 instead of 0, and the two errors
cancel each other out.
Solving for the PFE using the extended cover-up method It is also
possible to extend the cover-up method to handle repeated roots. Here we
choose a slightly different form of the partial fraction expansion:
1 A B C
= + + .
(1 z)2 (1 2z) (1 z)2 1 z 1 2z
Here A, B, and C are all constants. We can get A and C by the cover-up
method, where for A we multiply both sides by (1 z)2 before setting z = 1;
this gives A = 1/(1 2) = 1 and C = 1/(1 12 )2 = 4. For B, if we multiply
both sides by (1 z) we are left with A/(1 z) on the right-hand side and
a (1 z) in the denominator on the left-hand side. Clearly setting z = 1 in
this case will not help us.
The solution is to first multiply by (1 z)2 as before but then take a
derivative:
1 A B C
= + +
(1 z)2 (1 2z) (1 z)2 1 z 1 2z
1 C(1 z)2
= A + B(1 z) +
1 2z 1 2z !
d 1 d C(1 z)2
= A + B(1 z) +
dz 1 2z dz 1 2z
2 2C(1 z) 2C(1 z)2
= B + +
(1 2z)2 1 2z (1 2z)2
Now if we set z = 1, every term on the right-hand side except B
becomes 0, and we get B = 2/(1 2)2 or B = 2.
Plugging A, B, and C into our original formula gives
1 1 2 4
= + + ,
(1 z)2 (1 2z) (1 z)2 1 z 1 2z
and thus
z a0 1 2 4 a0
F = 2
+ =z 2
+ + + .
(1 z) (1 2z) 1 2z (1 z) 1 z 1 2z 1 2z
CHAPTER 11. COUNTING 203
an = 4 2n1 n 2 + a0 2n = 2n+1 + 2n a0 n 2.
More examples:
11.3.8.1 Example
Lets derive the formula for 1 + 2 + + n. Well start with the generating
function for the series ni=0 z i , which is (1 z n + 1)/(1 z). Applying the
P
d
z dz method gives us
n
X d 1 z n+1
iz i = z
i=0
dz 1 z
!
1 (n + 1)z n z n+1
=z
(1 z)2 1z (1 z)2
z (n + 1)z n+1 + nz n+2
= .
(1 z)2
8
The justification for doing this is that we know that a finite sequence really has a finite
n+1
sum, so the singularity appearing at z = 1 in e.g. 1z
1z
is an artifact of the generating-
function representation rather than the original seriesits a removable singularity that
can be replaced by the limit of f (x)/g(x) as x c.
CHAPTER 11. COUNTING 205
F = 1 + zF 2 .
1/2
For n 1, we can expand out the n terms as
!
1/2 (1/2)(n)
=
n n!
1 n1
Y
= (1/2 k)
n! k=0
1 n1
Y 1 2k
=
n! k=0 2
(1)n n1
Y
= n
(2k 1)
2 n! k=0
2n2
(1)n
Q
k=1 k
= n Qn1
2 n! k=1 2k
(1)n (2n 2)!
= n
n1
2 n! 2 (n 1)!
(1)n (2n 2)!
= 2n1
2 n!(n 1)!
(1)n (2n 1)!
= 2n1
2 (2n 1) n!(n 1)!
!
(1)n 2n 1
= 2n1 .
2 (2n 1) n
For n = 0, the switch from the big product of odd terms to (2n 2)!
divided by the even terms doesnt work, because (2n 2)! is undefined. So
here we just use the special case 1/2
0 = 1.
CHAPTER 11. COUNTING 207
Here we choose minus for the plus-or-minus to get the right answer and
then do a little bit of tidying up of the binomial coefficient.
We can check the first few values of f (n):
n f (n)
0
0 0 = 1
1 (1/2) 21 = 1
(1/3) 42 = 6/3 = 2
2
(1/4) 63 = 20/4 = 5
3
and these are consistent with what we get if we draw all the small binary
trees by hand.
1 2n
The numbers n+1 n show up in a lot of places in combinatorics, and
are known as the Catalan numbers.
CHAPTER 11. COUNTING 208
11.3.11 Variants
The exponential generating function or egf for a sequence a0 , . . . is
given by F (z) = an z n /n!. For example, the egf for the sequence 1, 1, 1, . . .
P
Probability theory
Here are two examples of questions we might ask about the likelihood of
some event:
210
CHAPTER 12. PROBABILITY THEORY 211
2. P () = 1.
Its unusual for anybody doing probability to actually write out the
details of the probability space like this. Much more often, a writer will
just assert the probabilities of a few basic events (e.g. Pr [{H}] = 1/2),
and claim that any other probability that can be deduced from these initial
probabilities from the axioms also holds (e.g. Pr [{T}] = 1 Pr [{H}] =
1/2). The main reason Kolmogorov gets his name attached to the axioms
is that he was responsible for Kolmogorovs extension theorem, which
says (speaking very informally) that as long as your initial assertions are
consistent, there exists a probability space that makes them and all their
consequences true.
12.1.2.1 Examples
A random bit has two outcomes, 0 and 1. Each occurs with proba-
bility 1/2.
A die roll has six outcomes, 1 through 6. Each occurs with probability
1/6.
A roll of two dice has 36 outcomes (order of the dice matters). Each
occurs with probability 1/36.
A random n-bit string has 2n outcomes. Each occurs with probability
2n . The probability that exactly one bit is a 1 is obtained by counting
all strings with a single 1 and dividing by 2n . This gives n2n .
A poker hand consists of a subset of 5 cards drawn uniformly at
random from a deck of 52 cards. Depending on whether the order of
the 5 cards is considered important (usually it isnt), there are either
52
5 or (52)5 possible hands. The probability of getting a flush (all five
13 52
cards in the hand drawn from the same suit of 13 cards) is 4 5 / 5 ;
there are 4 choices of suits, and 13
5 ways to draw 5 cards from each
suit.
A random permutation on n items has n! outcomes, one for each
possible permutation. A typical event might be that the first element
of a random permutation of 1 . . . n is 1; this occurs with probability
(n1)!/n! = 1/n. Another example of a random permutation might be
a uniform shuffling of a 52-card deck (difficult to achieve in practice!).
Here, the probability that we get a particular set of 5 cards as the first
5 in the deck is obtained by counting all the permutations that have
those 5 cards in the first 5 positions (there are 5! 47! of them) divided
52
by 52!. The result is the same 1/ 5 that we get from the uniform
poker hands.
12.1.3.1 Examples
What is the probability of getting two heads on independent fair
coin flips? Calculate it directly from the definition of independence:
Pr [H1 H2 ] = (1/2)(1/2) = 1/4.
Suppose the coin-flips are not independent (maybe the two coins are
glued together). What is the probability of getting two heads? This
can range anywhere from zero (coin 2 always comes up the opposite
of coin 1) to 1/2 (if coin 1 comes up heads, so does coin 2).
What is the probability that both you and I draw a flush (all 5 cards
the same suit) from the same poker deck? Since we are fighting over
the same collection of same-suit subsets, wed expect Pr [A B] 6=
Pr [A] Pr [B]the event that you get a flush (A) is not independent
of the event that I get a flush (B), and wed have to calculate the
probability of both by counting all ways to draw two hands that are
both flushes. But if we put your cards back and then shuffle the deck
again, the events in this new case are independent, and we can just
square the Pr [flush] that we calculated before.
Suppose the Red Sox play the Yankees. What is the probability that
the final score is exactly 44? Amazingly, it appears that it is equal
to2
Pr [A B] = Pr [A] + Pr [B] Pr [A B] .
Pr [A] + Pr [B] Pr [A B]
= (Pr [A B] + Pr [A B]) + (Pr [A B] + Pr [A B]) Pr [A B]
= Pr [A B] + Pr [A B] + Pr [A B]
= Pr [A B] .
12.1.4.1 Examples
What is the probability of getting at least one head out of two indepen-
dent coin-flips? Compute Pr [H1 H2 ] = 1/2+1/2(1/2)(1/2) = 3/4.
What is the probability of getting at least one head out of two coin-
flips, when the coin-flips are not independent? Here again we can get
any probability from 0 to 1, because the probability of getting at least
one head is just 1 Pr [T1 T2 ].
(1)|S|+1 Pr
[ X \
Pr Ai = Aj . (12.1.1)
i=1 S{1...n},S6= jS
CHAPTER 12. PROBABILITY THEORY 217
For discrete probability, the proof is essentially the same as for Theo-
rem 11.2.2; the difference is that instead of showing that we add 1 for each
T
possible element of Ai , we show that we add the probability of each out-
T
come in Ai . The result continues to hold for more general spaces, but
requires a little more work.3
Pr [A B]
Pr [A | B] = .
Pr [B]
One way to think about this is that when we assert that B occurs we
are in effect replacing the entire probability space with just the part that
sits in B. So we have to divide all of our probabilities by Pr [B] in order to
make Pr [B | B] = 1, and we have to replace A with A B to exclude the
part of A that cant happen any more.
Note also that conditioning on B only makes sense if Pr [B] > 0. If
Pr [B] = 0, Pr [A | B] is undefined.
Pr [A B] = Pr [A | B] Pr [B] .
i
Y k+j1
Pr [Ai ] = .
j=1
k+j
Pr [A B]
Pr [B | A] =
Pr [A]
Pr [A | B] Pr [B]
=
Pr [A]
Pr [A | B] Pr [B]
= h i h i.
Pr [A | B] Pr [B] + Pr A B Pr B
and 0 otherwise). There are many conventions out there for writing
indicator variables. I am partial to 1A , but you may also see them
written using the Greek letter chi (e.g. A ) or by abusing the bracket
notation for events (e.g., [A], [Y 2 > 3], [all six coins come up heads]).
Counts of events: Flip a fair coin n times and let X be the number of
times it comes up heads. Then X is an integer-valued random variable.
Examples
Let X and Y be six-sided dice. Then Pr [X = x Y = y] = 1/36 for
all values of x and y in {1, 2, 3, 4, 5, 6}. The underlying probability
space consists of all pairs (x, y) in {1, 2, 3, 4, 6} {1, 2, 3, 4, 5, 6}.
12.2.3.1 Examples
Roll two six-sided dice, and let X and Y be the values of the dice. By
convention we assume that these values are independent. This means
for example that Pr [X {1, 2, 3} Y {1, 2, 3}] = Pr [X {1, 2, 3}]
Pr [Y {1, 2, 3}] = (1/2)(1/2) = 1/4, which is a slightly easier com-
putation than counting up the 9 cases (and then arguing that each
occurs with probability (1/6)2 , which requires knowing that X and Y
are independent).
Take the same X and Y , and let Z = X + Y . Now Z and X are not
independent, because Pr [X = 1 Z = 12] = 0, which is not equal to
Pr [X = 1] Pr [Z = 12] = (1/6)(1/36) = 1/216.
Place two radioactive sources on opposite sides of the Earth, and let
X and Y be the number of radioactive decay events in each source
during some 10 millisecond interval. Since the sources are 42 millisec-
onds away from each other at the speed of light, we can assert that
either X and Y are independent, or the world doesnt behave the way
the physicists think it does. This is an example of variables being
independent because they are physically independent.
Roll one six-sided die X, and let Y = dX/2e and Z = X mod 2. Then
Y and Z are independent, even though they are generated using the
same physical process.
CHAPTER 12. PROBABILITY THEORY 224
Example (discrete variable) Let X be the number rolled with a fair six-
sided die. Then E [X] = (1/6)(1 + 2 + 3 + 4 + 5 + 6) = 3 12 .
5
Technically, this will work for any values we can add and multiply by probabilities.
So if X is actually a vector in R3 (for example), we can talk about the expectation of X,
which in some sense will be the average position of the location given by X.
CHAPTER 12. PROBABILITY THEORY 225
= a E [X] + E [Y ] .
Linearity of expectation makes computing many expectations easy. Ex-
ample: Flip a fair coin n times, and let X be the number of heads. What
is E [X]? We can solve this problem by letting Xi be the indicator variable
for the event coin i came up heads. Then X = ni=1 Xi and E [X] =
P
calculate the same value from the distribution of X (this involves a lot of
binomial coefficients), but linearity of expectation is much easier.
For example: Roll two dice and take their product. What value do we
get on average? The product formula gives E [XY ] = E [X] E [Y ] = (7/2)2 =
(49/4) = 12 14 . We could also calculate this directly by summing over all 36
cases, but it would take a while.
Alternatively, roll one die and multiply it by itself. Now what value do
we get on average? Here we are no longer dealing with independent random
variables, so we have to do it the hard way: E X 2 = (12 + 22 + 32 + 42 +
This is exactly the same as ordinary expectation except that the proba-
bilities are now all conditioned on A.
CHAPTER 12. PROBABILITY THEORY 228
Examples
term on the right-hand side can only make it smaller. This gives:
Pr [X a E [X]] 1/a.
12.2.5.1 Example
Suppose that that all you know about the high tide height X is that E [X] =
1 meter and X 0. What can we say about the probability that X >
2 meters? Using Markovs inequality, we get Pr [X > 2 meters] = Pr [X > 2 E [X]] <
1/2.
This version doesnt get anywhere near as much use as the unconditioned
version, but it may be worth remembering that it exists.
h Let X be the
Example i value of a fair six-sided die. Then E [X] = 7/2, and
2
E (X E [X]) = 61 (1 7/2)2 + (2 7/2)2 + (3 7/2)2 + + (6 7/2)2 =
35/12.
Computing variance directly from the definition can be tedious. Often
2
it is easier to compute it from E X and E [X]:
h i
Var [X] = E (X E [X])2
h i
= E X 2 2X E [X] + (E [X])2
h i
= E X 2 2 E [X] E [X] + (E [X])2
h i
= E X 2 (E [X])2 .
The second-to-last step uses linearity of expectation and the fact that
E [X] is a constant.
Example For X being 0 or 1 with equal probability, we have E X 2 = 1/2
Example Flip a fair coin n times, and let X be the number of heads. What
is the probability that |X n/2| > r? Recall that Var [X] = n/4, so
Pr [|X n/2| > r] < (n/4)/r2 = n/(4r2 ). So, for example, the chances
of deviating from the average by more than 1000 after 1000000 coin-
flips is less than 1/4.
Note that the bound decreases as k grows and (for fixed p) does not
depend on n.
12.2.7.1 Sums
A very useful property of pgfs is that the pgf of a sum of independent ran-
dom variables is just the product of the pgfs of the individual random vari-
ables. The reason for this is essentially the same as for ordinary generating
functions: when we multiply together two terms (Pr [X = n] z n )(Pr [Y = m] z m ),
we get Pr [X = n Y = m] z n+m , and the sum over all the different ways of
decomposing n + m gives all the different ways to get this sum.
So, for example, the pgf of a binomial random variable equal to the sum
of n independent Bernoulli random variables is (q + pz)n (hence the name
binomial).
So
F 0 (1) =
X
n Pr [X = n]
n=0
= E [X] .
CHAPTER 12. PROBABILITY THEORY 238
F 0 (1) (F 0 (1))2 .
Example If X is a Bernoulli random variable with pgf F = (q + pz), then
F 0 = p and F 00 = 0, giving E [X] = F 0 (1) = p and Var [X] = F 00 (1) +
F 0 (1) (F 0 (1))2 = 0 + p p2 = p(1 p) = pq.
Example If X is a binomial random variable with pgf F = (q + pz)n ,
then F 0 = n(q + pz)n1 p and F 00 = n(n 1)(q + pz)n2 p2 , giving
E [X] = F 0 (1) = np and Var [X] = F 00 (1) + F 0 (1) (F 0 (1))2 = n(n
1)p2 + np n2 p2 = np np2 = npq. These values would, of course,
be a lot faster to compute using the formulas for sums of independent
random variables, but its nice to see that they work.
Example If X is a geometric random variable with pgf p/(1 qz), then
F 0 = pq/(1 qz)2 and F 00 = 2pq 2 /(1 qz)3 . So E [X] = F 0 (1) =
pq/(1 q)2 = pq/p2 = q/p, and Var [X] = F 00 (1) + F 0 (1) (F 0 (1))2 =
2pq 2 /(1 q)3 + q/p q 2 /p2 = 2q 2 /p2 + q/p q 2 /p2 = q 2 /p2 + q/p. The
variance would probably be a pain to calculate by hand.
Example Let X be a Poisson random variable with rate . We claimed
earlier that a Poisson random variable is the limit of a sequence of
binomial random variables where p = /n and n goes to infinity, so
(cheating quite a bit) we expect that Xs pgf F = limn ((1 /n) +
(/n)z)n = (1+(+z)/n)n = exp(+z) = exp() n z n /n!.
P
E [X Y ] = E [X + (Y )]
= E [X] + E [Y ]
= E [X] E [Y ] ,
and
Var [X Y ] = Var [X + (Y )]
= Var [X] + Var [Y ] + 2 Cov [X, Y ]
= Var [X] + Var [Y ] 2 Cov [X, Y ] .
12.2.9.1 Densities
If a real-valued random variable is continuous in the sense of having a
distribution function with no jumps (which means that it has probability 0 of
landing on any particular value), we may be able to describe its distribution
by giving a density instead. The density is the derivative of the distribution
function. We can also think of it as a probability at each point defined in the
limit, by taking smaller and smaller regions around the point and dividing
the probability of landing in the region by the size of the region.
For example, the density of a uniform [0, 1] random variable is f (x) = 1
for x in [0, 1], and f (x) = 0 otherwise. For a uniform [0, 2] random variable,
we get a density of 21 throughout the [0, 2] interval. The density always
integrates to 1.
Some distributions are easier to describe using densities than using distri-
bution functions. The normal distribution, which is of central importance
in statistics, has density
1 2
ex /2 .
2
Its distribution function is the integral of this quantity, which has no
closed-form expression.
Joint densities also exist. The joint density of a pair of random vari-
ables with joint distribution function F (x, y) is given by the partial deriva-
2
tive f (x, y) = xy F (x, y). The intuition here again is that we are approxi-
mating the (zero) probability at a point by taking the probability of a small
region around the point and dividing by the area of the region.
12.2.9.2 Independence
Independence is the same as for discrete random variables: Two random vari-
ables X and Y are independent if any pair of events of the form X A,
Y B are independent. For real-valued random variables it is enough to
show that their joint distribution F (x, y) is equal to the product of their in-
dividual distributions FX (x)FY (y). For real-valued random variables with
densities, showing the densities multiply also works. Both methods gener-
alize in the obvious way to sets of three or more random variables.
CHAPTER 12. PROBABILITY THEORY 242
12.2.9.3 Expectation
If a continuous random variable has a density f (x), the formula for its
expectation is Z
E [X] = xf (x) dx.
For example, let X be a uniform random variable in the range [a, b].
1
Then f (x) = ba when a x b and 0 otherwise, giving
Z b
1
E [X] = x dx
a ba
b
x2
=
2(b a) x=a
b2 a2
=
2(b a)
a+b
= .
2
For continuous random variables without densities, we land in a rather
swampy end of integration theory. We will not talk about this case if we can
help it. But in each case the expectation depends only on the distribution
of X and not on its relationship to other random variables.
Chapter 13
Linear algebra
243
CHAPTER 13. LINEAR ALGEBRA 244
y = h1, 2i
x + y = h4, 1i
h0, 0i
x = h3, 1i
1. Yargh! Start at the olde hollow tree on Dead Mans Isle, if ye dare.
1. h0, 0, 0i
2. + h10, 0, 0i
3. + h0, 5, 0i
4. + h20, 0, 0i
5. + h6, 6, 0i
6. + h0, 0, 8i
7. + h0, 0, 6i
which sums to h4, 1, 2i. So we can make our life easier by walking 4
paces south, 1 pace west, and digging only 2 paces down.
13.1.2 Scaling
Vectors may also be scaled by multiplying each of their coordinates by an
element of the base field, called a scalar. For example, if x = h4, 1, 2i
is the number of paces north, east, and down from the olde hollow tree to
the treasure in the previous example, we can scale x by 2 to get the number
of paces for Short-Legged Pete. This gives
2 h4, 1, 2i = h8, 2, 4i .
CHAPTER 13. LINEAR ALGEBRA 246
13.3 Matrices
Weve seen that a sequence a1 , a2 , . . . , an is really just a function from
some index set ({1 . . . n} in this case) to some codomain, where ai = a(i) for
each i. What if we have two index sets? Then we have a two-dimensional
structure:
A11 A12
A = A21 A22
A31 A32
where Aij = a(i, j), and the domain of the function is just the cross-product
of the two index sets. Such a structure is called a matrix. The values Aij
are called the elements or entries of the matrix. A sequence of elements
with the same first index is called a row of the matrix; similarly, a sequence
of elements with the same second index is called a column. The dimension
of the matrix specifies the number of rows and the number of columns: the
matrix above has dimension (3, 2), or, less formally, it is a 3 2 matrix.3 A
matrix is square if it has the same number of rows and columns.
Note: The convention in matrix indices is to count from 1 rather than
0. In programming language terms, matrices are written in FORTRAN.
3
The convention for both indices and dimension is that rows come before columns.
CHAPTER 13. LINEAR ALGEBRA 248
13.3.1 Interpretation
We can use a matrix any time we want to depict a function of two arguments
(over small finite sets if we want it to fit on one page). A typical example
(that predates the formal notion of a matrix by centuries) is a table of
distances between cities or towns, such as this example from 1807:4
Because distance matrices are symmetric (see below), usually only half
of the matrix is actually printed.
Another example would be a matrix of counts. Suppose we have a set
of destinations D and a set of origins O. For each pair (i, j) D O, let
Cij be the number of different ways to travel from j to i. For example, let
origin 1 be Bass Library, origin 2 be AKW, and let destinations 1, 2, and
3 be Bass, AKW, and SML. Then there is 1 way to travel between Bass
and AKW (walk), 1 way to travel from AKW to SML (walk), and 2 ways
to travel from Bass to SML (walk above-ground or below-ground). If we
assume that we are not allowed to stay put, there are 0 ways to go from
Bass to Bass or AKW to AKW, giving the matrix
0 1
C = 1 0
2 1
4
The original image is taken from http://www.hertfordshire-genealogy.co.uk/
data/books/books-3/book-0370-cooke-1807.htm. As an exact reproduction of a public
domain document, this image is not subject to copyright in the United States.
CHAPTER 13. LINEAR ALGEBRA 249
If a matrix is equal to its own transpose (i.e., if Aij = Aji for all i and
j), it is said to be symmetric. The transpose of an n m matrix is an
m n matrix, so only square matrices can be symmetric.
One special matrix I (for each dimension n n) has the property that
IA = A and BI = B for all matrices A and B with compatible dimension.
This matrix is known as the identity matrix, and is defined by the rule
Iii = 1 and Iij = 0 for i 6= j. It is not hard to see that in this case
P
(IA)ij = k Iik Akj = Iii Aij = Aij , giving IA = A; a similar computation
shows that BI = B. With a little more effort (omitted here) we can show
that I is the unique matrix with this identity property.
all the entries above the diagonal. The only way this can fail is if we hit
some Aii = 0, which we can swap with a nonzero Aji if one exists (using
a type (c) operation). If all the rows from i on down have a zero in the i
column, then the original matrix A is not invertible. This entire process is
known as Gauss-Jordan elimination.
This procedure can be used to solve matrix equations: if AX = B, and
we know A and B, we can compute X by first computing A1 and then
multiplying X = A1 AX = A1 B. If we are not interested in A1 for
its own sake, we can simplify things by substituting B for I during the
Gauss-Jordan elimination procedure; at the end, it will be transformed to
X.
P P P P
k Aik (BC)kj = k m Aik Bkm Cmj . Then compute ((AB)C)ij = m (AB)im Cmj =
P P P P
m k Aik Bkm Cmj = k m Aik Bkm Cmj = (A(BC))ij .
So despite the limitations due to non-compatibility and non-commutativity,
we still have:
Transposes (A + B)> = A> + B > (easy), (AB)> = B > A> (a little trick-
> >
ier). (A1 ) = (A> )1 , provided A1 exists (proof: A> (A1 ) =
>
(A1 A) = I > = I).
(A + B)2 = (A + B)(A + B) = A2 + AB + BA + B 2 .
S = I + AS
IS AS = I
CHAPTER 13. LINEAR ALGEBRA 255
(I A)S = I
and finally multiplying both sides from the left by (I A)1 to get
S = (I A)1 ,
assuming I A is invertible.
13.4.1 Length
The lengthpof a vector x, usually written as kxk or sometimes just |x|, is
P
defined as x ; the definition follows from the Pythagorean theorem:
2 P 2 i i
kxk = xi . Because the coordinates are squared, all vectors have non-
negative length, and only the zero vector has length 0.
Length interacts with scalar multiplication exactly as you would expect:
kcxk = ckxk. The length of the sum of two vectors depends on how the are
CHAPTER 13. LINEAR ALGEBRA 256
aligned with each other, but the triangle inequality kx + yk kxk + kyk
always holds.
A special class of vectors are the unit vectors, those vectors x for
which kxk = 1. In geometric terms, these correspond to all the points on
the surface of a radius-1 sphere centered at the origin. Any vector x can be
turned into a unit vector x/kxk by dividing by its ilength. In two dimensions,
h >
the unit vectors are all of the form cos sin , where by convention
is the angle from due east measured counterclockwise;
h this is why travel-
i>
ing 9 units northwest corresponds to the vector 9 cos 135 sin 135 =
h i> h i
9/ 2 9/ 2 . In one dimension, the unit vectors are 1 . (There are
no unit vectors in zero dimensions: the unique zero-dimensional vector has
length 0.)
13.5.1 Bases
If a set of vectors is both (a) linearly independent, and (b) spans the en-
tire vector space, then we call that set of vectors a basis of the vector
space. An example of a basis is the standard basis consisting of the vectors
[10 . . . 00]> , [01 . . . 00]> , . . . , [00 . . . 10]> , [00 . . . 01]> . This has the additional
nice property of being made of of vectors that are all orthogonal to each
other (making it an orthogonal basis) and of unit length (making it a
normal basis).
A basis that is both orthogonal and normal is called orthonormal.
We like orthonormal bases because we can recover the coefficients of some
ai xi , then v xj =
P
arbitrary vector v by taking dot-products. If v =
ai (xi xj ) = ai , since orthogonality means that xi xj = 0 when i 6= j,
P
Theorem 13.5.1. If {xi } is a basis for some vector space V , then every
vector y has a unique representation y = a1 x1 + a2 x2 + + an xn .
Proof. Suppose there is some y with more than one representation, i.e., there
are sequences of coefficients ai and bi such that y = a1 x1 +a2 x2 + +an xn =
b1 x1 + b2 x2 + + bn xn . Then 0 = y y = a1 x1 + a2 x2 + + an xn
b1 x1 + b2 x2 + + bn xn = (a1 b1 )x1 + (a2 b2 )x2 + + (an bn )xn . But
since the xi are independent, the only way a linear combination of the xi
can equal 0 is if all coefficients are 0, i.e., if ai = bi for all i.
Even better, we can do all of our usual vector space arithmetic in terms
P P
of the coefficients ai . For example, if a = ai xi and b = bi xi , then it can
P P
easily be verified that a + b = (ai + bi )xi and ca = (cai )xi .
However, it may be the case that the same vector will have different
representations in different bases. For example, in R2 , we could have a basis
B1 = {(1, 0), (0, 1)} and a basis B2 = {(1, 0), (1, 2)}. Because B1 is the
standard basis, the vector (2, 3) is represented as just (2, 3) using basis B1 ,
but it is represented as (5/2, 3/2) in basis B2 .
CHAPTER 13. LINEAR ALGEBRA 259
Both bases above have the same size. This is not an accident; if a vector
space has a finite basis, then all bases have the same size. Well state this
as a theorem, too:
Theorem 13.5.2. Let x1 . . . xn and y1 . . . ym be two finite bases of the same
vector space V . Then n = m.
Proof. Assume without loss of generality that n m. We will show how
to replace elements of the xi basis with elements of the yi basis to produce
a new basis consisting only of y1 . . . yn . Start by considering the sequence
y1 , x1 . . . xn . This sequence is not independent since y1 can be expressed as
a linear combination of the xi (theyre a basis). So from Theorem 1 there
is some xi that can be expressed as a linear combination of y1 , x1 . . . xi1 .
Swap this xi out to get a new sequence y1 , x1 . . . xi1 , xi+1 , . . . xn . This new
sequence is also a basis, because (a) any z can be expressed as a linear
combination of these vectors by substituting the expansion of xi into the
expansion of z in the original basis, and (b) its independent, because if
there is some nonzero linear combination that produces 0 we can substi-
tute the expansion of xi to get a nonzero linear combination of the original
basis that produces 0 as well. Now continue by constructing the sequence
y2 , y1 , x1 . . . xi1 , xi+1 , . . . xn , and arguing that some xi0 in this sequence
must be expressible as a combination of earlier terms by Theorem 13.5.1
(it cant be y1 because then y2 , y1 is not independent), and drop this xi0 .
By repeating this process we can eventually eliminate all the xi , leaving the
basis yn , . . . , y1 . But then any yk for k > n would be a linear combination
of this basis, so we must have m = n.
The size of any basis of a vector space is called the dimension of the
space.
Proof. Well use the following trick for extracting entries of a matrix by
multiplication. Let M be an n m matrix, and let ei be a column vector
>
with eij = 1 if i = j and 0 otherwise.7 Now observe that (ei ) M ej =
P i j j P j
k ek (M e )k = (M e )i = k Mik ek = Mij . So given a particular linear f ,
>
we will now define M by the rule Mij = (ei ) f (ej ). It is not hard to see
that this gives f (ej ) = M ej for each basis vector j, since multiplying by
>
(ei ) grabs the i-th coordinate in each case. To show that M x = f (x) for
all x, decompose each x as k ck ek . Now compute f (x) = f ( k ck ek ) =
P P
P k P k P k
k ck f (e ) = k ck M (e ) = M ( k ck e ) = M x.
13.6.1 Composition
What happens if we compose two linear transformations? We multiply the
corresponding matrices:
(g f )(x) = g(f (x)) = g(Mf x) = Mg (Mf x) = (Mg Mf )x.
This gives us another reason why the dimensions have to be compatible
to take a matrix product: If multiplying by an n m matrix A gives a map
g : Rm Rn , and multiplying by a k l matrix B gives a map f : Rl Rk ,
then the composition g f corresponding to AB only works if m = k.
The set {M x} for all x is thus equal to the span of the columns of M ;
it is called the column space of M .
For yM , where y is a row vector, similar properties hold: we can think
of yM either as a row vector of dot-products of y with columns of M or as
a weighted sum of rows of M ; the proof follows immediately from the above
facts about a product of a matrix and a column vector and the fact that
yM = (M > y > )> . The span of the rows of M is called the row space of M ,
and equals the set {yM } of all results of multiplying a row vector by M .
x x
Note that in all of these transformations, the origin stays in the same
place. If you want to move an image, you need to add a vector to ev-
erything. This gives an affine transformation, which is any transforma-
tion that can be written as f (x) = Ax + b for some matrix A and column
vector b. One nifty thing about affine transformations is thatlike linear
transformationsthey compose to produce new transformations of the same
kind: A(Cx + d) + b = (AC)x + (Ad + b).
Many two-dimensional linear transformations have standard names. The
simplest transformation is scaling, where each axis is scaled by a constant,
but the overall orientation of the image is preserved. In the picture above,
the top right image is scaled by the same constant in both directions and
the second-from-the-bottom image is scaled differently in each direction.
Recall that the product M x corresponds to taking a weighted sum of
the columns of M , with the weights supplied by the coordinates of x. So in
CHAPTER 13. LINEAR ALGEBRA 263
Here the x vector is preserved: (1, 0) maps to the first column (1, 0), but
the y vector is given a new component in the x direction of c, corresponding
to the shear. If we also flipped or scaled the image at the same time that
we sheared it, we could represent this by putting values other than 1 on the
diagonal.
For a rotation, we will need some trigonometric functions to compute the
new coordinates of the axes as a function of the angle we rotate the image by.
The convention is that we rotate counterclockwise: so in the figure above,
the rotated image is rotated counterclockwise approximately 315 or 45 .
If is the angle of rotation, the rotation matrix is given by
" #
cos sin
.
sin cos
2. Suppose n < m. Pick any basis ei for Rn , and observe that f (ei )
ai ei to get
P
spans range(f ) (since we can always decompose x as
f (x) =
P i
ai f (e )). So the dimension of range(f ) is at most n. If
n < m, then range(f ) is a proper subset of Rm (otherwise it would
be m-dimensional). This implies f is not surjective and thus has no
inverse. Alternatively, if m < n, use the same argument to show that
any claimed f 1 isnt. By the same argument, if either f or f 1 does
not have full rank, its not surjective.
f , since f ( ai xi ) = ai f (xi ) = ai ei .
P P P
13.6.5 Projections
Suppose we are given a low-dimensional subspace of some high-dimensional
space (e.g. a line (dimension 1) passing through a plane (dimension 2)), and
we want to find the closest point in the subspace to a given point in the
full space. The process of doing this is called projection, and essentially
consists of finding some point z such that (x z) is orthogonal to any vector
in the subspace.
Lets look at the case of projecting onto a line first, then consider the
more general case.
CHAPTER 13. LINEAR ALGEBRA 265
A line consists of all points that are scalar multiples of some fixed vector
b. Given any other vector x, we want to extract all of the parts of x that lie
in the direction of b and throw everything else away. In particular, we want
to find a vector y = cb for some scalar c, such that (x y) b = 0. This is is
enough information to solve for c.
We have (x cb) b = 0, so x b = c(b b) or c = (x b)/(b b). So the
projection of x onto the subspace {cb | c R} is given by y = b(x b)/(b b)
or y = b(x b)/kbk2 . If b is normal (i.e. if kbk = 1), then we can leave out
the denominator; this is one reason we like orthonormal bases so much.
Why is this the right choice to minimize distance? Suppose we pick some
other vector db instead. Then the points x, cb, and db form a right triangle
with the right angle at cb, and the distance from x to db is kx dbk =
q
kx cbk2 + kcb dbk2 kx cbk.
But now what happens if we want to project onto a larger subspace?
For example, suppose we have a point x in three dimensions and we want to
project it onto some plane of the form {c1 b1 + c2 b2 }, where b1 and b2 span
the plane. Here the natural thing to try is to send x to y = b1 (x b1 )/kb1 k2 +
b2 (x b2 )/kb2 k2 . We then want to argue that the vector (x y) is orthogonal
to any vector of the form c1 b1 + c2 b2 . As before, (x y) is orthogonal to any
vector in the plane, its orthogonal to the difference between the y we picked
and some other z we didnt pick, so the right-triangle argument again shows
it gives the shortest distance.
Does this work? Lets calculate: (x y) (c1 b1 + c2 b2 ) = x (c1 b1 + c2 b2 )
(b1 (x b1 )/kb1 k2 + b2 (x b2 )/kb2 k2 ) (c1 b1 + c2 b2 ) = c1 (x b1 (b1 b1 )(x
b1 )/(b1 b1 )) + c2 (x b2 (b2 b2 )(x b2 )/(b2 b2 )) c1 (b1 b2 )(x b1 )/(b1 b1 )
c2 (b1 b2 )(x b2 )/(b2 b2 ).
The first two terms cancel out very nicely, just as in the one-dimensional
case, but then we are left with a nasty (b1 b2 )(much horrible junk) term at
the end. It didnt work!
So what do we do? We could repeat our method for the one-dimensional
case and solve for c1 and c2 directly. This is probably a pain in the neck. Or
we can observe that the horrible extra term includes a (b1 b2 ) factor, and if
b1 and b2 are orthogonal, it disappears. The moral: We can project onto a
CHAPTER 13. LINEAR ALGEBRA 266
Finite fields
268
CHAPTER 14. FINITE FIELDS 269
means that:
1. Addition is associative: (x + y) + z = x + (y + z) for all x, y, z in F .
2. There is an additive identity 0 such that 0 + x = x + 0 = x for all
x in F .
3. Every x in F has an additive inverse x such that x + (x) =
(x) + x = 0.
4. Addition is commutative: x + y = y + x for all x, y in F .
5. Multiplication distributes over addition: x (y + z) = (x y + x z)
and (y + z) x = (y x + z x) for all x, y, z in F .
6. Multiplication is associative: (x y) z = x (y z) for all x, y, z in F .
7. There is a multiplicative identity 1 such that 1 x = x 1 = x for
all x in F .
8. Multiplication is commutative: x y = y x for all x, y in F .
9. Every x in F \ {0} has a multiplicative inverse x1 such that x
x1 = x1 x = 1.
Some structures fail to satisfy all of these axioms but are still interesting
enough to be given names. A structure that satisfies 13 is called a group;
14 is an abelian group or commutative group; 17 is a ring; 18 is a
commutative ring. In the case of groups and abelian groups there is only
one operation +. There are also more exotic names for structures satisfying
other subsets of the axioms.3
Some examples of fields: R, Q, C, Zp where p is prime. We will be par-
ticularly interested in Zp , since we are looking for finite fields that can fit
inside a computer.
The integers Z are an example of a commutative ring, as is Zm for
m > 1. Square matrices of fixed dimension greater than 1 are an example
of a non-commutative ring.
3
A set with one operation that does not necessarily satisfy any axioms is a magma.
If the operation is associative, its a semigroup, and if there is also an identity (but not
necessarily inverses), its a monoid. For example, the set of nonempty strings with +
interpreted as concatenation form a semigroup, and throwing in the empty string as well
gives a monoid.
Weaker versions of rings knock out the multiplicative identity (a pseudo-ring or rng)
or negation (a semiring or rig). An example of a semiring that is actually useful is the
(max, +) semiring, which uses max for addition and + (which distributes over max) for
multiplication; this turns out to be handy for representing scheduling problems.
CHAPTER 14. FINITE FIELDS 271
0 1 x x+1
0 0 0 0 0
1 0 1 x x+1
x 0 x x+1 1
x+1 0 x+1 1 x
We can see that every nonzero element has an inverse by looking for ones
in the table; e.g. 11 = 1 means 1 is its own inverse and x(x+1) = x2 +x = 1
means that x and x + 1 are inverses of each other.
Heres the same thing for Z2 [x]/(x3 + x + 1):
0 1 x x+1 x2 x2 + 1 x2 + x x2 + x + 1
0 0 0 0 0 0 0 0 0
1 0 1 x x+1 x2 x2 + 1 x2 + x x2 + x + 1
x 0 x x2 x2 + x x+1 1 x2 + x + 1 x2 + 1
x+1 0 x+1 2
x +x 2
x +1 2
x +x+1 x 2 1 x
x 2 0 x 2 x+1 2
x +x+1 2
x +x x 2
x +1 1
x2 + 1 0 x2 + 1 1 x2 x x2 + x + 1 x+1 x2 + x
x2 + x 0 x2 + x x2 + x + 1 1 x2 + 1 x+1 x x2
x2 + x + 1 0 x2 + x + 1 x2 + 1 x 1 x2 + x x2 x+1
14.5 Applications
So what are these things good for?
On the one hand, given an irreducible polynomial p(x) of degree n over
Z2 (x), its easy to implement arithmetic in Z2 [x]/p(x) (and thus GF (2n ))
using standard-issue binary integers. The trick is to represent each poly-
ai xi by the integer value a = ai 2i , so that each coefficient ai
P P
nomial
is just the i-th bit of a. Adding two polynomials a + b represented in this
way corresponds to computing the bitwise exclusive or of a and b: a^b in
programming languages that inherit their arithmetic syntax from C (i.e., al-
most everything except Scheme). Multiplying polynomials is more involved,
although its easy for some special cases like multiplying by x, which be-
comes a left-shift (a<<1) followed by XORing with the representation of our
modulus if we get a 1 in the n-th place. (The general case is like this but
involves doing XORs of a lot of left-shifted values, depending on the bits in
the polynomial we are multiplying by.)
On the other hand, knowing that we can multiply 7 x2 + x + 1 by
5 x2 + 1 and get 6 x2 + x quickly using C bit operations doesnt help us
much if this product doesnt mean anything. For modular arithmetic (8.4),
we at least have the consolation that 7 5 = 6 (mod 29) tells us something
about remainders. In GF (23 ), what this means is much more mysterious.
This makes it usefulnot in contexts where we want multiplication to make
sensebut in contexts where we dont. These mostly come up in random
number generation and cryptography.
might do:
or
14.5.2 Checksums
Shifting an LFSR corresponds to multiplying by x. If we also add 1 from
time to time, we can build any polynomial we like, and get the remainder
mod m; for example, to compute the remainder of 100101 mod 11001 we do
10010 (shift in 0)
1011 (XOR with 11001)
10111 (shift in 1)
1110 (XOR with 11001)
14.5.3 Cryptography
GF (2n ) can also substitute for Zp in some cryptographic protocols. An
example would be the function f (s) = xs (mod m), which is fairly easy to
compute in Zp and even easier to compute in GF (2n ), but which seems to
be hard to invert in both cases. Here we can take advantage of the fast
remainder operation provided by LFSRs to avoid having to do expensive
division in Z.
Appendix A
Sample assignments
These are sample assignments from the Fall 2013 version of CPSC 202.
1. Your name.
(You will not be graded on the bureaucratic part, but you should do it
anyway.)
A.1.1 Tautologies
Show that each of the following propositions is a tautology using a truth
table, following the examples in 2.2.2. Each of your truth tables should
include columns for all sub-expressions of the proposition.
1. (P P ) P .
2. P (Q (P Q)).
277
APPENDIX A. SAMPLE ASSIGNMENTS 278
3. (P Q) (Q (P (Q R))).
Solution
For each solution, we give the required truth-table solution first, and then
attempt to give some intuition for why it works. The intuition is merely an
explanation of what is going on and is not required for your solutions.
This is a little less intuitive than the first case. A reasonable story
might be that the proposition is true if P is true, so for it to be false,
P must be false. But then (P Q) reduces to Q, and Q Q is
true.
3. (P Q) (Q (P (Q R))).
P Q R P Q QR P (Q R) Q (P (Q R)) (P Q) (Q (P (Q R)))
0 0 0 0 1 0 0 1
0 0 1 0 1 0 0 1
0 1 0 1 0 1 1 1
0 1 1 1 1 0 1 1
1 0 0 1 1 1 1 1
1 0 1 1 1 1 1 1
1 1 0 1 0 0 1 1
1 1 1 1 1 1 1 1
I have no intuition whatsoever for why this is true. In fact, all three
of these tautologies were plucked from long lists of machine-generated
tautologies, and three variables is enough to start getting tautologies
that dont have good stories.
APPENDIX A. SAMPLE ASSIGNMENTS 279
Its possible that one could prove this more succinctly by arguing by
cases that if Q is true, both sides of the biconditional are true, and if
Q is not true, then Q R is always true so P (Q R) becomes
just P , making both sides equal. But sometimes it is more direct (and
possibly less error-prone) just to shut up and calculate.
1. (P Q).
2. ((P Q) (P Q)).
Solution
1.
(P Q) (P Q)
P Q
P Q.
2.
((P Q) (P Q))
(P Q) (P Q)
(P Q) (P Q)
(P Q) (P Q)
(P Q) (Q P )
(P Q) (Q P )
P Q.
was successful as a leader), as well as all the usual tools of predicate logic
, , =, and so forth, and can refer to specific leaders by name.
Express each of the following statements in mathematical form. Note
that these statements are not connected, and no guarantees are made about
whether any of them are actually true.
Solution
1. The easiest way to write this is probably x : taller(Lincoln, x). There
is a possible issue here, since this version says that nobody is taller than
Lincoln, but it may be that somebody is the same height.1 A stronger
claim is x : (x 6= Lincoln) taller(Lincoln, x). Both solutions (and
their various logical equivalents) are acceptable.
Solution
1. Disproof: Consider R = T = {1}, S = . Then R is not a subset of S,
but
1. A (B C) = (A B) (A C).
2. A (B C) = (A B) (A C).
Solution
1. Let (a, x) A (B C). Then a A and x B C. If x B, then
(a, x) A B; alternatively, if x C, then (a, x) A C. In either
case, (a, x) (A B) (A C).
APPENDIX A. SAMPLE ASSIGNMENTS 282
A.2.3 Exponents
Let A be a set with |A| = n > 0. What is the size of each of the following
sets of functions? Justify your answers.
1. A .
2. A .
3. .
Solution
1. A = 1. Proof: There is exactly one function from to A (the empty
function).
2. A = 0. Proof: There is no function from A to , because A contains
the only function with as a domain. Note that this doesnt contradict
the A result, because there is no x that we fail to send anywhere.
APPENDIX A. SAMPLE ASSIGNMENTS 283
Solution
Disproof: Suppose S 6= S 0 but T = T 0 ; this can occur, for example, if
S = {a, b}, T = {z}, f (a) = f (b) = z, and S 0 = {a}. In this case, T 0 =
T = {z}, giving T \ T 0 = . But S \ S 0 = {b} 6= , and since there are no
functions from a nonempty set to the empty set, there cant be a surjection
g : S \ S0 T \ T 0.
A C B C.
Clarification added 2013-09-25 Its probably best not to try using the
statement |S| |T | if and only if S T in your proof. While this is one
way to define for arbitrary cardinals, the odds are that your next step is
to assert |A| + |C| |B| + |C|, and while we know that this works when
A, B, and C are all finite (Axiom 4.2.4), that it works for arbitrary sets is
what we are asking you to prove.
Solution
Well construct an explicit injection g : A C B C. For each x in A C,
let
(
f (x) if x A, and
g(x) =
x if x C.
Solution
Apply scaling invariance (Axiom 4.2.5) to 0 a and a b to get a a a b.
Now apply scaling again to 0 b and a b to get a b b b. Finally,
apply transitivity (Axiom 4.2.3) to combine a a a b and a b b b to
get a a b b.
APPENDIX A. SAMPLE ASSIGNMENTS 285
f (0) = 2,
f (n + 1) = f (n) f (n) 1.
Solution
The proof is by induction on n, but we have to be a little careful for small
values. Well treat n = 0 and n = 1 as special cases, and start the induction
at 2.
For n = 0, we have f (0) = 2 > 1 = 20 .
For n = 1, we have f (1) = f (0) f (0) 1 = 2 2 1 = 3 > 2 = 21 .
For n = 2, we have f (2) = f (1) f (1) 1 = 3 3 1 = 8 > 4 = 22 .
For the induction step, we want to show that, for all n 2, if f (n) > 2n ,
then f (n + 1) = f (n) f (n) 1 > 2n+1 . Compute
f (n + 1) = f (n) f (n) 1
> 2n 2n 1
= 2n 4 1
= 2n+1 + 2n+1 1
> 2n+1 .
The principle of induction gives us that f (n) > 2n for all n 2, and
weve already covered n = 0 and n = 1 as special cases, so f (n) > 2n for all
n N.
A0 = {3, 4, 5} ,
X
An+1 = An x.
xAn
P
Give a closed-form expression for Sn = xAn x. Justify your answer.
APPENDIX A. SAMPLE ASSIGNMENTS 286
Solution
Looking at the first couple of values, we see:
S0 = 3 + 4 + 5 = 12
S1 = 3 + 4 + 5 + 12 = 24
S2 = 3 + 4 + 5 + 12 + 24 = 48
Its pretty clear that the sum is doubling at each step. This suggests a
reasonable guess would be xAn x = 12 2n , which weve shown works for
P
n = 0.
For the induction step, we need to show that when constructing An+1 =
An {Sn }, we are in fact doubling the sum. There is a tiny trick here in
that we have to be careful that Sn isnt already an element of An .
Proof. First, well show by induction that |An | > 1 and that every element
of An is positive.
For the first part, |A0 | = 3 > 1, and by construction An+1 An . It
follows that An A0 for all n, and so |An | |A0 | > 1 for all n.
For the second part, every element of A0 is positive, and if every element
P
of An is positive, then so is Sn = xAn x. Since each element x of An+1 is
either an element of An or equal to Sn , it must be positive as well.
Now suppose Sn An . Then Sn = Sn + xAn \{Sn } x, but the sum is
P
that Sn = 12 2n ,
X
Sn+1 = x
xAn+1
X
= x + Sn
xAn
= Sn + Sn
= 12 2n + 12 2n
= 12 (2n + 2n )
= 12 2n+1 .
This completes the induction argument and the proof.
Solution
First lets figure out what n0 has to be.
We have
(2 0)!! = 1 (0!)2 = 1 1 = 1
(2 1)!! = 2 (1!)2 = 1 1 = 1
(2 2)!! = 4 2 = 8 (2!)2 = 2 2 = 4
(2 3)!! = 6 4 2 = 48 (3!)2 = 6 6 = 36
(2 4)!! = 8 6 4 2 = 384 (4!)2 = 24 24 = 576
APPENDIX A. SAMPLE ASSIGNMENTS 288
Solution
1. Proof: Recall that f (n) is O(n) if there exist constants c > 0 and N ,
such that |f (n)| c |n| for n N . Let c = 1 and N = 1. For any
n 1, either (a) f (n) = 1 1 n, or (b) f (n) = n 1 n. So the
definition is satisfied and f (n) is O(n).
2. Disproof: To show that f (n) is not (n), we need to show that for any
choice of c > 0 and N , there exists some n N with |f (n)| < c |n|.
Fix c and N . Let n be the smallest odd number greater than max(1/c, N )
(such a number exists by the well-ordering principle). Then n N ,
and since n is odd, we have f (n) = 1. But c n > c max(1/c, N )
c (1/c) = 1. So c n > f (n), concluding the disproof.
Solution
Proof: Write r for the right-hand side. Observe that
Similarly
Since a|r and gcd(b, c)|r, from the definition of lcm we get lcm(a, gcd(b, c))|r.
APPENDIX A. SAMPLE ASSIGNMENTS 290
Solution
1. Proof: Let g = gcd(a, b). Then g|a and g|b, so g|(b a) as well. So
g is a common divisor of b a and b. To show that it is the greatest
common divisor, let h|b and h|(b a). Then h|a since a = b + (b a).
It follows that h| gcd(a, b), which is g.
2. Disproof: Let a = 2 and b = 5. Then lcm(2, 5) = 10 but lcm(52, 5) =
lcm(3, 5) = 15 6= 10.
then
n
! = 0 (mod n).
2
Solution
Let n be composite. Then there exist natural numbers a, b 2 such that
n = ab. Assume without loss of generality that a b.
For convenience, let k = bn/2c. Since b = n/a and a 2, b n/2; but b
is an integer, so b n/2 implies b bn/2c = k. It follows that both a and
b are at most k.
We now consider two cases:
1. If a 6= b, then both a and b appear as factors in k!. So k! = ab
Q
1ik,i6{a,b} i,
giving ab|k!, which means n|k! and k! = 0 (mod n).
2. If a = b, then n = a2 . Since n > 9, we have a > 3, which means a 4
since a is a natural number. It follows that n 4a and k 2a. So
a and 2a both appear in the product expansion of k!, giving k! mod
2a2 = 0. But then k! mod n = k! mod a2 = (k! mod 2a2 ) mod a2 = 0.
APPENDIX A. SAMPLE ASSIGNMENTS 291
n mod a = 1
n mod b = 0
for every b in B.
Solution
Q Q
Proof: Let m1 = aA a and m2 = bB b. Because A and B are disjoint,
m1 and m2 have no common prime factors, and gcd(m1 , m2 ) = 1. So by the
Chinese Remainder Theorem, there exists some n with 0 n < m1 m2 such
that
n mod m1 = 1
n mod m2 = 0
Solution
Proof: The direct approach is to show that T is reflexive, symmetric, and
transitive:
1. Reflexive: For any x, xRx and xSx, so xT x.
3. Transitive: Let xT y and yT z. Then xRy and yRz implies xRz, and
similarly xSy and ySz implies xSz. So xRz and xSz, giving xT z.
Alternative proof: Its also possible to show this using one of the alter-
native characterizations of an equivalence relation from Theorem 9.4.1.
Since R and S are equivalence relations, there exist sets B and C and
functions f : A B and g : A C such that xRy if and only if f (x) = f (y)
and xSy if and only if g(x) = g(y). Now consider the function h : A
B C defined by h(x) = (f (x), g(x)). Then h(x) = h(y) if and only if
(f (x), g(x)) = (f (y), g(y)), which holds if and only if f (x) = f (y) and
g(x) = g(y). But this last condition holds if and only if xRy and xSy, the
definition of xT y. So we have h(x) = h(y) if and only if xT y, and T is an
equivalence relation.
then
Solution
Let S, T , f be such that f (x y) = f (x) f (y) for all x, y S.
Now suppose that we are given some x, y S with x y.
Recall that x y is the minimum z greater than or equal to both x
and y; so when x y, y x and y y, and for any z with z x
and z y, z y, and y = x y. From the assumption on f we have
f (y) = f (x y) = f (x) f (y).
Now use the fact that f (x) f (y) is less than or equal to both f (x) and
f (y) to get f (y) = f (x) f (y) f (x).
APPENDIX A. SAMPLE ASSIGNMENTS 293
Solution
Denote the vertices of K2 by ` and r.
If G is bipartite, let L, R be a partition of V such that every edge has
one endpoint in L and one in R, and let f (x) = ` if x is in L and f (x) = r
if x is in R.
Then if uv E, either u L and v R or vice versa; In either case,
f (u)f (v) = `r K2 .
Conversely, suppose f : V {`, r} is a homomorphism. Define L =
f 1 (`) and R = f 1 (r); then L, R partition V . Furthermore, for any edge
uv E, because f (u)f (v) must be the unique edge `r, either f (u) = ` and
f (v) = r or vice versa. In either case, one of u, v is in L and the other is in
R, so G is bipartite.
Solution
The rule is that Sm,k is connected if and only if gcd(m, k) = 1.
To show that this is the case, consider the connected component that
contains 0; in other words, the set of all nodes v for which there is a path
from 0 to v.
APPENDIX A. SAMPLE ASSIGNMENTS 294
0 0 0
7 1
4 1 4 1
6 2
5 3
3 2 3 2
4
S5,1 S5,2 S8,2
Proof. To show that a exists when a path exists, well do induction on the
length of the path. If the path has length 0, then v = 0 = 0 k (mod m).
If the path has length n > 0, let u be the last vertex on the path before v.
By the induction hypothesis, u = bk (mod m) for some b. There is an edge
from u to v if and only if v = u k (mod m). So v = bk k = (bk 1)
(mod m).
Conversely, if there is some a such that v = ak (mod m), then there is
a path 0, k, . . . , ak from 0 to v in Sm,k .
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
3 3 3 3 3 3 3 3 3 3
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5
Solution
First lets count how many two-path graphs we get when one path has size
k and the other n k; to avoid duplication, well insist k n k.
Having fixed k, we can specify a pair of paths by giving a permutation
v1 . . . vn of the vertices; the first path consists of v1 . . . vk , while the second
consists of vk+1 . . . vn . This might appear to give us n! pairs of paths for
each fixed k. However, this may overcount the actual number of paths:
If k > 1, then we count the same path twice: once as v1 . . . vk , and
once as vk . . . v1 . So we have to divide by 2 to compensate for this.
The same thing happens when n k > 1; in this case, we also have to
divide by 2.
Finally, if k = n k, then we count the same pair of paths twice,
since v1 . . . vk , vk+1 . . . vn gives the same graph as vk+1 . . . vn , v1 . . . vk .
So here we must again divide by 2.
For odd graphs, the last case doesnt come up. So we get n!/2 graphs
when k = 1 and n!/4 graphs for each larger value of k. For even graphs,
we get n!/2 graphs when k = 1, n!/4 graphs when 1 < k < n/2, and n!/8
graphs when k = n/2. Adding up the cases gives a total of
1 1 n1 n+1
n! + 1 = n!
2 4 2 8
APPENDIX A. SAMPLE ASSIGNMENTS 297
So we get the same expression in each case. We can simplify this further
to get
(n + 1)!
(A.8.1)
8
two-path graphs on n 3 vertices.
The simplicity of (A.8.1) suggests that there ought to be a combina-
torial proof of this result, where we take a two-path graph and three bits
of additional information and bijectively construct a permutation of n + 1
values.
Here is one such construction, which maps the set of all two-path graphs
with vertices in [n] plus three bits to the set of all permutations on [n + 1].
The basic idea is to paste the two paths together in some order with n
between them, with some special handling of one-element paths to cover
permutations that put n at one end of the other. Miraculously, this special
handling exactly compensates for the fact that one-element paths have no
sense of direction.
1. For any two-path graph, we can order the two components on which
contains 0 and which doesnt. Similarly, we can order each path by
starting with its smaller endpoint.
shows that the construction is also injective. So we have that the number of
permutations on n + 1 values is 23 = 8 times the number of two-path graphs
on n vertices, giving (n + 1)!/8 two-path graphs as claimed.
(For example, if our components are 0, 1, and 2, 3, 4, and the bits are
101, the resulting permuation is 4, 3, 2, 5, 0, 1. If the components are instead
3 and 2, 0, 4, 1, and the bits are 011, then we get 5, 3, 1, 4, 0, 2. In either
case we can recover the original two-path graph by deleting 5 and splitting
according to the rule.)
Both of these proofs are pretty tricky. The brute-force counting approach
may be less prone to error, and the combinatorial proof probably wouldnt
occur to anybody who hadnt already seen the answer.
Solution
Well take the hint, and let E(n) be the number of team assignments that
make k even and U (n) being the number that make k uneven, or odd. Then
APPENDIX A. SAMPLE ASSIGNMENTS 299
we can compute
! !
X n k X n k
E(n) U (n) = 2 2
0kn
k 0kn
k
k even k odd
n
!
X n k
= (1)k 2
k=0
k
n
!
X n
= (2)k
k=0
k
= (1 + (2))n
= (1)n .
Pn n n
We also have E(n) + U (n) = k=0 k 2 = (1 + 2)n = 3n . Solving for
E(n) gives
3n + (1)n
E(n) = . (A.8.2)
2
To make sure that we didnt make any mistakes, it may be helpful to
check a few small cases. For n = 0, we have one even split (nobody on
either team), and (30 + (1)0 )/2 = 2/2 = 1. For n = 1, we have the
same even split, and (31 + (1)1 )/2 = (3 1)/2 = 1. For n = 2, we get
a five even splits ((, ), ({x} , {y}), ({y} , {x}), ({x, y} , ), (, {x, y})), and
(32 + (1)2 )/2 = (9 + 1)/2 = 5. This is not a proof that (A.8.2) will keep
working forever, but it does suggest that we didnt screw up in some obvious
way.
Solution
Let S be the set of triples (a0 , a1 , a2 ) in [n]3 with a0 a1 a2 and let T
be the set of triples with a0 a1 a2 . Replacing each ai with (n 1) ai
gives a bijection between S and T , so |S| = |T |. Computing |T | is a little
easier, so well do that first.
APPENDIX A. SAMPLE ASSIGNMENTS 300
1 3 1 2 1
|S T | = 2 n + n + n n
3 2 6
2 3 2
= n + n2 n.
3 3
Solution
There are n3 = n(n1)(n2)
6 choices for R, all of which are equally likely. So
we want to count the number of sets R for which median(R) = median(S).
APPENDIX A. SAMPLE ASSIGNMENTS 301
(n 1)2 /4 3 n1
= .
n(n 1)(n 2)/6 2 n(n 2)
1. What is Pr [B | A]?
Solution
Recall that
Pr [B A]
Pr [B | A] = .
Pr [A]
Lets start by calculating Pr [A]. For any single suit s, there are (13)5
ways to give you 5 cards from s, out of (52)5 ways to give you 5 cards,
assuming in both cases that we keep track of the order of the cards.4 So the
3
This turns out to be pretty hard to do in practice [BD92], but well suppose that we
can actually do it.
4
If we dont keep track of the order, we get 13
5
choices out of 52
5
possibilities; these
divide out to the same value.
APPENDIX A. SAMPLE ASSIGNMENTS 302
(13)5
.
(52)5
Since there are four suits, and the events As are disjoint for each suit,
we get
X (13)5
Pr [A] = Pr [As ] = 4 .
s (52)5
For Pr [A B], let Cst be the event that your cards are all from suit s
and mine are all from suit t. Then
(13)10 if s = t, and
(52)10
Pr [Cst ] =
(13)5 (13)5 if s 6= t.
(52)10
Another way to get (A.9.1) is to argue that once you have five cards of a
particular suit, there are (47)5 equally probable choices for my five cards, of
which (8)5 give me five cards from your suit and 3 (13)5 give me five cards
from one of the three other suits.
APPENDIX A. SAMPLE ASSIGNMENTS 303
Solution
Expand
XD0
E [S] = E Di
i=1
n
X XD0
= E Di D0 = j Pr [D0 = j]
j=1 i=1
n j
1X X
= E Di
n j=1 i=1
j
n X
1 X
= E [Di ]
n j=1 i=1
n X j
1X n+1
=
n j=1 i=1 2
n
1X n+1
= j
n j=1 2
n
n+1 X
= j
2n j=1
n + 1 n(n + 1)
=
2n 2
(n + 1)2
= .
4
Appendix B
Sample exams
These are exams from the Fall 2013 version of CPSC 202. Some older exams
can be found in Appendices C and D.
P (P Q) Q
Solution
P Q P Q P (P Q) P (P Q) Q
0 0 1 1 1
0 1 0 0 1
1 0 0 1 1
1 1 1 1 1
305
APPENDIX B. SAMPLE EXAMS 306
x+y =0 (mod m)
xy =0 (mod m)
Solution
Add the equations together to get
2x = 0 (mod m)
Solution
Using the definition of exponentiation and the geometric series formula, we
can compute
n Y
X i n
X
2= 2i
i=1 j=1 i=1
n1
X
= 2i+1
i=0
n1
X
=2 2i
i=0
2n1
=2
21
= 2 (2n 1)
= 2n+1 2.
Solution
Suppose that for all C, A C B C. In particular, let C = A. Then
A = A A B A. If x A, then x B A, giving x B. So A B.
(Other choices for C also work.)
An alternative proof proceeds by contraposition: Suppose A 6 B. Then
there is some x in A that is not in B. But then A{x} = {x} and B{x} = ,
so A {x} 6 B {x}.
Solution
We need to show that any two elements of S are comparable.
If S is empty, the claim holds vacuously.
Otherwise, let x and y be elements of S. Then {x, y} is a nonempty
subset of S, and so it has a minimum element z. If z = x, then x y; if
z = y, then y x. In either case, x and y are comparable.
x Z : y Z : x < y (B.2.1)
x Z : y Z : x < y (B.2.2)
Solution
First, well show that (B.2.1) is true. Given any x Z, choose y = x + 1.
Then x < y.
Next, well show that (B.2.2) is not true, by showing that its negation
is true. Negating (B.2.2) gives x Z : y Z : x 6< y. Given any x Z,
choose y = x. Then x 6< y.
Solution
We dont really expect this to be true, because the usual expansion (A +
B)2 = A2 + AB + BA + B 2 doesnt simplify further since AB does not equal
BA in general.
APPENDIX B. SAMPLE EXAMS 309
(A + B)2 = A2 + 2AB + B 2 .
Then
A2 + 2AB + B 2 = A2 + AB + BA + B 2 .
AB = BA.
Then
" #2
2 2 0
(A + B) =
2 2
" #
0 4
= ,
8 4
but
" #2 " #" # " #2
2 1 1
2 1 1 1 1 1 1
A + 2AB + B = +2 +
1 1 1 1 1 1 1 1
" # " # " #
2 2 2 0 0 2
= +2 +
2 2 2 0 2 0
" #
6 0
= .
8 2
APPENDIX B. SAMPLE EXAMS 310
Solution
There are three: The empty graph, the graph with one vertex, and the graph
with two vertices connected by an edge. These enumerate all connected
graphs with two vertices or fewer (the other two-vertex graph, with no edge,
is not connected).
To show that these are the only possibilities, suppose that we have a
connected graph G with more than two vertices. Let u be one of these
vertices. Let v be a neighbor of u (if u has no neighbors, then there is no
path from u to any other vertex, and G is not connected). Let w be some
other vertex. Since G is connected, there is a path from u to w. Let w0 be
the first vertex in this path that is not u or v. Then w0 is adjacent to u or
v; in either case, one of u or v has degree at least two.
Appendix C
Note that topics covered may vary from semester to semester, so the ap-
pearance of a particular topic on one of these sample midterms does not
necessarily mean that it may appear on a current exam.
T (0) = 1.
T (n) = 3T (n 1) + 2n , when n > 0.
Solution
Using generating functions
P n,
Let F (z) = n=0 T (n)z then
1
F (z) = 3zF (z) + .
1 2z
311
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS312
Solution
Trying small values of n gives 0! = 1 = 20 (bad), 1! = 1 < 21 (bad),
2! = 2 < 22 (bad), 3! = 6 < 23 (bad), 4! = 24 > 24 = 16 (good). So well
guess n0 = 4 and use the n = 4 case as a basis.
For larger n, we have n! = n(n 1)! > n2n1 > 2 2n1 = 2n .
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS313
Solution
There are several ways to do this. The algebraic version is probably cleanest.
Combinatorial version
The LHS counts the way to choose k of n elements and then specially mark
one of the k. Alternatively, we could choose the marked element first (n
choices) and then choose the remaining k 1 elements from the remaining
n 1 elements ( n1
k1 choices); this gives the RHS.
Algebraic version
n n! n! (n1)! n1
Compute k k = k k!(nk)! = (k1)!(nk)! = n (k1)!((n1)(k1))! =n k1 .
Solution
For each i {1 . . . n}, let Ai be the event that the coin comes up heads for
the first time on flip i and continues to come up heads thereafter. Then
the desired event is the disjoint union of the Ai . Since each Ai is a single
sequence of coin-flips, each occurs with probability 2n . Summing over all
i gives a total probability of n2n .
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS314
Solution
Well show the slightly stronger statement 0 S(n) T (n) by induction
on n. The base case n = 0 is given.
Now suppose 0 S(n) T (n); we will show the same holds for n + 1.
First observe S(n+1) = aS(n)+f (n) 0 as each variable on the right-hand
side is non-negative. To show T (n + 1) S(n + 1), observe
T (n + 1) = bT (n) + g(n)
aT (n) + f (n)
aS(n) + f (n)
= S(n + 1).
Note that we use the fact that 0 T (n) (from the induction hypothesis)
in the first step and 0 a in the second. The claim does not go through
without these assumptions, which is why using S(n) T (n) by itself as the
induction hypothesis is not enough to make the proof work.
For example, with 3 students A, B, and C and 7 seats, there are exactly
4 ways to seat the students: A-B-C, A-BC-, AB-C-, and -A-B-C-.
Give a formula that gives the number of ways to seat k students in n
seats according to the rules given above.
Solution
The basic idea is that we can think of each student and the adjacent empty
space as a single width-2 unit. Together, these units take up 2k seats, leaving
n 2k extra empty seats to distribute between the students. There are a
couple of ways to count how to do this.
Combinatorial approach
Treat each of the k student-seat blocks and n 2k extra seats as filling one
nk
of k + (n 2k) = n k slots. There are exactly k ways to do this.
Solution
We need to count how many placements of rooks there are that put exactly
one rook per row and exactly one rook per column. Since we know that
there is one rook per row, we can specify where these rooks go by choosing
a unique column for each row. There are n choices for the first row, n 1
remaining for the second row, and so on, giving n(n 1) 1 = n! choices
2
altogether. So the probability of the event is n!/ nn = (n2 n)!/(n2 )!.
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS316
Solution
1. Proof: Let f (x) = x. Then f (x) = f (y) implies x = y and f is
injective.
Solution
Proof: By induction on i. For i = 0 we have A0 = a0 b0 = B0 . Now
suppose Ai Bi . Then Ai+1 = i+1
Pi
j=0 aj + ai+1 = Ai + ai+1
P
j=0 aj =
Pi Pi+1
Bi + bi+1 = j=0 bj + bj+1 = j=0 bj = Bj .
Solution
There is an easy way to solve this, and a hard way to solve this.
Easy way: For each possible recruit x, we can assign x one of four states:
non-member, member but not inner circle member, inner circle member but
not EGHMPoI, or EGHMPoI. If we know the state of each possible recruit,
that determines the contents of M , C, X and vice versa. It follows that
there is a one-to-one mapping between these two representations, and that
the number of rosters is equal to the number of assignments of states to all
n potential recruits, which is 4n .
Hard way: By repeated application of the binomial theorem. Expressing
the selection process in terms of choosing nested subsets of m, c, and x
members, the number of possible rosters is
n m c n m
( ! " ! !#) ( ! ! )
X n X m X c X n X m c
= 2
m=0
m c=0 c x=0 x m=0
m c=0 c
n
!
X n
= (1 + 2)m
m=0
m
n
!
X n m
= 3
m=0
m
= (1 + 3)n
= 4n .
Solution
1. Disproof: Let A = {}, B = {A} = {{}}, and C = B. Then A B
and B C, but A 6 C, because A but 6 C.
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS318
Solution
1. Here we apply Markovs inequality: since X 0, we have Pr[X
80] E[X] 60
80 = 80 = 3/4. This maximum is achieved exactly by letting
X = 0 with probability 1/4 and 80 with probability 3/4, giving E[X] =
(1/4) 0 + (3/4) 80 = 60.
Solution
If is a partial order, then by reflexivity we have x x for any x. But
then there exists z S such that x + z = x, which can only happen if z =
0. Thus 0 S.
Now suppose x and y are both in S. Then 0 + x = x implies 0 x, and
x + y = x + y implies x x + y. Transitivity of gives 0 x + y, which
occurs only if some z such that 0 + z = x + y is in S. The only such z is
x + y, so x + y is in S.
Solution
Write a2p1 = ap1 ap1 a. If a 6= 0, Eulers Theorem (or Fermats Little
Theorem) says ap1 = 1 (mod p), so in this case ap1 ap1 a = a (mod p).
If a = 0, then (since 2p 1 6= 0), a2p1 = 0 = a (mod p).
Solution
1. x (T (x) (L(m, x) L(x, x))).
APPENDIX C. MIDTERM EXAMS FROM PREVIOUS SEMESTERS320
Solution
Here are three ways to do this:
Pb Pb Pa1 Pn
1. Write k=a k as k=1 k k=1 k and then use the formula k=1 k =
n(n+1)
2 to get
b
X b
X a1
X
k= k k
k=a k=1 k=1
b(b + 1) (a 1)a
=
2 2
b(b + 1) a(a 1)
= .
2
3. Write bk=a k as ba
k=0 (a + k) = (b a + 1)a +
ba
P P P
k=0 k. Then use the
sum formula as before to turn this into (b a + 1)a + (ba)(ba+1)
2 .
Note that topics may vary from semester to semester, so the appearance
of a particular topic on one of these exams does not necessarily indicate
that it will appear on the second exam for the current semester. Note also
that these exams were designed for a longer time slotand were weighted
higherthan the current semesters exams; the current semesters exams
are likely to be substantially shorter.
321
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 322
1. What is the probability that the players score at the end of the game
is zero?
2. What is the expectation of the players score at the end of the game?
Solution
1. The only way to get a score of zero is to lose on the first roll. There
are 36 equally probable outcomes for the first roll, and of these the
six outcomes (4,6), (5,5), (5,6), (6,4), (6,5), and (6,6) yield a product
greater than 20. So the probability of getting zero is 6/36 = 1/6.
2. To compute the total expected score, let us first compute the expected
score for a single turn. This is
6 X 6
1 X
ij[ij 20].
36 i=1 j=1
where [ij 20] is the indicator random variable for the event that
ij 20.
I dont know of a really clean way to evaluate the sum, but we can
expand it as
3
! 6 5 4 3
X X X X X
i j +4
j+5 j+6 j
i=1 j=1 j=1 j=1 j=1
= 6 21 + 4 15 + 5 10 + 6 6
= 126 + 60 + 50 + 36
= 272.
S = 68/9 + (5/6)S,
Solution
Both the structure of the vector space and the definition of f are irrelevant;
the only fact we need is that ~z1 ~z2 if and only if f (~z1 ) = f (~z2 ). Thus for
all ~z, ~z ~z since f (~z) = f (~z (reflexivity); for all ~y and ~z, if ~y ~z, then
f (~y ) = f (~z) implies f (~z) = f (~y ) implies ~z ~y (symmetry); and for all ~x, ~y ,
and ~z, if ~x ~y and ~y ~z, then f (~x) = f (~y ) and f (~y ) = f (~z), so f (~x) = f (~z)
and ~x ~z (transitivity).
Solution
Lets save ourselves a lot of writing by letting x = 24036583, so that p =
2x 1 and the fraction becomes
x1
9 92
.
p
To show that this is an integer, we need to show that p divides the
denominator, i.e., that
x1
92 9=0 (mod p).
Wed like to attack this with Fermats Little Theorem, so we need to get
the exponent to look something like p 1 = 2x 2. Observe that 9 = 32 , so
x1 x1 x x 2
92 = (32 )2 = 32 = 32 32 = 3p1 32 .
x1 x1
But 3p1 = 1 (mod p), so we get 92 = 32 = 9 (mod p), and thus 92
9 = 0 (mod p) as desired.
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 324
Solution
Let G0 be the connected component of u in G. Then G0 is itself a graph,
and the degree of any vertex is the same in G0 as in G. Since the sum of
all the degrees of vertices in G0 must be even by the Handshaking Lemma,
there cannot be an odd number of odd-degree vertices in G0 , and so there is
some v in G0 not equal to u that also has odd degree. Since G0 is connected,
there exists a path from u to v.
Solution
Since the carrier is fixed, we have to count the number of different ways of
defining the binary operation. Lets call the operation f . For each ordered
pair of elements (x, y) S S, we can pick any element z S for the value
2
of f (x, y). This gives n choices for each of the n2 pairs, which gives nn
magmas on S.
Solution
Let A P(S); then by the definition of P(S) we have A S. But then
A S T implies A T , and so A P(T ). Since A was arbitrary,
A P(T ) holds for all A in P(S), and we have P(S) P(T ).
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 325
Solution
Let H be the set of hieroglyphs, and observe that the map f : H H cor-
responding to pushing the red lever up is invertible and thus a permutation.
Similarly, the maps g and h corresponding to yellow or blue up-pushes are
also permutations, as are the inverses f 1, g 1, and h 1 corresponding to
red, yellow, or blue down-pushes. Repeated pushes of one or more levers
correspond to compositions of permutations, so the set of all permutations
obtained by sequences of zero or more pushes is the subgroup G of the
permutation group S|H| generated by f , g, and h.
Now consider the cyclic subgroup hf i of G generated by f alone. Since
G is finite, there is some index m such that f m = e. Similarly there are
indices n and p such that g n = e and hp = e. So pushing the red lever up
any multiple of k times restores the initial state, as does pushing the yellow
lever up any multiple of n times or the blue lever up any multiple of p times.
Let k = mnp. Then k is a multiple of m, n, and p, and pushing any single
lever up k times leaves the display in the same state.
There are six problems on this exam, each worth 20 points, for a total
of 120 points. You have approximately three hours to complete this exam.
Assume n > 2.
Solution
Disproof: Consider the permutation (1 2)(3 4 5)(6 7 8 9 10)(11 12 13 14 15
16 17) in S17 . This has order 2 3 5 7 = 210 but 17 1716
2 = 2 = 136.
Solution
Proof: Let F be the free group defined above and let S be a subgroup of F .
Suppose S contains ak for some k 6= 0. Then S contains a2k , a3k , . . . because
it is closed under multiplication. Since these elements are all distinct, S is
infinite.
The alternative is that S does not contain ak for any k 6= 0; this leaves
only a0 as possible element of S, and there is only one such subgroup: the
trivial subgroup {a0 }.
S and T , there are at least two edges that have one endpoint in S and one
in T .
Solution
Proof: Because G is connected and every vertex has even degree, there is
an Euler tour of the graph (a cycle that uses every edge exactly once). Fix
some particular tour and consider a partition of V into two sets S and T .
There must be at least one edge between S and T , or G is not connected;
but if there is only one, then the tour cant return to S or T once it leaves.
It follows that there are at least 2 edges between S and T as claimed.
Solution
Each ranking is a total order on the n teams, and we can describe such a
ranking by giving one of the n! permutations of the teams. These in turn
generate n! distinct outcomes of the experiment that will cause the saber-
metrician to believe the hypothesis. To compute the probability that one
of these outcomes occurs, we must divide by the total number of outcomes,
giving
n!
Pr [strict ranking] = n .
2( 2 )
and the weight of its meal (and the eaten piranha is gone); if unsuccessful,
the piranha remains at the same weight.
Prove that after k days, no surviving piranha weighs more than 2k units.
It is not possible for a piranha to eat and be eaten on the same day.
Solution
By induction on k. The base case is k = 0, when all piranha weigh exactly
20 = 1 unit. Suppose some piranha has weight x 2k after k days. Then
either its weight stays the same, or it successfully eats another piranha of
weight y 2k increases its weight to x + y 2k + 2k = 2k+1 . In either case
the claim follows for k + 1.
over the reals, and consider the subspace S of the vector space of 2-by-2 real
matrices generated by the set {A, A2 , A3 , . . .}. What is the dimension of S?
Solution
First lets see what Ak looks like. We have
! ! !
2 1 1 1 1 1 2
A = =
0 1 0 1 0 1
! ! !
3 1 1 1 2 1 3
A = =
0 1 0 1 0 1
and in general we can show by induction that
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 329
! ! !
k 1 1 1 k1 1 k
A = = .
0 1 0 1 0 1
Observe now that for any k,
! ! !
k 1 k 1 2 1 1
A = = (k1) (k2) = (k1)A2 (k2)A.
0 1 0 1 0 1
It follows that {A, A2 } generates all the Ak and thus generates any linear
combination of the Ak as well. It is easy to see that A and A2 are linearly
independent: if c1 A + c2 A2 = 0, we must have (a) c1 + c2 = 0 (to cancel
out the diagonal entries) and (b) c1 + 2c2 = 0 (to cancel out the nonzero
off-diagonal entry). The only solution to both equations is c1 = c2 = 0.
Because {A, A2 } is a linearly independent set that generates S, it is a
basis, and S has dimension 2.
Solution
Let p be the probability of the event W that the coin comes up heads twice
before coming up tails. Consider the following mutually-exclusive events for
the first one or two coin-flips:
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 330
p = p2H + pH pS p + pS p,
Solution
The group G has exactly one element.
First observe that G has at least one element, because it contains an
identity element e.
Now let x and y be any two elements of G. We can show x y, because
y = x(x1 y). Similarly, y x = y(y 1 x). But then x = y by antisymmetry.
It follows that all elements of G are equal, i.e., that G has at most one
element.
Solution
Lets look at the effect of multiplying a vector of known weight by just one
near-diagonal matrix. We will show: (a) for any near-diagonal A and any x,
w(Ax) w(x)+1, and (b) for any n1 column vector x with 0 < w(x) < n,
there exists a near-diagonal matrix A with w(Ax) w(x) + 1.
To prove (a), observe that (Ax)i = nj=1 Aij xj . For (Ax)i to be nonzero,
P
there must be some index j such that Aij xj is nonzero. This can occur in
two ways: j = i, and Aii and xi are both nonzero, or j 6= i, and Aij and xj
are both nonzero. The first case can occur for at most w(x) different values
of i (because there are only w(x) nonzero entries xi ). The second can occur
for at most one value of i (because there is at most one nonzero entry Aij
with i 6= j). It follows that Ax has at most w(x) + 1 nonzero entries, i.e.,
that w(Ax) w(x) + 1.
To prove (b), choose k and m such that xk = 0 and xm 6= 0, and let A
be the matrix with Aii = 1 for all i, Akm = 1, and all other entries equal to
zero. Now consider (Ax)i . If i 6= k, then (Ax)i = nj=1 Aij xj = Aii xi = xi .
P
Pn
If i = k, then (Ai)k = j=1 Aij xj = Akk xk + Akm xm = xm 6= 0, since we
chose k so that ak = 0 and chose m so that am 6= 0. So (Ax)i is nonzero if
either xi is nonzero or i = k, giving w(Ax) w(x) + 1.
Now proceed by induction:
For any k, if A1 . . . Ak are near-diagonal matrices, then w(A1 Ak x)
w(x)+k. Proof: The base case of k = 0 is trivial. For larger k, w(A1 Ak x) =
w(A1 (A2 Ak x)) w(A2 Ak x) + 1 w(x) + (k 1) + 1 = w(x) + k.
Fix x with w(x) = 1. Then for any k < n, there exists a sequence of
near-diagonal matrices A1 . . . Ak such that w(A1 Ak x) = k + 1. Proof:
Again the base case of k = 0 is trivial. For larger k < n, we have from the
induction hypothesis that there exists a sequence of k 1 near-diagonal
matrices A2 . . . Ak such that w(A2 . . . Ak x) = k < n. From claim (b)
above we then get that there exists a near-diagonal matrix A1 such that
w(A1 (A2 . . . Ak x)) = w(A2 . . . Ak x) + 1 = k + 1.
Applying both these facts, setting k = n 1 is necessary and sufficient
for w(A1 . . . Ak x) = n, and so k = n 1 is the smallest value of k for which
this works.
Solution
Since in all three cases we are considering symmetric antisymmetric rela-
tions, we observe first that if R is such a relation, then xRy implies yRx
which in turn implies x = y. So any such R can have xRy only if x = y.
1. Prove that given any three consecutive values xi , xi+1 , xi+2 , it is pos-
sible to compute both a and b, provided xi 6= xi+1 .
2. Prove that given only two consecutive values xi and xi+1 , it is impos-
sible to determine a.
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 333
Solution
1. We have two equations in two unknowns:
axi + b = xi+1 (mod p)
axi+1 + b = xi+2 (mod p).
Subtracting the second from the first gives
a(xi xi+1 ) = xi+1 xi+2 (mod p).
If xi 6= xi+1 , then we can multiply both sides by (xi xi+1 )1 to get
a = (xi+1 xi+2 )(xi xi+1 )1 (mod p).
Now we have a. To find b, plug our value for a into either equation
and solve for b.
2. We will show that for any observed values of xi and xi+1 , there are at
least two different values for a that are consistent with our observation;
in fact, well show the even stronger fact that for any value of a, xi
and xi+1 are consistent with that choice of a. Proof: Fix a, and let
b = xi+1 axi (mod p). Then xi+1 = axi + b (mod p).
Solution
This is a job for generating functions!
P n n 1
Let R = 3 z = 13z be the generating function for the number of
1
robots of each weight, and let B = 2n z n = 12z
P
be the generating function
for the number of bodies of each weight. Let H = hn z n be the generating
P
R 1 2z 1 2z
H= = = .
B 1 3z 1 3z 1 3z
So h0 = 30 = 1, and for n > 0, we have hn = 3n 23n1 = (32)3n1 =
3 n1 .
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 334
Solution
Proof: Rewrite x A (x B x C) as x 6 A (x 6 B x C) or
(x 6 A x 6 B) x C. Applying De Morgans law we can convert the
first OR into an AND to get (x A x B) x C. This can further
be rewritten as (x A x B) x C.
Now suppose that this expression is true for all x and consider some x
in A B. Then x A x B is true. It follows that x C is also true.
Since this holds for every element x of A B, we have A B C.
Solution
From the extended Euclidean algorithm we have that if gcd(a, m) = 1, then
there exists a multiplicative inverse a1 such that a1 ax = x (mod m) for
all x in Zm . It follows that fa has an inverse function fa1 , and is thus a
bijection.
Alternatively, suppose gcd(a, m) = g 6= 1. Then fa (m/g) = am/g =
m(a/g) = 0 = a 0 = fa (0) (mod m) but m/g 6= 0 (mod m) since 0 <
m/g < m. It follows that fa is not injective and thus not a bijection.
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 335
Solution
Its easier to calculate the probability of the event that we never get two
consecutive heads or tails, since in this case there are only two possible
patterns of coin-flips: HT HT . . . or T HT H . . . . Since each of these patterns
contains exactly n heads and n tails, they occur with probability pn (1
p)n , giving a total probability of 2pn (1 p)n . The probability that neither
sequence occurs is then 1 2pn (1 p)n .
Solution
The graph G has exactly n2 edges. The reason is that under the stated
Solution
Observe first that (A B)(A + B) = A2 + AB BA + B 2 . The question
then is whether AB = BA. Because A and B are symmetric, we have that
BA = B > A> = (AB)0 . So if we can show that AB is also symmetric, then we
have AB = (AB)0 = BA. Alternatively, if we can find symmetric matrices
A and B such that AB is not symmetric, then A2 B 2 6= (A B)(A + B).
Lets try multiplying two generic symmetric 2-by-2 matrices:
! ! !
a b d e ad + be ae + bf
=
b c e f bd + ce be + cf
The product doesnt look very symmetric, and in fact we can assign
variables to make it not so. We need ae + bf 6= bd + ce. Lets set b = 0
to make the bf and bd terms drop out, and e = 1 to leave just a and c.
Setting a = 0 and c = 1 gives an asymmetric product. Note that we didnt
determine d or f , so lets just set them to zero as well to make things as
simple as possible. The result is:
! ! !
0 0 0 1 0 0
AB = =
0 1 1 0 1 0
Both and are equivalence relations. Let {0, 1}n / and {0, 1}n /
be the corresponding sets of equivalence classes.
1. What is |{0, 1}n /| as a function of n?
2. What is |{0, 1}n /| as a function of n?
Solution
1. Given a string x, the equivalent class [x] = {x, r(x)} has either one
element (if x = r(x)) or two elements (if x 6= r(x)). Let m1 be
the number of one-element classes and m2 the number of two-element
classes. Then |{0, 1}n | = 2n = m1 + 2m2 and the number we are
n
looking for is m1 + m2 = 2m1 +2m 2
2
= 2 +m2
1
= 2n1 + m21 . To find
m1 , we must count the number of strings x1 , . . . xn with x1 = xn ,
x2 = xn1 , etc. If n is even, there are exactly 2n/2 such strings, since
we can specify one by giving the first n/2 bits (which determine the
rest uniquely). If n is odd, there are exactly 2(n+1)/2 such strings,
since the middle bit can be set freely. We can write both alternatives
as 2dn/2e , giving |{0, 1}n /| = 2n1 + 2dn/2e .
2. In this case, observe that x y if and only if x and y contain the same
number of 1 bits. There are n + 1 different possible values 0, 1, . . . , n
for this number. So |{0, 1}n /| = n + 1.
The solution to the first part assumes n > 0; otherwise it produces the
nonsensical result 3/2. The problem does not specify whether n = 0 should
be considered; if it is, we get exactly one equivalence class for both parts
(the empty set).
f1 (x) = x1 x2 .
f2 (x) = x1 x2 .
f3 (x) = x1 + x2 + 1.
x21 x22 + x1 x2
f4 (x) = .
x1 + x2 + 1
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 338
Clarification added during the exam: You may assume that x1 +x2 6=
1 for f4 .
Solution
1. Linear: f1 (ax) = ax1 ax2 = a(x1 x2 ) = af1 (x) and f1 (x + y) =
(x1 + y1 ) (x2 + y2 ) = (x1 x2 ) + (y1 y2 ) = f1 (x) + f1 (y).
2. Not linear: f2 (2x) = (2x1 )(2x2 ) = 4x1 x2 = 4f2 (x) 6= 2f2 (x) when
f2 (x) 6= 0.
3. Not linear: f3 (2x) = 2x1 + 2x1 + 1 but 2f3 (x) = 2x1 + 2x2 + 2. These
are never equal.
4. Linear:
x21 x22 + x1 x2
f4 (x) =
x1 + x2 + 1
(x1 + x2 )(x1 x2 ) + (x1 x2 )
=
x1 + x2 + 1
(x1 + x2 + 1)(x1 x2 )
=
x1 + x2 + 1
= x1 x2
= f1 (x).
Solution
To compute E[aX ], we need to sum over all possible values of aX weighted
by their probabilities. The variable X itself takes on each value k {0 . . . n}
with probability nk 2n , so aX takes on each corresponding value ak with
2. What is E[Z]?
Solution
1. There are five cases where Z = 1 with Y = X + 1 (because X can
range from 1 to 5), and five more cases where Z = 1 with X = Y + 1.
So Pr[Z = 1] = 10 5
36 = 18 .
APPENDIX D. FINAL EXAMS FROM PREVIOUS SEMESTERS 340
Solution
Any two inputs k that are equal mod m give the same pair (3k mod m, 7k mod
m). So no matter how many iterations we do, we only reach m distinct lo-
cations. This equals m2 only if m = 1 or m = 0. The problem statement
excludes m = 0, so we are left with m = 1 as the only value of m for which
this method works.
Appendix E
Suppose you want to write down some mathematics. How do you do it?
E.1 By hand
This method is recommended for CS202 assignments.
Advantages Dont need to learn any special formatting tools: any symbol
you can see you can copy. Very hard to make typographical errors.
Example
E.2 LATEX
This is what these notes are written in. Its also standard for writing papers
in most technical fields.
Disadvantages You have to install it and learn it. Cant tell what some-
thing looks like until you run it through a program. Cryptic and
341
APPENDIX E. HOW TO WRITE MATHEMATICS 342
Example
n
X n(n + 1)
i= .
i=1
2
The text above was generated by this source code:
\begin{displaymath}
\sum_{i=1}^n i = \frac{n(n+1)}{2}.
\end{displaymath}
Advantages Everybody can read ASCII and most people can read Uni-
code. No special formatting required. Results are mostly machine-
readable.
APPENDIX E. HOW TO WRITE MATHEMATICS 343
n
---
\ n(n+1)
/ i = ------
--- 2
i=1
Calculus is not a prerequisite for this course, and it is possible to have a per-
fectly happy career as a computer scientist without learning any calculus at
all. But for some tasks, calculus is much too useful a tool to ignore. Fortu-
nately, even though typical high-school calculus courses run a full academic
hear, the good parts can be understood with a few hours of practice.
F.1 Limits
The fundamental tool used in calculus is the idea of a limit. This is an
approximation by nearby values to the value of an expression that we cant
calculate exactly, typically because it involves division by zero.
The formal definition is that the limit as x goes to a of f (x) is c, written
lim f (x) = c,
xa
if for any constant > 0 there exists a constant > 0 such that
|f (y) c|
whenever
|y x| .
344
APPENDIX F. TOOLS FROM CALCULUS 345
Some malevolent jackass picks , and says oh yeah, smart guy, I bet
you cant force f (y) to be within of c.
Your opponent wins if he can find a nonzero y in this range with f (y)
outside [c , c + ]. Otherwise you win.
(x + z)2 x2
lim = 2x.
z0 z
We need to take a limit here because the left-hand side isnt defined when
z = 0.
Before playing the game, it helps to use algebra to rewrite the left-hand
side a bit:
(x + z)2 x2 x2 + 2x(z) + (z)2 x2
lim = lim
z0 z z0 z
2x(z) + (z)2
= lim
z0 z
= lim 2x + z.
z0
So now the adversary says make |(2x + z) 2x| < , and we say thats
easy, let = , then no matter what z you pick, as long as |z 0| < , we
get |(2x + z) 2x| = |z| < = , QED. And the adversary slinks off with
its tail between its legs to plot some terrible future revenge.
Of course, a definition only really makes sense if it doesnt work if we
pick a different limit. If we try to show
(x + z)2 x2
lim = 12,
z0 z
(assuming x 6= 6), then the adversary picks < |12 2x|. Now we are out
of luck: no matter what we pick, the adversary can respond with some
value very close to 0 (say, min(/2, |12 2x|/2)), and we land inside but
outside 12 .
We can also take the limit as a variable goes to infinity. This has a
slightly different definition:
lim f (x) = c
x
APPENDIX F. TOOLS FROM CALCULUS 346
holds if for any > 0, there exists an N > 0, such that for all x > N ,
|f (x) c| < . Structurally, this is the same 3-step game as before, except
now after we see instead of constraining x to be very close to a, we con-
straint x to be very big. Limits as x goes to infinity are sometimes handy
for evaluating asymptotic notation.
Limits dont always exist. For example, if we try to take
lim x2 ,
x
F.2 Derivatives
The derivative or differential of a function measures how much the func-
tion changes if we make a very small change to its input. One way to think
about this is that for most functions, if you blow up a plot of them enough,
you dont see any curvature any more, and the function looks like a line that
we can approximate as ax + b for some coefficients a and b. This is useful for
determining whether a function is increasing or decreasing in some interval,
and for finding things like local minima or maxima.
The derivative f 0 (x) just gives this coefficient a for each particular x.
The notation f 0 is due to Leibnitz and is convenient for functions that have
names but not so convenient for something like x2 + 3. For more general
functions, a different notation due to Newton is used. The derivative of f
df d
with respect to x is written as dx or dx f , and its value for a particular value
x = c is written using the somewhat horrendous notation
d
f .
dx x=c
There is a formal definitions of f 0 (x), which nobody ever uses, given by
f (x + x) f (x)
f 0 (x) = lim ,
x0 x
where x is a single two-letter variable (not the product of and x!) that
represents the change in x. In the preceding section, we calculated an ex-
d 2
ample of this kind of limit and showed that dx x = 2x.
APPENDIX F. TOOLS FROM CALCULUS 347
f (x) f 0 (x)
c 0
x n nxn1
e x ex
a x x
a ln a follows from ax = ex ln a
ln x 1/x
cg(x) cg 0 (x) multiplication by a constant
g(x) + h(x) g 0 (x) + h0 (x) sum rule
g(x)h(x) g(x)h0 (x) + g 0 (x)h(x) product rule
g(h(x)) g 0 (h(x))h0 (x) chain rule
d x2 d 1 1 d 2
= x2 ( )+ x [product rule]
dx ln x dx ln x ln x dx
d 1
= x2 1 (ln x) 2 ln x[chain rule] + 2x
dx ln x
x2 1 2x
= 2 +
ln x x ln x
x 2x
= 2 + .
ln x ln x
The idea is that whatever the outermost operation in an expression is,
you can apply one of the rules above to move the differential inside it, until
there is nothing left. Even computers can be programmed to do this. You
can do it too.
F.3 Integrals
First you have to know how to differentiate (see previous section). Having
learned how to differentiate, your goal in integrating some function f (x) is
to find another function F (x) such that F 0 (x) = f (x). You can then write
APPENDIX F. TOOLS FROM CALCULUS 348
R
that the indefinite integral f (x) dx of f (x) is F (x) + C (any constant
C works), and compute definite integrals with the rule
Z b
f (x) dx = F (b) F (a).
a
Alternatively, one can also think Rof the definite integral ab f (x) dx as a
R
f (x) F (x)
f (x) + g(x) F (x) + G(x)
af (x) aF (x) a is constant
F (ax)
f (ax) a a is constant
xn+1
xn n+1 n constant, n 6= 1
x1 ln x
ex ex
ax
ax ln a a constant
ln x x ln x x
x : Sx 6= 0. (P1)
350
APPENDIX G. THE NATURAL NUMBERS 351
This still allows for any number of nasty little models in which 0 is
nobodys successor, but we still stop before getting all of the naturals. For
example, let SS0 = S0; then we only have two elements in our model (0
and S0, because once we get to S0, any further applications of S keep us
where we are.
To avoid this, we need to prevent S from looping back round to some
number weve already produced. It cant get to 0 because of the first axiom,
and to prevent it from looping back to a later number, we take advantage
of the fact that they already have one successor:
x : y : Sx = Sy x = y. (P2)
This is known as the induction schema, and says that, for any pred-
icate P , if we can prove that P holds for 0, and we can prove that P (x)
implies P (x + 1), then P holds for all x in N. The intuition is that even
though we havent bothered to write out a proof of, say P (1337), we know
that we can generate one by starting with P (0) and modus-pwning our way
out to P (1337) using P (0) P (1), then P (1) P (2), then P (2) P (3),
etc. Since this works for any number (eventually), there cant be some
number that we missed.
In particular, this lets us throw out the bogus numbers in the bad ex-
ample above. Let B(x) be true if x is bogus (i.e., its equal to B or one
APPENDIX G. THE NATURAL NUMBERS 352
of the other values in its chain of successors). Let P (x) B(x). Then
P (0) holds (0 is not bogus), and if P (x) holds (x is not bogus) then so does
P (Sx). It follows from the induction axiom that xP (x): there are no bogus
numbers.3
(x 6= 0) (y : x = Sy).
This seems like a good candidate for P (our induction hypothesis), be-
cause we do know a few things about 0. Lets see what happens if we try
plugging this into the induction schema:
Since we showed P (0) and xP (x) P (Sx), the induction schema tells
us xP (x). This finishes the proof.
Having figured the proof out, we might go back and clean up any false
starts to produce a compact version. A typical mathematician might write
the preceding argument as:
Proof. Induction on x.
x + 0 = x.
x + Sy = S(x + y).
APPENDIX G. THE NATURAL NUMBERS 354
(We are omitting some quantifiers, since unbounded variables are im-
plicitly universally quantified.)
This definition is essentially a recursive program for computing x + y us-
ing only successor, and there are some programming languages (e.g. Haskell)
that will allow you to define addition using almost exactly this notation. If
the definition works for all inputs to +, we say that + is well-defined. Not
working would include giving different answers depending on which parts of
the definitions we applied first, or giving no answer for some particular in-
puts. These bad outcomes correspond to writing a buggy program. Though
we can in principle prove that this particular definition is well-defined (using
induction on y), we wont bother. Instead, we will try to prove things about
our new concept of addition that will, among other things, tell us that the
definition gives the correct answers.
We start with a lemma, which is Greek for a result that is not especially
useful by itself but is handy for proving other results.4
Lemma G.3.1. 0 + x = x.
(We could do a lot of QED-ish jumping around in the end zone there,
but it is more refinedand lazierto leave off the end of the proof once its
clear weve satisifed all of our obligations.)
Heres another lemma, which looks equally useless:
Lemma G.3.2. x + Sy = Sx + y.
Lemma G.3.4. x + y = x + z y = z.
0 x.
x Sx.
x y y z x z.
5
This actually came up on a subtraction test I got in the first grade from the terrifying
Mrs Garrison at Mountain Park Elementary School in Berkeley Heights, New Jersey. She
insisted that 2 was not the correct answer, and that we should have recognized it as a
trick question. She also made us black out the arrow the left of the zero on the number-line
stickers we had all been given to put on the top of our desks. Mrs Garrison was, on the
whole, a fine teacher, but she did not believe in New Math.
APPENDIX G. THE NATURAL NUMBERS 356
a b c d a + c b + d.
x y y x x = y.
Sx y = y + x y.
Some properties of multiplication:
x 0 = 0.
1 x = x.
x 1 = x.
x y = y x.
x (y z) = (x y) z.
APPENDIX G. THE NATURAL NUMBERS 358
x 6= 0 x y = x z y = z.
x (y + z) = x y + x z.
x y z x z y.
z 6= 0 z x z y x y.
[BD92] Dave Bayer and Persi Diaconis. Trailing the dovetail shuffle to its
lair. Annals of Applied Probability, 2(2):294313, 1992.
[Ber34] George Berkeley. THE ANALYST; OR, A DISCOURSE Ad-
dressed to an Infidel MATHEMATICIAN. WHEREIN It is exam-
ined whether the Object, Principles, and Inferences of the modern
Analysis are more distinctly conceived, or more evidently deduced,
than Religious Mysteries and Points of Faith. Printed for J. Ton-
son, London, 1734.
[Big02] Norman L. Biggs. Discrete Mathematics. Oxford University Press,
second edition, 2002.
[Bou70] N. Bourbaki. Thorie des Ensembles. Hermann, Paris, 1970.
[Ded01] Richard Dedekind. Essays on the Theory of Numbers. The
Open Court Publishing Company, Chicago, 1901. Translated by
Wooster Woodruff Beman.
[Die10] R. Diestel. Graph Theory. Graduate Texts in Mathematics.
Springer, 2010.
[Fer08] Kevin Ferland. Discrete Mathematics. Cengage Learning, 2008.
[Gen35a] Gerhard Gentzen. Untersuchungen ber das logische Schlieen. i.
Mathematische zeitschrift, 39(1):176210, 1935.
[Gen35b] Gerhard Gentzen. Untersuchungen ber das logische Schlieen.
ii. Mathematische Zeitschrift, 39(1):405431, 1935.
[GKP94] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik.
Concrete Mathematics: A Foundation for Computer Science.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA,
USA, 2nd edition, 1994.
359
BIBLIOGRAPHY 360
[NWZ11] David Neumark, Brandon Wall, and Junfu Zhang. Do small busi-
nesses create more jobs? New evidence for the United States from
the National Establishment time series. The Review of Economics
and Statistics, 93(1):1629, February 2011.
[Wil95] Andrew John Wiles. Modular elliptic curves and Fermats Last
Theorem. Annals of Mathematics, 141(3):443551, 1995.
Index
362
INDEX 363
supremum, 72 uncountable, 63
surjection, 57 undirected graph, 140, 141
surjective, 56, 57 uniform discrete probability space, 213
symmetric, 125, 249 uniform distribution, 221
symmetric closure, 137 union, 49
symmetric difference, 49 unit, 106
symmetry, 32 unit vector, 256
syntactic sugar, 55 universal quantification, 26
universal quantifier, 26
tail, 141 universe, 49
tautology, 17 universe of discourse, 25
terminal vertex, 122, 141 unranking, 176
theorem, 9, 34, 37, 354 upper bound, 71, 86
Wagners, 149 upper limit, 86
theory, 9, 33
Theta valid, 35, 36
big, 100 Vandermondes identity, 179
Three Stooges, 47 variable
topological sort, 132 indicator, 219
topologically sorted, 154 variance, 232
total order, 70, 128, 132 vector, 243, 246, 255
totally ordered, 128 unit, 256
totient, 119 vector space, 243, 246
transition function, 59 vertex, 122
transition matrix, 249 initial, 122
transitive, 125 terminal, 122
transitive closure, 137, 153 vertices, 140
transitivity, 32, 70 Von Neumann ordinals, 58
translation invariance, 70
Wagners theorem, 149
transpose, 249
weakly connected, 153
tree, 154, 156
web graph, 144
triangle inequality, 256
weight, 184
trichotomy, 70
well order, 135
truth table, 16
well-defined, 354
proof using, 17
well-ordered, 58, 80
tuples, 55
turnstile, 34 Zermelo-Fraenkel set theory with choice,
two-path graph, 296 52
ZFC, 52
uncorrelated, 234 Zorns lemma, 135