0% found this document useful (0 votes)
73 views130 pages

Bamberg Lectures

This document provides an overview and table of contents for a textbook on fundamental concepts in mathematics. The textbook covers topics such as functions, relations, sets, numbers and rings, vector spaces, metric spaces, and more. It aims to provide students with an introduction to key concepts from abstract algebra and analysis. Learning outcomes include gaining skills in mathematical proofs and definitions, exposure to philosophy of mathematics ideas, and a foundation in areas like ring theory, real analysis, and metric spaces to prepare students for more advanced courses.

Uploaded by

Nathan Dyson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views130 pages

Bamberg Lectures

This document provides an overview and table of contents for a textbook on fundamental concepts in mathematics. The textbook covers topics such as functions, relations, sets, numbers and rings, vector spaces, metric spaces, and more. It aims to provide students with an introduction to key concepts from abstract algebra and analysis. Learning outcomes include gaining skills in mathematical proofs and definitions, exposure to philosophy of mathematics ideas, and a foundation in areas like ring theory, real analysis, and metric spaces to prepare students for more advanced courses.

Uploaded by

Nathan Dyson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

JOHN BAMBERG

F U N D A M E N TA L
CONCEPTS
I N M AT H E M AT I C S
Contents

Preface 7

1 Functions, relations and sets 11


1.1 How big is big? 11
1.2 What is a “function” anyway? 11
1.3 A function is a relation 12
1.4 A relation is a set 12
1.5 An aside: naïve set theory 13
1.6 How to start writing proofs 13
1.7 Cardinality fundamentals 15
1.8 Countable and uncountable 20
1.9 Diagonal arguments as two-person games 21
1.10 Aside: What is a number? 25
1.11 Exercises 26

2 From numbers to rings 29


2.1 Divisibility 29
2.2 Prime numbers and canonical factorisation of integers 35
2.3 Aside: how are the primes distributed? 36
2.4 Mersenne primes and perfect numbers 38
2.5 Clock arithmetic 40
2.6 Aside: The RSA algorithm for public-key encryption 45
2.7 Exercises 49
4 john bamberg

3 Rings beyond numbers 53


3.1 Rings and fields 53
3.2 Ideals 55
3.3 An important interlude: Equivalence relations, partitions and quotients 56
3.4 A construction of the rational numbers 59
3.5 Geometric things as quotients 61
3.6 Polynomials are like numbers 61
3.7 Irreducibility and factorisation 65
3.8 Gauß’s Lemma 66
3.9 Eisenstein’s Criterion 66
3.10 Clock arithmetic on polynomials 67
3.11 Congruence modulo a polynomial and ideals 71
3.12 Algebraic versus transcendental 71
3.13 Aside: Constructible numbers 74
3.14 Number-theory type results in ring theory 75
3.15 Exercises 76

4 Normed vector spaces 79


4.1 Abstract vector spaces: things that behave like Rn 79
4.2 Aside: The Axiom of Choice 82
4.3 Norms 83
4.4 Boundedness 84
4.5 Epsilon versus delta 86
4.6 Continuity for normed vector spaces 87
4.7 Function spaces 90
4.8 Exercises 92

5 Metric Spaces 95
5.1 Limit of a function and limit of a sequence are really the same thing 95
5.2 A good definition? 96
5.3 Metrics 97
5.4 Boundedness 99
5.5 Cauchy sequences 99
5.6 Exercises 101
fundamental concepts in mathematics 5

5.7 Complete metric spaces 102


5.8 The ring of Cauchy sequences 103
5.9 Construction of the real numbers 104
5.10 Arithmetic on R 105
5.11 What does 0.99999 . . . mean? 105
5.12 Aside: other ways to construct R 106
5.13 The topology of metric spaces 107
5.14 Continuity revisited 109
5.15 Function spaces as metric spaces 109
5.16 Contraction maps 110
5.17 Aside: Iteration Function System 113
5.18 Exercises 113

6 Compactness 115
6.1 From finiteness to compactness 115
6.2 Covers and the definition of compact sets 116
6.3 Closed, bounded and compact 117
6.4 Aside: Brouwer’s fixed-point theorem 119
6.5 Exercises 119

7 The twain shall meet 121


7.1 p-adic metric and the p-adic numbers 121
7.2 A bit more than what you know 122
7.3 Completing the rationals 123

8 Appendix 125

9 Index 129
Preface

A problem that I face in teaching this course is that I want to teach so


much! Alas, I cannot expect a student to understand how much I or one
of my colleagues might, but I do not want to short-change the students.
There are elements of the fundamentals of mathematics that are truly fas-
cinating and one could spend a whole course just discussing these. For
example: “Set theory and logic” could be a thirteen-week course in itself;
“Vector spaces” is normally taught in a course dedicated to linear algebra;
“Ring theory” normally accompanies a primer on group theory. After a few
years of experimenting with a second year course on pure mathematics,
the balance has been found. The product is this course: a primer on the
fundamental concepts of number systems and ring theory, together with
the highlights of real analysis. Along the way, there will be “asides” which
expand upon some of the gems we find by the roadside.

Warning: The notes are by no means complete. They are skeletal and it is up to the student to fill in with examples
and proofs. This can be achieved either by reading a variety of good books that can be found in the library, or by
attending lectures. These notes may not give you the insight in how the writer thought about the proofs or came up
with them. This level of insight will be portrayed in lectures.

Theorems, lemmas and corollaries in this course can roughly be categorised into the following, and we will
adjoin each result with the respective symbol:
?: The proof is beyond the expected skill level of the student if it were asked of the student to create a proof with-
out ever have read a proof of the result before. However, if the student can recall the “gist” of the proof, or the
trick involved, then the student should be able to do it.
†: The proof is beyond the expected skill-level of the student if it were asked of the student to create a proof without
ever have read a proof of the result before. The proof is long, tedious or very complex and intricate. The lecturer
does not think that being able to accomplish a proof such as this has any value to the students development.
: The proof of this result is sufficiently simple that by the end of the course, the student would be expected to be
able to prove this result or one of a similar complexity, having never seen it before.

Acknowledgements
I would like to thank Murray Smith for reading drafts of these notes.
8 john bamberg

Learning outcomes

It is always difficult in mathematics to accurately describe the outcomes of


learning this or that in the subject. I will attempt to list here what I think are
the things the student should come away with and what they can ideally be
able to do.

Proofs
This is really the first course in our undergraduate program where the
student learns how to write proofs in a variety of situations. At the very
least, the student should be able to do the auto-pilot component of a proof.
By this, I mean that the student should know what the necessary steps are
in a proof regardless of the difficult and ingenious step that some proofs
require. For example, if the student is asked to prove that a function f :
A → B is one-to-one, then they must begin the proof with a line like
“Suppose we have two elements a1 , a2 ∈ A such that f (a1 ) = f (a2 ). We
will show that a1 = a2 ”.

Definitions and examples


It cannot be said enough: mathematics is about clear and precise formu-
lation of ideas. So the student must absolutely know the definitions of
every concept in the course. In support of definitions, the student needs
to know examples which satisfy a particular definition, and if possible, a
non-example. That is, in some situations there is a hierarchy of structures,
and there may be subtleties between the layers. For example, do you know
of an algebraic number that is not an algebraic integer?

Philosophy of mathematics
From the very beginning of the course, the student will be exposed to the
philosophical edge of mathematics. Are there different types of infinities?
What does 0.999 · · · mean? There is no better way for a student to get
a thorough training in critical thinking than to embark on the theory of
cardinalities and number systems, and to question the foundations of their
subject. The student should know by the end of the course how we can
construct the various number systems (e.g., rationals, reals) from the natural
numbers, and why this was an important contribution to mathematics over
a century ago. We come across diagonal arguments, and non-constructive
proofs, and the student will have learnt how to picture difficult abstract
ideas by analogy to concrete examples.

An introduction to the basics of abstract algebra


One of the clear goals of this course is as a primer on the basics of abstract
algebra. At the heart, the student needs to be au fait with equivalence rela-
tions; they are everywhere in algebra! This course leads on to a third year
unit in group theory (the mathematics of symmetry) and more ring theory,
so some of the basics of ring theory will prepare the students for these two
fundamental concepts in mathematics 9

units. The student must have a very good knowledge of the integers modulo
n and of polynomial rings. These are the basic structures of ring theory.

An introduction to the basics of analysis


Again, a clear goal of this course is as a primer on some sub-disciplines
of the diverse area of analysis; the study of the infinite and infinitesimal.
We will cover the elementary aspects of convergence and continuity, and
the student will come away with a solid background in the theory of met-
ric spaces. Normed vector spaces and metric spaces are the bedrock of
functional analysis and geometry, and the student will need to be able to
write good old δ/-proofs in this basic setting before setting upon the more
general unit in third year. By the end of the course, the student will have
the preparation need to understand the basics of differential geometry (e.g.
manifolds, tangent bundles, etc.) and functional analysis (e.g. operator al-
gebras, Hilbert spaces, etc). So we cover the themes of completeness and
compactness, that historically give us the underpinning of modern analysis.
1
Functions, relations and sets

In this chapter we cover relations, basic set theory terminology, Russell’s Paradox, cartesian product of sets,
definition of a function, one-to-one, onto, and an introduction to writing proofs.

1.1 How big is big?

For two finite sets, it is straightforward to work out which set is bigger:
you simply count the elements of each set. What about infinite sets? Are
there some infinite sets that are bigger than other infinite sets? Does it make
sense to talk about infinity in the first instance?

Example 1.1.1. A European woman called Grietje counts out a set of


numbers as follows, and asserts that there is a pattern that this sequence
follows and that it goes on forever:

twee, vier, zes, acht, tien, twaalf, veertien, zestien, achttien, twintig, . . . .

You don’t know the language and so you guess that this person is just
counting:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . .
Then it is revealed to you that the language Grietje speaks is dutch and that
she was counting the even numbers.

Question 1.1.2. How big is the set of even whole numbers? Is it smaller
than the set of whole numbers? If we count the even numbers in a different
language, do we see any difference between the even numbers and the
whole set of numbers?

We will see that mathematics makes the answer to this question clear
and concrete by giving a definition of what the size of a set is. It turns out
that the even numbers have precisely the same size as the integers, that
the rational numbers have the same size as the integers, and that the real
numbers are definitely bigger than the integers.

1.2 What is a “function” anyway?

In secondary school, beyond polynomials we encounter functions that


have more complicated descriptions such as log, exp and trigonometric
functions. Often these functions were introduced to us passively, described
12 john bamberg

by association, yet did we really know what they were? What is the cosine
of a real number? Next we learn that a function is a rule1 that assigns one 1
What is a rule?
value for every input: there cannot be two values for the same input.
Example 1.2.1. The equation f ( x)√2 = 1 has two solutions: f ( x) = 1 and

f ( x) = −1. So writing f ( x) := ± x2 does not define2 a function as it is 2


When we define an object, we will use the
not single-valued. symbol :=. The symbol = is reserved as a
relation comparing pairs of objects.
Questions such as “what is a function?” were the types of questions
some mathematicians of the 19th century were making in their search for a
formal treatment of mathematics. We will see that on one hand, formalism
was a triumph in mathematical thought, but on the other hand, it brought
forth meta-mathematical and philosophical conundrums that shook the
foundations of modern mathematics.

1.3 A function is a relation

The relations we will encounter in this course deal with the interactions of
two things; they will be binary relations. The archetypical example will
be that of friendship on the set of human beings. Given two people, we
can establish whether they are friends or not. The reason to abstract and
clarify the idea of a relation is simply because they are so fundamental in
mathematics and occur so often. For example, we could say that a number
is less than another number by

x<y

or we could say that golden syrup (i.e. y) tastes better than honey (i.e.
x). Notice that we mostly use infix notation for relations3 A function is a 3
By “infix notation” we mean that the sym-
relation too! When we write bol of interest goes between its arguments.
So when we write that two things x and y
are equal, we demonstrate this by x = y.
f ( x ) : = x2

where x is a real number, what we are actually saying is that x is related to


x2 . Of course, when we write it has a relation in infix notation, it looks like

x f x2 .

This looks very weird, and in the case of functions, we do not really use
this notation in practice. Now a function f from X to Y is a relation, where
the left-hand things are in X and the right-hand things are in Y so that for
every element x of X there is a unique element of Y which is f -related to x.
4 4
A more formal definition will come later.
So what is a relation anyway?

1.4 A relation is a set

Well, it is nothing more than describing a pair of things, each having a left-
hand one and a right-hand one. The key devise in making this idea concrete
is the notion of the Cartesian product of two sets. Given two sets A and
B, the Cartesian product5 of A and B is the set of all (ordered) pairs (a, b) 5
One of the axioms of Zermelo-Fraenkel
with a ∈ A and b ∈ B. We could write this set as Set Theory, the most accepted set of
axioms for the fundaments of mathematics,
is called the Axiom of Pairing which allows
A × B := {(a, b) : a ∈ A, b ∈ B}.
us to ‘make’ cartesian products of sets. In
other words, making pairs is one of the
things in mathematics that is necessary
to establish as a fundamental construct,
and that it does not arise by any simpler
construct.
fundamental concepts in mathematics 13

1.5 An aside: naïve set theory

We have come to accept defining sets by a property (or for a fancier word,
predicate). So for example,

{x ∈ R : x2 + 3x + 1 = 0}.
√ √
This set consists of two elements − 23 + 25 and − 32 − 25 , and it might be a
different set if we allow x to be more than a real number (like matrices, for
example). But does it make sense to define a set of all sets of size 10?

{A : |A| = 10}.

Note that we have dropped any requirement that A live in a universe of


some kind. This can have strange consequences, especially when we allow
the ‘set of all sets’ to be a set.

Example 1.5.1 (The set of non-penguins). Let S be the set of all things
which are not penguins. Since S is not a penguin, it is an element of itself:

S ∈ S.

So we see here an example of a set that is a member6 itself! 6


Recall that the symbol ∈ is used for
membership, whereas ⊆ is used for “is a
Example 1.5.2 (The barber paradox). In a village, there is a male barber subset of”.
who shaves only those men who do not shave themselves. Who shaves the
barber?

1.5.1 Russell’s paradox


As we saw in Example 1.5.1, a set can be a member of itself; that’s right,
a member of itself! Bertrand Russell initiated a paradox which shocked
the logicians of his time (1901), and it is similar to the barber paradox
(Example 1.5.2).7 7
The solution to this paradox is well
beyond the scope of this course, though the
Example 1.5.3 (Russell’s paradox). Let A be the set of all sets S such that interested reader will not have any trouble
finding interesting articles on the web that
discuss Russell’s Paradox.
S < S.

Is A a member of itself?
If A is a member of itself, it would contradict its own definition as a set
containing all sets that are not members of themselves. On the other hand,
if A is not a member of itself, it would satisfy its predicate and hence be a
member of itself!

1.6 How to start writing proofs

Mathematics is arguably one of the oldest intellectual disciplines, and since


at least antiquity, a part of the doing of mathematics has been establishing
truths and collating them into theorems, lemmas and the like. A theorem
is a true mathematical statement and its proof relies on established valid
statements. Of course, we have to start somewhere, which we alluded to in
the previous section on a discussion on the philosophical underpinnings of
mathematics. For the student, proofs can be difficult and take some time to
14 john bamberg

get used to. Some of the fun of doing a proof comes from the completion of
a water-tight and elegant proof. However, for most of the formative period,
proofs can be frustrating and not enjoyable at all. So we will keep things
simple. For at least half of this course we will throw away what I believe
is one of the greatest inhibitions to the enjoyment of proofs: the syntax.
Syntax, though important in writing clear and elegant proofs, can take a
while to master. It often inhibits the new learner in understanding how a
proof works and how to recognise whether it is correct or not. In particular,
the emphasis early in one’s learning ought to be focussed on the logic of a
proof, not whether the right expression was used in conveying an inference
or deduction.

1.6.1 Restricted syntax


We will just use one method for doing proofs before embarking on writing
out proofs with flair and poetic english language. After we have become
comfortable with the structure of a proof, we will then flavour them by
introducing more variety into the syntax.

Temporary rules:

• Only use “Suppose” for initialising statements like “Assume ...”,


“Let ..”.

• Only use “Then” for logical connections like “Hence”, “Thus”,


“So”.

• You can use “Choose ...” if you need to show that something exists.

• Give a reason for every logical deduction, no matter how trivial it is.

• Conclude the proof with “Therefore, ...”.

Example 1.6.1. We will prove that the sum of two odd numbers is an even
number.

An odd number is an integer of the form 2n + 1 for some integer n.


Definitions that you’ll need:
An even number is an integer of the form 2n for some integer n.

 Proof (Restricted Syntax). Suppose x and y are odd numbers.


Then there are integers m, n such that x = 2n + 1 and y = 2m + 1 by the definition of an odd number.
Then x + y = (2n + 1) + (2m + 1) by definition of x and y.
Then x + y = 2n + 2m + 2 since 1 + 1 = 2.
Then x + y = 2(n + m + 1) by factoring out a common factor of 2.
Then x + y is an even number by definition of an even number and since
Therefore, the sum of two odd numbers is an even number.  n + m + 1 is an integer.

Joy and David Morris (University of Lethbridge, Canada) have a similar


technique for teaching proofs called “two-column proofs”.
fundamental concepts in mathematics 15

1.7 Cardinality fundamentals

First we return to the definition of a function.


Definition 1.7.1 (Function). Let A and B be sets and let f be a subset of
A × B (i.e., f is a relation). We say that f is a function if for every element
a ∈ A, there is a unique element b ∈ B such that (a, b) ∈ f . The domain of
f is A, the codomain of f is B and the image of f is

f ( A) := { f (a) : a ∈ A}.

So a function consists of three objects: a domain A, a codomain B, and a


relation f .
In some texts, the word range is used
instead of image. According to the Oxford
English Dictionary, the earliest specific use
1.7.1 Kidney diagram of a function of the term range (of a function) is found
in 1914 in A. R. Forsyth’s, “Theory of
One way to abstractly visualise the definition of a function is by a kidney Functions of Two Complex Variables”. The
diagram: term image can be used for the output of a
single element under a function.

Figure 1.1: This diagram depicts what


cannot happen in the definition of a
function.
1.7.2 Special kinds of functions
Definition 1.7.2 (One-to-one (injective)). A function f from A to B is one-
to-one if for each element of b ∈ B, there is at most one element a ∈ A such Equivalently: A function f from A to B is
that f (a) = b. one-to-one if for all a1 , a2 ∈ A such that
a1 , a2 , we have f (a1 ) , f (a2 ).

Example 1.7.3.
• The function f defined by f ( x) = x2 is not one-to-one since f (−1) =
f (1) but 1 , −1.

• The identity function on a set X is one-to-one since for all y ∈ X, there


is just one element, namely itself, that is mapped to y by the identity
function.

• The “blood type” function is not one-to-one. The blood-type function has domain the set
of all people, and codomain {A, B, AB, O}.
• Pictorially, a function f on the reals is one-to-one if every horizontal Never mind the rhesus for this example!
line drawn on the graph of f intersects the values of f at most once. You
should picture “ f ( x) = x2 ” and imagine drawing horizontal lines on
the graph.
16 john bamberg

Figure 1.2: This diagram depicts what


cannot happen in the definition of a one-to-
one function.
To prove a function f : X → Y is one-to-one:

Suppose f ( x1 ) = f ( x2 ) for some x1 , x2 ∈ X. Show that x1 = x2 .

Example 1.7.4. Show that the function f : R\{0} → R+ defined by



f ( x) = x + 1 for all x ∈ R\{0}, is one-to-one.

 Proof (Restricted Syntax).


Suppose x1 , x2 ∈ R\{0} and f ( x1 ) = f ( x2 ).
√ √
Then x1 + 1 = x2 + 1 by definition of f .
Then x1 + 1 = x2 + 1 by squaring each side.
Then x1 = x2 by subtracting 1 from each side.
Therefore, f is one-to-one. 

Definition 1.7.5 (Onto (surjective)). A function f : X → Y is onto, if for Equivalently: A function f : X → Y is onto
each element y ∈ Y, there is at least one element x ∈ X such that f ( x) = y. if the image f ( X ) equals the entire set Y.

Figure 1.3: This diagram depicts what


cannot happen in the definition of an onto
function.
Example 1.7.6.
fundamental concepts in mathematics 17

• The function f : R → R defined by f ( x) = x2 (for all x ∈ R) is not


onto since no element maps to −1 under f . If however we restricted the
definition of f by defining f : R → R+ ∪ {0}, then it would be onto.

• The “blood-type” function is onto, since for every blood type, there is at
least one person in the world who has that blood type.

To prove a function f : X → Y is onto:


Let y be an element of Y. Find an element x ∈ X such that f ( x) = y.

Example 1.7.7. Show that the function f : R\{0} → [1, ∞) defined by



f ( x) = x + 1 for all x ∈ R\{0}, is onto.

 Proof (Restricted Syntax). Suppose y ∈ [1, ∞).


Choose x = y2 − 1.

Then f ( x) = x + 1 by definition of f .
p
Then f ( x) = y2 − 1 + 1 by substituting x = y2 − 1.
p
Then f ( x) = y2 since −1 + 1 = 0.
Then f ( x) = y since y is positive.
Therefore, f is onto. 

Remark 1.7.8. In the above proof, the step “Choose x = y2 − 1” was only
possible after doing some work on the back of an envelope first. I worked
backwards to find an element x ∈ R\{0} such that f ( x) = y, and then wrote
my proof in the logically correct direction!

1.7.3 Counting
Counting is one of the most central ideas of mathematics, and it wasn’t un-
til Cantor’s work in the 19th century that we began to understand fully what
it means to count. To say that one set has more elements then another is a
trivial problem in the finite context, but what about infinite sets? Are there
more integers than even numbers? What Cantor realised, is that counting
can be thought of as pairing elements in a unique and exhaustive way. For
example, we can pair up the integers and even numbers in the following
way:

. . . , (−6, −3), (−4, −2), (−2, −1), (0, 0), (2, 1), (4, 2), (6, 3), . . . .

You can see here that every even number will appear in the first coordinate
in precisely one of these pairs and every integer will appear in the second
coordinate in just one of these pairs. So what we require is that there is a
function from the even numbers to the integers that is one-to-one and onto.
We call this a bijection.

Definition 1.7.9 (Bijection). A function is bijective if it is both one-to-one


and onto.

The function f from the even numbers to the integers defined by f ( x) =


x/2 is a bijection. So essentially, we think of the integers and even numbers
as having the same amount of elements.
18 john bamberg

Definition 1.7.10 (Equinumerous sets and cardinality). Two sets X and


Y are equinumerous, or have the same size/cardinality8 , if there exists a 8
The word cardinal for the size of a set
bijection from X to Y. We can express this symbolically as A ≈ B or simply came from a metaphor made with the usual
meaning of a Cardinal, that can be found
|A| = |B|. in the “Bibliotheca Hispanica” by Richard
Percival in 1591: “The numerals are either
Example 1.7.11. The unit interval (0, 1) and R are equinumerous, as we Cardinall, that is, principall, vpon which
the rest depend, etc."
shall see. Consider the following function f : R → (0, 1):

1 1
f ( x) : = arctan( x) + .
π 2 1

It can be shown that f is one-to-one and onto, and so R and (0, 1) have the
same size!
0
To prove two functions f : X → Y and g : X → Y are equal: 1 1
Figure 1.4: π arctan( x) + 2
Let x ∈ X. Show that f ( x) = g( x). A common mistake made by
students is that they prove that two functions are equal for a specific
element of X. You must prove that they compute the same value for
EVERY element of the domain!

1.7.4 Composition of functions


In algebra, we often want to create new things from old things. In this
section, we look at a way of creating a function from two old ones. This
operation is called function composition.

Definition 1.7.12 (Function Composition). Let g be a function from


a set A to a set B and let f be a function from a set B to a set C. Then
the composition of f and g, denoted f ◦ g is the function defined by
g f
( f ◦ g)(a) = f (g(a)) for all a ∈ A. A helpful picture: A →
− B−
→C

Example 1.7.13.

• The function h defined by h( x) = sin2 ( x) (for all x ∈ R) is the


composition of the two functions f and g defined by f ( x) = x2 and
g( x) = sin( x). So h = f ◦ g. Note that g ◦ f is a completely different
function which maps an element x ∈ R to sin( x2 ).

• Let X and Y be sets and let f : X → Y be a function. Then f ◦ idX = f


and idY ◦ f = f where idX and idY are the identity functions on X and
Y, respectively.

• Let R+ denote the positive real numbers, and let f : R+ → R+ be


defined by f ( x) = x2 (for all x ∈ R+ ). Let g : R+ → R+ be defined by

g(y) = y (for all y ∈ R+ ). Then f ◦ g = idR+ and g ◦ f = idR+ .

1.7.5 Images and inverses


Modern mathematics and its language have evolved into a reasonably stable
and uniform state. The words “one-to-one”, “onto”, and “real number”
are universally understood. We present here some more notions that are
fundamental to the communication of mathematics.
fundamental concepts in mathematics 19

Definition 1.7.14 (Image). Let f be a function from a set X into a set Y,


and let A ⊆ X. The image f ( A) of A under f is the subset

{ f (a) : a ∈ A}

of Y.

Definition 1.7.15 (Preimage). Let f be a function from a set X into a set Y,


and let S ⊆ Y. Then the preimage of S under f is the subset of X defined by

f ← (S ) := {x ∈ X : f ( x) ∈ S }.

Figure 1.5: Image and preimage.

Example 1.7.16. Consider the sine function sin : R → R. The image of R


under this function is sin(R) = [−1, 1]. The preimage of {0} under sine is
the set
sin← ({0}) = {x ∈ R : sin( x) = 0} = {πx : x ∈ Z}.

Definition 1.7.17 (Invertibility). A function f : X → Y is said to be


invertible if there exists a function g : Y → X such that f ◦ g = idY and
g ◦ f = idX . If such a g exists, we call it the inverse of f , and denote it by
f −1 .

In order for the notation f −1 to make sense, there has to be just one
inverse of an invertible function. We will assume without proof that the
operation of function composition is associative9 . 9
Jargon: A binary operation ? is . . .
• associative if the following always
Lemma 1.7.18. Suppose a function f : X → Y is invertible. Then there is a holds: a ? (b ? c) = (a ? b) ? c.
unique inverse of f . • commutative if the following always
holds: a ? b = b ? a.
 Proof (restricted syntax): Suppose there are two inverses g : Y → X and
h : Y → X for f . We will show that h = g.
Then h ◦ ( f ◦ g) = (h ◦ f ) ◦ g since composition is associative.
Then h ◦ idY = idX ◦ g by definition of inverse.
Then h = g since composition with the identity func-
Therefore, f has a unique inverse.  tion returns the original function.

Example 1.7.19.
20 john bamberg

• The squaring function from R to R is not invertible. However the squar-


ing function on R+ is invertible!

• The inverse of the identity function is itself, since for any set X, idX ◦
idX = idX .
Theorem 1.7.20. A function f : X → Y is invertible if and only if it is
bijective.
 Proof: See the exercises at the end of the chapter. 

1.7.6 The Issue of “Well-Defined”


It is a common trait of mathematicians to “define” a function before ac-
tually proving that it is a function. Ideally, we should define a set, show
it is a binary relation, and then show it is a function. As you can imagine,
this can be quite a cumbersome task in most situations. So mathematicians
have become accustomed to writing functions down before verifying that
they are indeed functions. When we prove that a binary relation is a binary
relation or a function is a function, we say that the object in question is
“well-defined”. Not until now have you needed to deal with this issue, but
you will find, especially when defining functions with a partition as their
domain, that you must take a minute to prove that what you’ve claimed is a
function actually is!
Example 1.7.21. “Let f ( x) : Q → Z be the function defined by f (q) = a
where a is the numerator of q.”
This is BAD! What we’ve written is not a function, since f (1/2) = 1
and f (2/4) = 2 but 1/2 = 2/4. We will see in Section 3.4 that the
rational numbers are like a partition of Z × (Z\{0}), and so we must
be extra careful when defining functions when there is an equivalence
relation hanging about! More of this issue will appear later when we study
equivalence relations in Section 3.3.

1.8 Countable and uncountable

Before we saw that the set of even numbers has the same size as the entire
set of numbers, and to see this, we showed that there was a bijection be-
tween these two sets. If we can embed a set S inside the natural numbers,
like we did for even numbers, then we say that S is countable10 . 10
We will use a definition that includes
finite sets, which differs from some texts
Definition 1.8.1 (Countability). A set X is countable if there is a one-to-one such as Martin Liebeck’s book “A concise
function f : X → N. A set is uncountable if it is not countable. introduction to pure mathematics”.

Lemma 1.8.2. Every finite set is countable.


 Proof: Let S be a finite set of size n. Then we can list the elements as
follows:
S = {s1 , s2 , . . . , sn }.
Let f : S → N be defined by f ( si ) := i. We will show that f is one-to-
one. Suppose f ( x) = f (y) for two elements x and y of S . Then there exist
i, j ∈ {1, 2, . . . , n} such that x = si and y = s j . Therefore f ( si ) = f ( s j ) and
hence i = j. So si = s j and f is one-to-one. 
fundamental concepts in mathematics 21

Lemma 1.8.3. The set of integers is countable.

? Sketch: Define the following function f from N to Z: A useful picture of how we ‘count’ Z:
... −3 −2 −1 0 1 2 3 ...

n
if n is even


2

f (n) : = 
 −n2+1

 otherwise

It turns out (exercise!) that f is a bijection. 

Theorem 1.8.4. A countable union of countable sets is countable.

? Proof: See lectures. 

Theorem 1.8.5. The positive rational numbers are countable.

? Proof: The main part of the proof is Cantor’s “zigzag” argument:

1 1 1 1
1 → 2 3 → 4 ...
. % . %
2 2 2 2
1 2 3 4 ...
↓ % . %
3 3 3 3
1 2 3 4 ...
. %
4 4 4 4
1 → 2 3 4 ...
.. .. .. ..
. . . .

The rest will be done in lectures. 

Corollary 1.8.6. The rational numbers are countable.

 Proof: The map f ( x) := −x is a bijection from Q+ to Q− (exercise) and


so by Theorem 1.8.5, Q− is countable. The singleton set {0} is countable
by Lemma 1.8.2, and so by Theorem 1.8.4, Q− ∪ {0} ∪ Q+ is countable.
Therefore, Q is countable. 

1.9 Diagonal arguments as two-person games

Here is a (silly) game we could play that would take a long time to com-
plete. I give you a rational number in the interval (0, 1), and then you give
me a different rational number in this interval. Then we continue turn by
turn, writing down rational numbers in (0, 1) that we have not mentioned
in earlier turns. The person who cannot think of a new rational number or
repeats one that has already been said, loses.

Question 1.9.1. Is there a non-losing strategy for one of the players?

The answer is of course yes, but the most elegant answer is to use a
diagonal argument. Here’s how it works. Suppose the following ten moves
have been made in the game, accurate to ten decimal places:
22 john bamberg

0.0125000000
0.1000000000
0.3467891023
0.0041200340
0.9654102948
0.8475839102
0.0194857291
0.3240580293
0.3333333333
0.2121212121

Now look along the diagonal, after the decimal point. If the number is
not equal to 3, then write down next to the row the number 3. Otherwise,
write “7”.

0.0125000000 3
0.1000000000 3
0.3467891023 3
0.0041200340 3
0.9654102948 3
0.8475839102 7
0.0194857291 3
0.3240580293 3
0.3333333333 7
0.2121212121 3

Then the new number 0.3333373373 must differ from the first number in
the first place, from the second number in the second place, and so on! So
we are guaranteed of obtaining a new number!
We can use this argument to show that the real numbers cannot be placed
in bijection with the natural numbers.

Theorem 1.9.2. The real numbers are uncountable.

? Proof: Done in lectures. 

Why does this argument not work for rational numbers? Well, the di-
agonal argument does not produce a rational number since the decimal
expression would be forced to be non-periodic.

Corollary 1.9.3. The irrational numbers are uncountable.

 Proof: This is a simple proof by contradiction. Suppose that the irra-


tional numbers R \ Q were uncountable. Then by Theorem 1.8.4, (R \ Q) ∪
Q would be countable. This is a contradiction, since (R \ Q) ∪ Q = R and
R is uncountable by Theorem 1.9.2. Therefore, the irrational numbers are
uncountable. 

1.9.1 Ordering cardinalities


Definition 1.9.4. Let A and B be sets. If there exists a one-to-one function
from A to B, we say that B is at least as numerous as A, and write A  B. If
A and B are not equinumerous, and A  B, then we write A ≺ B.
fundamental concepts in mathematics 23

Example 1.9.5. So {1, 2, 3} ≺ {1, 2, 3, 4}. How about a more interest-


ing example? Suppose we take the set {1, 2, 3, 4}. Now consider all of its
subsets:

∅, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3}, {2, 3, 4}, {1, 3, 4}, {1, 2, 4}, {1, 2, 3, 4}.

Notice that this set is bigger than {1, 2, 3, 4}. We will see that this is also
true for infinite sets.

Lemma 1.9.6. Let A, B, and C be sets.

(i) If A ⊆ B, then A  B.

(ii) If A  B and B  C, then A  C.

(iii) A  B or B  A.

 Proof:

(i) Suppose A ⊆ B. Then the inclusion map f : A → B defined by


f (a) := a is one-to-one. Therefore, A  B. If f (a) = f (a0 ) for two elements a, a0 ∈ A,
then a = a0 and hence f is one-to-one.
(ii) Suppose A  B and B  C. Then, by definition of , there exist
one-to-one functions f : A → B and g : B → C. By Exercise 1.11.10,
the composition of two one-to-one functions is one-to-one, and hence
g ◦ f is a one-to-one function from A to C. Therefore, A  C.
(iii) This is left as a (difficult) exercise. See Exercise 1.11.12.

The lemma above shows that  is what is known as a partial order


(which we do not cover in this course).

Definition 1.9.7. Given a set S and a subset A of S , the characteristic


function of A with respect to S is the map χA defined by

1 if x ∈ A


χA ( x) : = 

0 if x < A.

So how many subsets are there of a finite set of size n? Well, each subset
defines a characteristic n-tuple where we just write out the values of the
characteristic function. For example, for {2, 3} as a subset of {1, 2, 3, 4}, we
just write: (0, 1, 1, 0). Every subset is then in one-to-one correspondence
with n-tuples of length n whose entries have values either 0 or 1. So there
are 2n subsets of a set of size n.

Definition 1.9.8 (Power set). For a set A, the power set P( A) of A is the set
of all subset of A.
We must be sure that we do not allow the
“set of all sets” to be a set here.
Theorem 1.9.9. For any set A, we have A ≺ P( A).

? Proof: First we show that A  P( A), and then we will show that |A| ,
|P( A)|. Let f : A → P( A) be the function defined by

f (a) := {a}.
24 john bamberg

We will show that f is one-to-one. Suppose a, a0 ∈ A and f (a) = f (a0 ).


Then {a} = {a0 }, from which it directly follows that a = a0 . Therefore, f is
one-to-one and we have shown that A  P( A).
Suppose there is a function f : A → P( A). We will show, by contradic-
tion, that f cannot be onto. So suppose f is onto. Let

X = {a ∈ A : a < f (a)}.

Notice that X is a subset of A, that is, X ∈ P( A), and so there exists a ∈ A


such that f (a) = X. Does a ∈ X or not? If a ∈ X, then a < f (a), but
f (a) = X; a contradiction. If a < X, then a ∈ f (a), but again f (a) = X;
another contradiction! So f cannot be onto. 

The above proof is actually a “diagonal argument” in disguise. Imagine


A was countable, so that we could actually list its elements. Now consider
an array where the elements of A represent the columns, and the elements
of P( A) represent the rows. We put a T for true if the element ai of A
belongs to the subset A j of A, and a F (for ‘false’) otherwise. It would look
something like this:

A
a1 a2 a3 a4 a5 a6 a7 ···
A1 T T F T T F F ···
A2 F F T F T T F ···
A3 T F F F T F T ···
P( A) A4 T T F T T F F ···
A5 F F F F T F F ···
A6 F F F F F F T ···
A7 F T T F F F F ···
..
.

The set X we chose in the proof is what you get when you ‘flip’ the truth
values along the diagonal of this array.

A
a1 a2 a3 a4 a5 a6 a7 ··· X
A1 T T F T T F F ··· F
A2 F F T F T T F ··· T
A3 T F F F T F T ··· T
P( A) A4 T T F T T F F ··· F
A5 F F F F T F F ··· F
A6 F F F F F F T ··· T
A7 F T T F F F F ··· T
..
.

Theorem 1.9.10 (Cantor-Berstein-Schröder Theorem). Let A and B be two


sets. If A  B and B  A, then A ≈ B.

† Proof: A sketch of the proof might be done in lectures (if there is suffi-
cient time). 

Theorem 1.9.11. The real numbers and P(N) are equinumerous.


fundamental concepts in mathematics 25

? Proof: First we show that P(N)  R. Let X ⊆ N. Define the real


number rX by11 11
This is just the characteristic function for
0.a1 a2 a3 a4 . . . X written as decimal points behind 0.

where 
1 if i ∈ X



ai := 
0 otherwise.

Let f : P(N) → R be the function X → rX . We will show that f is one-


to-one. Suppose X, X 0 are subsets of N and f ( X ) = f ( X 0 ). Then rX = rX 0
where rX = 0.a1 a2 a3 a4 . . . and rX 0 = 0.a01 a02 a03 a04 . . .. So each ai is equal to
the corresponding a0i , and hence i ∈ X if and only if i ∈ X 0 (by definition of
ai and a0i ). Therefore, X = X 0 and f is one-to-one.
Now we show that R  P(N). We will use the fact from (reference)
that R ≈ (0, 1) (the unit open interval). Given r ∈ (0, 1) written in decimal
form r = 0.r1 r2 r3 r4 · · · , let Xr be the set

{r1 , r2 + 10, r3 + 100, r4 + 1000, . . .}.

Then Xr is a subset of N and no two elements are equal as written above.


So let g : (0, 1) → P(N) be defined by g(r ) := Xr . We will leave
it as a (simple) exercise for the reader to show that g is one-to-one. So
(0, 1)  P(N), and so by the Cantor-Berstein-Schöder Theorem 1.9.10,
(0, 1) ≈ P( N ). Therefore, since the composition of two bijections is a
bijection (see Exercise 1.11.10), we have R ≈ P( N ). 

1.9.2 Aside: The Continuum Hypothesis


We have shown above that N ≺ R (see Theorems 1.9.9 and 1.9.11), but is
there a set which has size in-between countable and the continuum12 ? 12
The continuum is an informal expression
in mathematics to express the cardinality of
The Continuum Hypothesis: Every uncountable subset of the real numbers the real numbers.
is equinumerous to the real numbers.

The search for a proof of this statement13 baffled mathematicians in 13


posed by Georg Cantor in 1878
the first half of the 20th century, and indeed, it was David Hilbert’s first of
23 open problems presented at the 1900 International Congress of Math-
ematicians. Kurt Gd̈el in 1940 and Paul Cohen in 1963 showed that the
hypothesis can neither be proved nor be disproved using the axioms of
Zermelo-Fraenkel set theory, the standard foundation of modern mathemat-
ics.

1.10 Aside: What is a number?

There are a number of approaches to formalising the idea of a number, but


the one we will briefly go over here is the system introduced by Giuseppe
Peano.

Definition 1.10.1 (Peano’s Axioms). There is a set N satisfying the follow-


ing conditions:

1. There is an element called 1 that belongs to N.

2. There is a function S : N → N, called the “successor function”.


26 john bamberg

3. For every x ∈ N, we have S ( x) , 1.

4. S is one-to-one.

5. Induction: If R is a subset of N such that

(a) 1 ∈ R, and
(b) x ∈ R =⇒ S ( x) ∈ R,

then R = N.

This definition of N seems at first to be extremely abstract, but it cap-


tures the properties of N beautifully. For example, these axioms imply the
Well-ordering Principle of the natural numbers, that every nonempty sub-
set of N has a least element. The meta-mathematics behind a seemingly
innocuous statement is beyond the scope of this course, though you may be
surprised if you see if such a statement can be proved for the real numbers!
So is there a set N with these properties? Here is the well-known exam-
ple introduced by John Von Neumann that models the properties of N ∪ {0},
starting only with the empty set.14 14
We include 0 into the number system, but
this is not a problem at all. Starting from 0
• We start with defining 0 to be the empty set ∅. or 1 does not really matter that much here.

• Then 1 is defined to be the set containing 0, that is, 1 := {∅}.

• Then 2 is defined to be the set containing 0 and 1, that is, 2 := {∅, {∅}}.

• So we have a successor function S ( x) := x ∪ {x}.

Addition on N is then defined inductively:

(i) x + 1 := S ( x)

(ii) x + S (y) := S ( x + y).

As an exercise, you may like to think of how multiplication would be


defined . . .

1.11 Exercises

Exercise 1.11.1. In Google, the three operations of Boolean logic work like
this:

AND Whitespace between terms, e.g., "Review" "High School Musical".

OR Use the word OR, e.g., "Burma" OR "Myanamar".

NOT Use the minus symbol −, e.g., "Poffertjes recipe" -"sour cream"

You can also use brackets to group clauses, and AND has natural precedence over OR.

(a) Give an example of a Google expression which creates a search for


"Madonna" or "Justin Timberlake", but not both.
fundamental concepts in mathematics 27

(b) Write a Google expression with the terms "drink coffee" and
"less likely to develop dementia" which represents the fol-
lowing implication:

“If you drink coffee, then you are less likely to develop dementia.”

Exercise 1.11.2 (Quantifiers). For each of the following sentences, rewrite


the statement making all the quantifiers15 explicit. Then form the negation 15

of the statement, and convert it back to english.


Quantifiers:
(a) The skies are not cloudy all day.
∀ (For All), ∃ (There Exists).
(b) The sun never sets on the British Empire.

(c) Every positive integer has a unique prime factorisation.

(d) There is no largest prime.

Exercise 1.11.3. Suppose A and B are sets. Give your answers in the nota-
tion of logic (not English).

(a) What does it mean to say that A is a subset of B?

(b) What does it mean to say that A is not a subset of B?

Exercise 1.11.4. Let f : Z → Z be the function defined by

f ( x) := 3x + 1 (mod 5)

What is the preimage of 4 under f ?

Exercise 1.11.5. What can you say about f : A → B if the preimage of


every singleton subset {b} of B is nonempty?

Exercise 1.11.6. Let f : A → B be a function, let A0 ⊆ A and let B0 ⊆ B.

1. Show that if f is one-to-one, then f ← ( f ( A0 )) = A0 .

2. Show that if f is onto, then f ( f ← ( B0 )) = B0 .

Exercise 1.11.7. What can you say about f : A → B if the preimage of


every singleton subset {b} of B is a singleton subset of A?

Exercise 1.11.8. Prove that a function f : A → B is a bijection if and only if


it is invertible (i.e., there exists a function g : B → A such that f ◦ g = idB
and g ◦ f = idA ).

Exercise 1.11.9. Let f : A → B be a function and let B1 and B2 be subsets


of B.

1. Show that f ← ( B1 ∪ B2 ) = f ← ( B1 ) ∪ f ← ( B2 ).

2. Show that f ← ( B1 ∩ B2 ) = f ← ( B1 ) ∩ f ← ( B2 ).
28 john bamberg

Exercise 1.11.10. Let f : A → B and g : B → C be functions.

(i) Show that if f and g are one-to-one, then g ◦ f is one-to-one.

(ii) Show that if f and g are onto, then g ◦ f is onto.

(iii) Show that if g ◦ f is one-to-one and f is onto, then g is one-to-one.

(iv) Show that if g ◦ f is onto and f is one-to-one, then f onto.

(v) Let A = C. Show that if g ◦ f = idA , then f is one-to-one.

(vi) Let A = C. Show that if f ◦ g = idB , then f is onto.

Exercise 1.11.11. Let A and B be countable sets. Show that A × B is count-


able. In particular, show that C is uncountable.

Exercise 1.11.12 (Hard). Let A and B be sets. Show that A  B or B  A.

Exercise 1.11.13. Prove that if A is an infinite set, then N  A.


2
From numbers to rings

This chapter covers elementary number theory and its analogues in the general theory of rings. We first look at
the “Division Rule” for the integers, the greatest common divisor of two numbers, and the Euclidean Algorithm. We
then look at the prime numbers, the atoms of the integers. This basic number theory leads to a fundamental number
system in abstract algebra, the integers modulo m, otherwise known as “clock-arithmetic”.

2.1 Divisibility

Let x and y be two integers. We say that x divides y1 if there exists another 1
Synonyms:
integer q such that • x divides y

y = qx. • y is a multiple of x
• x is a factor of y
We use the “long bar” notation to denote this relation:

x | y.

From this definition of divisibility we note the following things:

• Any integer x divides 0 since 0 = 0 · x.

• The number 1 divides any nonzero integer and an integer x divides itself,
both because x = 1 · x.

• If a and b are positive integers and a | b, then2 a 6 b. 2

Mini-proof: Since a | b, there exists an


Example 2.1.1 (A lattice of numbers). We can see how the divisibility integer q such that b = qa. Now a and b
relation works by drawing a Hasse diagram that displays how the positive are positive, so q must be positive. So q > 1
divisors of 24 relate to one another. and hence b > 1 · a. 

A Hasse diagram is a diagram which


24 represents a partially ordered set. Each
element of the set is a vertex, and we draw
a line segment between vertices x and y
12 8 whenever x is smaller than y and there is no
intermediate element of the set.

6 4

3 2

1
30 john bamberg

2.1.1 “Divides” is transitive


We didn’t point out that what makes the Hasse diagram so neat and tidy is
that it is not necessary to put in lines where there are paths; for instance,
we don’t need to draw a line between 2 and 12, because if 2 | 6 and 6 | 12
implies that 2 | 12.

Lemma 2.1.2. Let a, b and c be integers. If a | b and b | c, then a | c.

 Proof (Restricted Syntax). Suppose a | b and b | c. Then there are


integers m, n such that b = ma and c = nb by the definition of “divides”.
Then c = n(ma) by substituting ma for b.
Then c = ka where k = nm since n(ma) = (nm)a.
Then a | c by the definition of “divides”.
Therefore, if a | b and b | c, then a | c. 

Notice that in the above proof, the definition of “divides” had to be the
first deduction we made since it was all that we knew, and we reapplied the
definition of “divides” in the last deduction we made. So it is almost always
true that a mathematician, when devising a proof, constantly reflects on the
information known at each step, though being careful not to assume more
than what is allowed. At the same time, the mathematician keeps an eye on
the prize – the conclusion – and has a feeling of what the final steps need to
be.

2.1.2 Ring-like properties of things you know


When we have a binary operation on a set, like addition, and it always
produces elements in the same set, we say that the set is closed under this
operation.

Example 2.1.3. The set of even numbers 2N is closed under addition since
if we add two even numbers, the result is an even number. However, the set
of odd numbers is not closed under addition since 3 + 5 = 8.

Here are some operations on some familiar sets and we summarise when
the given set is closed under a particular operation.

+ × − Table 2.1: Closure and non-closure of some


well-known binary operations.
2N X X ×
N X X ×
Z X X X
Q X X X
R X X X
C X X X
n × n matrices over R X X X
invertible n × n matrices over R × X ×

In this course, we will look at other common number systems in mathe-


matics. Some of the sets above (which ones?) have the property that every
nonzero element has a multiplicative inverse. That is, we can divide by
nonzero elements.
fundamental concepts in mathematics 31

2.1.3 The Division Rule


Below is a very handy result in number theory, which is probably the most
fundamental property of the integers. It does not work for the rational
numbers or for the real numbers, but we will see later in the course that a
similar property holds in other number systems.

Lemma 2.1.4 (The Division Rule). Let a be a positive integer and let b be
an integer. Then there are unique integers q and r such that

b = qa + r and 0 6 r < a.

? Proof: To be done in lectures. 

We call q the quotient and r the remainder. Sometimes we write b


(mod a) for the remainder. So for example, if a = 5 and b = 12, then
b = 2a + 2 and there is no other way to write b in terms of a with the
remainder r satisfying 0 6 r < a.

Definition 2.1.5 (Greatest Common Divisor).


The greatest common divisor of two nonzero integers x and y is the
largest integer that divides both x and y:

gcd( x, y) = max{i ∈ N : i|x and i|y}.

Example 2.1.6.

• The divisors of 8 are 1, 2, 4, 8 and the divisors of 20 are 1, 2, 4, 5, 10, 20.


So gcd(8, 20) = 4.

• gcd(−12, 18) = 6.

• The gcd of two distinct primes is always 1.

One way to work out the gcd of 234 and 180, say, is to divide the smaller
number into the larger one, take its remainder (which is 54) and then notice
that gcd(234, 180) = gcd(54, 180). The next result is the basis for the next
section.

Lemma 2.1.7 (Behind Euclid’s Algorithm). Let a and b be nonzero inte-


gers. If b = qa + r then gcd(b, a) = gcd(r, a).

 Proof (restrictive syntax): Suppose d is a divisor of both a and b.


Then d divides b − qa since d divides qa and d divides b.
Then d divides r since r = b − qa.
Then d divides both r and a by just collecting facts.
Therefore every divisor of both a and b is also a divisor of r.
Conversely, suppose d is a divisor of both a and r.
Then d divides qa + r since d divides qa and d divides r.
Then d divides b since b = qa + r.
Therefore every divisor of both a and r is also a divisor of b.
Then {d ∈ N : d | a and d | b} = {d ∈ N : d | a and d | r} since we have shown above that both sets
Therefore, gcd(b, a) = gcd(r, a) are subsets of one another.
by definition of gcd.

32 john bamberg

2.1.4 The Euclidean Algorithm in Z


In this section we demonstrate a method for finding the gcd of two integers
which is attributed to the school of Euclid3 . We show by example the
routine.
3
Example 2.1.8. To find the greatest common divisor of 234 and 180,
Euclid of Alexandria (c. 325 BCE – c. 265 BCE)
perform the following steps. is the most prominent mathematician of antiquity
best known for his treatise on mathematics “The
(i) Draw two columns Elements”.

(ii) Write 234 and 180 in the right column, the largest 234 first.

234
180

(iii) Divide the smaller number 180 into the larger 234, and write the
quotient in the left column adjacent to 180, and the remainder below
180.

234
1 180
54

(iv) Now move down one row and repeat the last step over and over until
we get a remainder of 0.

234
1 180
3 54
3 18
0

(v) The second-last number in the right column is the greatest common
divisor of 180 and 234, namely 18.

Example 2.1.9. Consider 558 and 423. Here is the table we get when we
do the Euclidean Algorithm:

558
1 423
3 135
3 18
7 9
0

Then the last non-zero remainder is 9 so gcd(558, 423) = 9.

Lemma 2.1.10 (Bézout’s identity). If a and b are nonzero, then there are
integers m and n such that

gcd(a, b) = ma + nb.

 Proof: Done in lectures. 


fundamental concepts in mathematics 33

In fact, Bézout’s identity almost says that the greatest common divisor of
a and b is the smallest integer linear combination of a and b. We fill in the
details in what follows.
Corollary 2.1.11. Let a and b be two nonzero integers. Then an integer
x can be expressed as ma + nb for two integers m and n if and only if
gcd(a, b) divides x.
? Proof: We do the “ =⇒ ” direction first. Suppose that x can be ex-
pressed as ma + nb for some integers m and n. Let d be an integer that
divides both a and b. Then by transitivity of the divides relation (Lemma
2.1.2), d divides ma and d divides nb. So d divides their sum, and hence d
divides x. Therefore, gcd(a, b) must divide x, since by definition, gcd(a, b)
divides both a and b.
Now for the converse. Suppose gcd(a, b) divides x. So there is an inte-
ger c such that x = c · gcd(a, b). By Bézout’s identity (Lemma 2.1.10),
there exist a pair of integers m0 and n0 such that

gcd(a, b) = m0 a + n0 b.

Hence, x = (cm0 )a + (cn0 )b and the result follows by letting m = cm0 and
n = cn0 . 

Example 2.1.12. In Example 2.1.9, we showed that gcd(558, 423) =


9, and Bézout’s identity says that we can express 9 as an integer linear
combination of 558 and 423:

9 = 558m + 423n.

Here is how we do it with the table from before4 : 4


This trick is a shorthand for doing the
traditional method of reverse substitution:
558
9 = 135 − 7 · 18
1 423 = 135 − 7(423 − 3 · 135)
3 135 = 22 · 135 − 7 · 423
7 18 = 22 · (558 − 423) − 7423
2 9 = 22 · 558 − 29 · 423
0
We will make a new table with four columns, but retaining the numbers
in the left column but making them negative:

558 1 0
-1 423 0 1
-3 135
-7 18
-2 9
To fill out the third column, we take the product of the first and third
column in the previous row and add it to the previous value.
First step Second step Third step
558 1 0 558 1 0 558 1 0
-1 423 0 1 -1 423 0 1 -1 423 0 1
-3 135 1 -3 135 1 -3 135 1
-7 18 -7 18 -3 -7 18 -3
-2 9 -2 9 -2 9 22
34 john bamberg

Similarly for the fourth column:

First step Second step Third step


558 1 0 558 1 0 558 1 0
-1 423 0 1 -1 423 0 1 -1 423 0 1
-3 135 1 -1 -3 135 1 -1 -3 135 1 -1
-7 18 -3 -7 18 -3 4 -7 18 -3 4
-2 9 22 -2 9 22 -2 9 22 -29

We just read off from the last row of the table that m = 22 and n = −29.

Definition 2.1.13. Two nonzero integers are coprime5 if their gcd is equal 5
Often the term ‘relatively prime’ is used
to 1. for the same concept.

Example 2.1.14.

• 15 and 8 are coprime.

• Any two distinct primes are coprime.

There are some more nice consequences of Bézout’s identity, as we shall


see below.

Corollary 2.1.15. Two nonzero integers a and b are coprime if and only if
there exist integers m and n such that

1 = ma + nb.

 Proof: This is a direct application of Corollary 2.1.11 and Definition


2.1.13. That is, if we set x = 1 in the statement of Corollary 2.1.11, we see
that 1 can be written as ma + nb if and only if gcd(a, b) divides 1. Now if
gcd(a, b) divides 1, then it is equal to 1 (because it is divisible by 1) and
hence 1 = ma + nb for some integers m and n if and only if a and b are
coprime. 

Corollary 2.1.16. Let a and b be integers, and let c be an integer.

(i) If a and b are coprime, a | c and b | c, then ab | c.

(ii) If a and b are coprime, and a | bc, then a | c.

(iii) (Euclid’s Lemma) If p is prime and p | ab, then p | a or p | b.

 Proof: To be done in lectures. 

Part (i) of Corollary 2.1.16 is really a fact about the least common multi-
ple of two integers.

Definition 2.1.17. The least common multiple of two nonzero integers x


and y is defined by

LCM( x, y) = min{i ∈ N : x|i and y|i}.


fundamental concepts in mathematics 35

2.2 Prime numbers and canonical factorisation of integers

In the previous section, we already alluded to the prime numbers. These


are the indivisible parts of the integers and form the building blocks for all
other integers. First and foremost, we will give one of the most beautiful
elementary proofs in number theory, that there are infinitely prime numbers.

Theorem 2.2.1 (Infinitude of primes). There are infinitely many prime


numbers.
This proof is attributed to Ernst Kummer
(1810–1893).
? Proof: Suppose, for a proof by contradiction, that there are finitely many
primes
p1 < p2 < p3 < . . . < pn .
Let N be the product of all the primes; N := p1 p2 · · · pn . Because 2 and 3
are primes, we know at least that N > 2. Since N − 1 > 1, it has a prime
divisor pi . Since N is divisible by every prime, we know that pi divides
both N and N − 1, and so divides their difference. Therefore,

pi | N − ( N − 1) = 1

which is a contradiction as pi > 1! 

Now we will give the most important result of number theory, the “Fun-
damental Theorem of Arithmetic6 ”. 6
The Fundamental Theorem of Arithmetic
was established at least by the Greeks
Theorem 2.2.2 (The Fundamental Theorem of Arithmetic). Let n be an of the time of Euclid, since it appears in
Volume VII of The Elements.
integer with n > 2. Then n can be written uniquely as a product of prime
numbers:
n = p1 p2 · · · pk , p1 6 p2 6 · · · 6 pk .
So if n = q1 q2 · · · qk where the qi are prime numbers, and q1 6 q2 6 · · · 6
q` , then k = ` and qi = pi for all i ∈ {1, . . . , k}.

? Proof: This is an excellent example of a proof by induction.


Existence: For n > 2, let P(n) be the statement
P(n): “all the positive integers at most n can be written as a product of
primes.”

Clearly P(2) is true as 2 is a prime itself.7 So suppose P(k) is true for some 7
Here we stress that a product of a list
positive integer k > 2. If k + 1 is prime, then we are done, so suppose k + 1 of integers L allows L to have just one
element!
is not prime and that we can write k + 1 = ab where 1 < a, b < k + 1. Now
by our inductive hypothesis, a is a product of primes and b is a product
of primes. So ab is a product of primes! Therefore, P(k + 1) is true, and
hence by the Mathematical Induction, P(n) is true for every positive integer
n > 2.
Uniqueness: Suppose we can write n two ways as a product of primes

n = p1 p2 . . . pk = q1 q2 . . . q`

For a proof by contradiction, we can suppose that the first factorisation is


difference from the second. So without loss of generality, we may suppose
that
p1 p2 · · · pt = q1 q2 · · · qu
36 john bamberg

where
{p1 , p2 , . . . , pt } ∩ {q1 , q2 , . . . , qu } = ∅. (2.1)
Now p1 divides both sides, and so p1 mod q1 q2 · · · qu . So from Corol-
lary 2.1.16, we know that p1 must divide at least one of the qi , for some i.
This implies that p1 = qi , as qi is prime, which is a contradiction to 2.1.
Therefore, {p1 , p2 , . . . , pk } = {q1 , q2 , . . . , q` }. 

This rudimentary property of the integers allows us to have a canonical


factorisation of a number, a blue-print of a number in terms of the prime
numbers.

Definition 2.2.3 (Canonical factorisation). For a positive integer n > 2,


the canonical factorisation of n is the unique expression of it as a product of
primes:
pa11 pa22 · · · pak k
where p1 < p2 < · · · < pk and each ai is a non-negative integer.

Remark 2.2.4. The ai are allowed to be zero, though in this case, we do not
list the prime in order for the expression to be unique. Since n > 2, not all
of the ai can be zero.
A consequence of the Fundamental Theorem of Arithmetic is that is
not difficult to calculate the greatest common divisor and least common
multiple of two integers.

Lemma 2.2.5. Let x and y be integers, both at least 2, and write them out
in there canonical factorisations:

x = pa11 pa22 · · · pak k


y = pb11 pb22 · · · pb` ` .

Suppose without loss of generality that k = `, since we can take the ai or b j


to be 0 so that both factorisations have the same length. Then:
min(a1 ,b1 ) min(a2 ,b2 ) min(ak ,bk )
(i) gcd( x, y) = p1 p2 · · · pk
max(a1 ,b1 ) max(a2 ,b2 ) max(ak ,bk )
(ii) LCM( x, y) = p1 p2 · · · pk

Therefore,
xy
LCM( x, y) = .
gcd( x, y)
 Proof: To be done in lectures. 

2.3 Aside: how are the primes distributed?

2.3.1 The prime number function π(n)


There are infinitely primes, but do they occur in a regular way on the num-
ber line? Or do they become sparser as the size of the digits gets larger?
One of the first results in this direction is the so-called Bertrand’s Postulate Joseph Bertrand (1822-1900) was a French
and it was proved by the great Russian mathematician Pafnuty Chebyshev. mathematician who made important con-
tributions to many areas of mathematics:
number theory, differential geometry and
Theorem 2.3.1 (Bertrand’s Postulate). For every integer n > 2, there is a probability theory.
prime strictly between n and 2n.
fundamental concepts in mathematics 37

Let π(n) be the number of primes that are less than or equal to n. For
example, π(20) = 8 as 2, 3, 5, 7, 11, 13, 17 and 19 are the prime num-
bers less than 20. Another way to express Bertrand’s Postulate is by the
inequality
(∀n > 2) π(2n) − π(n) > 1.
However, Chebyshev proved something much stronger: Pafnuty Chevyshev (1821-1894) was
one of Russian history’s brightest minds
1 n 7 n and he made far reaching contributions
(∀n > 5) < π(2n) − π(n) < . to mathematics of the 19th century. His
3 log n 5 log n
ability to write fluently in French meant
At this point, we have given some indication that the prime numbers that he was in his words a “world-wide
mathematician”, and his work then bore
occur rather frequently. Here is a simple result which shows that they are influence during his own lifetime.
also rare.

Theorem 2.3.2. For every k ∈ N, there are k consecutive integers which


are not prime.

? Proof: Let k ∈ N and let N = (k + 1)! + 2. Note that


2| N because k > 1
3| N+1 because k > 2
4| N+2 because k > 3
5| N+3 because k > 4
.. ..
. .
In particular, we know that for i ∈ {0, . . . , k − 1}, we have i + 2 | (k + 1)!
and hence i + 2 | (k + 1)! + (i + 2) = N + i. Hence N, N + 1, N +
2, . . . , N + k − 1 are not prime. 

From Bertrand’s Postulate, we know that π(n) grows at least as fast as


50
log2 (n). In fact, Chebyshev attempted to prove something stronger, that the
primes grow like n/ log(n).
π(n)
Theorem 2.3.3 (The Prime Number Theorem). The limit of n/ log(n)
as n
tends to ∞ is 1.

This famous theorem was conjectured by Carl Friedrich Gauß (1750’s)


and proved independently by Jacques Hadamard and Charles de la Vallée 0 500
Poussin (1896). Their proof used “complex analysis”, and it wasn’t until
1949 that an elementary proof was given by Paul Erdös and Atle Selberg. Figure 2.1: log2 (n) and n/ log(n)

Remark 2.3.4. The prime number theorem says that π(n) grows asymp-
totically like n/ log(n), but it says nothing about the differencet π(n) −
n/ log(n).

2.3.2 Arithmetic progressions


Consider the following sequence:

1, 4, 7, 10, 13, 16, 19, 22, 25, . . .

These are the numbers which are of the form 3k + 1, where k ∈ N ∪ {0}.
We have highlighted in bold the prime numbers in this sequence, which is
known as an arithmetic progression. In general, if we take a and b to be
coprime8 , we can ask how many primes occur in the arithmetic progression 8
Why would you take a and b to be
coprime? If they had a common factor,
would we have a sensible sequence in
which to study the occurrence of primes in?
38 john bamberg

a, a + b, a + 2b, a + 3b, . . . .

A surprising result to this point was proved by Dirichlet in 1837.

Theorem 2.3.5 (Dirichlet’s Theorem). For all a, b ∈ N coprime, the


arithmetic progression

a, a + b, a + 2b, a + 3b, . . .

contains infinitely many primes.

† Proof: The proof of this result is beyond the scope of this course. 

What about the opposite question? Can we build sequences of primes


that have regular gaps between them? Consider the following interesting
sequences of primes:

a b Sequence
3 2 3, 5, 7
3 4 3, 7, 11
5 4 5, 11, 17, 23

The best we seem to be able to do is to list a sequence of 4 primes that


are spaced 4 apart. At the time of writing, the best arithmetic progression
consisting only of primes is due to Benoát Perichon (2010):

43142746595714191 + k · 23681770 · 223092870

where k ∈ {0, 1, . . . 25}. This sequence of primes has length 26 and the
distances between them are quite large compared to the sequences we had
a above. Can we do better than a sequence of length 26? This question was
answered by Ben Green and Terence Tao9 (an Australian!): 9
Terry Tao was awarded a 2006 Fields
Medal, the mathematical equivalent of a
Theorem 2.3.6 (The Green-Tao Theorem). The primes contain arbitrarily Nobel Prize. He is the first Australian to
have been awarded a Fields Medal.
long arithmetic progressions.

Of course, the proof is non-constructive!

2.4 Mersenne primes and perfect numbers

We saw in the previous section that there are infinitely many primes, but
it was not a constructive proof. That is, we did not create an explicit set of
infinitely many primes, we only proved that there cannot be finitely many
primes. Mersenne primes are the simplest set of many prime numbers that
we know of, and yet, we still do not know if they can produce infinitely
many primes.

Definition 2.4.1 (Mersenne prime). A prime number of the form 2n − 1 is a


Mersenne prime.

It turns out (see the Exercises at the end of this chapter) that n must a
prime number in order for 2n − 1 to be prime. Not all numbers of this form
are prime though:
fundamental concepts in mathematics 39

n 2n − 1 Is prime?
2 3 Yes
3 7 Yes
5 31 Yes
7 127 Yes
11 2047 No, 2047 = 23 · 89
13 8191 Yes
17 131071 Yes
19 524287 Yes
23 8388607 No, 8388607 = 47 · 178481
The largest known Mersenne prime was
The greek civilisation had a particular fascination with what they called discovered in 2008 and it is 2 p − 1 where
perfect numbers, and they knew of four perfect numbers at the time of p = 43, 112, 609. We currently know of 47
Mersenne primes. There is a world-wide
Nicomachus in 100AD. A number n is perfect if it equal to the sum of its “Great Internet Mersenne Prime Search”
proper divisors. For example, the proper divisors of 6 are 1, 2 and 3, and (GIMPS) that uses the CPU power of
6 = 1 + 2 + 3. thousands of home users machines who
have volunteered their desktop power to aid
in the search for the next Mersenne prime.
Example 2.4.2 (Examples of perfect numbers).
See www.mersenne.org for more.
Perfect number Proper divisors
6 1, 2, 3
28 1, 2, 4, 7, 14
496 1, 2, 4, 8, 16, 31, 62, 124, 248, 496
8128 1, 2, 4, 8, 16, 32, 64, 127, 254, 508, 1016, 2032, 4064
In Euclid’s Elements (Book IX), it was proved that there is a direct
connection from Mersenne primes to perfect numbers.

Theorem 2.4.3 (Euclid). If 2k − 1 is a Mersenne prime, then 2k−1 (2k − 1)


is a perfect number.

 Proof: To be done in lectures. 

Definition 2.4.4 (The sum-of-divisors function). For an integer n, let σ(n)


be the sum of the divisors of n.

So a perfect number n satisfies σ(n) = 2n. Here are some other interest-
ing properties of σ:

Lemma 2.4.5.

(i) If p is a prime and a is a positive integer, then

pa + 1 − 1
σ( pa ) = 1 + p + p2 + · · · pa = .
p−1

(ii) If a and b are coprime, then σ(ab) = σ(a)σ(b).

 Proof: For both parts, we simply use the following corollary of the
Fundamental Theorem of Arithmetic: if we express a positive integer x
in terms of its canonical factorisation pa11 pa22 . . . pak k , then every positive
divisor of x has canonical factorisation of the form pb11 pb22 . . . pbk k , where for
each i, we have bi 6 ai .

(i) Suppose p is a prime and a is a positive integer. Then the divisors of


pa must be of the form pi for i ∈ {0, . . . , a}. So σ( pa ) = 1 + p +
pa+1 −1
p2 + · · · pa = p−1 .
40 john bamberg

(ii) Suppose a and b are coprime. Suppose we wrote out their canonical
factorisations as follows

a = pa11 pa22 . . . pak k


b = qb11 qbb2 . . . qb` ` .
2

We may assume that a and b are both greater than 1, since the result
would definitely hold in either of these two cases. With this assump-
tion, we may further assume that each of the ai and bi are greater than
0 (otherwise we would suppress the term). Then σ(a) = i σ( pa1i ),
Q

and similarly, σ(b) = i σ(qb1i ). Since a and b are coprime, we have


Q

{p1 , . . . , pk } ∩ {q1 , . . . , q` } = ∅. So the canonical factorisation of ab is


simply
pa11 pa22 . . . pak k qb11 qbb2 . . . qb` `
2

and since we can easily read off the divisors of this integer by its
canonical factorisation, it then follows that σ(ab) = σ(a)σ(b).

In particular, σ(2k ) = 2k+1 − 1, which will be used in the following


result, which was first observed by the great mathematician, Leonard Euler.

Theorem 2.4.6 (Euler, c.a. 1740).


Every even perfect number is of the form 2k−1 (2k − 1).

? Proof: To be done in lectures. 

Question 2.4.7. Is there an odd perfect number?

The best known answer to date to this question (at the point of writing)
is due to Ochem and Rao (2011):

Theorem 2.4.8 (Ochem and Rao, 2012). An odd perfect number must be
larger than 101500 .

2.5 Clock arithmetic

Suppose it is 10:02am right now. What is the time 6 hours from now? If
you use a 24-hour convention, then you would say that it is 16:02. But if
you use a 12-hour convention, using “am” or “pm”, then the answer would
be 4:02pm. What were you doing when you are working out that it would
be 4:02 in the clock? You were counting to 12, and then starting back at 0
again. That is, 12 is the same as 0.
Here is another example. What day of the week is it on the 10th of the
next month? To work out the answer, you count out the number of days
until the 10th of the next month, divide by 7, and take the remainder. Then
you add this remainder to the day we are currently on.

Example 2.5.1. If it is now Wednesday August 22nd, then there are 19 days
until September 10. Now 19 = 2 × 7 + 5, so there is a remainder of 5 when
dividing 19 by 7. We then count forward by 5 in the days of the week and
discover that September 10 is a Monday.
fundamental concepts in mathematics 41

What would happen to the examples above if a clock had 13 numbers,


or if a week had five days? We would simply change which number that we
think of as ‘zero’.

Definition 2.5.2 (Congruence Modulo n).


Let n ∈ N. Integers a and b are said to be congruent modulo n if n
divides a − b. We denote this relation by a ≡n b.

Example 2.5.3.

• 14 ≡5 4 since 14 − 4 = 10 is a multiple of 5.

• −7 ≡3 2 since −7 − 2 = −9 is a multiple of 3.

The congruence relation is a little bit like the equality relation, but more
flexible. It at least has the following properties10 . Compare the following 10
in fact, these ‘properties’ are the defining
lemma to Lemma 2.1.2. properties of a congruence. Congruences
are studied in general in some areas of
abstract algebra.
Lemma 2.5.4. For n ∈ N, congruence modulo n satisfies the following
properties:

Reflexivity: For all a ∈ Z, a ≡n a.

Symmetry: For all a, b ∈ Z, if a ≡n b then b ≡n a.

Transitivity: For all a, b, c ∈ Z, if a ≡n b and b ≡n c then a ≡n c.

Compatible with addition: If a ≡n b and a0 ≡n b0 , then a + a0 ≡n b + b0 .

Compatible with mutliplication: If a ≡n b and a0 ≡n b0 , then aa0 ≡n bb0 .

 Proof:

Reflexivity: Note for all a ∈ Z that n divides a − a = 0 and hence a ≡ a


(mod n).

Symmetry: If a ≡ b (mod n), then n divides a − b, which is the same as


b − a, and hence b ≡ a (mod n).

Transitivity: Suppose a ≡ b (mod n) and b ≡ c (mod n) for some integers


a, b, and c. Then n divides a − b and n divides b − c. So n divides the
sum of a − b and b − c, which is a − c, and hence a ≡ c (mod n).

Compatible with addition and multiplication: Suppose a ≡n b and a0 ≡n


b0 . Then n divides a − b and n divides a0 − b0 . So n divides their sum
a − b + a0 − b0 = (a + a0 ) − (b + b0 ). Hence, a + a0 ≡n b + b0 .
For multiplication, we need to be more explicit. Write a − b = kn and
a0 − b0 = k0 n for some k, k0 ∈ Z. Then

aa0 = (b + kn)(b0 + k0 n) = bb0 + n(bk0 + b0 k + kk0 )

and hence n divides aa0 − bb0 .

Corollary 2.5.5. If a ≡n b, then for all positive integers k, we have ak ≡n


bk .
42 john bamberg

This allows us to make very difficult calculations with ease!


Example 2.5.6. Find the remainder of 750 when divided by 15.
Solution: First notice that 72 = 49 and 49 ≡15 4. So

74 = (72 )2 ≡15 42 = 16 ≡15 1.

Now 750 = 74·12+2 and so

750 = (74 )12 · 72


≡15 112 · 72
≡15 4.

Therefore, if we divide 750 by 15, we end up with a remainder of 4.


The remainder of an integer when we divide by n always gives us a
number within a certain range

Zn := {0, 1, . . . , n − 1}

which we call the set of integers modulo n. We will see that this set can be
equipped with addition-like and multiplication-like operations that makes it
into an interesting number system.
Lemma 2.5.7. For every integer x, there is a unique element y ∈ Zn such
that x ≡n y.
 Proof: Let x be an integer. By the Division Rule (Lemma 2.1.4), there
exist unique integers q and r such that x = qn + r and 0 6 r < n. So
r ∈ Zn . So if we let y = r, we see that there is a unique element y ∈ Zn
such that x ≡n y. 

We can make Zn into a ring by defining addition and multiplication


modulo n.
Definition 2.5.8 (Arithmetic Modulo n).
Let n ∈ N. Then we define the binary operations ⊕n and ⊗n on the
integers by
x ⊕n y = ( x + y) (mod n)
x ⊗n y = xy (mod n)
for all x, y ∈ Z. The output is always an element of Zn .

Example 2.5.9.
• 5 ⊕4 3 = 8 (mod 4) = 0,

• −7 ⊕5 13 = 6 (mod 5) = 1,

• 50 ⊗233 233 = 50 · 233 (mod 233) = 0.


We can sometimes ‘cancel’ both sides.
Lemma 2.5.10. Let n ∈ N and a ∈ Z, and suppose n and a are coprime. If
xa ≡n ya, then x ≡n y.
 Proof: Suppose xa ≡n ya. Then n divides xa − ya = ( x − y)a. Since
n and a are coprime, we have by Corollary 2.1.16 that n divides x − y.
Therefore, x ≡n y. 
fundamental concepts in mathematics 43

2.5.1 Linear diophantine equations


A linear diophantine equation is an equation of the form

ax + by = c

where a, b, c are given integers and we want to solve for integers x and y.
Here is an example that we see often in real life.

Example 2.5.11. What amounts of money can you make from $ 2 and $ 5
denominations? That is, for what values c can we solve 2x + 5y = c?

We can rephrase a linear diophantine equation ax + by = c in terms of a


solution in one of these variables, giving us a congruence equation:

ax + by = c has a solution in x ⇐⇒ b | ax − c (2.2)


⇐⇒ ax ≡b c (2.3)

Theorem 2.5.12. The congruence equation ax ≡b c has a solution in x ∈ Z


if and only if gcd(a, b) | c.

 Proof: Done in lectures. 

The next question we might ask is, if a linear diophantine solution has a
solution, does it have more, and how many more?

Example 2.5.13. Suppose a = 8, b = 6 and c = 14. Clearly there


is a solution to ax + by = c by taking x = y = 1. Are there more
solutions? First of all, from the argument above (2.2), we need 8x ≡6 14.
This equation simplifies11 to x ≡3 1. The set of all integers x which satisfy 11
To see this, notice that the condition
x ≡3 1 has a special name: 6 | 8x − 4 is equivalent to 3 | x − 1; because
3 is coprime to 2.

1 + 3Z = {x ∈ Z : x ≡3 1} = {. . . , −8, −5, −2, 1, 4, 7, 10, 13, . . .}.

Theorem 2.5.14 (The LDE Theorem). Suppose gcd(a, b) | c. If ( x0 , y0 ) is


a solution to ax + by = c (i.e., ax0 + by0 = c), then
!
b a
x0 + k , y0 − k
gcd(a, b) gcd(a, b)
is a solution to ax + by = c, for all k ∈ Z.

In other words, if we fix y, the set of all solutions of x to ax + by = c in


is infinite if it is non-empty, and it is the congruence class12 12
which we will see in Section 3.3

b
x0 + tZ, t= .
gcd(a, b)
We will leave the proof as an exercise (see the last section of this chapter).

2.5.2 A magic trick and the Chinese Remainder Theorem


Here is how the trick goes. You need four volunteers from the audience.
The first three will receive an envelope each containing a piece of paper
with a number on it. The fourth, call her Alice, chooses a number between
1 and 1000 and writes it on each of the three volunteers envelopes and
secretly passes their envelopes to them. You, nor the rest of the audience
know what the number is. Each volunteer has a similar task to complete:
44 john bamberg

• Volunteer 1 works out what their number is modulo 7, call it a1 , and


writes it on the piece of paper inside the envelope.

• Volunteer 2 works out what their number is modulo 11, call it a2 , and
writes it on the piece of paper inside the envelope.

• Volunteer 3 works out what their number is modulo 13, call it a3 , and
writes it on the piece of paper inside the envelope.

They give you their scribed pieces of paper. With this information you
can recover the original number that Alice chose! Let

x = −2 · 143 · a1 + 4 · 91a2 − 77a3 .

Alice’s number is then congruent to x modulo 1001. Why 1001? Because


1001 = 7 · 11 · 13, and we used the proof of the Chinese Remainder
Theorem.

Theorem 2.5.15 (Chinese Remainder Theorem). Let m1 , m2 , . . . mk be


positive integers, pairwise coprime, and let a1 , a2 , . . . , ak be integers. Then
there is a solution x ∈ Z of the following simultaneous linear diophantine
equations:

x ≡m1 a1
x ≡m2 a1
..
.
x ≡mk ak .

Moreover, two solutions are congruent modulo m1 m2 · · · mk .

? Proof: Let M = m1 m2 · · · mk . Now for each i, we have that mi and


M/mi are coprime. By Lemma 2.5.10, for each i, there exists bi such that

M
bi ( ) ≡mi 1.
mi

Let x = ki=1 ai bi mMi . Then for each i, we have x ≡mi ai bi mMi ≡mi ai .
P

Therefore, x is a solution to the simultaneous linear diophantine equations.


Now suppose x0 is another solution. Then for each i, we have13 13
because x, x0 ≡mi ai

x − x0 ≡mi 0

and hence mi divides x − x0 , for each i ∈ {1, . . . , k}. So M divides x − x0 by


Corollary 2.1.16, and hence
x ≡ M x0 .

The important part of the proof is that it gives us a solution :

k
X M
x= ai bi .
i=1
mi
fundamental concepts in mathematics 45

So in the magic trick above m1 = 7, m2 = 11 and m3 = 13. We then


work out the inverses of the mi m j modulo the other mk , and assign them to
bk :

b1 :(11 · 13)−1 (mod 7) → −2


b2 :(7 · 13)−1 (mod 11) → 4
−1
b3 :(7 · 11) (mod 13) → −1.

Of course, we could’ve taken any number to be an inverse as long as it


returns 1 under multiplication modulo mi . For example, in the first case we
could have taken b1 = 5 since

5 · (11 · 13) = 715 ≡7 1.

2.5.3 Fermat’s Little Theorem


Fermat’s Little Theorem, and its generalisation by Euler, are the essential
ingredients in the RSA-cryptosystem; a commonly used method for trans-
mitting public keys between two parties.

Theorem 2.5.16 (Fermat’s Little Theorem). Let p be a prime and let a be


an integer. Then
a p ≡ p a.

? Proof: To be done in lectures. 

Example 2.5.17. Show that for all n ∈ Z, n9 − n is divisible by 6.


Solution:. Let n ∈ Z. Then n9 = (n3 )3 and so we can apply Fermat’s Little
Theorem:
Now (n3 )3 ≡3 n3 as n3 ≡3 n by Fermat’s Little Theorem.
Then (n3 )3 ≡3 n by transitivity of “divides”
Therefore, n9 − n is divisible by 3.
Next we decompose n9 into ((n2 )2 )2 n and use Fermat’s LT for p = 2:
Now n2 ≡2 n by Fermat’s Little Theorem.
Then n4 = (n2 )2 ≡2 n2 ≡2 n by reapplying n2 ≡2 n.
Then n8 = (n4 )2 ≡2 n2 ≡2 n by reapplying n4 ≡2 n and n2 ≡2 n.
Then n8 · n ≡2 n · n Then n9 ≡2 n Therefore, n9 − n is divisible by 2. since n ≡2 n and ≡2 is compatible with
Since 2 and 3 are coprime, we have by Corollary 2.1.16 that n9 − n is multiplication.
since n2 ≡2 n.
divisible by 6.

2.6 Aside: The RSA algorithm for public-key encryption

For a long time, the world used private key cryptography to transmit secret
information, until the breakthrough of Diffie and Hellman in the 1970’s.
We will see an example of public key cryptography, the so-called RSA
cryptosystem. “We in science are spoiled by the success
It can often be difficult to find the (multiplicative) inverse of a number of mathematics. Mathematics is the study
of problems so simple that they have good
modulo m. For example, the inverse of 7 modulo 64 is 55, which may not solutions.”
be easy to guess off-hand. We can use the Euclidean Algorithm to find the
– Whitfield Diffie (1944–)
inverse modulo m and we show how this is done by example.
46 john bamberg

Example 2.6.1. We want to work out an inverse of 21 modulo 1430. The


essential property which makes this work is that 21 and 1430 have no prime
factors in common. By Bézout’s Identity, there are integers x and y such
that
21x + 1430y = 1.

So
21x ≡1430 21x + 1430y = 1

and hence x will give us the inverse if we can work it out. We will use the
Euclidean Algorithm:

1430 1 0
-68 21 0 1
-10 2 1 -68
-1 0 -10 681

Therefore, a multiplicative inverse of 21 is 681.

Sharing a ‘key’
The previous methods of enciphering depended on knowledge of the key
to decipher the message, so the key had to be kept private between Alice
and Bob. In the 1970’s a radical new approach to cryptography was borne,
public key cryptography. The general way it works is this: Bob has two
keys, one private and one public. The public key is used by Alice to en-
crypt messages sent to Bob, and the private key is used by Bob to decrypt
messages.
Here is an analogy where Alice sends a treasure chest to Bob through
the post. Alice has a padlock and both Alice and Bob have a key to this
padlock. When Alice sends the treasure chest, she puts a padlock on the
treasure chest, and then when Bob receives the chest, he opens it with
his keys. The problem with this approach is that Alice and Bob need to
meet privately to ensure they have identical keys. This is private key cryp-
tography. We can change this example to give an analogy for public key
cryptography:

Example 2.6.2. Alice has a treasure chest and padlock, and Bob has a
padlock as well, but it is a different padlock. Can you think of a way for
Alice to send the treasure chest to Bob so that Bob can open it, but they
never meet? (You are allowed to used the postal system more than once!)
Solution: Alice locks the chest with her padlock and sends it to Bob. Bob

then places his padlock on the chest and sends it back to Alice. We now
have two padlocks on the treasure chest. Alice takes her padlock off and
sends it back to Bob. Then Bob can open the treasure chest by removing his
padlock.

One-way functions
Just as we saw in the previous example, the main idea in public key cryp-
tography is a one-way function. A function f is said to be one-way if given
x it is “easy” to compute f ( x), but given y, it is “hard” to determine an x

Figure 2.2: Whitfield Diffie, Martin Hellman


and Ralph Merkle.
fundamental concepts in mathematics 47

such that f ( x) = y. The mathematics behind “easy” and “hard” is well


beyond the scope of this course, but if you’re interested, there is plenty on
the web about it, and related to this question is one of the biggest problems
in mathematics; is P = NP? Diffie and Hellman were the first to conceive
of the idea of using one-way functions in crytography, and they actually
constructed one, although it is not as good as the one we will see below.

The RSA cryptosystem

The RSA cryptosystem is named after the three mathematicians who in-
vented it around 1978: Ron Rivest, Adi Shamir and Leonard Adleman. It
was found out much later, that the same cryptosystem was discovered in
top-secret work by the GCHQ in the early seventies (by Clifford Cocks).
Before a message can be sent, a public key is set up by the receiver
(Bob) that everyone has access to.

• Choose two different primes p and q.

• Compute n = pq and ϕ = ( p − 1)(q − 1). (The symbol ϕ is the greek


letter phi.)

• Choose e such that e has an inverse modulo ϕ, and let d be this inverse.

• The encryption function is En,e ( x) = xe (mod n).

• The decryption function is Dn,d ( x) = xd (mod n).

Bob then has the following:

Public Key (n, e)

Private Key d.

Example 2.6.3. The two chosen prime numbers are p = 47 and q = 59. So

n = 47 × 59 = 2773, ϕ = 46 × 58 = 2668.

Now we need an integer e such that

de ≡ϕ 1

for some d. It turns out that e = 157 is one of many choices for e. In fact,

17 × e = 2669 ≡2668 1.

So d = 17.

Public Key: (n, e) = (2773, 157).

Alice then converts her message to a sequence of integers between 0 and


n − 1, and then encrypts them with

En,e ( x) = xe (mod n).


48 john bamberg

So for example, Alice sends the number 5 by encrypting it:

5157 (mod n) = 1044.

Bob then applies his decryption function Dn,d ( x) = xd (mod n):

104417 (mod 2773) = 5.

Why it works: Fermat’s Little Theorem


First we give a simple corollary of Fermat’s Little Theorem.

Corollary 2.6.4. Let p and q be different prime numbers and let a be an


integer which is not divisible by p or q. Then

a( p−1)(q−1) ≡ pq 1.

Proof. By Fermat’s Little Theorem (2.5.16), a p−1 ≡ p 1, and so

a( p−1)(q−1) ≡ p 1q−1 = 1.

Similarly, aq−1 ≡q 1 and so

a( p−1)(q−1) ≡q 1 p−1 = 1.

Therefore, both p and q divide a( p−1)(q−1) − 1 and so pq divides this num-


ber. Hence a( p−1)(q−1) ≡ pq 1. 

Let’s see what happens when we decrypt something that’s been en-
crypted:
Dn,d ( En,e ( x)) = ( xe )d (mod n).

So in order for decryption to give us the same thing back again, we need to
show that
xde ≡n x.

Now d was chosen so that de ≡ϕ 1, that is, de = 1 + mϕ for some m.


Hence
xde = x1+mϕ = x · ( xϕ )m .

Now xϕ = x( p−1)(q−1) and so xϕ ≡ pq 1 (by the Corollary above). It then


follows that
xde ≡n x.

Further questions on RSA for you to ponder


• Why is it computationally easy to carry out the calculations involved
such as finding d and computing powers modulo n?

• Why cannot another user discover the sent message when they know the
public key?

• How much extra information do you need to to crack a message sent


with RSA?
fundamental concepts in mathematics 49

2.7 Exercises

1. Show that n must a prime number in order for 2n − 1 to be prime.

2. Find the set of all integer solutions in x to the following set of simulta-
neous linear diophantine equations:

x ≡6 2
x ≡7 1
x ≡11 3.

3. Use the Euclidean Algorithm to find the greatest common divisor of 56


and 1430, and find integers s and t such that

gcd(56, 1430) = 56s + 1430t.

4. Let n be an odd integer. Prove that n2 ≡ 1 (mod 4).



5. Let n ≥ 2 be an integer. Suppose that for every prime p ≤ n, p does
not divide n. Prove that n is a prime.

6. Let q be an odd positive integer and let x ∈ Z. Show that x + 1 divides


xq + 1.

7. Show that if k ≥ 1 such that 2k + 1 is a prime then k = 2n for some


n ≥ 0.

8. Let Un be the nonzero elements of Zn which have a multiplicative


inverse and let a ∈ Un . Prove

a|Un | ≡ 1 (mod n).

What do you notice when n is prime?

9. Are the following true or false?

(a) 4 ≡13 17
(b) 6 ≡7 42
(c) −1 ≡4 11
(d) 11 ≡4 −1
(e) −5 ≡8 −21

10. Find the remainder r (between 0 and 6) that we get when we divide 682
by 7.

11. Calculate the following:

(a) 2 ⊕5 4
(b) −4 ⊕3 10
(c) 25 ⊗9 94
(d) −2 ⊗5 7
(e) 25634578912 ⊗2 65.
50 john bamberg

12. John H. Conway once said that 91 is the smallest integer which looks
like a prime, but isn’t. Use Fermat’s Little Theorem to show that 91 is
not prime. (Hint: Decompose 90 into binary: 90 = 64 + 16 + 8 + 2).

13. We will do a magic trick starting with 27 cards (the joker, the dia-
monds and the hearts), and the audience member selects 4 cards. The
assistant hides one card and you, the magician, will use the other three
cards to figure out what the fourth card is. First we assign a number
from 0 up to 26 to each card.

Joker A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K
– q q qqqqqqq q q q q r r r r r r r r r r r r r
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

The audience member selects four cards, and the assistant hides one of
the cards. The three cards that the magician can see are:
! ! !
3 7 10
q r r
(a) Convert these cards to numbers from {0, . . . , 26} and find the re-
mainder of their sum s when divided by 4.
(b) There are 24 possibilities for the hidden card. How many numbers
are there from {0, . . . , 23} which are congruent to s modulo 4?
(c) From the last question, you know that there are N numbers from
{0, . . . , 23} that are congruent to s modulo 4. The assistant wants to
somehow give you a number from 0 up to N − 1 which will give rise
to the number that gives away the hidden card for you.
The assistant lays out the three cards to you in this order ...
! ! !
7 10 3
r r q
i. When represented as numbers from {0, . . . , 26}, let di be the num-
ber of displayed cards appearing to the right of the i-th card that
are smaller than it. Find d1 and d2 .
ii. Compute p = 2d1 + d2 . You should get a number from {0, . . . , N −
1}.
iii. Now find R = 4p + (−s mod 4). The hidden card is the R-th
possible card. Be careful to start counting at 0 and to skip over the
three cards we already have; the hidden card may not be the R-th
card from the deck. Which card do you get?

14. Prove that a number is divisible by three if and only if the sum of
its digits is divisible by three. (Hint: The first step is to express the
unknown number N as some unknown sum of multiples of powers of
10.)

15. Find the day of the week you were born on by using the following
formula:

W = ( D + b2.6(( M + 9) mod 12) + 2.4c − 2C + b5Y/4c + bC/4c − b((M + 9) mod 12)/10c) mod 7

where
fundamental concepts in mathematics 51

• D is the day (1 to 31),


• M is the month (1 to 12),
• C is the century (2011 has C = 20),
• Y is the year (2011 has Y = 11),
• W is t he week day (Sunday=0,. . . ,Saturday=6).

The symbol bxc means the “integer part” of x (i.e., round down to the
nearest integer).
3
Rings beyond numbers

We investigate polynomial rings and their properties, to see which of those for the integers also hold for polyno-
mials. We finish with algebraic numbers and the famous impossible problems of antiquity. In the middle, we come
across one of the most important themes of this course: equivalence relations. From this chapter until the end of the
notes, the idea of an equivalence relation will crop up continuously (no pun intended for Chapter ??).

3.1 Rings and fields

By doing some elementary number theory first, we will have some moti-
vation to study the greater context; the theory of rings. The study of rings
really began in the work of Richard Dedekind1 in his seminal work Vor- 1 http://www-history.mcs.st-and.ac.uk/
HistTopics/Ring_theory.html
lesungen über Zahlentheorie (1879, 1894) and was invented as a generali-
sation of different algebraic systems with common properties, and the early
beginnings in the theory of rings were very much inspired by the quest to
prove Fermat’s Last Theorem. The term ring was later coined by David
Hilbert.
In the definition we use below, we will always assume that a ring has
a unit; sometimes these are known specially as unital rings. It seems the
definition is very long, but once the student learns what a group is (in a later
course) it becomes much simpler.

Definition 3.1.1 (Ring). A ring is a set R equipped with two associative


binary operations + and · called addition and multiplication, satisfying the
following axioms: We use 0 and 1 informally for the additive
and multiplicative inverses for a ring. Of
(R, +) is an abelian group course, we might use other notion in a
particular context, for example, O and I in
a matrix ring.
• Addition is commutative.
• There is an additive identity called 0 in R, such that for all a ∈ R we
have a + 0 = a.
• Every element has an additive inverse. That is, for each element
a ∈ R, there is an element b ∈ R such that a + b = 0.

(R \ {0}, ·) is a monoid

• There exists a multiplicative identity called 1 in R such that for all


elements a ∈ R we have a · 1 = a.

Distributive laws
54 john bamberg

• For all a, b, c ∈ R, the equation a · (b + c) = (a · b) + (a · c) holds.


• For all a, b, c ∈ R, the equation (a + b) · c = (a · c) + (b · c) holds.

Example 3.1.2.

• The set of integers form a ring, and the set of n × n matrices over R form
a ring with the usual addition and multiplication operations.

• However, N is not a ring as no element has an additive inverse, nor is


there an additive identity!

• Under the operations of ⊕n and ⊗n , the set Zn forms a ring.

• Later we will see how polynomials form a ring. Polynomial rings are the
bread-and-butter of algebraic number theory and algebraic geometry,
and their influence and application to 20th -century mathematics cannot
be understated.

A field is a ring where there is extra multiplicative structure. The applied


mathematicians use the word field to mean something entirely different!

Definition 3.1.3 (Field). A field is a ring ( F, +, ·) such that ( F \ {0}, ·) is


an abelian group. That is,

• · is commutative: a · b = b · a for all a, b ∈ F.

• Every element has a multiplicative inverse. That is, for each element
a ∈ F, there is an element b ∈ R such that a · b = 1.

Example 3.1.4. The set of real numbers R and the set of complex numbers
C are both fields, and so too is the set of rational number Q. The integers
Z, however, do not form a field since 2 does not have a multiplicative
inverse. The invertible n × n matrices over R only form a field if n = 1,
since matrix multiplication is not commutative.

The ring and field properties can be explained by looking at their addi-
tion and multiplication tables (particularly if they are finite!).

Example 3.1.5. Consider Z4 . It is a ring and we can write out its addition
and multiplication tables as follows:

⊕4 0 1 2 3 ⊗4 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 0 3 2 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 2 1 0 3 0 3 2 1

For the left-hand table for ⊕4 , we see that it is symmetric about the
diagonal, so this operation is commutative. We also see that it is a Latin
square2 and so every element has an additive inverse (and it is unique). On 2
Recall from your school days that an n × n
the other-hand the table for ⊗4 is symmetric, but not a Latin square. So square made up of n numbers is a Latin
square if every number appears exactly
some elements here have multiplicative inverses, and some don’t, even if once in each row and column.
we are careful to only look at the nonzero elements. If the nonzero elements
form a Latin square for ⊗4 , then this is the same as saying that the ring is a
field.
fundamental concepts in mathematics 55

Question 3.1.6. When do we get Latin squares?


When n is not prime, can we decide if an element of Zn has a multi-
plicative inverse? Such elements are called units of Zn . For example, in Z4
the units are just 1 and 3; there is no element s such that 2 ⊗4 s = 1.
Definition 3.1.7 (Units of a ring). A nonzero element r of a ring R is a unit
if there exists an element s such that r · s = 1. That is, r has a multiplicative
inverse.
Lemma 3.1.8. An element a ∈ Zn is a unit if and only if gcd(a, n) = 1.
 Proof: To be done in lectures. 

Corollary 3.1.9. Zn is a field if and only if n is prime.


 Proof: To be done in lectures. 

3.2 Ideals

The term ideal was introduced by Dedekind at the birth of the theory of
rings. An ideal is a special sub-ring of a ring which in some sense absorbs
elements of the ring when they are multiplied by elements inside the ideal.
First, let us consider the simplest ring, the ring of integers Z.
Example 3.2.1. Consider the even numbers 2Z of Z. It is closed under
addition and multiplication and so forms ring; that is, 2Z is a subring of Z.
Now let 2x be a typical element of 2Z, and take another element of Z, say
z. Then 2x · z is divisible by 2 and hence the product of 2x with z is an even
number. So we have the property that

(∀z ∈ Z)(∀y ∈ 2Z) x · y ∈ 2Z.

So 2Z is what we call an ideal of Z.


Are there other ideals of Z, and why do we care about them? We will
see in a later section that we can form a new ring from an ideal, which in
some sense inherits the basic properties of the original ring but we collapse
the ideal so that anything in it is treated like “zero”.
Example 3.2.2 (A more elaborate example). Take all of the integers 7Z
that are multiples of 7. Again, if we multiply something inside of 7Z with
something outside of 7Z, the result lies inside 7Z. But there is more we can
do with this interesting set of numbers. We can consider all translates of this
set:

1 + 7Z :={. . . , −13, −6, 1, 8, 15, 22, . . .}


2 + 7Z :={. . . , −12, −5, 2, 9, 16, 23, . . .}
3 + 7Z :={. . . , −11, −4, 3, 10, 17, 24, . . .}
4 + 7Z :={. . . , −10, −3, 4, 11, 18, 25, . . .}
5 + 7Z :={. . . , −9, −2, 5, 12, 19, 26, . . .}
6 + 7Z :={. . . , −8, −1, 6, 13, 20, 27, . . .}
7 + 7Z :={. . . , −7, 0, 7, 14, 21, 28, . . .}

Notice that the last example gives us 7Z back again, and they will keep
repeating when we use higher values for the translate. For example
56 john bamberg

15 + 7Z is the same as 1 + 7Z, since 15 ≡7 1.

These sets themselves form a ring3 if we consider the following operations: 3


This is the first time you’ve really seen
some high-order abstractness!
Addition: (a + 7Z) ⊕ (b + 7Z) := (a ⊕7 b) + 7Z

Multiplication: (a + 7Z) ⊗ (b + 7Z) := (a ⊗7 b) + 7Z

You might notice here that what we end up with is really not a new ring at
all: it looks and behaves just like Z7 ! The idea of rings being the same is
pursued further in a later course.

Lemma 3.2.3. The only ideals of Z are of the form nZ for some integer
n > 2.

? Proof: To be done in lectures. 

We will come back to ideals later once we have done equivalence rela-
tions and polynomial rings.

3.3 An important interlude: Equivalence relations, partitions and


quotients

We saw in the previous section an example of a partition of a set. A more


common example is the division of the population of Australia into states
and territories.

Definition 3.3.1 (Partition). Let P a collection of non-empty subsets of a


set S . We say that P is a partition of S if it satisfies the following:

Covers S : Every element of S lies in some element of P. In other words,


S = ∪P.

Disjointness: For every pair P1 , P2 ∈ P, we have P1 ∩ P2 = ∅. That is,


different elements of P are disjoint.

The elements of the partition are sometimes called cells or parts of the
partition.

Example 3.3.2. The even and odd integers form a partition of the integers.
As a partition, we would write it as

{E, O}

where E is the set of even integers and O is the set of odd integers. So a
partition4 is a “set of sets”. So in this sense, the set of integers is like the 4
One of the most common difficulties we
population of Australia, only we divide it up according to West Australians see in this course is that the student does
not understand that a partition is a set of
and non-West Australians! sets. That is why I deliberately used the
term “collection” in the definition to make
Example 3.3.3. The singleton subsets {s} of a set S form a partition, and this clear.
we would write it as
P := {{s} : s ∈ S }.

For example, the collection of singleton subsets of {1, 2, 3, 4, 5} are

{{1}, {2}, {3}, {4}, {5}}.


fundamental concepts in mathematics 57

The whole set S itself gives a partition with just one part!

P := {S }.

These two examples are considered the trivial partitions of a set.

Example 3.3.4. The translates of an ideal of a ring form a partition of the


ring. For the example we saw in the last section, we have that

{7Z, 1 + 7Z, 2 + 7Z, 3 + 7Z, 4 + 7Z, 5 + 7Z, 6 + 7Z}

is a partition of Z with seven parts, and each part is an infinite subset.

To prove a collection P of subsets of a set S is a partition:


- We will show that the union of P is S . Let s ∈ S . We will find a part P
of P such that s ∈ P. . . .
Figure 3.1: An ‘artist’s’ impression of the
partition of Z into the translates of 7Z.
...

- We will show that any two different elements P1 and P2 of P are


disjoint, by using the contrapositive. Suppose P1 ∩ P2 is nonempty, that
is, there is an element x ∈ P1 ∩ P2 . We will show that P1 = P2 (i.e.,
P1 ⊆ P2 and P2 ⊆ P1 ). . . .

Recall that the “disjointness” part of the


Now we will look at particular relations known as equivalence relations. definition of a partition requires us to prove
that if P1 , P2 , then P1 ∩ P2 = ∅.
But first, some examples and some jargon. Let ∼ be a relation on a set X. The contrapositive of this statement is
Then: equivalent: If P1 ∩ P2 , ∅, then P1 = P2 .
I have found it is easier to prove the
contrapositive statement than the original
∼ is reflexive if: x ∼ x for all x ∈ X; one! You will see when you try it.

∼ is symmetric if: x ∼ y implies y ∼ x (for all x, y ∈ X);

∼ is transitive if: x ∼ y and y ∼ z implies x ∼ z (for all x, y, z ∈ X).

Example 3.3.5.

• “friendship” is an equivalence relation (at least it should be!), since


everyone is a friend of themselves (reflexivity), if I’m a friend of you then
you’re a friend of me (symmetry), and a friend of a friend is a friend
(transitivity).

• “<” is not an equivalence relation on R as it is not reflexive.

• “6” is not an equivalence relation on R as it is not symmetric (but it is


reflexive and transitive!),

• By Lemma 2.5.4, ≡n is an equivalence relation.

• Equality is an equivalence relation.

Definition 3.3.6 (Equivalence classes). Let ∼ be a relation on a set X. Then


for each x ∈ X, the equivalence class containing x is the set

[ x] := {y ∈ X : y ∼ x}.
58 john bamberg

Example 3.3.7.

• For the “friendship” is equivalence relation the equivalence class con-


taining you is the set of all your friends.

• For congruence modulo 7, we have the following congruence classes:

[0] = 7Z
[1] = 1 + 7Z
..
.
[6] = 6 + 7Z

Notice in this example that we could write [7] for [0], we could write
[−13] for [1]. There are many ways to write down any equivalence class.

Example 3.3.8 (Example: Spokes-people of soccer teams). Suppose we


have a soccer competition with 12 teams. So the set of S players in the
competition, is split up into 12 teams, or in other words, the teams are a
partition of S . Each team has a spokes person who does nothing more than
say “we’ll take one week at a time” and “the boys tried really hard today”.
To identify a team you could just identity the spoke-person. For example,
“Tom Richards” team is the same as saying “the Numbats”. But one day,
Tom Richards is deposed of being spokes-person, and another person, from
his team, Bill Bates, is put in his place. It doesn’t matter much because this
spokes-person can also recite the same things that Tom says every week. So
to identify the “the Numbats” I could also say “Bill Bates’ team”.

The tale above is about representatives of an equivalence class. The


equivalence relation here is “being in the same team”. So we see that a
partition gives rise to an equivalence relation, and vice-versa. In the math-
ematical example we see above, the congruence modulo 7 relation gives us
a partition of Z. This phenomenon is one of the most important concepts in
pure mathematics.

Theorem 3.3.9 (The Equivalence Relation Theorem). The equivalence


classes of an equivalence relation on a set X forms a partition of X. Con-
versely, a partition P of X yields an equivalence relation on X defined by
x1 ∼P x2 if and only if x1 and x2 belong to the same part of P.

 Proof: To be done in lectures. 

Lemma 3.3.10. Let ∼ be an equivalence class on a set X, and let x, y ∈ X.


Then the following are equivalent:

(i) x ∼ y

(ii) x ∈ [y]

(iii) [ x] = [y].

 Proof: We will show that (iii) =⇒ (ii) =⇒ (i) =⇒ (iii).

(iii) =⇒ (ii): Suppose [ x] = [y]. By reflexivity, x ∈ [ x] and so x ∈ [y].

(ii) =⇒ (i): Suppose x ∈ [y]. Then, by definition of [y], we have x ∼ y.


fundamental concepts in mathematics 59

(i) =⇒ (iii): Suppose x ∼ y. In order to show that [ x] = [y], we need


to show that [ x] ⊆ [y] and [y] ⊆ [ x]. Let z ∈ [ x]. Then z ∼ x, and so by
transitivity, z ∼ y. Therefore, z ∈ [y] and [ x] ⊆ [y]. Conversely, if w ∈ [y],
we have w ∼ y. By symmetry, y ∼ x and then by transitivity w ∈ x.
Therefore, w ∈ [ x] and [y] ⊆ [ x]. Thus, [ x] = [y].


Definition 3.3.11 (Quotient). The set of all equivalence classes of an


equivalence relation ∼ of a set X is called the quotient of X by ∼.
We will see a classical example of a quotient in the next section.

3.4 A construction of the rational numbers

Let X be equal to the Cartesian product Z × (Z \ {0}). This is the set of


all ordered pairs (a, b) of integers where b , 0. Let ∼ be the relation on X
defined by
(a, b) ∼ (a0 , b0 ) ⇐⇒ ab0 = a0 b.
It turns out that ∼ is an equivalence relation:
Reflexivity: Let (a, b) ∈ X. Then ab is certainly equal to itself, and hence
(a, b) ∼ (a, b).
Symmetry: Let (a, b), (a0 , b0 ) ∈ X and suppose that (a, b) ∼ (a0 , b0 ). Then
ab0 = a0 b. Since ‘=’ is symmetric, we know that a0 b = ab0 . Therefore,
(a0 , b0 ) ∼ (a, b).
Transitivity: Let (a, b), (a0 , b0 ), (a00 , b00 ) ∈ X and suppose that (a, b) ∼
(a0 , b0 ) and (a0 , b0 ) ∼ (a00 , b00 ). Then ab0 = a0 b and a0 b00 = a00 b0 . Now
we can multiply the first of these equations through by b00 and obtain

ab0 b00 = a0 bb00 .

We can then use the second equation to substitute a00 b0 in for a0 b00 :

ab0 b00 = a0 bb00 = (a0 b00 )b = (a00 b0 )b.

Since b0 is nonzero, we can divide each side by b0 :

ab00 = a00 b.

Therefore, (a, b) ∼ (a00 , b00 ).


So all three properties of an equivalence relation are satisfied. It will be
interesting now to compute a few equivalence classes.

[(1, 1)] = {(a, b) ∈ X : a = b} = {(a, a) : a ∈ Z \ {0}}


[(1, 2)] = {(a, b) ∈ X : 2a = b} = {(a, 2a) : a ∈ Z \ {0}}
[(3, 5)] = {(a, b) ∈ X : 5a = 3b} = {(3a, 5a) : a ∈ Z \ {0}}
Perhaps we could write down a convenient shorthand for an equivalence
class [(a, b)]? How about
a
[(a, b)] −→ .
b
Voilé! We see that the rational numbers are nothing else but the quotient of
X by the equivalence relation ∼.
60 john bamberg

3.4.1 Addition and multiplication on the rationals


In primary school we learnt how to add fractions:

a c ad + bc
+ := .
b d cd
We can see by our construction in the previous section what this looks like
in terms of an operation on equivalence classes:

[(a, b)] + [(c, d )] := [(ad + bc, cd )].

Likewise for multiplication:

[(a, b)] × [(c, d )] := [(ac, bd )].

3.4.2 The field of fractions of a ring


The construction of the reals in the previous section is an example of the
construction of the field of fractions of a ring R.

Definition 3.4.1 (Field of fractions). Let R be a ring that has no zero-


divisors5 . Let X = R × R \ {0} and let ∼ be the equivalence relation defined 5
A zero-divisor of a ring R is an element x
by such that there exists a nonzero element y
such that xẏ = 0. For example, 2 is a zero
(a, b) ∼ (a0 , b0 ) ⇐⇒ ab0 = a0 b. divisor of Z6 , since 2 ⊗6 3 = 0.

Define Frac(R) to be the set of equivalence classes of ∼ equipped with the


following operations:

Addition: [(a, b)] + [(c, d )] := [(ad + bc, cd )],

Multiplication: [(a, b)] × [(c, d )] := [(ac, bd )].

It turns out (see Exercise 3.15.2) that the operations of addition and
multiplication on Frac(R) are well-defined and that Frac(R) is in fact a
field! The multiplicative identity is [(1, 1)], where 1 is the multiplicative
identity of R, and a non-zero element [(a, b)] has multiplicative inverse
[(b, a)]. Notice that a and b need not have multiplicative inverses in R! So
we saw that the rational numbers are realised as Frac(Z), where we make a
shorthand notation for the element [(a, b)] be writing a fraction ab . Here are
some other examples.

Example 3.4.2.

• If R is a field, then Frac(R) gives us a field that is equivalent to the


original field R. So Frac(Q) is the same as Q and Frac(R) is the same
as R.

• Let R[ x] be the set of polynomials with real coefficients. Then Frac(R[ x])
is the set of rational functions.

• We will see in the last chapter another interesting construction, whereby


the p-adic rationals are the field of fractions of the p-adic integers.
fundamental concepts in mathematics 61

3.5 Geometric things as quotients

Let I be the unit interval [0, 1]. These are the real numbers x satisfying 0 6 (0, 1) (1, 1)
x 6 1. Now take the Cartesian product I × I. We can see this geometrically
in the Cartesian plane.
Consider the following relation ∼ on I × I:
(0, 0) (1, 0)

( x, y) ∼ ( x0 , y0 ) ⇐⇒ y = y0 and ( x = x0 or |x − x0 | = 1).
It turns out that ∼ is an equivalence relation! Let’s see what it means geo-
metrically. We see that if we take a point on the left-hand side of the square,
then it is equivalent to a point on the right-hand side of the square at the
same longitude. So what we are doing is identifying the sides of the square.
We can think of curling the plane until the sides of the square meet; we end
up with a cylinder! So geometrically, we can model the quotient of I × I by
∼ with a cylinder.
We can also model some other surfaces with equivalence relations, and
this will be explored in the Appendix.

3.6 Polynomials are like numbers

We have seen polynomials in high school, such as

x3 + 3x2 + 1

and the emphasis there was to understand their sets of values, to draw
graphs of these values and find out when they are zero. We will not be so
interested in their values, rather, we will be interested in polynomials as
objects themselves and their entirety. This is what we have been doing with
numbers; we are not interested in single numbers on their own, rather about
the properties of numbers in general and how they interact.
Definition 3.6.1 (Polynomial). A polynomial over a ring R is an expression

an xn + an−1 xn−1 + · · · + a1 x + a0

where the ai are elements of R, called the coefficients of the polynomial.


So in other words, the powers of x are essentially place holders for the
coefficients. We could have easily defined polynomials as (n + 1)-tuples
(an , an−1 , . . . , a1 , a0 ), but as we shall see later, there are some advantages in
having easily digestible notation for polynomials.
Example 3.6.2.
Constant polynomials We will use the functional notation a for the con-
stant function x 7→ a, where a is a fixed element of a given ring. We will
also use this notation for the constant polynomial.

Polynomials of matrices Recall that the set of n × n matrices over a ring


R form another ring Mn×n (R). We can then consider polynomials of
matrices. For example, the characteristic polynomial of
 
 2 0
 
−1 1

is x2 − 3x + 2.
62 john bamberg

Now we can add polynomials simply by adding corresponding coeffi-


cients, and multiplying polynomials is just like you did in high school. For
example,
( x2 + 3)( x + 1) = x3 + x2 + 3x + 3.
So the polynomials have an addition operation and a multiplication op-
eration, we also have a zero polynomial, and multiplying by the constant
polynomial 1 does not change a polynomial.

Definition 3.6.3 (Polynomial ring). The set of all polynomials R[ x] over R


is a ring, called the polynomial ring of R.

Here are some technical things we need to care about, to avoid con-
fusion. The largest number i such that ai , 0 is called the degree of the
polynomial f = an xn + an−1 xn−1 + · · · + a1 x + a0 and the shorthand
notation is deg( f ). If ai = 0, we ignore writing this term of the polynomial
down. We will also write xi when the coefficient is 1.

As a convention, the degree of the zero polynomial is −1, whereas it


is 0 for all other constant polynomials.

Example 3.6.4. Let R be a ring and let P be the set of elements of R[ x] that
have degree 0 or −1. Then P is just like R! The function

a0 → a0

is a bijection from P to R. Not only is it a bijection, but it is compatible with


the operations of each ring6 . 6
Aside: A function f from a ring R1 to
a ring R2 is a homomorphism if for all
Earlier, we asked the question of which elements of Zn had multiplica- r, r0 ∈ R, we have
tive inverses (i.e., units). Similarly, some polynomial rings have nonzero • f (r +R1 r0 ) = f (r ) +R2 f (r0 ),

elements which do not have multiplicative inverses. • f (r ·R1 r0 ) = f (r ) ·R2 f (r0 ).


If f is also a bijection, then we say that
Example 3.6.5. Recall that in Z6 , the elements 1 and 5 are units since R1 and R2 are isomorphic (greek: same
shape), which means that R1 and R2 are
1 ⊗6 1 = 1 and 5 ⊗6 5 = 1. However, none of the other elements do! For essentially the same up to re-labelling.
example, there is no element b ∈ Z6 such that 2 ⊗6 b = 1, and so 2 is not a
unit of Z6 .

By Lemma 3.1.8, x ∈ Zn is a unit if and only of gcd( x, n) = 1. What


happens for polynomials?

Example 3.6.6. Consider the degree 1 polynomial x in R[ x]. Is it a unit?


If it was a unit, then there would exist a nonzero polynomial g such that
x · g = 1. Let us suppose that7 7
It is wrong to divide by x here! I have
seen before in this course that students feel
g := an xn + an−1 xn−1 + · · · + a1 x + a0 . compelled to work with this equation as
it is presented to them. So they will often
write “thus g = 1/x”. This assumes that
Then x · g would be we can do division in R[ x], and the point
of this example, is that we can’t.
an xn+1 + an−1 xn + · · · + a1 x2 + a0 x

and so would have degree n + 1. Since 1 has degree 0, it follows that


n = −1 and so g is the zero polynomial – a contradiction! Therefore, x is
not a unit in R[ x].
fundamental concepts in mathematics 63

The main theme of this section is to explore number-theoretic analogues


of polynomial rings. It is remarkable how number-like the polynomials
behave. The fundamental relation that we need in order to understand the
multiplicative structure of a polynomial ring is divisibility; as was the case
for the integers in order to establish important properties such as Bézout’s
identity and the canonical factorisation into products of primes.

Definition 3.6.7 (“Divides” relation). Let g and h be two polynomials in


R[ x], where R is a ring. We say that g divides h, and write g | h, if there
exists a polynomial k ∈ R[ x] such that

gk = h.

Example 3.6.8. Consider x + 1, x2 − 1 in Q[ x]. Since ( x + 1)( x − 1) =


x2 − 1, we see that x + 1 | x2 − 1.

3.6.1 Long division


One of the most important things we learn, or should be learning, in school
is long division. There are a number of reasons for this: (i) it is the only
true mathematical algorithm we learn in school, (ii) long division works
for the integers and not the real numbers (because of the Division Rule,
Lemma 2.1.4), (iii) to get a grip on what a rational number is as a decimal
expansion.
I will not revise long division of numbers here, but rather show how it is
done with polynomials, by example.

Example 3.6.9. We would like to divide x4 + 4x3 − x + 5 by x2 + 2x − 1 to


find a remainder and quotient. (Write this out in the box below).

Lemma 3.6.10 (The Division Rule for polynomials). Let f , g, ∈ F [ x] where


F is a field8 . Then there exist elements q, r ∈ F [ x] satisfying 8
Why is a field necessary and why does it
not work for rings? See Exercise 3.15.3.
f = qg + r and deg(r ) < deg(g).

? Proof: The proof essentially is an application of long division. See the


lectures. 
64 john bamberg

A polynomial is monic if its leading coefficient is 1.

Definition 3.6.11 (Greatest common divisor for polynomials). The greatest


common divisor gcd( f , g) of two polynomials f , g of a polynomial ring
F [ x] (where F is a field) is the (unique) monic polynomial d ∈ F [ X ] such
that:

(i) d | f , d | g;

(i) if h | f and h | g, then h | d.

The uniqueness part of the definition follows from Lemma 3.6.10.

Example 3.6.12. Let R = Q and consider f = x3 + x − 2 and g = x4 − 1.


Then gcd( f , g) = x − 1.

How did we find the greatest common divisor of f and g in the last
example? Well, the Euclidean Algorithm works for polynomials just as it
did for integers.

Lemma 3.6.13 (Behind Euclid’s Algorithm for polynomials). Let F be a


field and let f and g be elements of F [ x]. If f = qg + r then gcd( f , g) =
gcd(r, g).

 Proof: The proof is similar to its analogue, Lemma 2.1.7. Suppose


d is a divisor of both f and g. Then d divides f − qg and so d divides
both r and g. Therefore every divisor of both f and g is also a divisor of r.
Conversely, suppose d is a divisor of both g and r. Then d divides qg + r
and hence d divides f . Therefore every divisor of both g and r is also a
divisor of f . Moreover, the monic polynomials that divide f and g are the
same set of monic polynomials that divide r and g. Therefore, gcd( f , g) =
gcd(r, g). 

Example 3.6.14. We will now use the Euclidean Algorithm, but with poly-
nomials as input, to find the greatest common divisor of f = x3 + x − 2 and
g = x4 − 1. (We implicitly use long division throughout). Long division establishes that
x4 − 1 = x( x3 + x − 2) + (−x2 + 2x − 1).
x4 − 1
3 For the next step, we use long division
x x + x−2
again:
−x − 2 −x2 + 2x − 1
x3 + x − 2 = (−x − 2)(−x2 + 2x − 1) + (4x − 4).
− 14 x + 14 4x − 4
Then the last step is easy: −x2 + 2x − 1 =
0
(− 14 x + 14 )(4x − 4)

So the last remainder here is − 14 x + 14 , which we can simply scale by −4


to obtain a monic polynomial dividing both f and g. That is, gcd( f , g) =
x − 1.

In the last example, we saw rational numbers that are not integers ap-
pearing in the left-hand column. This is why it is necessary for the poly-
nomial ring to be defined over a field. It would not ‘work’ if we used Z
instead of Q in the last example.
A consequence of the Euclidean Algorithm for polynomials is a Bézout
identity for polynomials.
fundamental concepts in mathematics 65

Lemma 3.6.15 (Bézout for polynomials). Let f , g be two polynomials in


F [ x]. Then there exist two polynomials m, n ∈ F [ x] such that

gcd( f , g) = m · f + n · g.

The next theorem gives us a way to easily check for affine divisors of
polynomials.
Theorem 3.6.16 (The Factor Theorem). Let f ∈ F [ x] and let c ∈ F. Then
f (c) = 0 if and only if the polynomial x − c divides f .
 Proof: To be done in lectures. 

Corollary 3.6.17. If f ∈ F [ x] has degree d, then f has at most d zeroes in


F.
 Proof: To be done in lectures. 

3.7 Irreducibility and factorisation

Throughout this section, F will be a field. The irreducible polynomials are


to F [ x] what prime numbers are to Z.
Definition 3.7.1 (Irreducible polynomial). A polynomial p ∈ F [ x] whose
only divisors are constant polynomials and constant multiples of itself is
said to be irreducible (over F).
Example 3.7.2. It is important what the field is that we are referring to.
For example, x2 + 2 is irreducible when considered as an element of Q[ x],
however, it is reducible if we consider it as an element of R[ x]:
√ √
x2 − 2 = ( x − 2)( x + 2).

Lemma 3.7.3 (Euclid’s Lemma for polynomials). Let p ∈ F [ x] be irre-


ducible over F and let f , g ∈ F [ x]. If p | f · g then p | f or p | g.
? Proof: Suppose p divides f · g but p does not divide f . Then gcd( p, f ) is
the constant polynomial 1, since gcd( p, f ) is a monic polynomial properly
dividing the irreducible polynomial p. So by Lemma 3.6.15 there exist
polynomials m, n ∈ F [ x] such that

1 = m · f + n · p.

If we multiply through by g, we see that

g = m · ( f · g) + n · p · g

and we know that p divides the bracketed term above. So p divides the
right-hand side and hence p divides g. Therefore, if p does not divide f ,
then p divides g (which is logically equivalent to proving “p divides f or p
divides g”. 

Theorem 3.7.4 (Fundamental Theorem of Polynomial Arithmetic). The


factorisation of a polynomial in F [ x] into irreducible factors is unique up to
ordering and constant factors.
We will not prove this result as it is almost identical to the Fundamental
Theorem of Arithmetic, with just polynomials put in place of integers and
“irreducible” put in place of “prime” (Theorem 2.2.2).
66 john bamberg

3.8 Gauß’s Lemma

Here we look at when polynomials are irreducible over the integers.

Definition 3.8.1 (Content of a polynomial). The content I ( f ) of a polyno-


mial f ∈ Z[ x] is the greatest common divisor of its coefficients.

Example 3.8.2. So I (30x3 − 18x2 + 9) = 3.

Theorem 3.8.3 (The Content Theorem). If f , g ∈ Z[ x], then I ( f ) I (g) =


I ( f · g).

? Proof: Let f 0 = f /I ( f ) and g0 = g/I ( f ). Notice that these new


polynomials both have integer coefficients, because the gcd of the co-
efficients divides each coefficient. Hence I ( f 0 ) = I (g0 ) = 1. Now
f · g = f 0 · g0 I ( f ) I (g), and so

I ( f · g) = I ( f 0 · g0 ) I ( f ) I (g).

So it suffices to show that I ( f 0 · g0 ) = 1. Suppose f 0 = am xm + · + a1 x +


a0 and g0 = bn xn + · + b1 x + b0 , and suppose for a proof by contradiction
that there is some prime number p dividing I ( f 0 · g0 ). That is, p divides
each coefficient of f 0 · g0 . Now the contents of f 0 and g0 are both equal to
1, so there is a least i and a least j such that p does not divide ai and p does
not divide b j . However, the coefficient of xi+ j in f 0 · g0 is not divisible by
p, and this coefficient is

a0 bi+ j + a1 bi+ j−1 + · · · + ai b j + ai+1 b j−1 + · · · + ai+ j b0 .

This is a contradiction since p divides all terms above, except ai b j . There-


fore, I ( f 0 · g0 ) = 1. 

Lemma 3.8.4 (Gauß’s Lemma). Let g and h be two monic polynomials


in Q[ x[. If g · h has integer coefficients, then so too have g and h. That is,
g · h ∈ Z[ x] =⇒ g, h ∈ Z[ x].

? Proof: To be done in lectures. 

So this means that if a polynomial with integer coefficients is reducible


over Q, then it is also reducible over Z. Moreover, the factorisation of a
polynomial over Z into irreducibles is unique up to order and signs (multi-
plying by −1).

3.9 Eisenstein’s Criterion

There is an industry of “computational number theory” behind finding ways


to determine if an integer is a prime number or not, and it has applications
to cryptography and information security. One of the most basic “primality
tests” comes from Fermat’s Little Theorem: to determine if a number n is
prime, we choose random integers a satisfying 1 6 a < p and test to see if
an−1 ≡n 1 holds. In this section, we look at one of the most basic tests for
the irreducibility of a polynomial.
fundamental concepts in mathematics 67

Theorem 3.9.1 (Eisenstein’s Irreducibility Criterion). If f = xn +


an−1 xn−1 + · · · + a1 x + a0 ∈ Z[ x] and p is a prime number dividing
each coefficient ai where i ∈ {0, 1, . . . n − 1}, but p2 does not divide a0 , then
f is irreducible over Q.

 Proof: To be done in lectures. 

Example 3.9.2. The polynomial x4 + 3x2 + 15x + 6 is irreducible over


Q, since 3 divides each non-leading coefficient, but 9 does not divide the
constant coefficient.

Example 3.9.3. It is not difficult to prove that x3 − 3x − 1 is irreducible


over Q by noticing that if it was reducible, then it would have a linear
factor, and then we simply observe that there are no rational roots of this
polynomial, and then we apply the Factor Theorem (3.6.16). There is an-
other way, and it involves substitution.

Lemma 3.9.4. Let f be a polynomial in Z[ x] of the form

f = xn + an−1 xn−1 + · · · + a1 x + a0

and let g be another polynomial in Z[ x] of the form g = xm + bm−1 xm−1 +


· · · + b1 x + b0 . Let f ◦ g be the polynomial where we have substituted g for
the indeterminate x of f , to obtain another a monic polynomial of Z[ x] of
degree mn. If f ◦ g is irreducible over Q, then so too is f .

 Proof: We will prove the contrapositive. Suppose f is reducible over Q.


Then f = a · b for two polynomials a, b ∈ Q[ x]. We see than that f ◦ g
would factorise as a ◦ g and b ◦ g, and hence f ◦ g is reducible over Q. 

Example 3.9.5. Let f = x3 − 3x − 1. We substitute g = x + 1 to create a


polynomial f ◦ g that we can apply Eisenstein’s Criterion to:

f ◦ g = ( x + 1)3 − 3( x + 1) − 1 = x3 + 3x2 + 3x + 1 − 3x − 3 − 1 = x3 + 3x2 − 3.

So we see that 3 divides the non-leading coefficients and 9 does not divide
the constant coefficient. Therefore, by Eisenstein’s Criterion (Theorem
3.9.1), f ◦ g is irreducible over Q, and so by Lemma 3.9.4, f is also irre-
ducible over Q.

3.10 Clock arithmetic on polynomials

So far, we have seen that polynomials behave a little bit like numbers:
we have a Division Rule for F [ x], we have the greatest common divisor
function and the Euclidean Algorithm, and we have a notion of prime
numbers (i.e., the irreducible polynomials). We will now look at clock
arithmetic on R[ x] and an analogue of the ring Zn of integers modulo n.

Definition 3.10.1 (Congruence modulo a polynomial). Let p be a polyno-


mial in R[ x]. Define ≡ p on R[ x] by

f ≡ p g ⇐⇒ p | f − g.

Lemma 3.10.2. ≡ p is an equivalence relation on R[ x].


68 john bamberg

 Proof:

Reflexivity: Let f ∈ R[ x]. Clearly, p divides the zero polynomial, and so


f ≡p f .

Symmetry: Suppose f ≡ p g. Then p divides f − g, and hence p divides


−( f − g) = g − f . Therefore, g ≡ p f .

Transitivity: Suppose f ≡ p g and g ≡ p h. Then p divides f − g and p


divides g − h. So p divides the sum of f − g and g − h, and hence p
divides f − h. Therefore, f ≡ p h.

Recall that the quotient of R[ x] by ≡ p is the set of equivalence classes


of ≡ p , the congruence classes modulo p. We will write R[ x]/ ≡ p for this
quotient. We can also define addition and multiplication on R[ x]/ ≡ p in the
following natural way:

Definition 3.10.3 (Addition and multiplication on R[ x]/ ≡ p ). For all


polynomials a, a0 ∈ R[ x], let:

Addition: [a] ⊕ [a0 ] := [a + a0 ];

Multiplication: [a] ⊗ [a0 ] := [aa0 ].

We leave it as an exercise (see the Exercises at the end of this chapter)


that the addition and multiplication operations defined above, are well-
defined. Moreover,

Lemma 3.10.4. R[ x]/ ≡ p with the operations of addition and multiplica-


tion defined above (3.10.3) forms a ring.

The proof of this result is simple, but perhaps a little tedious, so we leave
it to the reader to verify that it is a true statement.

Example 3.10.5. It is important now to give an example of how this all


works, as we have probably presented the most abstract idea in this course;
a quotient of a polynomial ring. Consider R[ x], the of polynomials with
real coefficients, and let p be the polynomial x2 + 1. What does a congru-
ence class modulo p look like?
Let’s take the polynomial f := x5 + 3x2 − 2. When we do long division
to f modulo p, we find that

x5 + 3x2 − 2 = ( x3 − x + 3) · p + x − 5.

In other words, we have a remainder of x − 5 when dividing f by p. So


f ≡ p x − 5, or in other words,

[ f ] = [ x − 5].

In fact, we will always obtain a remainder that is a constant polynomial or


a linear polynomial, since the degree of the remainder has to be less than
the degree of p. So every congruence class modulo p is of the form

[ax + b]
fundamental concepts in mathematics 69

for some a, b ∈ R. This quotient ring has a very interesting property.


Consider the congruence class [ x] and multiply it by itself:

[ x ] ⊗ [ x ] = [ x2 ] .

Now p clearly divides the difference of x2 and the constant polynomial −1,
so x2 ≡ p −1 and hence
[ x] ⊗ [ x] = [−1].

Does this look familiar? Yes, there is a natural bijection9 between the 9
You can check that this really does work.
quotient R[ x]/ ≡ p and the complex numbers given by: If you add [ax + b] and [a0 x + b0 ], you
get [(a + a0 ) x + (b + b0 )], and their
product is [aa0 x2 + (a0 b + ab0 ) x + bb0 ].
[ax + b] 7→ b + ai. But aa0 x2 + (a0 b + ab0 ) x + bb0 ≡ p
(a0 b + ab0 ) x + bb0 − aa0 , and so the
Example 3.10.6 (A field with 4 elements). Consider the field Z2 consisting product of [ax + b] and [a0 x + b0 ] is what it
would be if viewed as complex numbers.
of the two elements 0 and 1. This itself is a quotient ring of Z, where 0 rep-
resents the even numbers and 1 represents the odd numbers. Recall that Z2
has the interesting property that 1 + 1 = 0. This makes the polynomial ring
Z2 [ x] particularly interesting. Consider all of the quadratic polynomials of
Z2 [ x]:
x2 , x2 + 1, x2 + x, x2 + x + 1.

Clearly the first and third are reducible since x divides both of these. The
second one also turns out to be reducible:

( x + 1)( x + 1) = x2 + x + x + 1 = x2 + (1 + 1) x + 1 = x2 + 0x + 1 = x2 + 1.

Now consider the polynomial x2 + x + 1. If we do long division with


each of the degree 1 polynomials10 of Z2 [ x], we will find that: 10
The only degree 1 polynomials in Z2 [ x]
are x and x + 1.
x2 + x + 1 = x ( x ) + x + 1
x2 + x + 1 = x( x + 1) + 1.

The denominators of our division are in the brackets above. So x2 + x + 1


is irreducible. Call this polynomial p. We will now look at the quotient of
Z2 [ x] by ≡ p . Here are the equivalence classes of ≡ p :

[0], [1], [ x], [ x + 1]

Notice, for example, that every polynomial of Z2 [ x] belongs to one of these


equivalence classes. If we do long division by p we will always have a re-
mainder that has degree 0 or 1, and the representatives of these equivalence
classes above are precisely these polynomials. Note what happens when we
add two equivalence classes:

[ x] ⊗ p [ x + 1] = [ x + x + 1] = [(1 + 1) x + 1] = [0x + 1] = [1].

When we multiply, we must remember to take the remainder after dividing


by p:

[ x ] ⊕ p [ x + 1 ] = [ x2 + x ] = [ x2 + x + 1 + 1 ] = [ p + 1 ] = [ 1 ] .

Let us now look at the addition and multiplication tables of Z2 [ x]/ ≡ p :


70 john bamberg

⊕p [0] [1] [ x] [ x + 1] ⊗p [0] [1] [ x] [ x + 1]


[0] [0] [1] [ x] [ x + 1] [0] [0] [0] [0] [0]
[1] [1] [0] [ x + 1] [ x] [1] [0] [1] [ x] [ x + 1]
[ x] [ x] [ x + 1] [0] [1] [ x] [0] [ x] [ x + 1] [1]
[ x + 1] [ x + 1] [ x] [1] [0] [ x + 1] [0] [ x + 1] [1] [ x]

Notice that the multiplication tables are very different from the table we
got for Z4 (see Example 3.1.5). First of all, if we delete the first row and
column, we get a Latin square! In other words, Z2 [ x]/ ≡ p is a field.

Theorem 3.10.7. Let p be a polynomial in F [ x], where F is a field. Then


the quotient ring F [ x]/ ≡ p is a field if and only if p is irreducible over F.

? Proof: The multiplication operation ⊗ p on F [ x]/ ≡ p is commutative,


so in order to show that F [ x] ≡ p is a field, we only need to show that every
element has a multiplicative inverse. We prove the “⇐” direction first.
Suppose p is irreducible over F, and let [ f ] be a nonzero equivalence class
of ≡ p (i.e., an element of F [ x]/ ≡ p ). To be nonzero, we mean that f . p 0,
or in other words, p does not divide f .
By Bézout’s identity for polynomials (Lemma 3.6.15), there exist poly-
nomials m and n in F [ x] such that

gcd( f , p) = m · f + n · p.

Now gcd( f , p) is a divisor of p, and p is irreducible, so there are two


possibilities:

(i) gcd( f , p) = k · p for some nonzero k ∈ F, or

(ii) gcd( f , p) is a constant polynomial, say k.

The first case is impossible, since otherwise, we would have p dividing f .


Therefore, gcd( f , p) = k, and moreover, k , 0. So we divide through by
the constant k and find that
! !
1 1
1 = m · f + n · p,
k k
 
or in other words, 1k m · f ≡ p 1. So [ 1k · m] is the multiplicative inverse of
[ f ], and we have thus shown that F [ x]/ ≡ p is a field.
Conversely, suppose p is NOT irreducible. Then there exist polynomials
a, b ∈ F [ x] such that p = a · b and the degrees of a and b are smaller than
the degree of p. In particular, a and b are not divisible by p and so [a] and
[b] are nonzero elements of F [ x]/ ≡ p . However,

[a] ⊗ p [b] = [ p] = [0]

which shows that F [ x]/ ≡ p has zero divisors. Therefore, F [ x]/ ≡ p is not a
field.
All up, we have shown that the quotient ring F [ x]/ ≡ p is a field if and
only if p is irreducible over F. 
fundamental concepts in mathematics 71

3.11 Congruence modulo a polynomial and ideals

Here we see that there is a direct relationship between the equivalence


classes of ≡ p and the ideal generated by p. The fancy term for this is a
principal ideal.

Definition 3.11.1 (Principal Ideal of R). Let p be an element of a commuta-


tive ring11 R. Then the following subset hpi := {r · p : r ∈ R} is an ideal of 11
This is just a ring where the multiplica-
R and is called the principal ideal generated by p. tion operation is commutative: a · b = b · a
for all a, b ∈ R.

Lemma 3.11.2. Let p be an element of R[ x], where R is a commutative


ring. Then the equivalence classes of ≡ p can be written as translates of the
principal ideal hpi:
[ f ] = f + hpi.

So the quotient R[ x]/ ≡ p can be realised in the usual sense in ring the-
ory, that it is a quotient R/I of a ring by an ideal. This is not an emphasis of
this course, but will be in a later course on ring theory.

3.12 Algebraic versus transcendental


• A ruler may be used to draw a new line,
One the triumphs of abstract algebra was the solution to the famous three extended as far as we like, through any
problems of antiquity. Using the basic constructions of Euclidean geometry, two points. It cannot measure.
we can make numbers from old ones. If we are given a unit length, we • The compass can draw circles.
can make the integers with a straight-edge and compass. In recent times, • We can copy circles.
this beautiful part of mathematics has left the Australian secondary school
curriculum, where in the past we enjoyed learning how to construct surds

like 5 using Euclidean geometry.
The three problems are about constructions in Euclidean geometry that
use only a straight-edge and compass, and there are more details in Section
3.13.

Doubling the cube (Eratosthenes 240 BC):


Can we construct a cube whose volume is twice that of a given cube? 12 12
Given the unit cube, a doubling√would
3
produce a cube with side length 2
Trisecting an arbitrary angle (at least Hippocrates 450 BC):
Given an arbitrary angle, such as π/3, can we construct a third of this
angle?
Squaring the circle (Rhind papyrus 1650 BC):
Can we construct a square whose area is equal to that of a given circle?

These problems can be rephrased in terms of algebraic numbers, and


will be pursued in Section 3.13.

Definition 3.12.1 (Algebraic number). Let a ∈ C. If there is a nonzero


polynomial f ∈ Q[ x] with f (a) = 0, then a is algebraic. Otherwise, a is
transcendental.

Example 3.12.2.

• 2 is algebraic since it is a zero of x2 − 2.

• The complex number i is algebraic since it is a zero of x2 + 1.


72 john bamberg


1+ 5
• The Golden Ratio 2 is algebraic since it is a zero of x2 − x − 1.

• e and π are transcendental13 13


By the Lindemann-Weierstrass Theorem,
which we won’t see in this course.
• Every rational number a/b is algebraic since it is a zero of x − ab .

It is still not known whether π + e or π · e is transcendental, but we do


know (see Exercise 3.15.7) that one of them is transcendental.
In the definition of an algebraic number, we can assume f is monic.

Lemma 3.12.3. Let f ∈ Q[ x] and c ∈ C such that f (c) = 0. Then there is


a unique monic irreducible polynomial m( x) ∈ Q[ x] such that m( x) = 0.
Moreover, m | f .

? Proof: To be done in lectures. 

The polynomial m( x) in Lemma 3.12.3 is called the minimal polynomial


of c.

Example 3.12.4. Let α = 1 + 2 and take f = x2 − 2x − 1. Notice that

α2 = 3 + 2 2 and so α is a zero of f . How do we know if f is irreducible
over Q?

First way: Since f is quadratic, we know that it is reducible if it has a


linear factor. So by Theorem 3.6.16 (see Exercise 3.15.4, we only need

to look at the zeros of f and see if they are rational. Indeed, 1 + 2 and

1 − 2 are irrational!

Second way: We can use Eisenstein’s Criterion (Theorem 3.9.1) and


Lemma 3.9.4. Substitute the polynomial g = x − 1 to obtain the poly-
nomial f ◦ g = ( x − 1)2 − 2( x − 1) − 1 = x2 − 4x + 2. So we see that
2 divides the non-leading coefficients but 4 does not divide the constant
coefficient. Hence, f ◦ g is irreducible over Q, and so too is f .

3.12.1 How big are the algebraic numbers?


Let A be the set of algebraic numbers. In the Exercise 1.11.11 at the end of
the last chapter, we saw there that C is uncountable by noting that there is
a bijection from R × R to C, and the Cartesian product of uncountable sets
yields an uncountable set. The rational numbers are a countable subset of
A, and we see below that the size of A is no bigger than the size of Q.

Theorem 3.12.5. A is countable.

? Proof: To be done in lectures. 

Corollary 3.12.6. The set of transcendental numbers C \ A is uncountable.

 Proof: Suppose the opposite, that C \ A is countable. Then by Theorem


1.8.4, we would have that (C \ A) ∪ A is countable, which is a contradic-
tion as this set is C. 

An important subset of the algebraic numbers, especially in algebraic


geometry and group representation theory, is the set of algebraic integers.

Definition 3.12.7 (Algebraic integer). A zero of a monic polynomial of


Z[ x] is called an algebraic integer.
fundamental concepts in mathematics 73

Example 3.12.8.

• −2 is an algebraic integer as it is a zero of x2 + 2 (whose coefficients
are integers).

• Integers are algebraic integers14 . 14


Do you see why?

• 1
2 is not an algebraic integer.

Lemma 3.12.9. The rational algebraic integers are integers.

? Proof: Let y be a rational algebraic integer, and write it in reduced form:


y = mn where gcd(m, n) = 1. By definition of an algebraic integer, there
exists a monic polynomial f ∈ Z[ x] such that f (y) = 0. Suppose f has
degree k. Then when we evaluate nk · f at mn , we get an expression of the
form
0 = mk + ak−1 mk−1 n + · · · + a1 mnk−1 + a0 nk .

Therefore, n divides mk and hence15 n = 1. So, y is an integer.  15


since gcd(m, n) = 1

Now we see how these sets of complex numbers behave under the usual
arithmetic operations.

Theorem 3.12.10. The algebraic numbers A form a field and the algebraic
integers form a ring.

† Proof: What we haven’t seen in this course is that we can describe al-
gebraic numbers as eigenvalues of matrices with rational entries. Sup-
pose c is an algebraic number with minimal polynomial f . If we write
f = xn + an−1 xn−1 + · · · + a1 x + a0 , then it turns out that the minimal
polynomial of f is in fact the characteristic polynomial of the following
matrix:
 
 0 1 0 ··· 0 
 
 0 0 1 ··· 0 
 . .. .. .. .. 
 .. . . . . 
 
 0
 0 0 · · · 1 

−a0 −a1 −a2 · · · −an−1
This matrix is known as the companion matrix of f . So we use the follow-
ing facts:

(i) c is an algebraic number if and only if c is an eigenvalue of a matrix


over Q;

(ii) c is an algebraic integer if and only if c is an eigenvalue of a matrix


over Z.

The Kronecker product A ⊗ B of two matrices has (i, j)-entry ai j B where


ai j is the (i, j)-entry of A. For example
 
5 6 10 12
     
1 2 5 6  7 8 14 16
  ⊗   =   .
3 4 7 8 15 18 20 24
 
21 24 28 32
74 john bamberg

This operation on matrices is particularly nice as it is compatible with


matrix multiplication:

( A ⊗ B)(C ⊗ D) = AC ⊗ BD.

In particular, C and D can be row vectors if we think of them as 1 × n


matrices.
Let a and b be algebraic numbers. Then a is an eigenvalue of a matrix A
over Q, and b is an eigenvalue of a matrix B over Q. It turns out that:

• a + b is an eigenvalue of A ⊗ I + I ⊗ B, where I is an identity matrix of


the appropriate size.

• a · b is an eigenvalue of A ⊗ B.

We leave the details of these calculations to the interested reader. So we can


show using the Kronecker product of matrices that the algebraic numbers
are closed under addition and multiplication, and likewise for the algebraic
integers. It then needs to be observed that the nonzero elements of the
algebraic numbers have multiplicative inverses (just their reciprocal value).


3.13 Aside: Constructible numbers

Here are the rules of the game:

1. We have an initial position of points {P0 , P1 , P2 , . . . , Pm }

2. Operation 1: Draw a line through two old points Pi and P j to get new
points where this line intersects other lines and circles.

3. Operation 2: Draw a circle with centre at an old point Pi and radius


equal to the distance between two old points P j and Pk to get new points
from intersections with other circles and lines.

Then a real number λ is constructible if we can construct points Pi and


P j whose distance apart is |λ| units by starting from an initial set of points
{P0 , P1 } whose distance apart is 1 unit and then performing a finite number
of Operations 1 and 2. It turns out, that the following is true16 . 16
See Section 5.6 of John Stillwell’s book
“Elements of Algebra".
Theorem 3.13.1. All constructible real numbers come from repeated
square roots and field operations starting from numbers in Q.

Corollary 3.13.2. Every constructible number is an algebraic number.17 17


Actually, this result requires us knowing
that the algebraic numbers are alge-
The interesting thing about constructible numbers is that they have braically closed; which is beyond the scope
of this course.
limited values for their degree. This quantity is defined as the degree of
their minimal polynomial. For example:

• 2 has degree 2 (since the minimal polynomial is x2 − 2);
√ √
• 2 + 3 has degree 4 (since the minimal polynomial is x4 − 10x2 + 1);

• cos(2π/17) has degee 8.


fundamental concepts in mathematics 75

The last of these has minimal polynomial

1 7 3 15 5 5 1 1
x8 + x7 − x6 − x5 + x4 + x3 − x2 − x + .
2 4 4 16 16 32 32 256

Using the theory of field extensions18 the following can be proved: 18


Which you might encounter in 3rd year.

Theorem 3.13.3. If λ is constructible, then deg(λ) = 2i for some i > 0.


√3
Example 3.13.4. 5 is not constructible as its minimal polynomial is
x3 − 5.

So the famous problems of antiquity become:



3
Doubling the cube: Is 2 constructible?

Squaring the circle: Is π constructible?

Trisecting ß/3: Is cos(π/9) constructible?

It turns out that each of these problems is impossible to solve:



3
deg( 2) = 3 (min. poly. X 3 − 2)
√3
=⇒ 2 is NOT constructible.

√ √
π is transcendental =⇒ π· π is not constructible

=⇒ π is NOT constructible.

3 1
deg(cos(π/9)) = 3 (min. poly. X 3 − X − )
4 8
=⇒ cos(π/9) is NOT constructible.

3.14 Number-theory type results in ring theory


Sub-categories of “rings”: Fields ⊂ Eu-
Below is a table that summarises one of the goals of this course: the syn- clidean domains ⊂ Principal ideal domains
thesis of results in number theory to polynomial rings. Each row states a ⊂ Unique factorisation domains ⊂ Integral
domains ⊂ Commutative rings ⊂ Rings.
result in number theory aligned with its analogue in polynomial rings. The
right-most column states the category of rings for which a more general
result holds, that the reader can pursue at their own interest.

Integers Polynomials Generally holds in ...


Division Rule Lemma 2.1.4 Lemma 3.6.10 Euclidean domains
Euclidean Algorithm Lemma 2.1.7 Lemma 3.6.13 Euclidean domains
Factorisation Theorem 2.2.2 Theorem 3.7.4 Unique factorisation domains
Euclid’s Lemma Corollary 2.1.16(c) Lemma 3.7.3 Integral domains
Bézout’s identity Lemma 2.1.10 Lemma 3.6.15 Principal ideal domains
When a quotient is a field Corollary 3.1.9 Theorem 3.10.7 Commutative rings

What of the following results?


76 john bamberg

Integers Polynomials Generally holds in ...


The LDE Theorem Theorem 2.5.14 ? ?
Chinese Remainder Theorem Theorem 2.5.15 ? ?
Gauß’s Lemma Lemma 3.8.4 ? ?
Eisenstein’s Criterion Theorem 3.9.1 ? ?

First of all, the LDE Theorem does have a direct analogue and we do
this in Exercise 3.15.8. The Chinese Remainder Theorem has an analogue
in the direct factorisation of a quotient ring R/I by coprime ideals. Gauß’s
Lemma holds when taking the field of fractions:
If a polynomial with coefficients in a ring R is reducible over Frac(R), then it
is also reducible over R.

Eisenstein’s Irreducibility Criterion works for a commutative ring R with


unit:
If f = xn + an−1 xn−1 + · · · + a1 x + a0 ∈ R[ x] and p is a prime in R dividing
each coefficient ai where i ∈ {0, 1, . . . n − 1}, but p2 does not divide a0 , then f
is irreducible over Frac(R).

3.15 Exercises

Exercise 3.15.1. For each of the following pairs of polynomials f ( x),


g( x), ∈ Q[ x], find the quotient q( x) and remainder r ( x) when f ( x) is
divided by g( x).

(i) f ( x) = x3 + x − 1, g( x) = x − 1

(ii) f ( x) = x4 − 1, g( x) = −x2 + 2.

(iii) f ( x) = 2x5 − 3x2 + 2x + 1, g( x) = x − 2.

Exercise 3.15.2. Show that addition and multiplication on R[ x]/ ≡ p , as


given in Definition 3.10.3, is well-defined.
Exercise 3.15.3. Show that the Division Rule (Lemma 2.1.4) does not hold
in the polynomial ring Z[ x].
Exercise 3.15.4. Let F be a field and let f ∈ F [ x]. Show that if f has degree
at most 3 and is reducible19 , then it has a factor with degree 1. 19
This is not true if we consider polynomi-
als of degree 4. The polynomial ( x2 + 1)2
Exercise 3.15.5. Show that if f , g ∈ F [ x], then deg( f · g) = deg( f ) + in Q[ x] is reducible but has no factors of
deg(g). degree 1.

Exercise 3.15.6. Find the minimal polynomial over Q for the following
numbers:

(i) 1 + i,

(ii) 2 + 3i,

(iii) e2πi/5 .

Exercise 3.15.7. By using the fact that the algebraic numbers form an
algebraically closed field20 , show that one of π + e or π · e is transcendental. 20
A field F is algebraically closed if every
non-constant polynomial has a root.
Exercise 3.15.8. State and prove an analogue of the LDE Theorem 2.5.14
for the polynomial ring F [ x].
fundamental concepts in mathematics 77

In the following, we will we see how we can extend the integers to a


particular lattice of the complex plane. It forms a ring, but it doesn’t have
the nice properties like the other rings we’ve seen so far ...


3.15.1 A different type of number system and its arithmetic: Z( −5)
This is the set of all formal sums of the form

a + b −5

where a, b ∈ Z. We will simply write a for a + 0 −5. We can define
addition and multiplication of such numbers:

√ √ √
Addition (a + b −5) + (a0 + b0 −5) = (a + a0 ) + (b + b0 ) −5
√ √ √
Multiplication (a + b −5) · (a0 + b0 −5) = (aa0 − 5bb0 ) + (ab0 + a0 b) −5
√ √
Exercise 3.15.9. For z = a + b −5 ∈ Z( −5) define the norm N (z) =

z · z̄ where, z̄ = a − b −5. Prove that

N (z · z0 ) = N (z) · N (z0 )

for any z, z0 ∈ Z( −5).

Exercise 3.15.10. A unit of Z( −5) is an element u such that there exists

an element v such that u · v = 1. What are the units of Z( −5)? (Hint:

You know that N (z · z0 ) > N (z) for any z, z0 ∈ Z( −5).)

Exercise 3.15.11. An irreducible in Z( −5) is a non-unit which cannot

be written as a product of two non-units. Show that 3 and 1 + −5 are
irreducible.
Exercise 3.15.12. How many ways can 6 be written as a product of irre-

ducibles in Z( −5)?

Exercise 3.15.13. A prime in Z( −5) is a nonzero element p, that is not a
unit, satisfying the following:

given x and y in Z( −5) such that p divides x · y, then p divides x or y.

Show that every prime is an irreducible, but the converse is not true.
4
Normed vector spaces

In this chapter we look at a generalisation of Euclidean space that encapsulates ‘spaces of functions’ and other
objects whereby we can measure the difference between things as we would the vectors of Rn .

4.1 Abstract vector spaces: things that behave like Rn

Gregory H. Moore wrote an interesting piece in Historia Mathematica1 that 1


‘The axiomatization of linear algebra:
describes how the idea of a vector space was developed to include the many 1875–1940’, Histroria Mathematica 22
(1995), 262–303.
mathematical objects we study today. Here, we reproduce the abstract of
his article:

Modern linear algebra is based on vector spaces, or more generally, on


modules. The abstract notion of vector spaces was first isolated by Peano
(1888) in geometry. It was not influential then, nor when Weyl rediscovered it
in 1918. Around 1920 it was rediscovered again by three analysts – Banach,
Hahn, and Weiner – and an algebraist, Noether. Then the notion developed
quickly, but in two distinct areas: functional analysis, emphasizing infinite-
dimensional normed vector spaces, and ring theory, emphasizing finitely
generated modules which were often not vector spaces. Even before Peano,
a more limited notion of vector space over the reals was axiomatized by
Darboux (1875).

So what does Moore, mean by ‘axiomatize’? We will motivate the ax-


iomatic definition of a (normed) vector space by exploring a few examples.

Example 4.1.1 (Euclidean space). In Rn , we can add two elements to get


another element of Rn , and we also have another operation called scalar This is not a binary operation. Rather
multiplication: each element of R defines a unary (’one’)
operation Rn → Rn .

λ(u1 , u2 , . . . , un ) := (λu1 , λu2 , . . . , λun ).

We can take subsets which are closed under these two operations, and
we call them subspaces. For example, the set of elements (u1 , u2 , . . . , un )
whose sum ui is zero forms a subspace. We can also define a basis of Rn
P

so that we can write every element as linear combinations of the basis ele-
ments. For example, the vectors (1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)
form a linearly independent spanning set for Rn .

Example 4.1.2 (Vector space of code words). Let’s take the smallest field,
Z2 , with just the two elements 0 and 1. Much of the theory of codes is
80 john bamberg

about strings of 0’s and 1’s, of a common length, say n. We can add code
words, with the rule that 1 + 1 = 0. For example:

0011001 + 1001111 = 1010110.

Scalar multiplication by elements of Z2 is not very interesting here since


multiplying by 0 and 1 just annihilates or preserves the codewords re-
spectively. This example is really just like the previous example where we
replace n-tuples of real numbers with n-strings of elements from a different
field Z2 .

Example 4.1.3 (Vector space of functions). Consider the set of all func-
tions V X from a set X to a vector space V (think of Rn if you like). We can
add functions by the so-called point-wise addition of functions, and we can
also define scalar multiplication on functions:

( f + g)( x) := f ( x) + g( x)
(λ f )( x) := λ f ( x).

There are subsets of V X that are closed under this operation, such as the
constant functions. Can we write every element of V X as a linear combina-
tion of a distinguished set of functions?

Example 4.1.4 (Vector space of polynomials). Consider the polynomial


ring R[ x]. We can add polynomials and multiply them by elements of R to
give us more elements of R. If f , g ∈ R[ x] and α, β ∈ R, then α · f + β · g ∈
R[ x]. There are subsets of R[ x] that are closed under these operations, such
as the polynomials which have degree at most a given number, such as
5 say. We can write every element of R[ x] as a linear combination of the
polynomials
1, x, x2 , x3 , x4 , . . .

All of these examples satisfy the axioms2 of a vector space. 2


What we are demonstrating now is typical
of pure mathematics; the distilling of the
Definition 4.1.5 (Vector space). Let V be a set and F be a field, and sup- properties of interesting and widely studied
examples in order to create a framework
pose there is a binary operation ‘+’ on V, and a map which takes an ele- where all of these entities can be studied
ment λ ∈ F and an element v ∈ V and returns an element λ · v of V (we with an over-arching theory.
call this scalar multiplication). We call V a vector space if these operations
satisfy the following axioms:

(1) For all u, v ∈ V we have u + v = v + u.

(2) For all u, v, w ∈ V we have (u + v) + w = u + (v + w).

(3) There is an element 0 ∈ V, such that for all v ∈ V, we have v + 0 = v.

(4) For every v ∈ V, there is an element −v ∈ V such that v + (−v) = 0.

(5) For every v ∈ V, we have 1 · v = v.

(6) For all u, v ∈ V and λ ∈ F, we have λ(u + v) = λu + λv.

(7) For all v ∈ V and λ, µ ∈ F, we have (λ + µ)v = λv + µv.

(8) For all v ∈ V and λ, µ ∈ F, we have λ(µv) = (λµ)v.


fundamental concepts in mathematics 81

This seems like a lot to remember! However, if we use some of the


jargon of abstract algebra, we can cluster some of these axioms together
to make it easier to remember3 . The first four just say that + is associative 3
I don’t expect you to remember them.
and commutative, there is an additive identity, and every element has an
additive inverse. In third year, we would say that (V, +) is an abelian
group. The rest basically says that the field F acts as automorphisms on the
group4 V. 4
You will understand this once you see
All of the examples we had before satisfy this definition. So what? Well, “group actions” in third year.

from this abstract and very general definition, we can do linear algebra
in a similar way to what you did in first year mathematics with Rn . We
can study subspaces, spanning sets, bases, linear maps, eigenvalues and
so on. We do not need to spend much time on this generalised version of
linear algebra, since you can more or less assume that whatever property
you learnt about of Rn has a direct analogue in a vector space V. The main
difference that you will see is that we can have vector spaces which do not
have a finite basis: infinite-dimensional vector spaces.

4.1.1 Vector spaces from fields


Recall that a subset S of a ring R is a subring if it itself is a ring under the
induced operations of R. Likewise, a subset K of a field F is a subfield if it
itself is a field. For example, we have the following lattice of fields:

R A

R∩A

If we take two of these fields, we can define a vector space.

Lemma 4.1.6. Let F be a field and let K be a subfield of F. Define ‘scalar


multiplication’ on F by the multiplication by elements of K. Then the ad-
dition operation on F together with scalar multiplication by elements of K
forms a vector space on F.

† Proof: To check all the axioms of a vector space can be quite tedious. In
this case, many of them follow simply from the definition of field: e.g. (1) –
(4) follow immediately. It just suffices to check the axioms (5)–(8), but we
will leave this to the reader. 

Example 4.1.7. Consider F = C and K = R. This setup defines a two-


dimensional vector space on F. Why? Because, every element of F can be
written as a + b · i for two elements a, b ∈ R. That is, {1, i} forms a basis for
F (over K).

Example 4.1.8. Consider F = R and K = Q. This time, we end up with


an infinite-dimensional vector space! We will see this as a proof by contra-
diction. Suppose there was a finite basis for F over K: {e1 , e2 , . . . , en }. Then
82 john bamberg

every element of R could be written as a linear combination

q1 · e1 + q2 · e2 + · · · + qn · en

where q1 , q2 , . . . , qn ∈ Q. In fact, this implies that there is a bijection5 f 5


Mini-proof that f is a bijection: In
from R to the Cartesian product Qn defined by Exercise 4.8.1, we show that f is well-
defined. Suppose f (q1 · e1 + · · · + qn ·
en ) = f (q01 · e1 + · · · + q0n · en ). Then
f (q1 · e1 + q2 · e2 + · · · + qn · en ) := (q1 , q2 , . . . , qn ). (q1 , · · · , qn ) = (q01 , · · · , q0n ) and hence
qi = q0i for each i ∈ {1, . . . , n}. Therefore,
So the cardinality of R is the same as the cardinality of Qn ; which is a q1 · e1 + · · · + qn · en = q01 · e1 + · · · +
q0n · en and f is one-to-one. Now for an
contradiction as R is uncountable by Theorem 1.9.2, Q is countable by element (q1 , q2 , · · · , qn ) ∈ Qn , notice that
1.8.6, and Qn is countable by (repeated application of) Exercise 1.11.11. f (q1 · e1 + · · · + qn · en ) := (q1 , . . . , qn ),
so f is clearly onto.

4.2 Aside: The Axiom of Choice

Is there a basis for the vector space R over Q? Does every vector space
have a basis?
We have already seen Russell’s Paradox and the Continuum Hypothesis
as fundamental philosophical questions in mathematics that stirred the
minds of early 20th century mathematicians. There is another taboo subject
in mathematics, and it is the acceptance or non-acceptance of the Axiom of
Choice. “The Axiom of Choice is necessary to
select a set from an infinite number of
Axiom of choice: For any collection S of nonempty sets, there exists a socks, but not an infinite number of shoes.”
– Bertrand Russell
function f that assigns to each set S in S an element f (S ) of S .

The function f here is called a choice function, a map which selects one
element from an infinite collection of sets. If we can always assume that
such a function exists, then the real numbers would be well-ordered6 , that 6
I’ll suppress the details here.
is,

there is a total order . such that every nonempty subset of the real numbers
has a minimum element (with respect to .).

This seems like nonsense: how can we order the real numbers so that any
open interval has a minimum element? Another reason to dislike the Axiom
of Choice is the Banach-Tarski paradox in measure theory. In 1924 Stefan
Banach and Alfred Tarski proved the following remarkable result: It is pos-
sible to take a solid ball in 3-dimensional space, cut it up into finitely many
pieces and, moving them using only rotation and translation, reassemble
the pieces into two balls of the same radius as the original. In other words,
we get two spheres exactly the same size as the original sphere merely by
cutting and shifting! Alternatively, we can cut up a ball the size of a pea
and reassemble it into a ball the size of the sun. This theorem has come to
be known as The Banach-Tarski Paradox not because it is a logical paradox
(like that of Russell), but rather because it goes against our intuition about
how the world works.
There are many, many results such as these that follow from assuming
the Axiom of Choice, so why do we bother with it at all? It turns out that
certain extremely useful results rely on an equivalent version of the Axiom
of Choice known as Zorn’s Lemma7 . However, here are some of the results 7
Zorn’s Lemma: Suppose a partially
which follow from Zorn’s Lemma: ordered set P has the property that every
totally ordered subset has an upper bound
in P. Then the set P contains at least one
maximal element.
fundamental concepts in mathematics 83

(i) Every vector space has a basis.

(ii) The Hahn-Banach Theorem; one of the fundamental theorems of


functional analysis which is fundamental to quantum mechanics.

(iii) Every field has an algebraic closure.

(iv) Tychonoff’s Theorem: An arbitrary product of compact sets is com-


pact.

(v) Every subgroup of a free group is free.


“The Axiom of Choice is obviously true;
There are ways around the Axiom of Choice (such as the Axiom of the Well Ordering Principle is obviously
false; and who can tell about Zorn’s
Determinateness) that eliminate the bad stuff and keep the good, but this is Lemma?” – Jerry Bona
getting way beyond this course.

4.3 Norms

In high-school, the Euclidean norm is introduced in the context of problems


in physics, where we would like to calculate the magnitude of a vector. So
given a vector8 v := (a1 , a2 , . . . , an ) ∈ Rn the norm of v is the square root 8
In high-school, a vector is thought of as
of the sum of the squares of its coordinates: an arrow, as a linear combination such as
a1 i + a2 j + a3 k.
q
kvk := a21 + a22 + · · · + a2n .

Another length-function appears in coding theory, the so-called Ham-


ming weight, and it has similar properties to the Euclidean norm.

Example 4.3.1 (Hamming weight). Given an n-string v of elements of Z p ,


the Hamming weight w(v) is the number of nonzero positions. For example,

w(000130520) = 4.

There are three properties of this function which share analogous proper-
ties with the Euclidean norm on Rn :

Non-degeneracy: w(v) = 0 if and only if v is the string of all 0’s. (Just like
the zero vector).

Positivity: w always returns a non-negative number.

Triangle inequality: w(u + v) 6 w(u) + w(v), for all u, v.

We will be looking at vector spaces were the scalars are the real num-
bers, since there is a natural ordering on R.

Definition 4.3.2 (Norm). A norm on a vector space V over R is a map

kk:V→R

such that for all u, v ∈ V and λ ∈ R, we have:

Non-degeneracy: kvk = 0 if and only if v = 0 (the zero vector).

Positive homogeneity: kλvk = |λ| · kvk.

Triangle inequality: ku + vk 6 kuk + kvk.


84 john bamberg

In particular, these axioms imply that kvk is always non-negative9 . 9


Mini-proof: For every v ∈ V, we have
So the Hamming weight is not a norm, since it doesn’t satisfy “Positive 0 = k0k = kv − vk 6 kvk + k − vk =
kvk + | − 1| · kvk = 2kvk, and so kvk > 0.
homogeneity”. A normed vector space is simply a vector space with a norm
on it, and we will write it as a pair [V, k · k].

Example 4.3.3. Consider the n-dimensional vector space Rn over R. We


have already see the Euclidean norm as a way of measuring distance and
magnitude on Rn , but there is another useful norm:

kvk∞ := max{|v1 |, |v2 |, . . . , |vn |}

where we write v as (v1 , v2 , . . . vn ). We leave it as an exercise at the end of


the chapter that k · k∞ does indeed define a norm on Rn (Exercise 4.8.8. So
if n = 2, we see pthat k(3, 4)k∞ = 4 whereas for the Euclidean norm, we
have k(3, 4)k = 32 + 42 = 5.

Example 4.3.4. Consider the vector space V of all polynomials in R[ x]


with degree at most n. The following defines a norm on V:

kan xn + an−1 xn−1 + · · · + a1 x + a0 k := (n + 1)|an | + n|an−1 | + · · · + 2|a1 | + |a0 |.

(See Exercise 4.8.7).

4.4 Boundedness

We say that u ∈ R of S is an upper bound if for all s ∈ S we have s 6 u.

Definition 4.4.1 (Least upper bound (supremum)). Let S ⊂ R. An upper


bound ` of S is a least upper bound if for every other upper bound u of S ,
we have ` 6 u.

Equivalently, we can use the following definition of least upper bound


which is perhaps more amenable in giving proofs:

An upper bound ` of S is a least upper bound if it satisfies:

• (∀s ∈ S ) s 6 `,
• (∀ > 0)(∃s ∈ S ) ` −  < s.

∃s ∈ S

`− `

Example 4.4.2. Let S := {2 − 1/n : n ∈ N}. There is no ‘maximum’ of this We say that m is the maximum of S if m is
set S , but 2 is a least upper bound for S . an upper bound of S that also lies in S .

Proof. We need to prove that 2 is an upper bound of S , and then that it is


the smallest of the upper bounds of S . We have highlighted the parts of the proof
that are NOT ‘auto-pilot’. The rest of the
proof is just applying the definition of a
Upper bound: Let s ∈ S . So there exists n ∈ N such that s = 2 − 1/n. least upper bound and the definition of S .
Now 1/n > 0 and hence −1/n < 0. Then 2 − 1/n < 2, and so s < 2.
Therefore, 2 is an upper bound for S .
fundamental concepts in mathematics 85

Least upper bound: Let  > 0. We need to find10 a suitable s ∈ S such that 10
We will do a ‘backwards’ calculation in
` −  < s. Let n be the next largest integer after 1/ and let s = 2 − 1/n. the margin.
` −  < s ⇐ 2 −  < 2 − 1/n
Then n > 1/ and so  > 1/n. Thus 2 −  < 2 − 1/n and hence
⇐ − < −1/n
2 −  < s. Since s ∈ S , we have found an element of s that is greater than
⇐  > 1/n
2 −  and so 2 is the least upper bound of S .
⇐ n > 1/

 So we should choose n such that n > 1/.

Theorem 4.4.3 (The Least Upper Bound Property of R). Every nonempty
subset S of R which has an upper bound, has a least upper bound.

We will come back to the proof of this result once we properly define
what R is!
The generalisation of an open interval in higher dimensions is a sphere,
or ball, of Rn . We will go one step further and define balls for normed
vector spaces.

Definition 4.4.4 (Ball). Let x be an element of a normed vector space


[V, k · k]. Then the (open) ball of radius r ∈ R+ about the element x is the
set
Br ( x) := {v ∈ V : kv − xk < r}.

In the mathematical discipline of topology, a ball is the canonical exam-


ple of an open set. Below we give some examples of what these balls look
like for the various norms we’ve seen so far.

Example 4.4.5 (Balls in R2 with the Euclidean norm). In R2 with the


Euclidean norm, balls are just discs.

Example 4.4.6 (Balls in R2 with the max-norm). If we take the norm k · k∞


from Example 4.3.3, the balls look like squares! For example, take the ball
of radius 1 about the origin:

B1 ((0, 0)) = {( x, y) ∈ R2 : k( x, y)k∞ < 1} = {( x, y) ∈ R2 : max{|x|, |y|} < 1}


= {( x, y) ∈ R2 : |x| < 1 and |y| < 1}.
86 john bamberg

Example 4.4.7 (Balls in R[ x] (bounded degree)). Let R = C and consider


the polynomials in R[ x] that have degree at most 2. We will use the norm
that was defined in Example 4.3.4. Consider the ball of radius 1 about the
polynomial x + i:

B1 ( x + i) = {a2 x2 + a1 x + a0 ∈ C[ x] : k(a2 x2 + a1 x + a0 ) − ( x + i)k < 1}


= {a2 x2 + a1 x + a0 ∈ C[ x] : 3|a2 | + 2|a1 − 1| + |a0 − i| < 1}

So for example, 16 x2 + x + 34 i ∈ B1 ( x + i).

4.5 Epsilon versus delta

The definition of a limit has two quantifiers: ∀ followed by ∃. It is very


important that they are in this order11 . We have seen this already in the 11
Note the difference between the sen-
definitions of onto and of the least upper bound of a set: tences (i) “everybody loves somebody”
and (ii) “there is somebody who loves
everyone”!
f : A → B is onto: (∀b ∈ B)(∃a ∈ A) f (a) = b

` is a least upper bound of S ⊆ R: (∀ > 0)(∃s ∈ S ) `− < s

I like to think of such mathematical statements as games. For example,


with the definition of onto, I give you b ∈ B, and then it is up to you to
present me with a ∈ A such that f (a) = b. I then choose another b ∈
B, and so-on and so-on. A proof that f is onto requires you coming up
with a winning strategy for this game. This is the same for proving that a
function or sequence converges. I give you  and you must come up with
a systematic way to provide a δ. For sequences, you are asked to provide a
threshold value N ∈ N.

Example 4.5.1 (Old chestnut: 1/n). Let us consider the sequence sn :=


1/n. If I give you a window around 0 in the y-axis (the codomain) with a
particular size , can you find a cut-off point N so that the sequence beyond
sN is contained in this window?
Below we look at the case that  = 0.18.

0.8

0.6

0.4

0.2

0
0 2 4 6 8 10
Figure 4.1: 1/n drawn as ‘n versus sn ’

By the picture above, if we only regard the sequence after n = 5 on-


wards, we can be sure that the sequence lies in the orange-strip. How
fundamental concepts in mathematics 87

would we prove this? Well . . .

n > 6 =⇒ 1/n 6 1/6


=⇒ 1/n < 0.18 = .

In general, we have a winning strategy for this game.


You give me , and I declare that N is 1/.

So with this strategy, I can show:

n > N =⇒ n > 1/


=⇒ 1/n < .

4.6 Continuity for normed vector spaces

One of the main successes of the theory of normed vector spaces is in the
generalisation of continuity from Euclidean spaces (over R or C) to ar-
bitrary vector spaces. We are then able to study functions or sequences
of polynomials, of matrices and sequences of continuous functions them-
selves! Continuity then gains a deeper meaning when we look for solutions
of differential equations, where the differentiable functions are the elements
of our vector space and the integration and differentiation operators are the
continuous maps!

Definition 4.6.1 (Continuous maps between normed vector spaces). A map


f : V → W between normed vector spaces (V, k · kV ) and (W, k · kW ) is
continuous at a ∈ V if for every ball B ( f (a)) of W with centre f (a), there
exists a ball Bδ (a) of V about a, such that

f (Bδ (a)) ⊂ B ( f (a)).

A map is continuous if it is continuous at each point in its domain.

Bδ (a) B ( f (a))

a f (a)

V W

Example 4.6.2 (Constant functions are continuous). Suppose we have two


normed vector spaces (V, k · kV ) and (W, k · kW ) and consider a constant
function f : V → W that is defined by f (v) = w where v ∈ V and w is a
fixed element of W. We will show that f is continuous (everywhere).

Proof. Let  > 0 and let v ∈ V. We want to find δ > 0 so that

f (Bδ (v)) ⊂ B ( f (v)).

This can be greatly simplified when we apply the definition of f :

{w} ⊂ B (w).
88 john bamberg

In fact, we can choose δ to be anything we like, since

B (w) = {w0 ∈ W : kw0 − wkW < }

and clearly w ∈ B (w) as kw − wkW = k0kW = 0. So f is continuous at


every point of V. 

Example 4.6.3 (Dirichlet’s characteristic function of the rationals). Let χQ


be the function on R defined by

1 if x ∈ Q


χQ ( x) := 

0 if x < Q.

Here our normed vector space is just [R, | · |]. We will show that χQ is not
continuous at any point a of R. To do this, we must take the negation of the
definition of continuous:

f is not continuous at a if

(∃ ∈ R+ )(∀δ ∈ R+ ) f (Bδ (a)) 1 B ( f (a))

Choose  = 21 and let δ ∈ R+ . We will show that χQ (Bδ (a)) is not a


subset12 of B (χQ (a)). If a ∈ Q, choose x ∈ R \ Q ∩ Bδ (a). If a < Q, 12
To show that a subset X is not a subset
choose x ∈ Q ∩ Bδ (a). This is possible because of the Archmidean property of a set Y, we only need to find an element
x ∈ X that is not in Y.
of the real numbers: every open interval in R contains a rational number
and an irrational number. So in both cases we will have |χQ (a) − χQ ( x)| =
1 > 12 and hence χQ ( x) < B (χQ (a)). Therefore, χQ is not continuous at
any point of R.

Example 4.6.4 (A fully worked out, difficult, example). Let V and W both
be R2 , but equip V with the Euclidean norm and W with the norm k · k∞
from Example 4.3.3 . Then the map f : V → W defined by

f (( x, y)) := ( x − y, xy)

is continuous at (1, 1). How would we devise a proof of this?


Let  > 0. We want to find δ > 0 so that What happens to B2 ((1, 1)) under f ?

f (Bδ ((1, 1))) ⊂ B ( f ((1, 1))) = B ((0, 1)).

We should figure out first what these two entities are and what they look
like.

• The set f (Bδ ((1, 1))) is just the set of elements ( x − y, xy) such that
q
k( x, y) − (1, 1)k = ( x − 1)2 + (y − 1)2 < δ.
Before After

• The set B ((0, 1)) is just the set of elements (u, v) such that

k(u, v) − (0, 1)k∞ = max{|u|, |v − 1|} < .

What would the proof look like?


fundamental concepts in mathematics 89

Choose δ to be ?something depending on ? and let ( x, y) ∈


Bδ ((1, 1)). We want to show that f (( x, y)) ∈ B ((0, 1)). Now
q
k( x, y) − (1, 1)k < δ =⇒ ( x − 1)2 + (y − 1)2 < δ
=⇒ . . .
=⇒ . . .
=⇒ max{|x − y|, |xy − 1|} < 
=⇒ k f (( x, y)) − (0, 1)k∞ < 
=⇒ f (( x, y)) ∈ B ((0, 1))
We can fit f (B2 ((1, 1)) inside B5 ((0, 1)).
10

The ‘’ was given to us, and we must find the δ which makes this work.
The picture in the margin shows that if  = 5, then δ = 2 is a suit- 5

able choice. How do we find δ in general, in terms of ? We basically do


reverse-engineering to find a δ which will do the job. It doesn’t need to be 0

the best δ; any suitable δ will do!


-5

max{|x − y|, |xy − 1|} <  ⇐ |x − y| <  and |xy − 1| < 


⇐ |( x − 1) − (y − 1)| <  |xy − y + y − 1| < 
-10
and -10 -5 0 5 10

⇐ |x − 1| + |y − 1| <  and |xy − y| + |y − 1| <  Notice that we have a different norm in the
codomain of f . So balls look like squares!
⇐ |x − 1| + |y − 1| <  and |y||x − 1| + |y − 1| < 

Choosing δ = min{1, /2} will do the job,


q as we will see.
Suppose k( x, y) − (1, 1)k < δ. Then ( x − 1)2 + (y − 1)2 < δ by definition of k · k.
Then |x − 1| + |y − 1| < δ by the triangle-inequality.
Then |x − 1 − (y − 1)| < δ and 2|x − 1| + |y − 1| < 2δ by the triangle inequality and since |y − 1| 6
2|y − 1|.

Now by the triangle inequality, and what we’ve deduced so far:

|y| = |1 + y − 1| 6 1 + |y − 1| < 1 + δ 6 2.

Then |x − y| < δ and |y||x − 1| + |y − 1| < 2δ by the argument in the box.


Then |x − y| < δ and |xy − y| + |y − 1| < 2δ since |y||x − y| = |xy − y|.
Then |x − y| < δ and |xy − 1| = |xy − y + y − 1| < 2δ by the triangle inequality.
Then |x − y| <  and |xy − 1| <  since 2δ < .
Then max{|x − y|, |xy − 1|} <  by definition of max.
Then k f (( x, y)) − (0, 1)k∞ <  by definition of f and k · k∞
Then f (( x, y)) ∈ B ((0, 1)).
Therefore, there exists δ such that f (Bδ ((1, 1))) ⊂ B ( f ((1, 1))) and
hence f is continuous at (1, 1).

We have given the most general topological definition of continuity13 . 13


Except that ‘open sets’ take the place of
As we shall see below, this definition radically reduces when the map f is a ‘open balls’.

linear operator.
90 john bamberg

Definition 4.6.5 (Linear operator). A map f : V → W between normed


vector spaces (V, k · kV ) and (W, k · kW ) (defined over a field F) is called a
linear operator if for all v1 , v2 ∈ V and λ1 , λ2 ∈ F, we have

f (λ1 v1 + λ2 v2 ) = λ1 f (v1 ) + λ2 f (v2 ).

The proof of the following result is an excellent example of an ‘analysis’-


style proof.

Theorem 4.6.6. Let f : V → W be a linear operator between normed


vector spaces (V, k · kV ) and (W, k · kW ). Then f is continuous (everywhere)
if and only if f is continuous at 0.

? Proof (mostly restricted syntax): Clearly the “ =⇒ ” direction holds, so


suppose f is continuous at 0. Let a ∈ V with a , 0. We will show that f is
continuous at a ∈ V.
Suppose  ∈ R+ .
Then there exists δ ∈ R+ such that f (Bδ (0)) ⊂ B ( f (0)) since f is continuous at 0.
Then f (Bδ (0)) ⊂ B (0) since f (0) = 0.
(We will show that f (Bδ (a)) ⊂ B ( f (a))).
Suppose x ∈ Bδ (a).
Then kx − akV < δ by definition of Bδ (a).
Then x − a ∈ Bδ (0) by definition of Bδ (0).
Then f ( x − a) ∈ f ( Bδ (0)) by definition of f (Bδ (0)).
Then f ( x − a) ∈ B (0) since f (Bδ (0)) ⊂ B (0).
Then k f ( x − a)kW <  by definition of B (0)
Then k f ( x) − f (a)kW <  since f is linear.
Then f ( x) ∈ B ( f (a)) by definition of B ( f (a))
Then f (Bδ (a)) ⊂ B ( f (a)) by definition of f (Bδ (a)).
Therefore, f is continuous at a. 

Example 4.6.7. If f : V → W is a linear operator between normed vector


spaces (V, k · kV ) and (W, k · kW ), and V has finite dimension d ∈ N, then f
is continuous.

4.7 Function spaces

Let X be a set and let V be a vector space over a field F. As we saw in


Example 4.1.3, the set F ( X, V ) of all functions from X to V forms a vector
space under the operations of point-wise addition and scalar multiplication.
It forms a normed vector space if we consider just the bounded functions
B( X, V ) in F ( X, V ). We can also consider subspaces of this normed vector
space to give us other interesting normed vectors spaces.
Suppose X is a normed vector space. Then we have:

C( X, V ): The continuous bounded functions from X to V.

D( X, V ): The differentiable bounded functions from X to V.

L( X, V ): The linear operators from X to V.

These subspaces will recur in later sections.


fundamental concepts in mathematics 91

In some sense, the opposite of infinite ought to be bounded. We lose too


much of the interesting mathematics and its applications if we only study
the finite structures. We will see in a later section that boundedness is a key
concept that guarantees that certain functions and sequences have limits or
fixed points.

Definition 4.7.1 (Bounded set). A subset S of a normed vector space


[V, k · k] is bounded if it is contained in some ball. That is, there exists x ∈ V
and r ∈ R+ such that S ⊆ Br ( x).

Definition 4.7.2 (Bounded function). A function f from a set X to a normed


vector space [V, k · k] is bounded if its image is a bounded set of [V, k · k].
That is, there exists x ∈ V and r ∈ R+ such that f ( X ) ⊆ Br ( x).

The set of all bounded functions B( X, V ) is a vector subspace of the set


F ( X, V ) of all functions from X to V 14 . So now we will turn B( X, V ) into 14
This will be left as an exercise for those
a normed vector space by equipping it with an appropriate norm. In the interested in vector spaces.

example below, we will look at the case where V = R, but it can be done in
general if V itself is a normed vector space.

Example 4.7.3 (The ‘sup’ norm). Let X be a set and let f : X → R be


a bounded function. An equivalent definition15 that f be bounded is that 15
Prove this!
there exists a positive constant C such that for all x ∈ X, we have

| f ( x)| < C.

By the Least Upper Bound Property of R (Theorem 4.4.3), there exists a


least upper bound for f ( X ). So we can define

k f k := sup{| f ( x)| : x ∈ X}.

Let’s check now that it is a norm on bounded functions from X to R:

Non-degeneracy: If f is the zero function 0, then clearly k f k = 0. Now


suppose we have a bounded function g : X → R such that kgk = 0. Then
|g( x)| 6 0 for all x ∈ X, and so we must have g( x) = 0 for all x ∈ X;
that is, g is the zero function.

Positive homogeneity: Let λ ∈ R and f ∈ B( X, R). Then |λ f ( x)| =


|λ|| f ( x)| for all x ∈ X. So it is not difficult to see that sup{|λ f ( x)| : x ∈
X} = |λ| sup{| f ( x)| : x ∈ X} and hence kλ f k = |λk f k.

Triangle inequality: Let f , g ∈ B( X, R) and let u := k f k + kgk. Now


for a given element x ∈ X, we have from the triangle inequality for the
absolute value norm, the following:

|( f + g)( x)| = | f ( x) + g( x)| 6 | f ( x)| + |g( x)|.

By definition, | f ( x)| 6 k f k and |g( x)| 6 kgk and so

|( f + g)( x)| 6 u.

Since k f + gk is the least upper bound of {|( f + g)( x)| : x ∈ X}, we must
have
k f + gk 6 u.
92 john bamberg

Example 4.7.4. What is the ‘distance’ (with respect to the sup norm)
between the functions f ( x) := x and g( x) := x2 on the closed interval
[0, 1]?

kg − f k = sup{|x2 − x| : x ∈ [0, 1]}


= sup{x − x2 : x ∈ [0, 1]}
1
= .
4
This is because x − x2 is symmetric about 12 , and so attains its maximum for
x = 12 .

Now we show that if a linear map f is continuous if it preserves bounded


sets.

Theorem 4.7.5. Let f : V → W be a linear operator between normed


vector spaces [V, k · kV ] and [W, k · kW ]. Then f is continuous if and only if
there is a constant C > 0 such that for all v ∈ V

k f ( x)kW 6 C · kxkV .

? Proof: To be done in lectures. 

4.7.1 Aside: Fourier series


One of the most important concepts in applied mathematics and harmonic
analysis is the decomposition of a periodic function into trigonometric
functions. Here, the function space are the set of all functions f : R → R
which are integrable on [−π, π] and a periodic:

f ( x + 2π) = f ( x), for all x ∈ R.

Such functions with only finitely many discontinuities are square integrable
and so the following defines a norm on this function space, and it is called
the L2 -norm.
s Z π
1
k f k := | f (θ)|2 dθ
2π −π

It turns out that the cosine and sine functions form a basis for this func-
tion space, and the decomposition of f into this basis is known as a Fourier
series.

4.8 Exercises

Exercise 4.8.1. Show that f in Example 4.1.8 is well-defined.


Exercise 4.8.2. Let [V, k · k] be a normed vector space, and consider R
equipped with the Euclidean norm. Show that the function f : V → R
defined by f ( x) := kxk is continuous.
Exercise 4.8.3. Let [V, k · k] be a normed vector space and let V ∗ be the set
of all continuous linear functions from V to R. A linear map from V to R is traditionally
called a functional. For example, the
(a) Define addition and scalar multiplication on V ∗ as we have done for Riemann integral gives rise to a functional.
function spaces and verify that V ∗ is a vector space.
fundamental concepts in mathematics 93

(b) Now we show that V ∗ has a natural norm on it. Show that k · k : V ∗ →
R defined by
k f k := sup{| f ( x)| : kxk 6 1}
is a norm on V ∗ .

Exercise 4.8.4. Consider the following function M from Rn to R:


n
X
M (( x1 , x2 , . . . , xn )) := |xi |.
i=1

(a) Show that M defines a norm on Rn .

(b) For n = 2, draw a picture of the ball B1 ((0, 0).

Exercise 4.8.5. Show that the continuous bounded functions from X to V


form a subspace of B( X, V ).
Exercise 4.8.6. Let g : R2 → R be the function defined by

g(( x, y)) := x − y.

It turns out that g is continuous (everywhere). Let  = 7. Find the best


possible δ ∈ R+ such that

g(Bδ ((2, 1)) ⊆ B (g(2, 1)).

Exercise 4.8.7. Prove that the k · k in Example 4.3.4 defines a norm.


Exercise 4.8.8. Prove that the k · k∞ in Example 4.3.3 defines a norm.
Exercise 4.8.9. Let S be the set {1 + 1n : n ∈ N}. Guess a least upper bound
` for S and prove that it is indeed the supremum of S .

Exercise 4.8.10. Let f : [1, 2] → R be the function with



 x if 1 6 x < 2



f ( x) = 
1 if x = 2.

Show that k f k = 2.

Exercise 4.8.11. Consider sin and cos restricted to the interval [0, π] (so

they are elements of B([0, π])) Show that k sin − cos k = 2.
5
Metric Spaces

For the notions of continuity and limit in R or C, it is not the fact that these structures are ordered that makes
it all work, rather it is the notion of distance that prevails. In this chapter, we go beyond normed vector spaces to the
realm of ‘metric spaces’. These are one of the most widely available sources of interesting spaces and shapes that we
can study limits and continuity on.

5.1 Limit of a function and limit of a sequence are really the same
thing

In calculus we meet the definition of a limit of a function on the real num-


bers.

lim x→a f ( x) = ` means . . .

(∀ > 0)(∃δ > 0)(∀x ∈ R) 0 < |x − a| < δ =⇒ | f ( x) − `| < .

Then perhaps we learn about sequences of real numbers and what it


means for {sn } to converge to a real number `.

limn→∞ sn = ` means . . .

(∀ > 0)(∃N ∈ N)(∀n ∈ N) n > N =⇒ |sn − `| < .

Are these notions of limit different? One of these definitions of a limit


applies to a function f : R → R, and the other applies to a sequence of real
numbers. The problem lies in understanding what a sequence is.
Definition 5.1.1 (Sequence). A sequence of elements of a set X is a function
s : N → X.
We often write s(n) simply as sn . We now use the idea of a ball from
the previous section to recast the two definitions above. Here, the normed
vector space we are contending with is simply the real numbers equipped
with the absolute value norm.
Definition of limit of f using balls:
(∀B (`))(∃Bδ (a) \ {a}) f (Bδ (a) \ {a}) ⊆ B (`).
96 john bamberg

Definition of limit of s using balls:

(∀B (`))(∃( N, ∞) ∩ N) s(( N, ∞) ∩ N) ⊆ B (`).

What we will see later is that both of these definitions are of the form

(∀B)(∃A) f ( A) ⊆ B
where A and B are suitably defined sets (like a punctured open set).
Using the ball-definition allows us to be flexible with the space we are
working in, and what notion of distance or closeness that we can choose to
adopt.

5.2 A good definition?

In the last chapter, we explored continuity of functions on normed vector


spaces and gave the “ball”-definition there. The definition of continuity
comes about from thinking of drawing the function without lifting your 
pen. And lifting your pen and moving to another spot would violate conti-
nuity.
If the function was ‘broken’ as we see in the figure, then we could find
an interval of R for which there is an unresolvable gap in the image of f . 0
So a function is NOT continuous, if there is some a ∈ R and some width
Figure 5.1: A function f : R → R that is
 ∈ R+ such that not continuous.
f (Bδ (a)) 1 B ( f (a))
for every δ ∈ R+ . We saw this in Example 4.6.3. What about the definition
of a limit?
Many mathematicians of the late 18th century toiled hard to find a defi-
nition for the limit of a function. They wanted to delineate the ‘good func-
tions’ in calculus from the ‘bad ones’. The notion of a limit is critical in
distilling the infinitesimal approximation of a rate of change; the derivative
of a function1 . It is also vital in a water-tight theory of integration and in 1
Indeed, Newton and Leibniz did not have
order to have a Fundamental Theorem of Calculus (i.e., a way to invert a precise definition of a limit and thought
that you could get by without having one!
differentiation). The ensuing 150 years of mathematical
development proved them wrong.
Example 5.2.1. The series
1 1 1
1+ + + +···
1×2 2×3 3×4
does have a finite sum, namely the number 2. Do you see why? (Hint: Take
the difference of this series with a similar one that starts at the second
term).
Now the popular Harmonic series 1 + 1/4 + 1/9 + 1/16 + 1/25 + · · ·
was shown to have a finite sum by Euler. This is because this series is
bounded term for term by the one we have above. It turns out that the value
of it is π2 /6, but to show this rigorously requires a precise definition of a
limit.
“When the values successively attributed to
It was Bernhard Bolzano (1817), Augustin-Louis Cauchy (1821), and a particular variable approach indefinitely a
fixed value so as to differ from it by as little
Karl Weierstrass (1859/60) who made precise the definition of a limit as we as one wishes, this latter value is called the
know it today. limit of the others.”

–Augustin-Louis Cauchy.
fundamental concepts in mathematics 97

Example 5.2.2 (A sequence in R2 ). Define a sequence in R2 by 1

n 1
xn : = ( , ). 0.8
n+1 n
0.6
Does this sequence converge to something?
By the graph, it seems that the sequence gets closer and closer to the
0.4
point (1, 0). Let’s try and prove this. Suppose  > 0. On the back of an

envelope, I worked out that choosing N > 2/ will do the job, as you will 0.2

now see.
0
√ 0 0.2 0.4 0.6 0.8 1
Proof. Suppose  > 0. Choose N to be an integer greater than 2/, and
suppose n is an integer such that n > N. Figure 5.2: The sequence {xn }, where only
√ the values of the sequence are plotted (and
Then n > 2/
the domain isn’t).
Then n22 <  2 √
as N > 2/.
Then n12 + n12 <  2 by rearranging the equation.
Then (n+11)2 + n12 <  2 since 2 = 1 + 1.
q since (n+11)2 < n12 .
Then (n+11)2 + n12 < 
by taking square roots of each side.
+1 , n )k < 
Then k( n−1 1
by definition of the Euclidean norm.
Then k( n+1 − 1, 1n )k < 
n
since n
n+1 −1 = n
n+1 − n+1
n+1 = −1
n+1 .
Then kxn − (1, 0)k <  by definition of xn .
Therefore, there exists an integer N such that if n > N, then
kxn − (1, 0)k < . Therefore, {xn } converges to (1, 0). 

Now we will look at generalising the notion of distance one step further
than norms on vector spaces. We will throw away the linear structure so
that we are left with a metric on a set.

5.3 Metrics

A norm k · k can be used to find the distance between two elements x and y
of a vector space V by simply computing kx − yk. This gives us a function
that takes pairs from V × V and gives them a non-negative value in R. We
will see that this idea of measuring distance can be extended beyond norms
on vector spaces; we just need a similar type of function on a set X which
has the same properties as the example arising from a norm. The great analyst Maurice Fréchet (1878
– 1973) introduced the idea of a metric
Definition 5.3.1 (Metric). Given a nonempty set X, a function d : X × X → on a set in his doctoral dissertation (Sur
quelques points du calcul fonctionnel:
R is a metric if it satisfies the following axioms: “On some points of functional calculus”),
though the term metric was first coined by
(i) d ( x, y) > 0 for all x, y ∈ X, Hausdorff.

(ii) d ( x, y) = 0 if and only if x = y,

(iii) d ( x, y) = d (y, x) for all x, y ∈ X,

(iv) d ( x, y) 6 d ( x, z) + d (z, y) for all x, y, z ∈ X.

We say that [ X, d ] is a metric space.

Notice that the function defined by d ( x, y) := kx − yk arising from a


normed vector space [V, k · k] satisfies these properties directly from the
definition of a norm.
98 john bamberg

Example 5.3.2 (A metric not arising from a norm; the discrete metric). Let
X be a set and define d : X × X → R by

1 if x , y



d ( x, y) := 
0 if x = y.

Clearly d satisfies (i), (ii) and (iii) of Definition 5.3.1. For the triangle
inequality, part (iv), let us consider x, y, z ∈ X. If x = y, then clearly
d ( x, y) 6 d ( x, z) + d (z, y) holds as d ( x, y) = 0 in this case. So suppose
x , y. If y , z, then

d ( x, y) = 1 6 d ( x, z) + 1 = d ( x, z) + d (z, y).

So assume now that y = z. We cannot have x = z since otherwise x = y.


Therefore, d (z, y) = 0 and d ( x, z) = 1, giving us d ( x, z) + d (z, y) = 1 =
d ( x, y). Hence in all cases, we have d ( x, y) 6 d ( x, z) + d (z, y).
Almost everything that works for normed vector spaces can be ex-
tended to metric spaces. The canonical open sets, the balls of normed
vector spaces, have a direct analogue in the theory of metric spaces.
Definition 5.3.3 (Balls of metric spaces). Let x be an element of a metric
space [ X, d ]. Then the (open) ball of radius r ∈ R+ about the element x is
the set
Br ( x) := {y ∈ X : d ( x, y) < r}.

Example 5.3.4 (Manhattan metric). The Manhattan metric d M on R2 is


defined by
d M (( x1 , x2 ), (y1 , y2 )) := |x1 − y1 | + |x2 − y2 |
where | · | is the absolute value norm on R.
What does a ball look like?
So we can extend the notion of a limit in a normed vector space to the
context of metric spaces.
Definition 5.3.5 (Limit of a sequence in a metric space). Let [ X, d ] be a
metric space, let {sn } be a sequence in X, and let x ∈ X. We say that x is
the limit of {sn } or that {sn } converges to x if for every  ∈ R+ , there exists
N ∈ N such that
n > N =⇒ sn ∈ B (a).
Example 5.3.6 (Post-Office metric). The Post-Office metric d p on R2 is
defined by 
kxk + kyk if x , y



d p ( x, y) := 
0 if x = y


where k · k is the Euclidean norm. What does a ball look like here?

Br ( x) = {y ∈ R2 : d p ( x, y) < r}
= {y ∈ R2 : kyk < r − kxk or x = y} Figure 5.3: The ball B3/2 ((1, 0)). Notice
that it looks like a ‘Euclidean’ ball around
= {y ∈ R2 : kyk < r − kxk} ∪ {x} the origin plus an extra point, namely,
(1, 0).
In particular, the sequence from Example 5.2.2 does not converge to
(1, 0). It turns out (see Exercise 5.6.1) that this metric does not arise from a
norm!
fundamental concepts in mathematics 99

Example 5.3.7 (Radar-screen metric). Let [V, k · k] be a normed vector


space. The radar-screen metric dr for [V, k · k] is defined by

dr ( x, y) = min{1, kx − yk}.

What does a ball look like? Does the sequence from Example 5.2.2
converge? Does this metric arise from a norm?

Example 5.3.8 (Induced metric). Let S be a subset of a metric space [ X, d ].


Then d induces a metric d S on S , where we define d S to be identical to
d on S .

5.4 Boundedness

Just as we did for normed vector spaces, we can readily extend the notion
of a bounded set, a bounded function or a bounded sequence to metric
spaces. A subset W of a metric space [ X, d ] is bounded if it is contained in
some ball:
W ⊆ Br (a), ∃a ∈ X, r ∈ R+ .
Likewise, a sequence is bounded if the whole sequence is contained in
some ball. Finally, a function whose codomain is a metric space is bounded
if its image is contained in a ball.

Example 5.4.1. Every nonempty subset S of a discrete metric space [ X, d ]


is bounded. Just take an element s ∈ S and the ball B2 ( s). Then every
element of X is contained in B2 ( s).

Example 5.4.2. The set of natural numbers N is unbounded in R with


respect to the Euclidean metric. There is no open interval big enough that
contains N.

The brilliant insight of Fréchet and Hausdorff was that the theory of met-
ric spaces has many of the properties as functions on Euclidean space have.
For example, there are unique limits of convergent sequences and such
sequences do not ramble off too infinity. In the generality of topological
spaces, this is not true!

Theorem 5.4.3. In a metric space [ X, d ], a convergent sequence has a


unique limit.

? Proof: To be done in lectures. 

Theorem 5.4.4. In a metric space [ X, d ], a convergent sequence is bounded.

? Proof: To be done in lectures. 

5.5 Cauchy sequences

The Babylonians (1800–1600BC) had a sophisticated method for finding


√ √
2. First, it was known that 2 is the length of the side of a rectangle that
has area 2, and their idea was to approximate such a square with rational
rectangles. Start with a 2 × 1 rectangle2 and draw the line ‘y = x’. This line 2
it has an area of 2
meets the other side of the rectangle with x-coordinate 1, and so we take the
100 john bamberg

average of this x-coordinate with the width of the rectangle: 23 . The next
rectangle we draw has width 32 and height 2/ 32 . This time, the diagonal
meets the top-side of the rectangle in a point with x-coordinate equal to 43 .
and so take the average of this x-coordinate with the width of the rectangle:
( 34 + 32 )/2 = 17
12 . Continuing in this way, the rectangle quickly converges

to a square with side lengths equal to 2.

4 24
3 17
1

1 3 2 1 3 1 17
2 2 12

If we look at the height of this rectangle, we obtain a sequence:

xn+1 := xn /2 + 1/xn , x1 = 1.

If we are living in a world only of rational numbers (such as the Babylo-


nians did), then we would not know what this sequence is converging to.
But we do know that what ever it is, its square is equal to 2. Notice that if it
happened that x12 = 2, then we would have

x22 = ( x1 /2 + 1/x1 )2 = x12 /4 + 1 + 1/x12 = 1


2 +1+ 1
2 = 2.

That is, the sequence would be a constant sequence where every element
would have square equal to 2. This is an example of a Cauchy sequence,
or if you like, convergence of a sequence without a designated limit. The
sequence gets closer and closer to itself! More mathematically, no matter
what window we allow, at some point, the envelope of the sequence has a
width less than that of the window.

Definition 5.5.1 (Cauchy sequence). Let [ X, d ] be a metric space and


let {xn } be a sequence in this metric space. We say that {xn } is a Cauchy
sequence, if for every  ∈ R+ , there is a number N ∈ N such that for all
n, m,
n, m > N =⇒ d ( xn , xm ) < .

Example 5.5.2. Consider the metric space on the open unit interval (0, 1)
with the Euclidean metric d. We will show that the following sequence is a
Cauchy sequence, but it has not limit in (0, 1):
1
xn : = 1 − .
10n
Let  ∈ R+ . Choose N to be an integer greater than log10 ( 1 ) and suppose
n > m > N.
Then m > log10 ( 1 ).
Then 101m < .
Then 101m − 101n < .
Then (1 − 101n ) − (1 − 101m ) < .
Then |xn − xm | < .
Then d ( xn , xm ) < .
Therefore, {xn } is a Cauchy sequence.
fundamental concepts in mathematics 101

Example 5.5.3. Back to our original example. The metric space is [Q, d ]
where d is the Euclidean metric, and the sequence is

xn+1 := xn /2 + 1/xn , x1 = 1.

Let  ∈ R+ . What should we choose for N so that n, m > N =⇒


d ( xn , xm ) < ?

Theorem 5.5.4 (Convergent implies Cauchy implies bounded). Let {sn } be


a sequence in a metric space [ X, d ].

(i) If {sn } is convergent, then it is also a Cauchy sequence.

(ii) If {sn } is a Cauchy sequence, then it is bounded.

? Proof: To be done in lectures. 

5.6 Exercises

Exercise 5.6.1. Show (by proof by contradiction) that the Post-Office metric
(Example 5.3.6) does not arise from a norm.
Exercise 5.6.2. Here we explore the Manhattan metric on Rn :
X
d (x, y) := |xi − yi |

where xi is the i-th component of x, and similarly for y.

(i) Prove that d defines a metric3 on Rn . 3


You may assume that the Euclidean norm
on R is a metric.
(ii) Assume now that n = 2. Draw the unit ball B1 (0) around the origin,
with respect to d.
(iii) Consider the sequence

(−1)n
!
sn : = 1 + , (−1)n + 1/n , n = 1, 2, 3, . . .
n

(a) Plot the first ten points of the sequence.


(b) Let ` := (1, 0). Find a positive number  so that for every n,

d ( sn , `) > .

What have we shown?

Exercise 5.6.3. Consider the following sequence


n
X 4
xn := − (−1)i .
i=1
2i − 1

(a) Compute x1 , x2 , x3 , x4 and x5 .

(b) It looks like this sequence is convergent. What do you think this se-
quence is convergent to?

(c) Simplify
|xm+1 − xm |.
102 john bamberg

(d) Let m, n ∈ N and suppose n > m. Show that

|xn − xm | 6 |xm+1 − xm |.

(e) Suppose  = 0.1. Find N ∈ N such that for all n, m > N,

|xn − xm | < .

Exercise 5.6.4. In this exercise, we will show that the sequence {1/n} is a
Cauchy sequence in the open interval (0, 2) (w.r.t., the Euclidean metric).

(i) Write the start of the proof.

(ii) Suppose, without loss of generality, that n > m > N. Show that

1 − 1 6 1
m n m

and hence make a suitable, and simple, choice for N in terms of .

(iii) Write the remainder of your proof, starting with “Choose N = . . . and
suppose m, n ∈ N such that m, n > N. Then ...”.

Exercise 5.6.5. Give an example of a sequence {xn } of real numbers (w.r.t.,


the Euclidean metric) such that

(∀ ∈ Q+ )(∃N ∈ N)(∀n ∈ N) n > N =⇒ |xn+1 − xn | < 

but is NOT a Cauchy sequence.

5.7 Complete metric spaces

Normed vector spaces are an important framework for the appli-


cations of mathematics in science and engineering, and in order to have
‘calculus’ work on normed vector spaces, we need the notion of a com-
plete normed vector space. These are also known as Banach spaces after
the great polish mathematician Stefan Banach (1892 – 1945). The Fréchet
derivative on a normed vector space is then defined as a generalisation
of the usual derivative that one encounters with real-valued functions. In
particle physics and quantum mechanics, the notion of a Hilbert space is
vitally important: it is just a complex Banach space equipped with an inner
product.
Suppose we want to find a solution to the following differential equation:

f 0 ( x) = 14 ( f ( x)2 + x2 ), f (0) = 0,

where f is a function from the closed interval [− 21 , 21 ] to R.

• How do we know that this differential equation has a solution?

• Can the solution be written out nicely in terms of functions we already


know?

By the end of this chapter, we will have a solution to this question that
uses the theory of complete metric spaces and contraction maps.
fundamental concepts in mathematics 103

Definition 5.7.1 (Complete metric space). A metric space [ X, d ] is com-


plete if every Cauchy sequence in [ X, d ] converges (to an element of [ X, d ]).

If X is a normed vector space and is complete, then we call it a Banach


space. Throughout, we will denote the Euclidean metric by dE .

Example 5.7.2. [Q, dE ] is not complete. Consider the sequence in Exam-


ple 5.5.3:
xn+1 := xn /2 + 1/xn , x1 = 1.
This is a Cauchy sequence and it gives rational approximations to the real
√ √
number 2. However, 2 is irrational and so this Cauchy sequence does
not converge in [Q, dE ].

Example 5.7.3. [Rn , dE ] is complete, but we will postpone a proof of this


until later. In fact, every finite dimensional vector space is complete.

Example 5.7.4. [(0, 2), dE ] is not complete. Consider the following Cauchy
sequence:
xn := 2 − 1/n.
The sequence is increasing but has no limit in (0, 2).

Example 5.7.5. A discrete metric space is always complete. Let [ X, d ] be


a discrete metric space and let {xn } be a Cauchy sequence in [ X, d ]. Then
there exists N ∈ N such that

d ( xn , xm ) < 1

for all m, n > N. Moreover,

d ( xn , xN +1 ) < 1

for all n > N which means that xn = xN +1 (for all n > N) by definition of
the discrete metric. So we see that {xn } converges to xN +1 .

5.8 The ring of Cauchy sequences

We have already seen the vector space of all functions F ( X, V ) from a set
X to a vector space V, and that a sequence in a metric space [ X, d ] is just a
function N → X. Let us consider just the metric space [Q, dE ] and the set
of all Cauchy sequences R in [Q, dE ]. We will show that R is a ring under
suitably defined operations of addition and multiplication.

Addition of Cauchy sequences: Given two Cauchy sequences {xn } and


{yn } in [Q, dE ], define a new sequence {zn } by the term-by-term sum:
zn := xn + yn .

Multiplication of Cauchy sequences: Given two Cauchy sequences {xn }


and {yn } in [Q, dE ], define a new sequence {zn } by the term-by-term
product: zn := xn · yn .

The two constructions above yield Cauchy sequences.

Lemma 5.8.1. The term-by-term sum and product of two Cauchy se-
quences of [Q, dE ] are also Cauchy sequences.
104 john bamberg

 Proof: To be done in lectures. 

The zero-Cauchy sequence is just the constant sequence 0 and the unit-
Cauchy sequence is just the constant sequence 1. We leave it as an exercise That is, 0 is the sequence 0, 0, 0, . . ., and 1
is the sequence 1, 1, 1, . . ..
to verify that these two sequences serve as additive and multiplicative
identities for R (respectively).

Definition 5.8.2 (Null sequence). A sequence {xn } in [Q, dE ] is null if it


converges to 0.

By Theorem 5.5.4, a null sequence is a Cauchy sequence. In fact, the set


of all null sequences not only forms a subring of R, but it is an ideal.

Theorem 5.8.3. The set of all null sequences of [Q, dE ] forms an ideal of
R.

? Proof: To be done in lectures. 

This means that we can take a quotient of R by the null sequences. In


this course, this means that there is a nice equivalence relation on R, which
we will use to define the real numbers.

5.9 Construction of the real numbers

Definition 5.9.1 (Equivalence of rational Cauchy sequences). For two


Cauchy sequences {xn } and {yn } in [Q, dE ], we say that they are equivalent
and write {xn } ∼ {yn } if and only if the term-by-term difference {xn } − {yn } is
a null sequence.

Example 5.9.2. The sequence {1/n} is equivalent to {−1/n}, which is in


turn equivalent to the zero-sequence!

Lemma 5.9.3. The relation ∼ on R is an equivalence relation.

? Proof: To be done in lectures. 

Now for the construction of the real numbers:

R
The real numbers are defined to be the equivalence classes of ∼.
6

So what is Q as a subset of R?

Lemma 5.9.4. The function x → x, where x is the constant sequence 4

x, x, x, . . . is a one-to-one function from Q to R. Moreover, if x, y ∈ Q, then


x ∼ y if and only if x = y.
2
Example 5.9.5. The only real numbers representable by Cauchy sequences
of integers are the integers! Let’s see why this is true. Suppose we have a
Cauchy sequence {xn } of integers. Then the difference of any two unequal 0
0 2 4 6 8 10
members of the sequence is at least 1. So let  = 12 . Since {xn } is a Cauchy
sequence, there is some N ∈ N such that for every n, m > N we have Figure 5.4: A Cauchy sequence of inte-
gers will eventually become a constant
|xn − xm | <  = 21 . This condition forces xn = xm (as they are integers) sequence.
which in turn implies that the subsequence {xn : n > N} is the constant
sequence {xN +1 }.
fundamental concepts in mathematics 105

5.10 Arithmetic on R

Now that we have finally defined the real numbers by simpler things (i.e.,
rational numbers), we can review how addition and multiplication are
defined on real numbers in the way we have defined them. In fact, we have
essentially already seen our arithmetic works on R when we defined the
term-by-term sums and products of Cauchy sequences.

Addition on R: Given two real numbers4 [{xn }] and [{yn }], define 4
Remember, a real number is an equiv-
alence class of Cauchy sequences of
rationals.
[{xn }] + [{yn }] := [{xn + yn }].

Multiplication on R: Given two real numbers [{xn }] and [{yn }], define

[{xn }] · [{yn }] := [{xn yn }].

Are these operations well-defined? Let us look at addition and leave


multiplication as an exercise. Suppose we have [{xn }] = [{xn0 }] and [{yn }] =
[{y0n }]. Then {xn } and {xn0 }, and {yn } and {y0n }, differ by null sequences:

• {xn0 } = {xn } + {un }, where {un } converges to 0;

• {y0n } = {yn } + {vn }, where {vn } converges to 0.

So
{xn0 } + {y0n } = {xn } + {yn } + {un } + {vn }

and {un } + {vn } is also a null sequence as the set of null sequences is a
subring of the Cauchy sequences. Therefore,

{xn0 } + {y0n } ∼ {xn } + {yn }

and hence
[{xn0 }] + [{y0n }] = [{xn }] + [{yn }].
So addition is well-defined.
In fact, it would then be not difficult to show (though tedious) that R
forms a ring under these operations, as we would hope it would! To show
that R is a field requires showing that every nonzero element has an in-
verse, which we have left as Exercise 5.18.4.

5.11 What does 0.99999 . . . mean?

A decimal expansion such as 0.717171717 · · · can be realised from a se-


quence of rational numbers:

0.71, 0.7171, 0.7171, 0.717171, . . . .

In other words, the real number we have written above is the equivalence
class of the sequence
{71/102n }.

It turns out that this sequence is actually a rational number since

{71/102n } ∼ {71/99}.
106 john bamberg

What about 0.999999 . . .? This is really the real number whose Cauchy
sequence of rationals is
{1 − 1/10n }.

Consider the unit-sequence 1 = {1}. If we look at the term-by-term differ-


ence of these two sequences, we obtain

{1/10n }

which converges to 0; it is a null-sequence! So we formally consider the


two sequences {1 − 1/10n } and 1 to be the same: they are equivalent
Cauchy sequences of rational numbers and so represent the same real
number. That is,
1 = 0.99999 · · · .

5.12 Aside: other ways to construct R

There are two other mainstream ways of defining the real numbers in math-
ematics. One is the famous ‘Dedekind cuts’ method, the other is more
abstract and arises in the theory of field extensions. The latter can be found
in a book on ring theory and you would find it under the topic of transcen-
dental extensions of the rational numbers. We will explain here the former
and easier notion of Dedekind cuts (see also page 194 of Liebeck’s book).
A nonempty subset X of the rational numbers Q is called a Dedekind cut
if it satisfies the following conditions:

• for any x ∈ X, we have that X contains all the rationals less than x;

• X has no maximum.

For example, X := {x ∈ Q : x < 25 } is a Dedekind cut.


Now for the construction of the real numbers:

R
The real numbers are the set of all Dedekind cuts of Q.

So what is Q as a subset of R? Well, for q ∈ Q, the Dedekind cut


Xq := {x ∈ Q : x < q} represents q. We can make this notion stronger
by noting that the map q 7→ Xq is a one-to-one function. Moreover, real

numbers like 2 are defined as

{q ∈ Q : q2 < 2 or q 6 0}.

We can also define addition and multiplication on R.

Addition on R: Given two real numbers5 X and Y, define 5


Remember, a real number in this small
section is a Dedekind cut of rationals.
X + Y := {x + y : x ∈ X, y ∈ Y}.

One can show that this operation gives us a Dedekind cut.


fundamental concepts in mathematics 107

Multiplication on R: This one is more complicated. Given two positive


real numbers X and Y, define

X · Y := Q− ∪ {0} ∪ {xy : x ∈ X ∩ Q+ , y ∈ Y ∩ Q+ }.

Something similar can be done for non-positive real numbers. Again, it


is not difficult to see that this operation gives us a Dedekind cut.
From a Dedekind cut X, select an element
So why does it give us a structure that it the same as the set of equiv- x1 ∈ X and an element y1 ∈ Q\X. Now
define for each n > 2 the following two
alence classes of Cauchy sequences? This question does not have a short coupled sequences where we take averages
answer, though the interested reader may want to delve deeper to find out at each step:
why these two definitions are equivalent ways to construct R.  x +y x +y
 n−1 2 n−1 if n−1 2 n−1 ∈ Q


xn : = 
 xn−1
 otherwise.
 x +y x +y
5.13 The topology of metric spaces  n−1 2 n−1 if n−1 2 n−1 < Q


yn : = 
yn−1
 otherwise.
In the study of topology which you will see more of in 3rd year, the key It turns out that {xn } is a Cauchy sequence
notion is that of an open set. This is yet another level of abstraction of of rational numbers.
“closeness” and allows one to analyse interesting spaces that are not metric
spaces. We will see how this works in metric spaces as preparation for a
course in point-set topology.

Definition 5.13.1 (Open set). A subset S of a metric space [ X, d ] is open if


and only if S is a union of open balls.

In other words, every point x ∈ S lies inside a ball Br ( x) ⊆ S .

Example 5.13.2. The trivial examples of open sets are the empty set ∅ and
the whole metric space X itself. An open ball itself is open.

Example 5.13.3. Subspaces of metric spaces can offer counter-intuitive


ways to create open sets. Take for instance closed interval [0, 1] equipped
with the Euclidean metric. Then the subset [0, 21 ) is an open set! To see this,
suppose we have an element x ∈ [0, 12 ). Then either x ∈ (0, 12 ), and we
can easily find an open ball centered at x contained in [0, 21 ), or x = 0. If
x = 0, then we take B 1 ( x), which is just [0, 14 ) and is contained in [0, 12 ].
4

Example 5.13.4. In a discrete metric space, every singleton subset is open.


Take a singleton subset {x}. Then the only element of this set is x and the
ball B 1 ( x) is just {x}!
2

The next lemma tells us that in a discrete metric space, every subset is
open. The two properties outlined in the lemma essentially give us the ax-
ioms of a topological space, which you will meet in 3rd year mathematics.

Lemma 5.13.5. Let [ X, d ] be a metric space.

(a) An intersection of finitely many open sets of [ X, d ] is an open set.

(b) Any union of open sets of [ X, d ] is an open set.

 Proof:

(a) Let {O1 , O2 , . . . , On } be a finite collection of open sets in [ X, d ]. Let


x ∈ ∩ni=1 Oi . So x ∈ Oi for every i. Then for each i, there is a ball Bi ( x)
contained in Oi . Now {i : i = 1, 2, . . . , n} is a finite set of real numbers
108 john bamberg

and so there exists a minimum value, say min of this set. Then Bmin ( x)
is contained in each Oi , and therefore, is contained in ∩ni=1 Oi . Thus,
∩ni=1 Oi is open.

(b) Let {Oi : i ∈ I} be a collection of open sets of [ X, d ], where I is just


an index set. Let x ∈ ∪i∈I Oi . So there exists i ∈ I such that x ∈ Oi .
Since Oi is open, there exists a ball B ( x) contained in Oi . This ball
is certainly contained in ∪i∈I Oi , and so we have shown that ∪i∈I Oi is
open.

A closed set is something like a closed interval, such as [0, 1]. One of
the most common mistakes of students is that they think that the opposite
of open is closed, as we would in normal everyday language. However,
in mathematics, this is not true! We will see examples of sets which are
neither open or closed, and examples which are both open and closed.

Definition 5.13.6 (Closed set). A subset C of a metric space [ X, d ] is closed


if and only if its complement X\C is open.

Example 5.13.7. A set which is both closed and open is said to be clopen.
For example, the subset
{x ∈ Q : x2 > 2}

of [Q, dE ] is clopen. We will leave the proof of this to Exercise 5.18.6.

One of the most useful results in the theory of metric spaces is the char-
acterisation of closed subsets by convergent sequences.

Lemma 5.13.8. Let C be a closed subset of a metric space [ X, d ]. Then


every convergent sequence {xn } of elements of C has its limit in C.

 Proof: To be done in lectures. 

And conversely . . .

Lemma 5.13.9. Let Y be a subset of a metric space [ X, d ] such that every


convergent sequence {yn } of elements of Y has its limit in Y. Then Y closed.

 Proof: To be done in lectures. 

The next result will be important when we look at sequences of func-


tions, and is the one of the first key results in the theory of Hilbert spaces.

Theorem 5.13.10. Let [ X, d ] be a metric space and let Y be a subset of X.

(i) If [ X, d ] is complete and Y is closed then [Y, d Y ] is complete.

(ii) If [Y, d Y ] is complete, then Y is closed.

 Proof: To be done in lectures. 


fundamental concepts in mathematics 109

5.14 Continuity revisited

Recall from Definition 1.7.15 that is f : X → Y is a function, and S ⊆ Y,


then the preimage of S under f is the subset of X defined by

f ← (S ) := {x ∈ X : f ( x) ∈ S }.

The topological definition of continuity does not need  or δ, and is a beau-


tiful and succinct way to define a continuous function on metric spaces:
the preimage of any open set is open. We prove now that the topological
definition fits with our usual notion of continuity.

Theorem 5.14.1. Let [ X, d ] and [Y, e] be two metric spaces and let f : X →
Y be a function. Then f is continuous if and only if the preimage of any
open set of Y is an open set of X.

 Proof: To be done in lectures. 

5.15 Function spaces as metric spaces 1

Consider the following functions on the closed interval [0, 1]:

fn ( x ) : = x n , n ∈ N.

The functions seem to converge to another function:



0 1
1 if x = 1



f ( x) : =  Figure 5.5: Graphs of fn where n is a power
0 otherwise.


of 2.

But what does convergence of functions mean? The problem is that we


have drawn the function, so what we are seeing is that the graph of fn
converges to the graph of f . In other, words

for every x, the limit of fn ( x) as n → ∞ is f ( x).

This is what we call point-wise convergence. We see here that with this
notion of convergence, the limit of a sequence of continuous functions is a
discontinuous function!
We shall explore now the notion of uniform-convergence of functions
which preserves continuity. The difference between the two forms of con-
vergence is that instead of thinking about the values of fn and what they
tend to, we just look at the functions themselves.

Example 5.15.1. Consider the set of bounded functions B([0, 1], R) on


the closed interval [0, 1]. We have seen in Section 4.7 that B([0, 1], R) is
a normed vector space when equipped with the ‘sup’-norm (see Example
4.7.3). This norm gives us the sup-metric so that B([0, 1], R) can be viewed
as a metric space:

d∞ ( f , g) := k f − gk = sup{| f ( x) − g( x)| : x ∈ [0, 1]}.

This is gives us a notion of the distance between two functions. So let us


look at the previous example, where we had a sequence of functions { fn }
110 john bamberg

that point-wise converge to a discontinuous function f . Then for each


n ∈ N, we have

d∞ ( fn , f ) = sup{| fn ( x) − f ( x)| : x ∈ [0, 1]}


= sup{xn − 0 : x ∈ [0, 1)}
= 1.

So with respect to the sup-metric, the sequence { fn } does not converge! 1

Example 5.15.2. Now we look at a different sequence of functions, this


time on the closed interval [−π, π]:
1
gn ( x ) : = cos(nx), n ∈ N. −π 0 π
n
This time, it seems the drawing of gn converges to the x-axis, that is,
the zero function 0. We will see in this example that it also converges in
[B([−π, π], R), d∞ ] to the same function (which is continuous (by the way). −1
Now

d∞ (gn , 0) = sup{|gn ( x) − 0| : x ∈ [−π, π]} Figure 5.6: Graphs of gn where n is a


power of 2.
= sup{ n1 | cos(nx)| : x ∈ [−π, π]}
= 1
n sup{| cos(nx)| : x ∈ [−π, π]}
1
= .
n
So here we see that d (gn , 0) tends to 0 as n tends to infinity, and therefore,
gn , as a function, tends to 0.

These two examples are interesting from another perspective. The first
sequence of functions (Example 5.15.1) was not convergent in [B([0, 1], R), d∞ ],
and nor was it a Cauchy sequence (see Exercise 5.18.9). The second exam-
ple (Example 5.15.2) is a convergent sequence in [B([−π, π], R), d∞ ], and
so by Theorem 5.5.4, it is also a Cauchy sequence. In fact, we will prove
now and important result about bounded functions; they form a complete
metric space.

Theorem 5.15.3. For any nonempty set X, we have that [B( X, R), d∞ ] is
complete.

? Proof: To be done in lectures. 

Theorem 5.15.4. For any nonempty set X, the subspace of bounded contin-
uous functions [C( X, R), d∞ ] is closed and hence complete.

 Proof: To be done in lectures. 

5.16 Contraction maps

Definition 5.16.1 (Lipschitz function). Suppose [ X, d ] and [Y, e] are metric


spaces. A function f : X → Y is Lipschitz if there is a constant c ∈ R+
such that for every x1 , x2 ∈ X we have

e( f ( x1 ), f ( x2 )) 6 c · d ( x1 , x2 ).

We call the smallest possible value of c the Lipschitz constant of f .


fundamental concepts in mathematics 111

Example 5.16.2.

(a) The function


f : R → R : x 7→ c|x|
is Lipschitz with Lipschitz constant c.

(b) g : R → R : x 7→ x2 is not Lipschitz.

(c) h : [−1, 1] → R : x 7→ x2 is Lipschitz with Lipschitz constant 2.

Lemma 5.16.3. Lipschitz functions are continuous.

 Proof: Suppose [ X, d ] and [Y, e] are metric spaces and suppose f : X →


Y is Lipschitz with Lipschitz constant c ∈ R+ . Suppose  > 0 and let x ∈ X.
Choose δ = /c. Now if y ∈ Bδ ( x), then f (y) would lie inside the closed
ball
Bc·d ( x,y) ( f ( x))

which in turn is a subset of B ( f ( x)). So f (Bδ ( x)) ⊂ B ( f ( x)), and hence,


f is continuous at every point. 

In fact, the set of bounded Lipschitz functions in B( X, R) form a closed


subset of C( X, R), and hence they are complete.

Definition 5.16.4 (Contraction map). Suppose [ X, d ] and [Y, e] are metric


spaces. A Lipschitz function f : X → Y is a contraction map if its Lipschitz
constant is less than 1.

A fixed point of a function f : X → X is an element x ∈ X such that


f ( x) = x.

Example 5.16.5.

(a) The map ( x, y) 7→ ( x2 + x − 1, y) on R2 has two fixed points: (0, 0)


and (−1, 0). We will see from the theorem below that it is not a con-
traction map.
 
0.4 −0.5 6
(b) The map ( x, y) 7→ ( x, y)   + (1, 0) is a contraction map . 6
why?
0.5 0.5
Now it fixes the point ( x, y) if and only if (0.4x + 0.5y + 1, −0.5x +
0.5y) = ( x, y). So we see that this contraction map has a unique fixed
point, namely (10/11, −10/11).

Theorem 5.16.6 (Banach’s Contraction Mapping Theorem). Suppose


[ X, d ] is a nonempty complete metric space, and we have a contraction map
f : X → X. Then there exists a unique point x ∈ X such that f ( x) = x.

? Proof: To be done in lectures. 

Example 5.16.7. Now we return to our original example. Suppose we want


to find a solution to the following differential equation:

f 0 ( x) = 41 ( f ( x)2 + x2 ), f (0) = 0,

where f is a function from the closed interval [− 12 , 12 ] to R.

• How do we know that this differential equation has a solution?


112 john bamberg

• Can the solution be written out nicely in terms of functions we already


know?

First we use the Fundamental Theorem of Calculus to see this differential


equation as an integral equation:
Z x
1 2 2
f ( x ) − f (0) = 4 ( f ( x) + x )dx
0

and hence Z x
2
f ( x) = 1
4 ( f ( x) + x2 )dx.
0
To make things more difficult, we will define a function Φ on the metric
space of continuous functions C([− 21 , 21 ], R). For such a function f , de-
Rx
fine Φ( f ) to be the function that maps x to 0 14 ( f ( x)2 + x2 )dx. It is not
difficult to see that
Φ(B1/2 (0)) ⊂ B1/2 (0),

that is, if we look at the space of functions that are distance at most 12 from
the zero function 0 with respect to the sup-metric, then Φ maps this set into
itself. It turns out that Φ is a contraction map on B1/2 (0) as we will now
see.
Let f1 , f2 ∈ B1/2 (0). Then
Z
kΦ( f1 ) − Φ( f2 )k∞ = k 41 ( f12 − f22 )k∞
0
R Rx
where 0 ( f12 − f22 ) is the map that takes x to the definite integral 0 ( f12 −
f22 )dx. Now we will use a fact from calculus that if f is continuous on the
R
interval [a, b] and c ∈ [a, b], then k c f k 6 (b − a)k f k. So

1 1 −1
kΦ( f1 ) − Φ( f2 )k∞ 6 ( − )k f12 − f22 k∞
4 2 2
1
6 k f1 − f2 k∞ k f1 + f2 k∞
4
1
6 k f1 − f2 k∞ (k f1 k∞ + k f2 k∞ )
4
1 1 1
6 k f1 − f2 k∞ ( + )
4 2 2
and so
1
kΦ( f1 ) − Φ( f2 )k∞ 6 k f1 − f2 k∞ .
4
Therefore, Φ is a contraction map on B1/2 (0) with Lipschitz constant at
most 41 . Now B1/2 (0) is a closed subsets of a complete metric space, and
so must also be complete. Therefore, by Banach’s Contraction Mapping
Theorem 5.16.6, there exists a unique fixed-point of Φ.

That is, there is a continuous function f ∈ B1/2 (0) such that

Φ( f ) = f ,
Figure 5.7: A graph of the unique solution
which means, that f is a solution to our original differential equation. to the DE.
According to mathematica, there is no nice way to write this function f in
terms of functions we know!
fundamental concepts in mathematics 113

5.17 Aside: Iteration Function System

An Iteration Function System (IFS) in Rm (or a complete metric space,


if you prefer) is a finite set of contraction maps {w1 , . . . , wm } and they are
used to generate interesting fractals and dynamical systems. Let H (Rm ) be
the set of all closed and bounded subsets of Rm . We can define a metric on
this space known as the Hausdorff distance:

h( A, B) := max{min{dE (a, b) : b ∈ B} : a ∈ A}.

So the Hausdorff distance is the greatest of all the distances from a point in
one set to the closest point in the other set. It turns out that [H (Rm ), h] is a
complete metric space.
Now we define a function G on H (Rm ), known as the Hutchison opera-
tor:
H ( B) := w1 ( B) ∪ w2 ( B) ∪ · · · ∪ wn ( B).
This gives us a contraction map on H (Rm ). So by Banach’s Contraction
Mapping Theorem 5.16.6, there exists a unique closed and bounded set B
that is a fixed-point of H. This set B is the fractal we want to generate.

Example 5.17.1 (Barnsely’s Fern). Consider the following four functions7 7


In fact, they are affine maps.
on R2 :  
a b
fi ( x, y) := ( x, y)   + (e, f ), i = 1, 2, 3, 4
c d
where we extract the parameters from the following table:
i a b c d e f
1 0 0 0 0.16 0 0
2 0.85 -0.04 0.04 0.85 1.6 0.85
3 0.2 0.23 -0.26 0.22 1.6 0.07
4 -0.15 0.26 0.28 0.24 0.44 0.07
The mathematica code to generate the fractal is:
Needs["ProgrammingInMathematica‘IFS‘"]
Needs["ProgrammingInMathematica‘ChaosGame‘"]
f1 = AffineMap[{{0, 0, 0}, {0, 0.16, 0}}];
f2 = AffineMap[{{0.85, 0.04, 0}, {-0.04, 0.85, 1.6}}];
f3 = AffineMap[{{0.2, -0.26, 0}, {0.23, 0.22, 1.6}}];
f4 = AffineMap[{{-0.15, 0.28, 0}, {0.26, 0.24, 0.44}}];
fern = IFS[{f1, f2, f3, f4}]
ChaosGame[fern, 50000, Coloring -> Automatic]

5.18 Exercises

Exercise 5.18.1. Define d : Q × Q → R by d ( x, y) = |x − y| for all x, y ∈ Q.


Show that the metric space (Q, d ) is not complete.
Exercise 5.18.2. Let D([−1, 1], R) be the set of differentiable functions
from [−1, 1] to R.

(a) For each n ∈ N, define the following function fn : [−1, 1] → R:


p Figure 5.8: Barnsely’s Fern: a fixed point
fn ( x) = x2 + 1/n2 . of a contraction map.

Draw f1 , f2 and f3 .
114 john bamberg

(b) Is fn differentiable for every n?

(c) Which function f do you think the sequence { fn } converges to in


B([−1, 1], k · k∞ )? Give a proof. (Hint: You may use the fact that
the maximum of fn ( x) − f ( x) is attained at x = 0.)

Proof. Let  > 0. We want to find N ∈ N such that if n > N, then

k fn − f k∞ < .

...


(d) So what can you say about D([−1, 1], R)?

Exercise 5.18.3. For each i ∈ N, let mi be the set of Cauchy sequences on


[Q, dE ] whose i-th element is zero. Show that mi is an ideal of the ring of
all Cauchy sequences on [Q, dE ].
Exercise 5.18.4. Let {xn } be a Cauchy sequence of rationals, and suppose it
is not equivalent to the zero-sequence. Show that the real number [{xn }] has
a multiplicative inverse.
Exercise 5.18.5. Here we show that metric spaces are Hausdorff ; a set of
spaces in topology which are key to generalising calculus to nice shapes
and geometries. Let [ X, d ] be a metric space and suppose x, y ∈ X with
x , y. Find two disjoint open balls Bx ( x) and By (y).

Hausdorff Property for [ X, d ]: For every pair of distinct points


x, y ∈ X, there exist disjoint open sets U x and Uy containing x and y
respectively.

Exercise 5.18.6. Show that the the subset {x ∈ Q : x2 > 2} of [Q, dE ] is


clopen.
Exercise 5.18.7. Prove that all functions from a discrete metric space to any
given topological space are continuous.
Exercise 5.18.8. Let [ X, d ] and [Y, e] be metric spaces and suppose f : X →
Y is continuous. Prove that if C ⊆ Y is closed, then f ← (C ) is closed in X.
Exercise 5.18.9. Show that the sequence of functions defined in Example
5.15.1 is not a Cauchy sequence.
Exercise 5.18.10. Let µ ∈ R+ and let T : R → R be the function defined
by T ( x) := 4µx(1 − x). This is the famous logistic map from the theory
of dynamical systems. For what values of µ is T a contraction map? (The
metric on R is the usual Euclidean one).
Exercise 5.18.11. Consider the differential equation over R

f 0 ( x) = f ( x)

where f (0) = C. Use Banach’s Contraction Mapping Theorem 5.16.6 to


show that this differential equation has a unique solution.
6
Compactness

In this chapter we explore the generalisation of finite sets to compact


spaces, which turn out to be the right kind of definition for nice functions
to behave as they ought to. Continuous functions are bounded and have
maxima and minima, and closed sets really look like closed sets as we
intuitively think of them. “We have already pointed out and will
recognize throughout this book the impor-
tance of compact sets. All those concerned
6.1 From finiteness to compactness with general analysis have seen that it is
impossible to do without them”

Calculus and analysis on a finite set X is very straight-forward: – Maurice Fréchet

The finite world

The finite world


All functions f : X → R are bounded.
All functions f : X → R attain a maximum.
All sequences of elements of X have constant subsequences.
All covers have finite subcovers.

The latter property does not make much sense for the moment, so we
will explore a particular example. My topology teacher at La Trobe Uni-
versity, John Banks, described a compact set as something where you can
measure temperature. That is, there is a continuous function to the reals
whose image is bounded and attains a maximum. For example, the sphere
is such a set; on the planet earth we can measure the temperature sensibly
at each point of its surface. There is a maximum temperature, and it is a
continuous function.

Example 6.1.1 (Aside: Motivating the definition of compact). Suppose


we have a metric space [ X, d ]. Let K ⊆ X and we will see what happens
to K if we stipulate certain conditions on its image under an arbitrary
continuous function f . We would like the first property to hold above, so
we will suppose suppose that f ( K ) is bounded, that is, there is a constant
C f such that | f (k)| < C f for all k ∈ K. Alternatively, we can think of
boundedness in a finer sense. Instead of bounding f ( K ) with one large
open ball, we can cover f ( K ) with a collection of open balls. Let’s see why
this would give us a bounded set. Suppose we have a collection {Ui : i ∈ I}
of bounded open sets that cover up f ( K ), where I ⊆ N. So f ( K ) ⊆ ∪i∈I Ui .
116 john bamberg

Then the preimages { f ← (Ui ) : i ∈ I} form a covering of K by open


sets, since f is continuous. Moreover, each f ( f ← (Ui )) is bounded since
f ( f ← (Ui )) ⊆ Ui .
Now suppose we can reduce this cover of K to just a finite collection
of bounded open sets. That is, we have a finite subset J of I such that K is
contained in the union of { f ← (U j ) : j ∈ J}. Then after a small calculation,
we would have
f ( K ) ⊆ ∪ j∈J Uk .

Now a union of finitely many bounded sets is a bounded set, and so ∪ j∈J Uk
is bounded. So what we have done is shown that if we can reduce any cover
of K by open sets to a finite one, then it ensures that f ( K ) is bounded. This
is the first thing we need in order for a maximum and minimum of f ( K ) to
exist. We also would like f ( K ) to closed, and this will also be guaranteed.

6.2 Covers and the definition of compact sets

Definition 6.2.1 (Cover). Let K be a subset of X, I ⊆ N, and suppose


{Ui : i ∈ I} is a set of subsets of X such that

K ⊆ ∪i∈I Ui .

We say that {Ui : i ∈ I} is a cover for K.

If [ X, d ] is a metric space and K ⊆ X, then an open cover of K is a cover


consisting only of open subsets of X. Similarly, a cover {Ui : i ∈ I} is
finite if I is finite. A subcover of a cover {Ui : i ∈ I} for K is another cover
{Ui : i ∈ J} for K where J ⊆ I.

Example 6.2.2. Let K the set of positive reals R+ , and let Ui be the open
interval (0, i), for each i ∈ N. Then {Ui : i ∈ N} is an open cover for K.

However, no matter what you do, you cannot find a finite subset of these
open intervals which will cover all of K.

Mathematicians liken compactness to “almost finiteness”. In general-


ising the finite world to the infinite world, one way is to see whether the
property preserved will work in the compact world. The notion of compactness arose out of the
notion of a uniformly continuous function.
Definition 6.2.3 (Compact). A subset K of a metric space [ X, d ] is compact It was shown by Heine that a continuous
map on a closed and bounded subset of
if every open cover of K contains a finite subcover. a metric space is uniformly continuous.
Let [ X, d ] and [Y, e] be two metric spaces,
So the example above shows that R+ is not compact: there exists an and let f : X → Y be a function. We
open cover which does not have a finite subcover. We will see later that say that f is uniformly continuous if
(∀ ∈ R+ )(∃δ ∈ R+ )(∀x, x0 ∈ X )
the closed intervals of R are compact, and that many of the nice bounded
d ( x, x0 ) < δ =⇒ e( f ( x), f ( x0 )) < .
sets that naturally occur in nature are examples of compact sets. Notice that
it depends on what an open set is, and so what the metric is. We will also Notice that uniform continuity is a global
property of functions, whereas continuity at
a point is a local property.
fundamental concepts in mathematics 117

see some strange compact sets from considering counter-intuitive metric


spaces.
We should also comment now on our motivating example above; the
converse turns out to also be true. By Exercise 6.5.5, if every continuous
function f : K → R is bounded, then K is compact.

The compact world

The compact world


All continuous functions f : X → R are bounded.
All continuous functions f : X → R attain a maximum.
All sequences of elements of X have convergent subsequences.
All open covers have finite subcovers.

Lemma 6.2.4. If [ X, d ] is a metric space and K is a finite subset of X, then


K is compact.

 Proof: Suppose {Ui : i ∈ I} is an open cover for K. By definition


of union, for each k ∈ K, there exists ik ∈ I such that k ∈ Uik . Since
K is finite, {Uik : k ∈ K} is finite and we also have automatically that
K ⊆ ∪k∈K Uik . Therefore, we have a finite subcover of {Ui : i ∈ I}, and so K
is compact. 

Lemma 6.2.5. The closed interval [0, 1] is compact.

? Proof: To see this, suppose {Ui : i ∈ I} is an open cover for [0, 1]. Now
consider the following set A:

A := {x ∈ [0, 1] : [0, x] can be covered by finitely many Ui ’s}.

Notice that A is nonempty since 0 ∈ A (as [0, 0] = {0} and we can find one
element of {Ui : i ∈ I} containing 0). Moreover, 1 is an upper bound for A,
and so by the least upper bound property for R, we see that a least upper
bound α for A exists.
Suppose α < 1. By definition of union, there exists j ∈ I such that
α ∈ U j . Now by definition of an open set, there exists  ∈ R+ such that
B (α) ⊆ U j . On the other hand, α − 2 < α and so

[0, α − ]
2
is covered by finitely many Ui0 s, since α − 2 ∈ A. Let this set of Ui ’s be
indexed by J ⊆ I where J is finite. Therefore

{U j } ∪ {Ui : i ∈ J}

is a finite cover of [0, α + 2 ]; which is a contradiction as α + 2 > α and


α + 2 ∈ [0, 1].
Therefore, α = 1 and hence [0, 1] = A. It follows then that [0, 1] is
compact. 

6.3 Closed, bounded and compact

Theorem 6.3.1. Every compact subset of a metric spaces [ X, d ] is closed


and bounded.
118 john bamberg

? Proof: To be done in lectures. 

We will see later that the converse holds for Euclidean spaces, though
there are examples of metric spaces that are non-compact but closed and
bounded.

Example 6.3.2. Suppose we have a discrete metric space [ X, d ]. Then the


collection
{B 1 ( x) : x ∈ X}
2
is an open cover of X, and in fact, it is just

{{x} : x ∈ X}.

If X is infinite, then this open cover has no finite subcover. So X is not


compact, but it is closed and bounded.

Revision on images mixed with preimages. Each of these state-


ments can be proved easily from the definitions of image and preim-
age, and we saw some of these properties in the exercises in Chapter 1.
Let f : A → B be a function. Then

(i) U ⊆ f ← ( f (U )) for all U ⊆ A.

(ii) U ⊆ V ⊆ A =⇒ f (U ) ⊆ f (V ).

(iii) X ⊆ Y ⊆ B =⇒ f ← ( X ) ⊆ f ← (Y ).

(iv) f ← ( f ← ( Xi ), where Xi ⊆ B for each i ∈ I.


S S
i∈I Xi ) = i∈I
S S
(v) f ( i∈I Ui ) = i∈I f (Ui ), where Ui ⊆ A for each i ∈ I.

We will be using the above properties in the proof of the next result.

Lemma 6.3.3. Let [ X, d ] and [Y, e] be metric spaces and suppose f : X →


Y is continuous. If K ⊆ X is compact, then f ( K ) is compact in [Y, e].

 Proof: Let U be an open cover for f ( K ). Then


[
K ⊆ f ← ( f ( K )) ⊆ f ← (∪U ) = f ← (U ).
U∈U

Since f is continuous, { f ← (U ) : U ∈ U} is an open cover for K. Since K is


compact, there is a subcover { f ← (U ) : U ∈ V}, where V is a finite subset
of U. Now  
 [  [

f ( K ) ⊆ f   f (U ) ⊆ U
U∈V U∈V

and hence V is a finite subcover of U. Therefore, f ( K ) is compact. 

Theorem 6.3.4 (Heine-Borel-Dirichlet Theorem). A nonempty subset K of


Rn is compact if and only if K is closed and bounded.

† Proof: Beyond the scope of this course. 


fundamental concepts in mathematics 119

For example, the n-sphere is compact as it is a closed and bounded


subset of Rn . Also, every closed interval of R is compact, the torus is
compact in R3 , and the Klein bottle is compact in R4 .
For metric spaces, we have the following generalisation of Heine-Borel,
whose proof is also beyond the scope of this course. We say that a subset
K is totally bounded if it can be covered with finitely many balls of equal
radius r.

Theorem 6.3.5 (Generalised Heine-Borel-Dirichlet Theorem). A nonempty


subset K of a metric space [ X, d ] is compact if and only if K is complete
and totally bounded.

6.4 Aside: Brouwer’s fixed-point theorem

One of the successes of abstracting to the compact world is a truly beau-


tiful theorem due to Brouwer, known universally as Brouwer’s fixed-point
theorem. It is perhaps better known as the device upon which Nash’s Equi-
librium Theorem in game theory is based, along with its far sweeping
consequences in analysis and topology: the Jordan Curve Theorem, The
Hairy Ball Theorem and the Borsak-Ulam Theorem. 1 1
Luitzen E. J. Brouwer (1881 – 1966)
was one of the best dutch mathematicians
Theorem 6.4.1 (Brouwer’s fixed-point theorem). Every continuous func- of all time, and he’s work transformed
the disciplines of topology, measure
tion f from a convex compact subset K of a Euclidean space to K itself has theory and analysis. He is also famous
a fixed point. for his contributions to the philosophy
of mathematics, founding the intuitionist
There are many interesting consequences of this theorem where f is movement.
thought of as a deformation of a geometric object:

1. No matter how much you stir a jar of honey, some point in the liquid
will end up in exactly the same place in the glass as before.

2. Take a map of Perth, and suppose that that map is laid out on a table
inside Perth. There will always be a point on the map which represents
that same point as its own position in Perth.

3. The game Hex cannot end in a draw.

4. If you want to go to sleep while standing up in a train that travels along


a perfectly straight track, there is some starting angle that will cause you
not to fall over.

6.5 Exercises

Exercise 6.5.1. Wherever possible, give a proper subcover of the following


covers of R. Justify your answer.

(a) {(−n, n) : n ∈ N}

(b) {( x − 2, x + 2) : x ∈ Z}

(c) {( x − 1, x + 1) : x ∈ Z}

Exercise 6.5.2. Which of the following subsets of R2 are compact, under


the usual topology?
120 john bamberg

(a) {( x, y) ∈ R2 : x2 + y2 = 1}

(b) {( x, y) ∈ R2 : x2 + y2 6 1}

(c) {( x, y) ∈ R2 : x2 + y2 < 1}

(d) {( x, y) ∈ R2 : x2 + y2 > 1}

(e) {( x, y) ∈ R2 : x2 − y2 6 1}

Exercise 6.5.3. Let A have the discrete topology. Which subsets of A are
compact? Give a proof.
Exercise 6.5.4. Below, we look at an example in the metric space on the
closed interval [0, 1].

(a) Show that the sets


" # " #
1 1 1 1
Ui := 0, − ∪ + ,1 , i = 4, 5, . . . ,
π i π i

form an open cover of Q ∩ [0, 1].

(b) Does {Ui } have a finite subcover of Q ∩ [0, 1]?

Exercise 6.5.5. Let [ X, d ] be a metric space and let K be a subset of X.


Show that if every continuous function f : K → R is bounded, then K is
compact.
7
The twain shall meet

One of the most important constructs in mathematics are the so-called


p-adic numbers, which find themselves in all sorts of areas of mathematics
from algebraic number theory to finite combinatorics. It takes a course such
as this to understand what the p-adic numbers are. They are a combination
of algebraic concepts and analytic ones, needing the notion of a completion
of a metric space and yields a fundamental example of a ‘local field’ in
abstract algebra.

7.1 p-adic metric and the p-adic numbers

For x
y ∈ Q, suppose we have integers a, b, n so that
x a
= pn
y b
so that neither a nor b are divisible by p. Then the p-adic valuation of yx ,
written |x/y| p is defined to be

|x/y| p := p−n .

By convention, |0| p = 0. Another way to see the p-adic valuation is to look


at integers first. If z ∈ Z, then |z| p is p−n where n is the largest power of p
so that pn divides z. Then for a rational number, yx we need not worry about
reducing its expression and simply define |x/y| p := |x| p /|y| p .

Example 7.1.1.

• |5|5 = 1
5 whereas |50|5 = 1
25 .

• |1/2|5 = 1 and note that |20/40|5 = |20|5 /|40|5 = 1/5


1/5
= 1.

When we defined a norm in Chapter 4, the positive homogeneity condi-


tion was respect to the absolute value norm on R. A more general definition
of norm is over a normed field where there is a more general definition
of ‘absolute value’. Here, we replace the absolute value with the p-adic
valuation, and we end up with a more general norm on Q.

Lemma 7.1.2. The p-adic valuation defines a norm on the rational num-
bers.

 Proof (sketch):
122 john bamberg

(a) Non-degeneracy: By convention, |0| p = 0, and we cannot have


|v| p = 0 when v , 0.

(b) Pos. homogeneity: For all λ, v ∈ Q, we have |λv| p = |λ| p · |v| p .

(c) Triangle inequality: To show that for all u, v ∈ Q, we have |u +


v| p 6 |u| p + |v| p , it suffices to show something stronger: |pn (a/b) +
0 0
pn (a0 /b0 )| p 6 max{1/pn , 1/pn }.

We then define the p-adic metric d p on Q by

d p (u, u0 ) := |u − u0 | p .

1
Example 7.1.3. Notice that d7 (8, 1) = |7|7 = 7 whereas d7 (100, 1) =
|99|7 = 1. But then d7 (99, 1) = |98|7 = 712 .

Example 7.1.4. We will show that the harmonic series

1 + p + p2 + p3 + . . .

converges in [Q, | · | p ]. Let xn be the partial sum ni=0 pi for each integer
P

n > 0. Using the formula of a geometric series we see that

xn − (−1) = pn+1 .

and so |xn − (−1)| p = p−(n+1) .


Let  > 0. Choose N to be the next integer larger than − log p ( ) − 1 and
suppose n > N. Then
n + 1 > − log p ( )

and thus p−(n+1) < . Therefore, p−(n+1) <  and

|xn − (−1)| p < .

Hence, {xn } converges to −1!

7.2 A bit more than what you know

Recall that we can write an integer in base 2 by listing 0’s and 1’s. So for
example, the number 22 can be written in base 2 as

101102

because 20 = 16 + 4 + 2 and we put zeros for where 8 and 1 appear. In


base 3, we write 22 as
2113

since 22 = 2 · 32 + 1 · 31 + 1 · 30 . So we have used a finite series to


represent integers.
Any positive integer can be written in base p as
n
X
ai pi
i=0
fundamental concepts in mathematics 123

where each ai ∈ Z p . Can we do the same for positive rationals? Let us


3
consider the fraction 13 . We can use negative powers of 3 to give a base 3
expansion of this rational number:
1 1 1 1 1
0· +2· +0· +0· +2· +···
3 9 81 243 729
which we can write as
0.02002002002 . . .3

Notice that the base 3 expansion here does not terminate, but it is periodic.
Now [Q, | · |] is not complete, but often the partial sums ∞ −i
P
i=−n ai p (where
ai ∈ Z p ) converge to a rational number.

Lemma 7.2.1. Let p be a prime number. Any positive rational number can
be written in the form
X∞
ai p−i , ai ∈ Z p
i=n
where convergence of the series above is given by the Euclidean metric.

Except for the expressions where ai = p − 1 for all i beyond a certain


point, the base p expansion of a rational number is unique.
Notice that the series expansion of a rational number in terms of powers
of p is valid since we have been careful to prove that the result is something
that converges with respect to the absolute value metric. We will see now,
that if we use a different metric, something similar but “upside-down” hap-
pens. For the so-called p-adic expansion of a rational number, we use the
p-adic metric, which in some sense inverts the distances given by the abso-
lute value metric. So instead of summing from negative infinity, we need
to begin our summation at a determinate value, and continue to positive
infinity.

Lemma 7.2.2. Let p be a prime number. Any positive rational number can
be written uniquely in the form

X
ai pi
i=n

where each ai ∈ Z p , and convergence of the series above is given by the


p-adic metric.

7.3 Completing the rationals

With respect to the p-adic metric, the rational numbers are not complete,
and we will see this by exhibiting a particular example. Consider the fol-
lowing sequence:
n
X
p, p + p2 , p + p2 + p3 , . . . xn := pi .
i=1

Then {xn } is a Cauchy sequence but it does not converge.

Proof. Let  > 0. We want to find an N ∈ N such that if n > m > N, then
d ( xm , xn ) < . Choose N to be the next largest integer after log p (1/ ) − 1.
124 john bamberg

It turns out that if n > m > N, then pm1+1 < . Now n−m−1 pi is coprime to
P
i=0
p and so it has p-adic valuation 1. Therefore,

n−m−1
X n
X
1/pm+1 = |pm+1 | p ·

pi =
pi
i=0 i=m+1
p p

X n X m
= pi − pi = |xn − xm | p .
i=1 i=1

p

Hence d p ( xn , xm ) <  and {xn } is a Cauchy sequence.


However, this sequence does not converge in [Q, d p ]. For a proof by
contradiction, suppose {xn } converges to q, and write q = pt ba where a, b
are coprime to p (we won’t need to consider the case that q = 0, because
it is clear that the sequence does not converge to 0). Let  = 1p . Then there
exists N ∈ N such that for all n > N, we have |xn − q| p < 1p . If t > 1, then
p2 divides xn b − pt a, which is a contradiction as xn is divisible by p, and
the difference between 1p xn b and pt−1 a is coprime to p. Similarly, if t 6 0,
then p2 divides xn bp−t − a. So consider the remaining case t = 1. Then
1
|xn − q| p = |xn b − pa| p = (1 + p + . . . + pn−1 )b − a p
p


Now we come to the culmination of this set of notes; the definition of


the p-adic numbers.

Definition 7.3.1 (p-adic numbers). The equivalence classes of Cauchy


sequences in [Q, d p ] is the set of p-adic numbers and we denote it by Q( p) . The standard notation for the p-adic
The closed ball of radius 1 about 0 is the set of p-adic integers, which we numbers is usually Q p , and for the p-adic
integers, Z p . However, we have been using
denote by Z( p) . Z p for the field of order p, and so instead,
we change the notation usually used for
We can add and multiply p-adic numbers just as we did when we con- p-adic integers.
structed the real numbers as Cauchy sequences of rationals.

Lemma 7.3.2. The p-adic numbers form a ring, and the p-adic integers are
a subring of Q( p) .

Theorem 7.3.3. The set of p-adic numbers are uncountable.


8
Appendix

In this course you will see proofs of statements which have layers of diffi-
culty. Most of the proofs you’ve seen so far have been of statements such as
the sum of two odd numbers is an even number or for every x > 4, we have
2 x > x2 . Now you will statements such as

for every b ∈ B, there exists an element a ∈ A, such that f (a) = b.

You might recognise the above as being the definition of an onto function.
A similar type of statement was encountered in first year when we saw the
definition of limit. If you break down the statement into its fundamental
pieces, that is, into its quantifiers and clauses, you will see how to do the
proof.

Proving “(∀b ∈ B)(∃a ∈ A) Property(a, b)”:

This type of proof is used to prove that a function is onto or that a function
is continuous at a point.

Proof. Suppose b ∈ B. We want to find an element a ∈ A such that


Property(a, b).

Do some work somewhere else to figure out what a should be.

Choose a ∈ A. Then ...


..
.
Therefore, Property(a, b). 

Converting “P =⇒ Q” to a “for all” statement

A handy thing to remember when proving statements like P =⇒ Q is that


logically, this is equivalent to proving

for all instances when P holds, we also have Q.


126 john bamberg

Proving “(∀x, y ∈ A) P( x, y) =⇒ Q( x, y)”:

Here we have statements P( x, y) and Q( x, y) depending on x and y. The


definition of a one-to-one function is an example of a statement with this
shape, and we can convert it to the following:
“(∀x, y ∈ A; satisfying P( x, y)) Q( x, y)”
So this is how we write the proof.

Proof. Suppose x, y ∈ A and suppose P( x, y) holds. Then ...


..
.
Therefore Q( x, y) holds. 

Set equality

Two sets A and B are equal if they have the same elements. That is,

A = B ⇐⇒ A ⊆ B and B ⊆ A.

Proving containment

To prove A ⊆ B, we show that


if a ∈ A, then a ∈ B.

Example: Let A be the set of positive integers which are divisible


by 4, and let B be the set of integers which are greater than or equal to
−10. We will show that A ⊆ B.

Proof. Let a ∈ A. Then 4 divides a and hence 4 6 a. So −10 6 a and


hence a ∈ B. Therefore A ⊆ B. 

Some set constructions


The union of two subsets A and B of a set X, is the set

A ∪ B = {x ∈ X : x ∈ A or x ∈ B}.

The intersection of two subsets A and B of a set X, is the set

A ∩ B = {x ∈ X : x ∈ A and x ∈ B}.

The complement of a subset A of a set X, is the set

X\A = {x ∈ X : x < A}.

The power set of a set X, is the set of all subsets of X, and we denote it
P( X ). The Cartesian product of two sets A and B, is the set of all ordered
pairs (a, b) of elements a ∈ A and b ∈ B, and we write this set as

A × B = {(a, b) : a ∈ A and b ∈ B}.


fundamental concepts in mathematics 127

De Morgan’s Laws for sets:


Let A and B be subsets of a set X. Then

• X\( A ∩ B) = ( X\A) ∪ ( X\B),

• X\( A ∪ B) = ( X\A) ∩ ( X\B),

• A ⊆ B ⇐⇒ X\B ⊆ X\A.
9
Index

addition modulo n, 42 contraction map, 111 lemma:Euclid’s Lemma for polynomials,


algebraic integer, 72 coprime, 34 65
algebraic number, 71 countable, 20 limit of a sequence
arithmetic progression, 37 cover, 116 in a metric space, 98
associative, 19 linear diophantine equation, 43
Dedekind cuts, 106 linear operator, 90
Bézout’s identity, 32 degree of a polynomial, 62
Bézout’s identity for polynomials, 65 divides relation Mersenne prime, 38
ball for integers, 29 metric, 97
of a metric space, 98 for polynomials, 63 p-adic, 122
of a normed vector space, 85 domain, 15 discrete metric, 98
Banach space, 103 induced on a subset, 99
bounded equinumerous, 18 Post-Office metric, 98
function, 91 equivalence classes, 57 metric space, 97
set, 91 Euclidean Algorithm, 32 minimal polynomial, 72
for polynomials, 64 multiplication modulo n, 42
canonical factorisation, 36 even number, 14
cardinality, 18 naïve set theory, 13
cartesian product, 12 field, 54 norm, 83
Cauchy sequence, 100 field of fractions, 60 Euclidean, 83
addition of, 103 fixed point, 111 sup norm, 91
equivalence, 104 function, 12, 15
null, 104 bijection, 17 odd number, 14
unit, 104 invertible, 19 open set, 107
zero, 104 one-to-one, 15
Cauchy sequence multiplication of, 103 onto, 16 partition, 56
cells, 56
parts, 56
characteristic function, 23 Gauß’s Lemma, 66 polynomial, 61
closed set, 108 greatest common divisor, 31 content of, 66
closed under an operation, 30 for polynomials, 64 irreducible, 65
codomain, 15
monic, 64
commutative, 19
ideal polynomial ring, 62
compact, 116
principal, 71 power set, 23
complete metric space, 103
image, 15, 19 preimage, 19
composition, 18
infix notation, 12 proofs
congruence
integers modulo n, 42 restricted syntax, 14
modulo n, 41
modulo a polynomial, 67
congruence class, 43 kidney diagram, 15 quotient, 59
congruence classes, 58
continuous maps least common multiple, 34 ring, 53
between normed vector spaces, 87 least upper bound, 84 Russell’s paradox, 13
130 john bamberg

sequence, 95 Eisenstein’s Irreducibility Criterion, LDE Theorem, 43


subfield, 81 67 Prime Number Theorem, 37
sum-of-divisors function, 39 Equivalence Relation Theorem, 58 theorem:Fundamental Theorem of Poly-
Euclid, 39 nomial Arithmetic, 65
theorem Euler, 40 theorem:Least Upper Bound Property, 85
Banach’s Contraction Mapping Theo- Factor Theorem, 65 totally bounded, 119
rem, 111 Fermat’s Little Theorem, 45
Bertrand’s Postulate, 36 Fundamental Theorem of Arithmetic, unit interval, 61
Brouwer’s fixed-point theorem, 119 35 units, 55
Chinese Remainder Theorem, 44 Generalised Heine-Borel-Dirichlet, upper bound, 84
Content Theorem, 66 119
Dirichlet’s Theorem, 38 Green-Tao Theorem, 38 vector space, 80
Division Rule, 31 Heine-Borel-Dirichlet, 118
Division Rule for polynomials, 63 infinitude of primes, 35 well-ordering principle, 26

You might also like