Mathematical Proof and Analysis and Introduction To Abstract Mathematics Michaelmas Term
Mathematical Proof and Analysis and Introduction To Abstract Mathematics Michaelmas Term
MA103
Introduction to Abstract Mathematics
Michaelmas Term
Lecture Notes
These lecture notes were compiled by Peter Allen in 2022. However much of the content
comes from Martin Anthony, Graham Brightwell, Michele Harvey, Jan van den Heuvel and Amol
Sasane.
Contents
1 Introduction 6
1.1 What is this course about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 How to get the most out of this course (and all the other maths courses) 9
1.1.2 Topics covered (MA102, first half of MA103) . . . . . . . . . . . . . . . . . 11
1.2 Moodle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Activities and sample exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2
3
This chapter is intended to tell you what ‘abstract mathematics’ and ‘proof’ mean, and why
you should care about studying them.
1 × 1 + 1 + 1 + 1 + 1 + 4 = 9 = 3 × 3 = (1 + 2) × (1 + 2)
2 × 2 + 2 + 2 + 2 + 2 + 4 = 16 = 4 × 4 = (2 + 2) × (2 + 2)
3 × 3 + 3 + 3 + 3 + 3 + 4 = 25 = 5 × 5 = (3 + 2) × (3 + 2) and so on . . .
These are concrete examples. You probably see that there is a pattern to the answers we get.
We can write it more generally:
x × x + 4 × x + 4 = (x + 2) × (x + 2) .
This is a mathematical statement. It’s something which is either true or false (depending on
what x is). It means the same as the following English:
Choose a number, multiply it by itself, then add your chosen number four times, and finally
add four. You will get the same answer as if you add two to your chosen number to get a new
number, then multiply the new number by itself.
Writing x in an equation, rather than ‘your chosen number’ in an English phrase, is an
example of a (simple) abstraction. Here the purpose is to simplify the presentation. There is
no need to write equations with xs in them; you could do it all in words—and indeed long ago
that is what people did. Of course, it’s hard to get anything done like that. If you show the
6
CHAPTER 1. INTRODUCTION 7
equation to a small child, it won’t mean anything to them, while they can read and understand
the sentence. But once you understand what the symbols in the equation mean, then it’s much
quicker and easier to read or write.
Now we come to proof. Is the statement above (however it’s written) true for some other
values of x than the three we checked by calculation? And if so, why? The purpose of a proof is
not just to be certain that a statement is true. It also explains why a statement is true. As you
probably know, the statement we wrote is true for all integers. Here is a proof.
Proof.
(x + 2) × (x + 2)
=(x + 2) × x + (x + 2) × 2 (multiplication distributes over addition)
=x × x + 2 × x + (x + 2) × 2 (multiplication distributes over addition)
=x × x + 2 × x + x × 2 + 2 × 2 (multiplication distributes over addition)
=x × x + 2 × x + x × 2 + 4 (2 × 2 = 4)
=x × x + 2 × x + 2 × x + 4 (multiplication is commutative)
=x × x + (2 + 2) × x + 4 (multiplication distributes over addition)
=x × x + 4 × x + 4 (2 + 2 = 4)
We can see that each line is equal to the previous one, for any integer x, because of the reason
given on the right. Most of the reasons are axioms—statements which we are assuming to be
true—and a couple are little calculations which you should check. So in particular the first and
last lines are equal for any integer x, in other words the statement
(x + 2) × (x + 2) = x × x + 4 × x + 4
is a true statement. This is a second reason abstraction is important: it is a time- and memory-
saving device. You can prove something once—or remember one fact—in an abstract setting
and use it in many different concrete examples.
Later on, you will see examples of mathematical structures which are not just numbers. For
some of these structures, the two axioms we mentioned above will be true, and (if you can find a
reasonable way of saying what ‘2’ and ‘4’ are!) the above proof still works. For other structures,
one or both of these axioms might not be true, so the proof will not work. That doesn’t mean
the statement is automatically false, but at least you should be suspicious.
CHAPTER 1. INTRODUCTION 8
Actually, you probably already know an example (or, at least, by the time you come to
revision you will know it). We can look at 2-by-2 matrices. Here, it’s reasonable to say that ‘2’
2 0 4 0
should mean the matrix ( ), and ‘4’ should be ( ). Assuming you know how to add and
0 2 0 4
multiply 2-by-2 matrices, you can make sense of the statement ‘(x + 2) × (x + 2) = x × x + 4 × x + 4’
now when x is a 2-by-2 matrix. Does the proof we gave still work, and is the statement true?
Well, multiplication of matrices does still distribute over addition, and the two small cal-
culations do still work. But matrix multiplication is not commutative; you can find pairs of
matrices where the order you multiply them makes a difference to the answer. So the proof does
not work.
2 0
But it happens (luckily!) to be the case that multiplication of any 2-by-2 matrix by ( ) does
0 2
commute (think about why!) and since the only place we used commutativity of multiplication
in our proof above was to say 2 × x = x × 2, we can make our proof work by changing the
2 0
reason ‘multiplication is commutative’ to ‘multiplication by ( ) is commutative’. Phew! The
0 2
statement is still true for 2-by-2 matrices, and we can prove it.
However, you should be a bit careful with matrices. Is it true that
2 2 2
⎛ 0 1 0 0 ⎞ 0 1 0 1 0 0 0 0
( )+( ) =( ) +2⋅( )⋅( )+( ) ?
⎝ 0 0 1 0 ⎠ 0 0 0 0 1 0 1 0
This looks like the same ‘expanding out the brackets’ that we just did, but (if you try to
mimic the proof above) you’ll see that there is a step where you would like to say that
0 1 0 0 0 0 0 1
( )⋅( ) = ( )⋅( ), i.e. that these two matrices commute. They don’t, and
0 0 1 0 1 0 0 0
this is where the calculation goes wrong.
Next term, we’ll give axiomatic definitions of a group and a vector space, and start proving
theorems about abstract groups (and vector spaces). Here ‘abstract’ means we don’t assume
anything about the group except the axioms. This will seem painful and useless at first: you’ll
(by then) know a few concrete examples of groups and of vector spaces. It will usually be easier
to see how to prove the theorems for the concrete examples. Usually you will have some idea
already why the theorems should be true in the examples, while you won’t have much intuition
for how abstract groups behave. The natural response will be that you don’t want to study
abstract groups, you want to work with the concrete examples you know. But this is the wrong
reaction. The reason is that you will then only learn about the concrete examples you already
know, and you will suffer as soon as in future courses you see new examples of groups and are
expected to immediately know a bunch of facts about them (and also in the exam, where we
will likely test your ability to work with a new example of an abstract structure).
Finally, let’s return to proof. Why should you care that you can mathematically prove a
statement, when it’s obviously true (like the one above) or when you can check lots of cases and
become convinced?
First, we will not generally be interested in proving obvious statements, we will rather be
trying to prove statements which aren’t obvious. We will discuss later what exactly that word
‘obvious’ means, and we will see lots of examples of statements where you don’t immediately see
whether it is true or false, or how to decide which.
Second, what do you learn from checking cases? If you are trying to find out whether a
claim is true or false, it’s a good idea to start checking cases. That might give you an idea why
the claim is true, or find out if it is complete nonsense. But what if the statement is true for
CHAPTER 1. INTRODUCTION 9
most cases, but there are some special cases where it goes wrong? You most likely won’t find
them. Similarly, if you’re writing a computer program (a likely part of your future career!) and
your program works most of the time, but you don’t consider some special cases (‘edge cases’ is
the jargon), you might end up writing a program which causes a disaster—not at the level of
say crashing a plane, because such programs are checked in detail, but you could easily find
your automated trading program has lost your bank a lot of money and you your job. To avoid
that, you need to learn how to keep an overview of a complicated problem: which parts have I
checked, and what is still left that could go wrong? Learning to write formal proofs is a good
way to train.
In this course, we need to work with precise definitions and statements, and you will need to
know these. Not only will you need to know these, but you will have to understand them, and
be able (through the use of them) to demonstrate that you understand them. Simply learning
the definitions without understanding what they mean is not going to be adequate. I hope that
these words of warning don’t discourage you, but I think it’s important to make it clear that
this is a subject at a higher conceptual level than most of the mathematics you are likely to have
studied before. This does not mean it is incredibly hard and you will struggle. It is not incredibly
hard, and you are quite capable of doing well in this course (or you would not be here). It does
mean, though, that if you are used to getting through school courses by memorising material
without understanding it, then now is the time to change that (and, by the way, no-one will
hire you for your memorisation ability—a computer does that better!).
One of the standard problems students have in this course is around what it means to
‘know precise definitions’. We will be using English language—not, for the most part, logical
symbols—to define various concepts. If you know the string of words as it appears here by heart,
then, yes, you know the precise definition. But most likely I will not be completely consistent,
and certainly your textbooks and other courses will use slightly different strings of words for the
same concept. What will be changed will turn out to be things that do not alter the meaning of
the concept—you’re completely used to the idea that there are words one can change without
changing the intent of a sentence in English. Mathematical English is, however, a bit more picky
than the usual spoken English; there are some words which you cannot change, and in particular
the order of words is often important. I’ll highlight this when it gets relevant in the course. For
now, if you’re not sure whether two sentences mean the same thing, that tells you you don’t
understand either of them and you need to think a bit more and look at the examples.
1.1.1 How to get the most out of this course (and all the other
maths courses)
There are two theories about mathematical ability (and intelligence in general). One theory says
that you have what you are born with. The other says that (just like strength or stamina) it’s
something you develop by practice. Various studies have shown that broadly similar number
of students believe each theory, but the ones who believe ability is something you develop are
consistently the ones who do better—and almost all academic mathematicians believe ability is
something you learn and train.
Some people are faster than others, but speed is in the end not all that useful: no matter
how fast you are, if you switch off and coast for a while, you will have trouble catching up with
people who pay attention and work on understanding their courses. In particular—and this
is different to school maths—we will always assume you understood the previous lectures and
courses, and we will use things from those previous lectures and courses all the time. If you do
understand the previous material—even if you are not so fast—you’ll understand a good deal of
the current lecture (maybe all of it, maybe not quite) in real time, and you won’t need to spend
CHAPTER 1. INTRODUCTION 10
much time after the lecture going over the material. If you don’t really understand the previous
material, you won’t have a chance to understand large parts of the lecture and you’ll have to do
even more work afterwards to catch up.
In this course, all the theory will be introduced in the lectures, together with some examples.
There will be extra examples sessions (which don’t exist in most courses—don’t expect them!),
which you do not have to attend but which may well be useful.
In the lectures and examples sessions, you should be trying to understand what is going
on. Don’t waste time copying things down which will appear on Moodle (which is essentially
everything). Certainly don’t waste time with a newspaper or games on your phone, which annoys
me and distracts your classmates. No-one is taking attendance in the lectures; if you’re not going
to pay attention, go to a café instead. If you do not understand something I said or wrote, then
probably either I didn’t explain it properly or I made a mistake, so you should ask questions
(louder, or put your hand up, if I don’t hear). When I ask a question, I really want an answer.
Probably you need to think about the question to answer it, so I will wait until someone does
answer.
For many of you there will be some point in the lectures where you do not immediately
understand what I say, and when you ask a question the answer is still not very useful. You
should keep asking me to explain better in such a situation. It is possible that I will eventually
say that I want to move on and you should think about it after the lecture. That doesn’t mean
I think you are stupid, it generally means I am failing to understand what exactly you don’t
understand, or maybe I do understand but cannot think of a good way to explain it on the spot.
In either case, if you try to formulate clearly exactly what it is that you do not like (which will
take time, which is why you should do it after the lecture), you will probably find that doing so
also helps you figure out what is going on; once you understand something deeply in this way
you will not forget it. But it would still be useful to tell me about it after the lecture, so that I
can improve the lectures for next year (and if possible give some more explanation on Moodle
directly).
There will also be problems set every week, some online (for which you’ll see results immedi-
ately) and some which you will solve and hand in to your class teacher, who will mark them
and discuss in the next class. The class work will be marked, and in addition it will contribute
10% to your eventual course grade.
The purpose of the problems is for you to practice and check you really know what is going
on. If you get stuck, hand in half a solution with ‘I don’t know what to do next’ and your class
teacher will tell you (either written on the work, or maybe many people were stuck in the same
place and the class teacher will go over it in class; usually then there will be a short comment
like ‘Will discuss in class’). Then you learn something. If you don’t hand anything in, or you
only hand in the problems you could solve, you don’t learn anything. The written comments on
your work, and the explanations in class, are the most important piece of feedback you get—but
you only get it if you show us something on which we can give feedback. On that note—please
do not copy work from someone else (or from last year’s solutions). Doing this is a waste of
your time and ours, and it is plagiarism which can potentially land you in serious trouble. Your
mark for each week will reflect how well your class teacher feels you did on the exercises.
The contribution to the course grade is different. Each week, your work will be either judged
as acceptable or not—this is a binary system, you don’t get extra points for amazing work—and
to get all of the 10% course grade, you need sufficiently many acceptable pieces of work, handed
in on time, over the term.
There are two ways in which a piece of work can be judged acceptable. One is if you have a
‘Satisfactory’ or better grade. The other is if you do not have such a grade, but you have made a
serious attempt at all the questions. This is defined to mean: you have written down all the
definitions (often there will be only one) relevant to that question.
CHAPTER 1. INTRODUCTION 11
The intention of this contribution to the course grade is to reward students who keep up
with the course and make some effort to learn actively. If you know how to do most exercises,
you’re guaranteed to get the ‘acceptable’. If you don’t, then the first thing you should do is to
write down the definitions, not just because this will guarantee you your ‘acceptable’, but also
(and mainly) because the most common reason why students cannot do exercises is that they
do not know what the exercise is actually asking—writing down the definitions will often give
you an idea of how to get on with the solution.
The only way to fail to get the 10% for coursework is for you to decide that it is not worth
your time to make any serious attempt at the classwork. In recent years we noticed students
increasingly doing this, usually then telling us that they are ‘a bit behind and need to catch up’,
or that they will ‘do all the questions in revision’. Usually, these students failed their exams; we
hope that if you are thinking of studying ‘school style’, even if you believe that you personally
will be able to make it work, you will at least recognise that throwing away 10%—an entire class
grade—is a bad move.
Finally, there are office hours and the Maths Support Centre. If you don’t understand
something, you should first try to figure it out for yourself—if you manage, then you won’t
forget it (and you should be happy with yourself). But if you get stuck, then you should not
wait and hope that it magically gets clear. It probably will not, and you will suffer because you
don’t understand something I am assuming you do understand in my lectures. So go to office
hours or the Support Centre and ask questions. You have already paid for those office hours;
use them. You can also try talking with your friends on the course and seeing if you can figure
out what’s going on—group work can be fun and productive.
1.2 Moodle
All information and materials for this course are on Moodle:
http://moodle.lse.ac.uk/course/view.php?id=1989
On the course Moodle page, you will find assignments, solutions, lecture notes, and so on.
1.3 Reading
These notes are intended to be a comprehensive treatment. That means, I think you should not
need to buy or borrow any textbooks for this course.
However, you might disagree. If you don’t like my writing style, or you want to understand
a particular topic better, try looking at a textbook. If you want more exercises, and you are
actually going to do the exercises, look at a textbook. If you want more exercises in order to
read the solutions, you’re wasting your time!
There are many books that would be useful for this subject, since abstract mathematics is a
component of all university-level mathematics degree programmes I know of.
For the first half of the course (the part covered by these notes), the following two books are
recommended, and most chapters of the notes will start with a reference to the corresponding
chapters in these two books.
• Biggs, Norman L., Discrete Mathematics, Second edition. (Oxford University Press, 2002).
[ISBN 0198507178].
• Eccles, P.J., An Introduction to Mathematical Reasoning: numbers, sets and functions. (Cam-
bridge University Press, 1997). [ISBN 0521597188].
There is one topic that neither of these covers, which is the topic of Complex Numbers.
However, this is a topic that is well-covered in a number of other textbooks—look around.
2.1 Introduction
In this course, we want to make precise mathematical statements and establish whether they
are true or not—we want to prove things. But for that, we have to first understand what a
proof is. We will look at fairly simple types of mathematical statement, in order to emphasise
techniques of proof. Some of these statements are going to be interesting, others are not so
interesting—bear in mind that what you are doing in this part of the course is learning the rules
of the game: the play (and more of the fun) comes later.
In later chapters (such as those on numbers, analysis and algebra) we will use these proof
techniques extensively. You might think that some of the things we prove in this chapter are
very obvious and hardly merit proving, but proving even ‘obvious’ statements can be quite
tricky sometimes, and it is good preparation for proving more complicated things later.
13
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 14
(a) 20 is divisible by 4.
(c) 21 is divisible by 4.
(d) 21 is divisible by 3 or 5.
(f) n2 is even.
(k) For natural numbers n, the number n2 is even if and only if n is even.
√
(l) There are no natural numbers m and n such that 2 = m/n.
These are all mathematical statements, of different sorts (all of which will be discussed in
more detail in the remainder of this chapter).
Statements (a) to (e) are straightforward propositions about certain numbers, and these are
either true or false. Statements (d) and (e) are examples of compound statements. Statement (d)
is true precisely when either one (or both) of the statements ‘21 is divisible by 3’ and ‘21 is
divisible by 5’ is true. Statement (e) is true precisely when both of the statements ‘50 is divisible
by 2’ and ‘50 is divisible by 5’ are true.
Statement (f) is different, because the number n is not specified and whether the statement
is true or false will depend on the value of the so-called free variable n. Such a statement is
known as a predicate.
Statement (g) makes an assertion about all natural numbers and is an example of a universal
statement.
Statement (h) asserts the existence of a particular number and is an example of an existential
statement.
Statement (i) can be considered as an assertion about all even numbers, and so it is a
universal statement, where the ‘universe’ is all even numbers. But it can also be considered as
an implication, asserting that if n happens to be even, then n2 is even.
Statement (j) is a universal statement about all odd numbers. It can also be thought of (or
rephrased) as an implication, for it says precisely the same as ‘if n is odd, then n2 is odd’.
Statement (k) is an ‘if and only if’ statement: what it says is that n2 is even, for a natural
number n, precisely when n is even. But this means two things: namely that n2 is even if n is
even, and n is even if n2 is even. Equivalently, it means that n2 is even if n is even and that n2
is odd if n is odd. So statement (k) will be true precisely if (i) is true for all natural numbers,
and (j) is true.
Statement (l) asserts the non-existence of a certain pair of numbers (m, n). Another way
of thinking about this statement is that it says that for all choices of (m, n), it is not the case
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 15
√
that m/n = 2. (This is an example of the general rule that a non-existence statement can be
thought of as a universal statement, something to be discussed later in more detail.)
It’s probably worth giving some examples of things that are not proper mathematical
statements.
‘6 is a nice number’ is not a mathematical statement. This is because ‘nice number’ has
no mathematical meaning. However, if, beforehand, we had defined ‘nice number’ in some way,
then this would not be a problem. For example, suppose we said:
Let us say that a number is nice if it is the sum of all the positive numbers that
divide it and are less than it.
Then ‘6 is a nice number’ would be a proper mathematical statement, and it would be true,
because 6 has positive divisors 1, 2, 3, 6 and 6 = 1 + 2 + 3. But without defining what ‘nice’ means,
it’s not a mathematical statement. Definitions are important1 .
‘n2 + n’ is not a mathematical statement, because it does not say anything about n2 + n.
It is not a mathematical statement in the same way that ‘Boris Johnson’ is not a sentence: it
makes no assertion about what Boris Johnson did or did not do to get thrown out. However,
‘n2 + n > 0’ is an example of a predicate with free variable n and, for a particular value of n, this
is a mathematical statement. Likewise, ‘for all natural numbers n, n2 + n > 0’ is a mathematical
statement.
Finally, anything which does not make sense as an English sentence is not a mathematical
statement. We will use lots of symbols—some you know, like =, some you don’t yet, like ∀—which
all mean some English word or words. It’s easy to write something with symbols that, when you
read it out, doesn’t make sense. If when you read your work out, you are saying something like
‘five is true’ or ‘for every integer n we have n = 2’, something is wrong. Figure out what you
meant to write, then write that.
Now that we’re all clear on exactly what the statements mean, let’s see which ones are true
and prove them.
(a) 20 is divisible by 4.
This statement is true. Since 20 = 5 × 4, we see that (by the definition) 20 is divisible by 4.
And that’s a proof! It’s utterly convincing, watertight, and not open to debate. Nobody can
argue with it, not even a sociologist! Isn’t this fun? Well, maybe it’s not that impressive in
such a simple situation, but we will certainly prove more impressive results later.
(c) 21 is divisible by 4.
This is false, as can be established in a number of ways. First, we note that if the natural
number m satisfies m ≤ 5, then m × 4 will be no more than 20. And if m ≥ 6 then m × 4
will be at least 24. Well, any natural number m is either at most 5 or at least 6 so, for all
possible m, we do not have m × 4 = 21 and hence there is no natural number m for which
m × 4 = 21. In other words, 21 is not divisible by 4. Another argument (which is perhaps
more straightforward, but which relies on properties of rational numbers rather than just
simple properties of natural numbers) is to note that 21/4 = 5.25, and this is not a natural
number, so 21 is not divisible by 4. (This second approach is the same as showing that 21
has remainder 1, not 0, when we divide by 4.)
Most of you are probably completely happy with these proofs. Maybe one or two of you
would like to know things like: why is there no natural number between 5 and 6? Do we need
to prove it? We’ll get to that next term; for now, don’t worry about it.
(d) 21 is divisible by 3 or 5.
As we noted above, this is a compound statement. It is true precisely if one (or both) of the
following statements is true:
(i) 21 is divisible by 3
(ii) 21 is divisible by 5.
Statement (i) is true, because 21 = 7 × 3. Statement (ii) is false. Because at least one of these
two statements is true, statement (d) is true.
(i) 50 is divisible by 2
(ii) 50 is divisible by 5.
Statements (i) and (ii) are indeed true because 50 = 25 × 2 and 50 = 10 × 5. So statement (e)
is true.
(f) n2 is even
As mentioned above, whether this is true or false depends on the value of n. For example,
if n = 2 then n2 = 4 is even, but if n = 3 then n2 = 9 is odd. So, unlike the other statements
(which are propositions), this is a predicate P (n). The predicate will become a proposition
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 17
when we assign a particular value to n to it, and the truth or falsity of the proposition can
then be established. You probably implicitly assume that n has to be a natural number,
but there isn’t actually anything in the statement to tell you that—maybe n is a matrix, in
which case it’s not even clear what ‘even’ should mean for a matrix (we only defined ‘even’
for natural numbers). If we assume n is a natural number, then (i) and (j) cover all the
possibilities.
Because k(2k + 1) is an integer, this shows that n2 + n = n(n + 1) is divisible by 2; that is, it
is even. We supposed here that n was even. But it might be odd, in which case we would
have n = 2k + 1 for some integer k. Then
These examples hopefully demonstrate that there are a wide range of statements and proof
techniques, and in the rest of this chapter we will explore these further.
Right now, one thing I hope comes out very clearly from these examples is that to prove a
mathematical statement, you need to know precisely what it means. Well, that sounds obvious,
but you can see how detailed we had to be about the meanings (that is, the definitions) of the
terms ‘divisible’, ‘even’ and ‘odd’.
Something you can also notice is that we like to come up with special names to distinguish
things even when it’s ‘unnecessary’. For example, we talked about ‘propositions’ and ‘predicates’
as being different types of statement; why bother with these two funny words? Right now,
this no doubt feels like me inventing more words that you have to learn for no good reason.
If I write down one of the statements above, you’ll be able to see immediately whether it is a
simple true-or-false statement (a proposition) or whether there is some free variable in it and
its truth could depend on the value of the free variable (a predicate). However later, when we
are dealing with more complicated statements and have to explain something difficult, it will
be useful for me to be able to say ‘consider the proposition ...’ and ‘we have the predicate ...’
and expect that these words have made your life easier—you know already that what is coming
should be respectively a true-or-false statement, and have a free variable (or two) in it. Quite a
lot of mathematical vocabulary and notation is there ‘to help the reader’. It will always look
unnecessary when it’s introduced, because that will always be in a simple situation where what
is intended is obvious. We will never test you on it (there will not be an exam question asking
which of the following statements are predicates), but knowing it will help you understand and
write mathematics better.
2.3.1 Negation
The simplest way to take a statement and form another statement is to negate the statement.
The negation of a statement P is the statement ¬P (sometimes just denoted ‘not P ’), which is
defined to be true exactly when P is false. This can be described in the very simple truth table,
Table 2.1:
P ¬P
T F
F T
What does the table signify? Quite simply, it tells us that if P is true then ¬P is false and if
P is false then ¬P is true.
Example 2.1. If P is ‘20 is divisible by 3’ then ¬P is ‘20 is not divisible by 3’. Here, P is false
and ¬P is true.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 20
It has, I hope, been indicated in the examples earlier in this chapter, that to disprove a
universal statement about natural numbers amounts to proving an existential statement. That
is, if we want to disprove a statement of the form ‘for all natural numbers n, property p(n) holds’
(where p(n) is some predicate, such as ‘n2 is even’) we need only produce some N for which
p(N ) fails. Such an N is called a counterexample. Equally, to disprove an existential statement
of the form ‘there is some n such that property p(n) holds’, one would have to show that for
every n, p(n) fails. That is, to disprove an existential statement amounts to proving a universal
one. But, now that we have the notion of the negation of a statement we can phrase this a little
more formally. Proving that a statement P is false is equivalent to proving that the negation
¬P is true. In the language of logic, therefore, we have the following:
More precisely,
• The negation of the universal statement ‘for all n, property p(n) holds’ is the existential
statement ‘there is n such that property p(n) does not hold’.
• The negation of the existential statement ‘there is n such that property p(n) holds’ is the
universal statement ‘for all n, property p(n) does not hold’.
We could be a little more formal about this, by defining the negation of a predicate p(n) (which,
recall, only has a definitive true or false value once n is specified) to be the predicate ¬p(n)
which is true (for any particular n) precisely when p(n) is false. Then we might say that
• The negation of the universal statement ‘for all n, the statement p(n) is true’ is the existential
statement ‘there is n such that ¬p(n) is true’.
• The negation of the existential statement ‘there is n such that p(n) is true’ is the universal
statement ‘for all n, the statement ¬p(n) is true’.
Now, let’s not get confused here. None of this is really difficult or new. We meet such logic
in everyday life. If I say ‘It rains every day in London’ then either this statement is true or
it is false. If it is false, it is because on (at least) one day it does not rain. The negation (or
disproof) of the statement ‘On every day, it rains in London’ is simply ‘There is a day on which
it does not rain in London’. The former is a universal statement (‘On every day, . . . ’) and the
latter is an existential statement (‘there is a day . . . ’). Or, consider the statement ‘There is a
student who enjoys reading these lecture notes’. This is an existential statement (‘There is . . . ’).
This is false if ‘No student enjoys reading these lecture notes’. Another way of phrasing this
last statement is ‘Every student reading these lecture notes does not enjoy it’. This is a more
awkward expression, but it emphasises that the negation of the initial, existential statement, is
a universal one (‘Every student . . . ’).
The former is an existential statement (‘there is something I will write that . . . ’) and the
latter is a universal statement (‘everything I write will . . . ). This second example is a little more
complicated, but it serves to illustrate the point that much of logic is simple common sense.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 21
• 50 is divisible by 2
• 50 is divisible by 5.
Statement (e) is true because both of these two statements are true.
Table 2.2 gives the truth table for the conjunction P and Q:
P Q P ∧Q
T T T
T F F
F T F
F F F
What Table 2.2 says is simply that P ∧ Q is true precisely when both P and Q are true (and
in no other circumstances).
Suppose that P and Q are two mathematical statements. Then ‘P or Q’, also denoted P ∨ Q,
and called the disjunction of P and Q, is the statement that is true precisely when P , or Q, or
both, are true. For example, statement (d) above, which is
‘21 is divisible by 3 or 5’
is the disjunction of the two statements
• 21 is divisible by 3
• 21 is divisible by 5.
Statement (d) is true because at least one (namely the first) of these two statements is true.
Note one important thing about the mathematical interpretation of the word ‘or’. It is always
used in the ‘inclusive-or’ sense. So P ∨ Q is true in the case when P is true, or Q is true, or
both. In some ways, this use of the word ‘or’ contrasts with its use in normal everyday language,
where it is often used to specify a choice between mutually exclusive alternatives. (For example
‘You’re either with us or against us’.) But if I say ‘Tomorrow I will wear brown trousers or I will
wear a yellow shirt’ then, in the mathematical way in which the word ‘or’ is used, the statement
would be true if I wore brown trousers and any shirt, any trousers and a yellow shirt, and also if
I wore brown trousers and a yellow shirt. You might have your doubts about my dress sense in
this last case, but, logically, it makes my statement true.
Table 2.3 gives the truth table for the disjunction P and Q:
What Table 2.3 says is simply that P ∨ Q is true precisely when at least one of P and Q is
true.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 22
P Q P ∨Q
T T T
T F T
F T T
F F F
P Q P Ô⇒ Q
T T T
T F F
F T T
F F T
Table 2.4: The truth table for ‘P Ô⇒ Q’
Note that the statement P Ô⇒ Q is false only when P is true but Q is false. To go back to
the previous example, the statement ‘If it rains, I wear a raincoat’ is false precisely if it does
rain but I do not wear a raincoat.
Warning 2.2. Many students focus on the ‘if the premise is true’ first two lines of the truth
table above, and forget the last two lines. We will need to use all four lines regularly, so do not
do this. Yes, the mathematical Ô⇒ is a bit different to the usual English ‘implies’, but this is
something you simply need to get used to. For the next few months, every time you use Ô⇒ ,
think for a few seconds about whether you have really written what you wanted to write.
The statement P Ô⇒ Q can also be written as Q ⇐Ô P . There are different ways of
describing P Ô⇒ Q, such as:
• if P then Q
• P implies Q
• P is sufficient for Q
• Q if P
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 23
• P only if Q
• Q whenever P
• Q is necessary for P .
All these mean the same thing. The first two are the ones I will use most frequently.
• P is equivalent to Q
P Q P Ô⇒ Q Q Ô⇒ P P ⇐⇒ Q
T T T T T
T F F T F
F T T F F
F F T T T
What the table shows is that P ⇐⇒ Q is true precisely when P and Q are either both true
or both false.
Activity 2.1. Look carefully at the truth table and understand why the values for P ⇐⇒ Q
are as they are. In particular, try to explain in words why the truth table is the way it is.
So far in mathematics, most statements you have seen are ‘if and only if’ statements. In
particular when you rearrange equations, you’re (usually!) saying ‘these two things are equal if
and only if those two things are equal’. In fact, most of the times that you have seen a ‘genuine’
Ô⇒ (I mean, one where it would not be true to write ⇐⇒ ) it’s been as a warning that
something nasty might be around the corner: it’s true that if a = b then a2 = b2 , but it’s not
always true that if a2 = b2 then a = b, so be careful.
That is not how things will be for most of the mathematics you will study, and you will get
used to ‘implies’ being the normal thing. That shouldn’t be surprising. There are usually several
different possible causes for the same effect, so any one of these causes will imply the effect. If
you stay inside, you won’t get sunburnt; if you use sunscreen, you won’t get sunburnt; if you
wear a spacesuit, you won’t get sunburnt. The converse is generally going to be false—it is not
true that if you don’t get sunburnt, then the reason is that you used sunscreen, and stayed
inside, and wore a spacesuit.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 24
Another piece of vocabulary we will sometimes use, when we are told A ⇐⇒ B, is that
A and B are logically equivalent. Spelling it out, we say A and B are logically equivalent if
either they are both true, or they are both false. Generally, we will say things like ‘P is true if
and only if Q is true’ when we need to look at what the statements P and Q actually are—as
mathematical statements, maybe talking about integers—in order to see why the ⇐⇒ is the
case. We will say that A and B are ‘logically equivalent’ if we do not need to understand the
mathematical meaning of the statements at all, we only need to look at the logic. This is ‘to
help the reader’.
Example 2.3. The statements ¬(P ∨ Q) and ¬P ∧ ¬Q are logically equivalent.
To see that this is true, we can draw out the truth tables:
P Q P ∨ Q ¬(P ∨ Q) ¬P ¬Q ¬P ∧ ¬Q
T T T F F F F
T F T F F T F
F T T F T F F
F F F T T T T
We can see that the two bold lines are the same—these two statements are logically equivalent.
So I might say ‘We know that the flobble is not either pretty or quick. It is logically equivalent
to say that the flobble is not pretty, and the flobble is not quick.’—and I presumably want to go
on for a few more lines of argument to tell you something interesting about the flobble. However
what I’ve signalled here is that you do not need to know what a flobble is, nor what it should
mean for one to be pretty or quick, in order to be happy with this particular line of argument.
If on the other hand, I say ‘a graph is bipartite if and only if it contains no odd cycle’ then
I’m signalling that in order to be happy that this statement is true (it is) you will need to look
up definitions of all the funny words in the sentence (don’t do that now!) and do some ‘real
maths’ not ‘just logic’.
Activity 2.2. Show that the statements ¬(P ∧ Q) and ¬P ∨ ¬Q are logically equivalent.
Activity 2.3. What is the converse of the statement ‘if the natural number n divides 4 then n
divides 12’ ? Is the converse true? Is the original statement true?
P Q P Ô⇒ Q ¬P ¬Q ¬Q Ô⇒ ¬P
T T T F F T
T F F F T F
F T T T F T
F F T T T T
If you think about it, the equivalence of the implication and its contrapositive makes sense.
For, ¬Q Ô⇒ ¬P says that if Q is false, P is false also. So, it tells us that we cannot have Q
false and P true, which is precisely the same information as is given by P Ô⇒ Q.
So what’s the point of this? Well, sometimes you might want to prove P Ô⇒ Q and it will,
in fact, be easier to prove instead the equivalent (contrapositive) statement ¬Q Ô⇒ ¬P . You
will see many examples of this through your degree.
Mistake 1 (The theorem doesn’t apply, so its conclusion is false). I’ve just finished (summer
2019) marking MA103 exams in which a large number of students wrote ‘the conditions of
Theorem A are not met, so the conclusion is false’. That is exactly the same error as the midnight
sunscreen advocate: Theorem A is an ‘if P then Q’ statement, and it can perfectly well be that
P is false but (for some other reason) Q is still true. So these answers received zero marks, and
this paragraph has been added.
Summer 2020: The same mistake again. I’ll keep adding to this each year many students
lose marks for this class of error in the exam.
Summer 2021: Well, there were less of these mistakes, but it made the difference between
passing and failing for quite a few.
Summer 2022: We didn’t have a question of this form this year.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 26
If you write down every single step, you’re in a great position if someone wants to argue with
your proof. If someone doesn’t agree with your conclusion—the statement you’re proving—it’s
their problem to find a mistake in your proof. That means they have to point at some statement
in your proof and say that they do not believe it. Now there are two sorts of statements in your
proof: ones which follow logically from earlier statements, and your axioms. If the doubter says
they don’t believe something which follows logically from earlier statements, then they have
to point at one of these earlier statements and say they don’t like that one either (or they tell
you they don’t believe in logic, in which case you can safely stop listening). Eventually they
will either be convinced you were right all along, or they will get back to one of your axioms
and say they disagree with that. Now, if you have some strange non-standard axiom, then there
might even be a good reason to argue. But if you stick to standard axioms, like ‘addition of
natural numbers is commutative’, then no-one is going to argue—which means you will convince
everyone that what you claim is true. This is the gold standard of proof.
The problem with writing down every single step is that it takes a very long time to actually
get anywhere. Look back to the proof on page 7—it takes eight lines to do a piece of algebra
which you would normally write out in one line, and even that proof skips the steps of proving
from axioms that 2 × 2 = 2 + 2 = 4 (which we’ll see how to do next term). You don’t want to
spend the next three years taking pages and pages to write out simple algebra, so we need to
agree on a way to write proofs which is shorter. There are two ways to do this, and we will use
both.
The first way is that, as we go through the course (and the degree) we will make for ourselves
a library of true statements—ones which we already proved—and we will not repeat the proofs
every time we want to use them. So, for example, we already proved that for every natural
number n, the number n2 + n is even (We didn’t really write out every single step—if you don’t
like that, try doing it yourself). Next time we want to know that n2 + n is even for some natural
number n, we won’t need to prove it, we can just say ‘proved in MA103’. There’s nothing much
anyone can object to here—it’s clear that we could have written out a gold standard proof just
by copying-and-pasting in the proof from MA103.
The second way we will save time is by not writing out every single step. When you need to
do a piece of algebra, do it just as you did in school, and we will assume you do know how to
justify all the steps by going back to the axioms (or at least that you know where to look in
order to find out how). We will also sometimes save steps by saying that something is ‘obvious’,
or ‘clear’. When you (or I) write ‘obvious’ or ‘clear’ in a proof, it is there to tell the reader that
there are some steps missing, that you (or I) know what those steps are, and that the reader
should have no trouble figuring out what the missing steps are. What this also means is: if you
cannot explain why a statement is true, then you cannot write that it is ‘obvious’
in a proof. You will need to make a judgement of how many steps it is OK to skip.
You will quickly get used to what is and what is not acceptable as a proof—assuming you
do the weekly exercises—because your class teacher will correct you. What you should keep in
mind is that whatever you write as a proof should be something which you could expand out to
a gold standard proof if you were forced to, either from memory or because you know where to
look for the missing pieces and previously proved statements.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 27
As we go on, those ‘missing pieces and previously proved statements’ will get pretty long:
there will be proofs you write later this year in a page or two which might take a hundred or
more pages to write out in ‘gold standard’ style. For an example (which you shouldn’t expect to
understand when you read this the first time; but it will make sense when you’re revising) think
about how to prove that a piece of simple algebra with the rational numbers makes sense, in
terms of the axioms for the natural numbers. We prove in this course that you can do it (which
is enough—if I know something is possible, I don’t have to actually do it to check it works)—but
try actually doing it!
2 2
Proof. For all real a, b we have ab ≤ a 2+b .
2 2
Let p and q be real numbers. We have pq ≤ p +q2 .
We have 2pq ≤ p2 + q 2 .
We have p2 − 2pq + q 2 ≥ 0.
We have (p − q)2 ≥ 0.
Since p and q are real numbers, p − q is a real number. Since the square of any real number is
non-negative, we have (p − q)2 ≥ 0.
Expanding the brackets, we have p2 − 2pq + q 2 ≥ 0.
2 2
Rearranging, we get pq ≤ p +q 2 .
2 2
Since we proved pq ≤ p +q 2 for an arbitrary pair p and q of real numbers, we can conclude that
2 2
for all real a, b we have ab ≤ a 2+b .
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 28
What is going on here? The black text on the left is the proof we wanted. I’ve written it
out in a bit more detail than you would maybe feel necessary, in order to mention a couple of
important points. The red text on the right is the ‘current aim’—this is what we want to prove,
we have not yet proved it! The first line is simply repeating the text of the example. Let me
repeat what this aim is, in English. It is:
Pick any two real numbers. Then their product is at most half the sum of their squares.
Next, we pick a couple of real numbers p and q. We don’t assume anything about them apart
from that they are real numbers—that’s what the word ‘arbitrary’ means. We want to check
that for this particular pair of real numbers, we have the inequality we want—so the current
aim (the red text on the right) gets simpler. This is a standard approach to proving ‘for all’
statements; again, we’ll say more about this later.
At this point, I don’t see how to proceed ‘forwards’ in the proof; it’s not obvious what
the next black line should be, because the ‘aim’ inequality is complicated. So I try to ‘work
backwards’ and rearrange the ‘aim’ to something easier. That’s the next few red lines: get rid
of fractions, collect all the terms on one side, try to factorise—these are all things you can try.
If one doesn’t turn out to help, no problem, try another! In this example, we get to the nice
simple aim (p − q)2 ≥ 0.
Now I have reached an aim which I know how to prove true, so I write it down (that’s the
next black line). Finally, I can write out the rest of the proof, by writing out the red lines in
reverse; if you were trying this following the suggestion to work literally backwards from the
bottom of the paper, you’d already have written these lines from the bottom of the paper, and
this is where you would stop.
Finally—check that this proof makes sense! Does each black line really follow from the
previous ones?
It’s important to be a bit careful about what is going on with the ‘for all’, because many
students get confused here. Read this now, but come back and re-read it once you get to the
end of the next chapter and we have formally discussed quantifiers.
When we write ‘for all a, b ...’ the a and b are placeholders (we say bound variables, as
opposed to the free variables that appear in predicates) that we introduce just in order to write
the inequality conveniently. If you change these two letters for any others, it doesn’t change
the meaning of the sentence, or indeed if you write it in English without any algebra at all (as
in the box above). It doesn’t make sense to talk about ‘what a is’ on the first line; a is just a
placeholder. This is why I used different letters p and q on the second line: here we declare that
for the rest of the proof, we are going to work with a particular pair of real numbers p and q,
and they won’t change from line to line. I won’t normally bother with this (because normally
we are too lazy to use new letters) but you should be aware that this is a little bit naughty.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 29
Finally, we wrapped up the proof by stressing that what ‘for all’ means is a promise: ‘pick
any pair of real numbers, check the inequality for that particular pair, and you will find that it
is a true inequality.’
One final point to note is that this use of red text on the right in a proof is not standard ;
don’t expect to see it elsewhere. This is just my best attempt to show you how we get to a proof.
I’ll do this in several proofs later in the notes: it will always be the case that if you completely
ignore the red lines, what you have is a complete proof. If you are ‘working backwards’, you can
avoid having to write red lines by literally working back from the bottom of the page; if you
want to copy my red lines style, feel free, but think of the red lines as being part of your rough
work that should be crossed out once you figured out and wrote down the complete proof.
the middle which explains why the last line is true. If (when) you get a proof back from your
class teacher marked as wrong even though ‘the answer is right’, before complaining, think: does
it make a difference to the story if the middle line is instead ‘You pull out your gun and shoot
the child’ ?
Mistake 4 (Backwards thinking). Working in reverse to obtain a proof but then not writing
the proof out forwards.
For example, consider trying to prove the following trigonometric identity: for all real numbers
x, we have
(cos x)2 − sin x = 1 − (sin x)2 + sin x . (2.1)
If you just work in reverse, your proof might be:
Proof. Fix a real number x.
where to get to the last line we used the identity (sin x)2 + (cos x)2 = 1, which holds for all real
numbers x by Pythagoras’ Theorem. The last line is true, so we are done.
Note that normally you wouldn’t write justifications for each line of simple algebra—it’s
obvious enough how we got from each line to the next—but I wanted to do this here for extra
clarity.
This looks a lot like what we did in the last section to prove Example 2.5; it’s a lot like the
red rearranging-the-inequality lines there. We just didn’t bother to write the remaining black
lines out. What’s the problem?
What the above proof shows is that if the identity we want to prove, (2.1), holds, then 0 = 0,
which is a true statement. But that is the converse of the statement we want to prove, if 0 = 0
then (2.1) holds. (Which is the same as just saying that (2.1) holds: 0 = 0 is True.) We already
know that the converse being true doesn’t tell us if the original statement is true. If we want to
prove the original statement, we need to end with the statement we want to prove, not start
with it.
That might seem picky—let’s see what happens if we try to write it out in the ‘right order’.
Proof, take 2. Fix a real number x. We have
Looks better—but wait! In the last section, I told you to check the proof. The first two ‘so’s
are fine, but the third ‘so’, ‘taking square roots’, boils down to ‘If a2 = b2 then −a = b’—and
that’s not true; it could equally well be that a = b. There is a problem with the proof here—and
the reason is that we are trying to prove a false statement! In fact,
(cos π2 )2 − sin π2 = 02 − 1 = −1 but 1 − (sin π2 )2 + sin π2 = 1 − 12 + 1 = 1 .
so the ‘identity’ simply isn’t true.
What you should learn from this example is that it is not being picky to insist on writing
arguments (especially calculations with algebra) properly so that the statement to be proved
comes at the end not the beginning. It is very easy to do some operation to both sides which is
not reversible—in this example, squaring—without noticing and ‘prove’ a false statement. If
you write a proof properly, i.e. forwards, then you are more likely to notice a potential problem.
P Q P ∧ Q ¬(P ∧ Q) ¬P ¬Q ¬P ∨ ¬Q
T T T F F F F
T F F T F T T
F T F T T F T
F F F T T T T
Comment on Activity 2.3. The converse is ‘if n divides 12 then n divides 4’. This is false. For
instance, n = 12 is a counterexample. This is because 12 divides 12, but it does not divide 4.
The original statement is true, however. For, if n divides 4, then for some m ∈ Q, 4 = nm and
hence 12 = 3 × 4 = 3nm = n(3m), which shows that n divides 12.
CHAPTER 2. MATHEMATICAL STATEMENTS, PROOF, AND LOGIC 32
3.1 Sets
You have probably already met some basic ideas about sets and there is not too much more to
add at this stage, but they are such an important idea in abstract mathematics that they are
worth discussing here.
If you look around on the Internet, you might run into some things talking about ‘set theory’
and saying that this is all very subtle, and ‘unproveable’ and such things. This is not what we
are going to do. We are going to take a very simple view of sets (sometimes called naı̈ve set
theory). We are not going to go looking for trouble, and we will not find it, so don’t worry. If
you are curious about what trouble you might find if you insist on looking for it, see Section 3.6.
3.1.1 Basics
Loosely speaking, a set may be thought of as a collection of objects. A set is usually described
by listing or describing its members, or elements, inside curly brackets. For example, when we
write A = {1, 2, 3}, we mean that the objects belonging to the set A are the numbers 1, 2, 3 (or,
equivalently, the set A consists of the numbers 1, 2 and 3). Equally (and this is what we mean
by ‘describing’ its members), this set could have been written as
Here, the symbol ∣ stands for ‘such that’. Often, the symbol ‘∶’ is used instead, so that we might
write
A = {n ∶ n is a whole number and 1 ≤ n ≤ 3}.
33
CHAPTER 3. SETS AND QUANTIFIERS 34
Let’s see why {1, 2, 3} = {1, 2, 3, 1} according to this definition. We have to check a certain
predicate (namely x ∈ {1, 2, 3} ⇐⇒ x ∈ {1, 2, 3, 1} ) is true for every x. Well, for x = 1 it’s true, 1
is in both sets. For x = 2 it is true, 2 is in both sets. For x = 3 it is true, 3 is in both sets. For
x = 4 it is true, 4 is in neither set. For x = banana it is true, banana is in neither set. And so on...
for any x except the ones we already checked, the predicate is true because x is in neither set.
3.1.4 Subsets
We say that the set S is a subset of the set T , and we write S ⊆ T , if every member of S is a
member of T . For example, {1, 2, 5} ⊆ {1, 2, 4, 5, 6, 40}. (Be aware that some texts use ⊂ where
we use ⊆.) What this means is that we have
A rather obvious, but sometimes useful, observation is that, given two sets A and B, A = B
if and only if A ⊆ B and B ⊆ A. So to prove two sets are equal, we can prove that each of these
two ‘containments’ holds. That might seem clumsy, but it is, in many cases, the best approach.
For any set A, the empty set, ∅, is a subset of A. You might think this is strange, because
what it means is that ‘every member of ∅ is also a member of A’. But ∅ has no members—how
can that be true? Let’s go back to the logic: ‘every member of ∅ is also a member of A’ means
‘for each x, if x in ∅ then x ∈ A’. Check the truth table of if—then ( Ô⇒ ). The only way some
x can be a counterexample to this statement is if x is in ∅ and not in A. But there is no x such
that x ∈ ∅, by definition—so we proved ∅ ⊆ A.
It’s very easy to get confused about what sets are equal, what are members and what are
subsets of a set. I’m about to give an example, which right now will look like a deliberate attempt
to trick you. But things like this will show up later, not as a trick, and you need to get it right.
Warning 3.2. Consider the set S = {0, 1, {0, 1}, {2}}. What are its members and subsets?
Well, 0 is a member. And so is 1, and so is {0, 1}, and so is {2}. But 2 is not a member
of S. Furthermore, {0, 1} is a subset of S (because 0 and 1 are both members of S) and so is
{{0, 1}}. These are two different sets—{0, 1} ≠ {{0, 1}}. And there are some other subsets of
S too—try to write them all out; you should get 16 in total.
If you don’t like the statements above, maybe think of it this way. Any (mathematical)
object can go in a set, so the number 1 can go in, or a function can go in, or even another set.
This is just the same thing as saying that you can put a (normal) object in a parcel, so an apple
can go in a parcel, or an orange can go in a parcel, or a parcel full of sweets can go in another
parcel, and so on. If you think a parcel containing a parcel full of sweets is the same as a parcel
full of sweets (or it’s the same as just having a lot of sweets), think back to childhood games of
Pass-the-Parcel. Just like that game, it really matters how many of the { and } set brackets
there are, and what exactly they go round.
Similarly, we define the intersection A ∩ B to be the set whose members belong to both A
and B:
A ∩ B = {x ∣ x ∈ A and x ∈ B} .
In other words,
for all x we have x ∈ A ∩ B ⇐⇒ (x ∈ A) ∧ (x ∈ B) .
CHAPTER 3. SETS AND QUANTIFIERS 37
really defines the same set as A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 , and similarly with the arbitrary intersection.
What do these definitions mean if I = ∅? It’s not very obvious, and we need to talk about
universal sets to understand it. We’ll get back to this later; for now, just think of ⋃ as a
convenient way to avoid writing a long string of ∪s.
A ∩ B = Ā ∪ B̄.
This looks a little like the fact (see Activity 2.2) that ¬(P ∧ Q) is equivalent to ¬P ∨ ¬Q. And
this is more than a coincidence. The negation operation, the conjunction operation, and the
disjunction operation on statements behave entirely in the same way as the complementation,
intersection, and union operations (in turn) on sets. In fact, when you start to prove things
about sets, you often end up giving arguments that are based in logic.
For example, how would we prove that A ∩ B = Ā ∪ B̄? We could argue as follows:
x∈A∩B ⇐⇒ x ∈/ A ∩ B
⇐⇒ ¬(x ∈ A ∩ B)
⇐⇒ ¬((x ∈ A) ∧ (x ∈ B))
⇐⇒ ¬(x ∈ A) ∨ ¬(x ∈ B)
⇐⇒ (x ∈ Ā) ∨ (x ∈ B̄)
⇐⇒ x ∈ Ā ∪ B̄.
What the result says is, in fact, easy to understand: if x is not in both A and B, then that’s
precisely because it fails to be in (at least) one of them.
For two sets A and B (subsets of a universal set E), the complement of B in A, denoted by
A ∖ B, is the set of objects that belong to A but not to B. That is,
A ∖ B = {x ∈ A ∣ x ∈/ B}.
R × R × R = R3 = {(a, b, c) ∶ a, b, c, ∈ R} ,
P(A) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
Activity 3.2. Write down the power set of the set A = {1, 2, 3, 4}.
Activity 3.3. Suppose that A has n members, where n ∈ N. How many members does P(A)
have?
CHAPTER 3. SETS AND QUANTIFIERS 39
3.2 Quantifiers
We have already met the ideas of universal and existential statements involving natural numbers.
More generally, given any set E, a universal statement on E is one of the form ‘for all x ∈ E,
P (x)’. This statement is true if P (x) is true for all x in E, and it is false if there is some x in
E (known as a counterexample) such that P (x) is false. We have a special symbol that is used
in universal statements: the symbol ‘∀’ means ‘for all’. So the typical universal statement can
be written as
∀x ∈ E, P (x).
(The comma is not necessary, but I think it looks better.) An existential statement on E is one
of the form ‘there is x ∈ E such that P (x)’, which is true if there is some x ∈ E for which P (x)
is true, and is false if for every x ∈ E, P (x) is false. Again, we have a useful symbol, ‘∃’, meaning
‘there exists’. So the typical existential statement can be written as
∃x ∈ E, P (x).
Here, we have omitted the phrase ‘such that’, but this is often included if the statement reads
better with it. For instance, we could write
∃n ∈ N, n2 − 2n + 1 = 0,
but it would probably be easier to read
∃n ∈ N such that n2 − 2n + 1 = 0.
Often ‘such that’ is abbreviated to ‘s.t.’. (By the way, this statement is true because n = 1
satisfies n2 − 2n + 1 = 0.)
We have seen that the negation of a universal statement is an existential statement and
vice versa. In symbols, ¬(∀x ∈ E, P (x)) is logically equivalent to ∃x ∈ E, ¬P (x); and ¬(∃x ∈
E, P (x)) is logically equivalent to ∀x ∈ E, ¬P (x).
With these observations, we can now form the negations of more complex statements. Consider
the statement
∀n ∈ N, ∃m ∈ N, m > n.
Activity 3.4. What does the statement ∀n ∈ N, ∃m ∈ N, m > n mean? Is it true?
What would the negation of the statement be? Let’s take it gently. First, notice that the
statement is
∀n ∈ N, (∃m ∈ N, m > n).
The parentheses here do not change the meaning. According to the rules for negation of universal
statements, the negation of this is
∃n ∈ N, ¬(∃m ∈ N, m > n).
But what is ¬(∃m ∈ N, m > n)? According to the rules for negating existential statements, this
is equivalent to ∀m ∈ N, ¬(m > n). What is ¬(m > n)? Well, it’s just m ≤ n. So what we see is
that the negation of the initial statement is
∃n ∈ N, ∀m ∈ N, m ≤ n.
We can put this argument more succinctly, as follows:
¬ (∀n ∈ N(∃m ∈ N, m > n)) ⇐⇒ ∃n ∈ N, ¬(∃m ∈ N, m > n)
⇐⇒ ∃n ∈ N, ∀m ∈ N, ¬(m > n)
⇐⇒ ∃n ∈ N, ∀m ∈ N, m ≤ n.
CHAPTER 3. SETS AND QUANTIFIERS 40
Warning 3.4. This argument is succinct, but it is also hard to read, at least for me. Just to
understand what each line means requires some thought, and then some more thought to see
that it actually is equivalent to the previous line. It’s also fragile in the sense that making some
tiny change could break it.
In particular, the order of quantifiers is important. Change them, and you probably change
the meaning. If you change the order of the quantifiers in Activity 3.4, is what you get a true
statement? Try writing out what it means in English.
You want to prove an existential statement. That means you need to find one example.
There is a person who has run under 10 seconds for the 100m, because Usain Bolt did it.
You want to prove a universal statement. That means you need to check every single
possibility.
There is no person over 10 metres tall, because (you went round the world and measured the
heights of all 8 000 000 000 people).
You want to disprove an existential statement. That means you need to check every single
possibility doesn’t work —in other words, prove a universal statement.
You want to disprove a universal statement. That means you need to find one example where
it goes wrong—which is the same as proving an existential statement, and we normally call the
bad example a counterexample.
How do proofs actually look that do these things? I can’t help you much with proving an
existential statement (yet). Sit down and think about what the object you need to find is, and
hopefully at some point you can write down ‘Usain Bolt’ or ‘Lamont Marcell Jacobs’ or some
other example.
But there is a standard first thing to try if you are supposed to prove a universal statement.
If the statement is ‘for all z ∈ R, P (z)’ then the proof will often start ‘Pick z ∈ R’ or ‘Given
z ∈ R’. Then the aim is to prove P (z) for this one particular z.
We will get to using existential and universal statements later. You are told some universal
statement is true—what can you do with that information? It’s best to think of that as a
completely different thing to the process of proving existential and universal statements; again,
we’ll get to that later.
⋃ Ai = {x ∣ ∃i ∈ I, x ∈ Ai } ,
i∈I
⋂ Ai = {x ∣ ∀i ∈ I, x ∈ Ai } .
i∈I
Check that you see these definitions agree with the ones we gave earlier!
Now, what exactly do we do if I is an empty set? Well, for union it is intuitively clear: the
union of no sets had better be an empty set. That’s what the definition above says. If I is
empty, there is no i ∈ I, so whatever the condition after ‘∃i ∈ I’ is is irrelevant. The statement
‘∃X ∈ ∅, P (x)’ is False whatever P (x) is. This looks obvious written like this, but if P (x) is a
statement that looks ‘obviously true’ you will be tempted to say that ‘∃X ∈ ∅, P (x)’ should be
True, and then you will run into trouble.
CHAPTER 3. SETS AND QUANTIFIERS 41
For the arbitrary intersection, it is not so clear what the right answer should be—and in fact
we will avoid using this notation—but what the answer should be is that
⋂ Ai = E
i∈∅
where E is the universal set we’re working in. Why? Well, because ‘∀x ∈ ∅, P (x)’ is True
whatever P (x) is, so by definition every x we are considering is in the arbitrary intersection of
no sets. This might sound strange, and for sets it is a bit funny. But it is important in logic:
and again, if P (x) is some statement that looks ‘obviously false’ then you will be tempted to
say that ‘∀x ∈ ∅, P (x)’ should be False and get into trouble.
This proof has been written in a fairly informal and leisurely way to help explain what’s
happening. It could be written more succinctly and a bit more formally:
Proof. Suppose the set of prime numbers is not infinite. Then there are t prime numbers, for
some integer t. In other words, the set of prime numbers is {p1 , . . . , pt }. Consider the integer
N = (p1 × p2 × ⋅ ⋅ ⋅ × pt ) + 1. Now N is bigger than any of p1 , . . . , pt , so (by our assumption that
p1 , . . . , pt are all the prime numbers) it cannot be prime. And by construction N is not divisible
by any of p1 , . . . , pt (if we divide by any of them we have a remainder of 1). And since 2 and 3
are prime, certainly N is at least 7, in particular it is bigger than 1. But any integer bigger than
1 is either prime or it is divisible by a prime number, which is a contradiction.
This proof is still missing a few things—which you can see a bit more clearly because it’s
written formally. Why does the first sentence imply the second? Well, we didn’t formally define
the word ‘infinite’ yet. When we do, you’ll see that the second sentence is just writing out the
definition of ‘not infinite’, also known as ‘finite’. And we still didn’t prove the final sentence—but
hopefully it is a bit more clear what exactly we do need to prove. It’s worth thinking about this
a little bit now—what exactly is missing? We defined a prime number to be an integer greater
than 1 which is only divisible by 1 and itself. So we need to know what to do if we are given an
integer bigger than 1 which is not prime.
The other point which we should be careful about is the following. Suppose that we take the
first t prime numbers, multiply them together and add one. What we just proved is that either
we will get a new prime number or what we get will be divisible by a prime number which isn’t
one of the first t primes. We don’t have any idea which of these two things will happen. If you
try this for the first few values of t, you see
2+1=3
2×3+1=7
2 × 3 × 5 + 1 = 31
2 × 3 × 5 × 7 + 1 = 211
2 × 3 × 5 × 7 × 11 + 1 = 2311
which are all prime. It’s tempting to think this pattern will continue, but in fact
2 × 3 × 5 × 7 × 11 × 13 + 1 = 30031 = 59 × 509
is not prime.
In this course, we’ll try to give you a mix of problems. Some will not be harder than the
ones you did at school—these are there to check you understand the concept we just introduced.
A few will be either very hard (so that while I might know how to solve them, you probably will
not be able to) or even unsolved problems. These are there so that you get some experience with
trying something genuinely difficult and seeing how far you can get. Most will be somewhere in
the middle: harder than anything you did in school, but you can solve some or most of them,
in more or less time. These are the kinds of questions that will appear on the exam—but by
then you will have more experience and things you find difficult now will not be so bad any
more—and so training yourself to solve questions at this level will be needed to pass the course.
Inevitably, when you read a proof, in the textbooks or in these notes, you will ask ‘How did
the writer know to do that?’ (‘magic steps’) and you will often find you asking yourself ‘How
can I even begin to prove this?’. This is perfectly normal.
Look back to the two-line proof of Example 2.5. That proof has a ‘magic step’, but you know
how the writer thought of it. We will meet more proofs with magic steps in this course (and I
will generally try to explain why they are not really magic) and in future courses (where you
might be expected to figure things out for yourself a bit more), and there will always be some
reason why the step is not as magical as it seems.
We’ll discuss more strategies, more things to try, more tools to use, as we go on in the course.
At the same time, we’ll look at more difficult problems and more complicated concepts. You
may well feel the whole time that you are only barely coping with the course, and everything is
almost too hard. That’s what we are aiming for, more or less: to push your problem solving
ability to improve as fast as possible. Every so often, look back at the problems from the first
few weeks that you struggled with so that you can see how much you have moved on.
For now, the main thing to remember is: if you don’t try, you will never succeed. Try
something. You don’t have to justify to anyone why you should start with this particular
calculation, or why that theorem might help you. No-one will see your rough work. When you
fail, think about why—what is missing? What else could you try? Eventually you will get there.
This is a bit like integration—there are several methods, different substitutions and so forth; try
one until you get there. It’s more open ended in that there will be many more things to try.
One thing is vital: before you try to prove anything, you need to understand what it is that
you want to prove. That no doubt sounds totally obvious—but every year, I read lots of work
from students who obviously do not know what all the words in the question mean. If you do
not know what a word means, you have no chance of writing a correct solution! Look up the
definition. Then use the definition—there has to be a reason why that word is there!
CHAPTER 3. SETS AND QUANTIFIERS 45
This is particularly the case when a word has a meaning in mathematics and a meaning in
normal English, and these meanings are not the same. We saw that already with ‘implies’, and
there will be many more examples. You don’t get to choose, you have to use the mathematical
definition.
In general, you should expect that it takes time to read and understand even a rather short
mathematical statement. Take the time, look up any words you don’t know or are unsure about,
check that you know the meanings of all the symbols, and put all the pieces together. As a quick
example, what does A = B mean? Well, that depends what A and B are. Are they numbers?
vectors? sets? functions? In each of those cases, the symbol = means something different.
The first thing to notice with this statement is that the = is a set equality. That has a
definition, so we might get somewhere by writing it in. Since this is the aim—this is what we
want to prove—we’re going to be changing our aim, i.e. working backwards, to start with. Let’s
first give the proof, then explain it a little bit.
Proof. {x ∈ R ∣ x2 + 4 ≥ 8} ∩ N = {y ∈ Z ∣ y ≥ 2}
∀z , (z ∈ {x ∈ R ∣ x2 + 4 ≥ 8} ∩ N) ⇐⇒ (z ∈ {y ∈ Z ∣ y ≥ 2})
∀z , (z 2 + 4 ≥ 8 and z is a positive integer) ⇐⇒ (z ≥ 2 is an integer)
Fix z. (z 2 + 4 ≥ 8 and z is a positive integer) ⇐⇒ (z ≥ 2 is an integer)
∀z , (z 2 + 4 ≥ 8 and z ∈ N) ⇐⇒ (z ≥ 2 is an integer)
By definition (of and and of the sets written out below), that is the same thing as
∀z , (z ∈ {x ∈ R ∣ x2 + 4 ≥ 8} ∩ N) ⇐⇒ (z ∈ {y ∈ Z ∣ y ≥ 2}) ,
{x ∈ R ∣ x2 + 4 ≥ 8} ∩ N = {y ∈ Z ∣ y ≥ 2}
so we are done.
CHAPTER 3. SETS AND QUANTIFIERS 46
Again, the black lines are a complete proof. But we didn’t know how to get started without
thinking a bit first about what it is we actually wanted to prove. The first red line is just
repeating what we want to prove. The second red line is writing out the definition of set equality
in this particular example. That’s what I mean by ‘know and use’ the definition—it’s never
going to help much to simply copy the definition from your notes; what you need to do is to
write the definition as it applies to the thing you’re working with.
The third red line is, again, simply copying out the definitions as they apply in this example.
On the right, we’re simply filling in what it means for z to be in the set {y ∈ Z ∣ y ≥ 2}. On
the left, we’re filling in what it means for z to be in the conjunction of the two sets: namely
(by definition) it is a real number such that z 2 + 4 ≥ 8, and also it is a positive integer. Since all
positive integers are real, I didn’t bother to write the ‘is a real number’ bit. So far, our ‘current
aim’ has been getting longer each line, which looks like negative progress—but it is also getting
more concrete; we replaced abstract notation with things that you are familiar with. Generally
that means it will be easier to handle.
At this point, we can see a standard strategy to try. We’re supposed to prove a ‘for all’
statement, so let’s pick a particular z and try to prove it for that particular z. This gives us the
first black line of our proof, and (for the first time) the current aim actually gets shorter. What
we now have to prove is something simple. Saying that z is an integer at least 2 is supposed to
be the same thing as saying that z is a positive integer such that z 2 + 4 ≥ 8.
There are a few ways to proceed at this point, but the one I chose is to illustrate another
standard technique, ‘proof by cases’. At this moment, we said nothing about what z is. The ⇐⇒
statement we are trying to prove could be true for any of several different reasons, depending on
what z is. We simply list a bunch of reasons, called dividing into cases, and then check that
every z is covered by one of these reasons.
Once we checked that our cases are exhaustive—that is, any possible z falls into at least one
of them—then what we have proved is that for any z we have
(z 2 + 4 ≥ 8 and z is a positive integer) ⇐⇒ (z ≥ 2 is an integer). So we can write that down as
the next black line; and then we finish the proof off by recopying the red lines from earlier in
the reverse order, and checking that they really make sense written out forwards.
It’s important that you are happy with this proof. If not, you should talk to me or your class
teacher for a better explanation.
You may well feel that we’ve done a lot of unnecessary formalism to prove something ‘obvious’,
if you’re already happy with what the set notation means. This is a trap—it’s important to
be able to figure out how to write a formal proof from the definitions now, even though you
know how to write an intuitive and convincing explanation of why these two sets are the same
without bothering. This is because later (very soon) you’ll be dealing with statements which are
not so obvious, and you will not be able to rely on your intuition; then you need to be able to
get started with a formal proof.
In particular, you may well feel that the definition chasing we did—replacing the set equality
with its definition, and replacing the set membership and conjunction with their definitions—was
just some formal nonsense that you did not need in order to see why the statement is true. Later
in the course, the statements you can attack by definition chasing will not be obvious, but it
will still sometimes be the case that the only thing you need to do to get a solution is to replace
notation or terms with their definitions till you get to something obvious.
Finally, we saw a ‘proof by cases’ of a ‘for all’ statement. There will be lots more of these to
come. You haven’t seen anything like this before because simple algebra statements, when they’re
true, are true for exactly one reason—the calculation you do to prove them. More complicated
statements generally have multiple possible reasons for being true, as we saw here. If you’re
CHAPTER 3. SETS AND QUANTIFIERS 47
not happy with the logic, think of it the following way. Whatever z is, we need to provide a
reason why the predicate we’re looking at (the ⇐⇒ statement) is True for that particular z. If
there were only say 5 possible values of z, we could just do that by writing out each of the five
corresponding statements and checking them. Since there are infinitely many possible values of
z, we can’t do that.
But we can tell a Checker how they should go about checking any particular z they want.
You can imagine a dialogue with the Checker. The line ‘fix z’ means, we tell the Checker to
decide on a particular z that they should check; maybe it’s 5, or 0, or π, or banana. Then the ‘if
z is not an integer’ line means: we tell the Checker to first ask themselves whether their favourite
z is an integer; if it’s not, we explain to them why the ⇐⇒ is true (your favourite z is not an
integer, so both the left and right side of the ⇐⇒ come out to False, so the ⇐⇒ comes out
True). Then the next line tells the Checker what to do if z = 1, and so on. Finally we make sure
our cases are exhaustive—that means we are now confident that whatever z the Checker asks us
for help with checking, we have written down a reason for the Checker why the ⇐⇒ comes out
True.
In particular, ‘fix z’ does not mean that z is somehow ‘all the possibilities at once’.
How should you know to think about proving something by cases? This is simple to say
(but not always easy to do). If you can’t find one argument that works for every z, then find an
argument that works for some z, write it down, figure out which zs exactly it works for, and
then think about how to handle the other zs. Keep going until you find you’ve dealt with every
possible z.
x2 = 1 Ô⇒ x = 1 or x = −1.
This is not only pure laziness, since it’s just as easy to write :
x2 = 1, hence x = 1 or x = −1.
But it is even probably not what was meant! The implication arrow “Ô⇒” has a logical
meaning “if . . . , then . . . ”. So if you write “x2 = 1 Ô⇒ x = 1 or x = −1”, then that really means
“if x2 = 1, then x = 1 or x = −1”. And hence this gives no real information about what x is. On
the other hand, writing
I know x2 = 1, hence x = 1 or x = −1,
means that now we know x = 1 or x = −1 and can use that knowledge in what follows.
Some other unnecessary symbols that are sometimes used are “∴” and “ ∵ ”. They mean
something like “therefore/hence” and “since/because”. It is best not to use them, but to write
the word instead. It makes things so much easier to read.
CHAPTER 3. SETS AND QUANTIFIERS 48
Problem 3.8. For any natural numbers a, b, c with c ≥ 2, there is a natural number n such that
an2 + bn + c is not a prime.
result is sometimes prime and sometimes not, depending on t (we saw examples of both). Are
there infinitely many values of t such that we get a prime number? No-one knows the answer;
that problem has been open for over 2 300 years.
Do it yourself
Here is one (of many possible) solutions to Problem 3.8:
Given : natural numbers a, b, c, with c ≥ 2.
To prove : there is a natural number n such that an2 + bn + c is not a prime.
By definition, a natural number p is prime if p ≥ 2 and the only divisors of p are 1 and p itself.
Hence to prove : there is a natural number n for which an2 + bn + c is smaller than 2 or it has
divisors other than 1 or itself.
Let’s take n = c. Then we have an2 + bn + c = ac2 + bc + c.
But we can write ac2 + bc + c = c (ac + b + 1), which shows that ac2 + bc + c has c and ac + b + 1 as
divisors.
Moreover, it’s easy to see that neither c nor ac + b + 1 can be equal to 1 or to ac2 + bc + c.
We’ve found a value of n for which an2 + bn + c has divisors other than 1 or itself.
The crucial step in the answer above is the one in which I choose to take n = c. Why did I
choose that? Because it works. How did I get the idea to take n = c? Ah, that’s far less obvious.
Probably some rough paper and lots of trying was involved. In the final answer, no information
about how this clever idea was found needs to be given.
You probably have no problems following the reasoning given above, and hence you may
think that you understand this problem. But being able to follow the answer, and being able to
find the answer yourself are two completely different matters. And it is the second skill you
are suppose to acquire in this course. (And hence the skill that will be tested in the examination.)
Once you have learnt how to approach questions such as the above and come up with the clever
trick yourself, you have some hope of being able to answer other questions of a similar type.
But if you only study answers, you will probably never be able to find new arguments for
yourself. And hence when you are given a question you’ve never seen before, how can you trust
yourself that you have the ability to see the “trick” that that particular question requires ?
For many, abstract mathematics seems full of clever “tricks”. But these tricks have always
been found by people working very hard to get such a clever idea, not by people just studying
other problems and the tricks found by other people.
Why is it so important that c ≥ 2 ? If you look at the proof in the previous section, you see
that that proof goes wrong if c = 1. (Since we want to use that c is a divisor different from 1.)
Does that mean the statement is wrong if c = 1 ? (No, but a different proof is required.)
And what happens if we allow one or more of a, b, c to be zero or negative?
And what about more complicated expression such as an3 + bn2 + cn + d for some numbers
a, b, c, d with d ≥ 2 ? Could it be possible that there is an expression like this for which all n give
CHAPTER 3. SETS AND QUANTIFIERS 50
prime numbers ? If you found the answer to the original question yourself, then you probably
immediately see that the answer has to be “no”, since similar arguments as before work. But if
you didn’t try the original question yourself, and just studied the ready-made answer, you’ll be
less well equipped to answer more general or slightly altered versions.
Once you start thinking like this, you are developing the skills required to be good in
mathematics. Trying to see beyond what is asked, asking yourself new questions and seeing
which you can answer, is the best way to train yourself to become a mathematician.
We’ve now reached the point in the course where you have all the basic tools you need to
start looking at problems. There will be more concepts to introduce in the next chapters, but
we will stop with introducing a new concept every page, and start spending much more time
finding out what we can do.
One of the properties we would rather like to have sets to have is that we can write things
like
{n ∈ N ∶ n is even}
and say that this too is a set. More generally, if we have some statement P (s) (whose truth
depends on s) and a set S, we would like to say that {s ∈ S ∶ P (s) is true} is a set. We’ll see
that this kind of statement shows up continually throughout your degree programme.
Now, so far this looks fine—if ‘anything goes’ then certainly this is OK. But if ‘anything
goes’, we can also ask about the set of all mathematical objects—this would also be a set, let’s
call it U for ‘universe’. And we can write our favourite statement P (s), for example P (s) could
be the statement ‘s is not a member of s’. In that case we get a set
X = {s ∈ U ∶ P (s) is true} .
Now, you might notice this statement P (s) is a bit funny—how can a set possibly be a member
of itself? Well, actually if U is a set, then U is a mathematical object so U has to contain itself.
That might already raise a warning sign that strange things are going to happen, but it’s not
actually a logical contradiction; it’s just a bit funny.
But what about this set X? Well, by definition X contains everything which is not a member
of itself (and nothing else). So it certainly contains anything which isn’t a set (because something
which isn’t a set doesn’t contain anything at all, let alone itself). And it certainly contains
a lot of sets, like ∅ and {1, 2, 53}. OK, does X contain X? Well, if not, then by definition it
should. So X must contain X. But then by definition, X cannot contain X. That’s a logical
contradiction, pointed out by Bertrand Russell.
That’s really nothing more than a mathematical version of the ‘Barber of Seville’, who shaves
everyone in Seville that doesn’t shave themself. Who shaves the Barber?
What this logical contradiction tells us is that ‘anything goes’ is not OK. Some things are
not sets. We need to give some rules which allow you to construct new sets from old sets; some
axioms of set theory. This is what most mathematicians do (when we think about such things
CHAPTER 3. SETS AND QUANTIFIERS 51
at all!), and usually we use some axioms called ZFC (Zermelo-Fraenkel with Choice). These
axioms don’t, for instance, allow you to construct a ‘set of everything’; in fact, they don’t allow
any set to contain itself (because you have to construct new sets from old sets you already have).
These rules don’t—as far as we know—lead to logical contradictions like Russell’s. If you are
worried about trying to explain everything in mathematics, then a good place to start is with
ZFC set theory.
However, ZFC set theory is hard work; you spend a lot of time and energy proving things
which look ‘obvious’. We had to make a choice: do we spend all year building up the basics of
mathematics from set theory, so that you have one (hopefully) consistent foundation for the rest
of your degree? Or do we want to actually do some mathematics? We chose to do the latter,
which means that in this course we are going to assume some things are true without proving
them. In particular, we are going to assume statements like that there is such a thing as the set
of natural numbers N, that it makes sense to talk about sets of pairs such as {(a, b) ∶ a, b ∈ N},
and so on. All these are things which one can prove from the ZFC axioms, but we will not do so.
If you dislike this, you should go study ZFC set theory (in the summer, when you have
time!). However don’t expect it to be particularly easy, and don’t expect it to be an ‘answer to
everything’. You’ll still need to assume that ZFC set theory itself makes sense; there is no proof
that it makes sense.
4.1 Introduction
In this chapter we will discuss what is meant by a ‘mathematical structure’, and explore some
of the properties of one of the most important mathematical structures: the natural numbers.
These will not be new to you, but they shall be explained a little more formally. The chapter
also studies a very powerful proof method, known as proof by induction. This enables us to
prove many universal statements about natural numbers that would be extremely difficult to
prove by other means.
53
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 54
(6) The ‘clock numbers’ Z24 , which are the integers {0, 1, 2, . . . , 23} on a 24-hour clock, where
you add and multiply as you would on a clock; if you get 24 you replace it with 0, if you get
25 you replace it with 1, and so on.
a b
(7) The 2 × 2 matrices ( ) where a, b, c, d are real numbers. Here too we can define addition
c d
and multiplication:
a b a′ b ′ a + a′ b + b ′
( ) + ( ′ ′) = ( ) and
c d c d c + c′ d + d ′
a b a′ b ′ aa′ + bc′ ab′ + bd′
( ) × ( ′ ′) = ( ′ ).
c d c d ca + dc′ cb′ + dd′
These still look like structures where you can ‘do arithmetic as you’re used to it’. But you
have to be a little careful now. In Z24 we have 4 × 5 = 20 = 4 × 11. So what should we say 20/4
is? You’re used to the idea that ‘division by zero’ doesn’t make sense, but in Z24 ‘division by
four’ also doesn’t make sense. When you work with 2 × 2 matrices, then multiplication turns out
not to be commutative:
0 1 1 0 0 −1 1 0 0 1 0 1
( )( )=( ) but ( )( )=( ).
−1 0 0 −1 −1 0 0 −1 −1 0 1 0
(8) The set of social networks, where a social network consists of a (finite) collection of people
and a relation ‘friends’ between pairs of people.
Think of taking a snapshot of the Facebook network at some moment: there are something like
1 000 000 000 people in the network, and if I look at any particular pair I will find they are either
friends or they are not. That’s a social network (by the definition we gave); if we let some time
pass, some people join or leave, some pairs of people friend or de-friend each other, we get a
different social network.
It’s not clear what + or × should mean here—how can we multiply social networks? But
I probably don’t have to convince you that there are interesting things to study here; and in
fact the (results of the) mathematical study of networks (‘Graph Theory’) turns out to be very
important in today’s technology. We’re not going to go further into this in MA103; the point of
giving this example is to show you that we can be interested as mathematicians in things which
don’t involve arithmetic.
More or less, any time you find a precise, unambiguous definition of something, then you have
a mathematical structure which you can start studying. Mathematics is a much broader subject
than the arithmetic you saw in school. A lot of mathematics is not about numbers. Of
course, not everything interesting is mathematics—you (maybe) find politics interesting, but
you will not be able to come up with a definition of ‘left-wing’ or ‘economically good’ which is
generally agreed on, let alone one which is precise and unambiguous. We’ll have to leave politics
to the political scientists. The flip side of this is: it’s (more or less) true that all mathematicians
agree that all of mathematics is correct, which keeps fights to a minimum. That’s certainly not
true for political scientists, who (sometimes) write books whose messages boil down to ‘My idea
is right’, ‘You’re wrong’, ‘Am not!’, ‘Wrongy wrongy wrong!’... and so on.
If you’re thinking carefully, you might notice that the structures we mentioned above aren’t
really very clearly defined. What are ‘the points on the number line’ ? In fact, what are ‘the
natural numbers’ ? We probably all feel we know what is meant by a positive integer, how to
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 55
add and multiply them, and that all of us will get the same answers if we try it. But that’s not
good enough. It would be very embarrassing if it turned out that some of us made different
assumptions to others about the natural numbers, and we started arguing about what statements
are true.
The way we solve this in mathematics is to be very careful with assumptions. We will write
down a rather short list of assumptions, called axioms. And then we will prove that all the other
properties of the natural numbers which you are used to follow from those few axioms. This is
called the axiomatic approach to the natural numbers. We’ll develop it in MA103, in Lent Term,
as a warm-up to the axiomatic approach to groups and to abstract vector spaces—these are
structures which you quite possibly have not yet met, and about which you have no intuition.
The only way you can hope to prove anything about groups or abstract vector spaces is to get
good at working with axioms.
But for now, we will stick to structures that you are familiar with, like the natural numbers.
And we will not worry too much about justifying properties carefully from ‘axioms’, instead we
will get on with some mathematics.
Well, we can prove P (2) from this. We know P (1) is true, and we know P (1) Ô⇒ P (2) is
true. Look at the truth table for Ô⇒ ; the only way that that can happen is that P (2) is true.
Now we can prove P (3). We know (now!) that P (2) is true, and we know P (2) Ô⇒ P (3).
Again, the only way that can happen is that P (3) is true.
And so on...
That looks fairly convincing; and I said (truthfully) that this does prove P (2) and P (3).
Why is this not in fact a proof that the Principle of Induction is true? Well, in mathematics we
insist that a proof is always a finite argument: it has to be something you can get to the end of
and check, not an infinite sequence of statements, nor something finishing with a vague ‘carry
on like that’.
You will probably feel that this particular ‘and so on’ is clear enough that you would be
happy to accept this argument as valid, even though it doesn’t quite fit the definition of a
mathematical proof. This is a reasonable thing to say; though in Lent Term we will write down
some axioms for the natural numbers and use them to prove the Principle of Induction.
What is not fine, though, is saying the same thing about other similar arguments.
If you start allowing ‘and so on’ statements into proofs, it is easy to run into trouble. You
might miss something that works fine for the first ten steps but then goes wrong, because you
didn’t notice it before writing ‘and so on’. Your readers might guess a different pattern than you
intended for ‘and so on’—if I give you the sequence 1, 1, 2, 6, ... what is actually the next term?
there are a few integers you could reasonably argue for. Your readers might not be able to guess
at all what you meant ‘and so on’ to mean, because you are proving something complicated.
What certainly is the case is that your mathematics will no longer be something that anyone
can check and agree on; different readers might disagree on whether your proof is valid.
So we do not allow ‘and so on’ in proofs. If you can formulate clearly enough what you
intend ‘and so on’ to mean, then what you will find you have written is a proof by induction.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 57
4.3.2 An example
Here’s an example of how we might prove by induction a result we proved directly earlier, in
the previous chapter.
Suppose you looked at this statement for a bit, and didn’t notice the ‘trick’ we used earlier.
You would probably see what n2 + n is for a few integers first, to get some idea. 12 + 1 = 2 is even.
22 + 2 = 6 is even. 32 + 3 = 12 is even. Then you might think, the difference between consecutive
squares is always odd, and obviously the difference between consecutive integers is always odd,
so the difference between consecutive values of n2 + n is always even—if I know that n2 + n is
even, that tells me (n + 1)2 + (n + 1) is even. As soon as you start thinking that it would be
useful to know an earlier case to prove a later case, that generally means you want to write a
proof by induction. Here it is.
Proof. Let Q(n) be the statement ‘n2 + n is even’.
The base case is n = 1. The statement Q(1) is ‘12 + 1 is even’. That is true, because 12 + 1 = 2.
Fix a natural number k.
As an induction hypothesis, suppose Q(k) is true, i.e. k 2 + k is even.
We have (k + 1)2 + (k + 1) = k 2 + 2k + 1 + k + 1 = (k 2 + k) + 2(k + 1). Since k 2 + k is an even
number by the induction hypothesis, and 2(k + 1) is obviously even, we see that (k + 1)2 + (k + 1)
is even, which is Q(k + 1).
So for this k, we proved Q(k) Ô⇒ Q(k + 1), and since k ∈ N is arbitrary, we proved
∀k ∈ N , Q(k) Ô⇒ Q(k + 1), the induction step.
By the Principle of Induction, we conclude that Q(n) is true for all n ∈ N.
The reason why I used the letter Q rather than P is just to remind you that it’s not important
which particular letter we use.
Let’s recap the logic here quickly. The Principle of Induction says: if you know the base case
Q(1) and the induction step ∀k ∈ N , Q(k) Ô⇒ Q(k + 1) are true statements, then you also
know that ∀n ∈ N , Q(n) (which is our goal) is a true statement. So a proof by induction will
always mean proving the base case Q(1) (which is usually, as here, a simple calculation), and
then proving the induction step, and then saying ‘so we are done by induction’.
The induction step is a complicated statement: it is a universal statement, and the thing
inside the ‘for all’ that we want to show is itself an implication. Nevertheless, there is a standard
thing to try for both of these. Since we want to prove ∀k ∈ N , Q(k) Ô⇒ Q(k + 1), the proof of
the induction step will start ‘fix k ∈ N’ or ‘given k ∈ N’ (these mean the same) and then we just
have to prove the implication Q(k) Ô⇒ Q(k + 1) for this particular k, about which we are not
going to assume any more (it is ‘arbitrary’).
There is also a standard first thing to try when we want to prove an implication: assume
the premise Q(k). We give it the name induction hypothesis to help the reader; to remind them
that this is a standard part of the induction proof. We then just need to prove Q(k + 1) holds,
and somewhere along the way we presumably will use the statement Q(k) we assumed. I can’t
help you any more with this bit—this is usually the hard part of an induction proof, where you
need to figure out how in fact you want to prove your implication.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 58
4.3.4 Variants
Sometimes you want to prove that a statement is true not for all positive integers (natural
numbers) but perhaps for all non-negative integers, or all integers at least 8, or something similar.
Something like induction still works, commonly called ‘induction with base case N ’. Here N is
some particular integer, which is the smallest case you want to prove (such as 0, or 8).
The Principle of Induction with base case N : Suppose P (n) is a statement involving
integers n ≥ N . Suppose furthermore that the following two statements are true.
(i) P (N ) is true; (the ‘Base case’)
(ii) For all integers k ≥ N , P (k) Ô⇒ P (k + 1). (the ‘Induction step’)
Then P (n) is true for all integers n ≥ N .
Note that the Principle of Induction is the same thing as the Principle of Induction with
base case 1. The more general statement above can be proved using the (original) Principle of
Induction: this is an exercise.
Example 4.2. Prove that
∀n ≥ 4 , n2 ≤ 2n .
Let’s notice that we can’t prove this by the usual induction. The ‘base case’ n = 1 is true, so
is the n = 2 case, but the n = 3 case is false; 32 is bigger than 23 . That means that we would get
stuck proving the induction step if we tried. Try to figure out the proof for yourself!
Activity 4.1. Prove that ∀n ≥ 4 , n2 ≤ 2n .
Another variant of the Induction Principle is the following, known as the Strong Induction
Principle:
The Strong Induction Principle: Suppose P (n) is a statement involving natural numbers
n. Suppose furthermore that the following statement is true.
Does that help? Well, yes, a bit. We can recognise that over on the right, the conclusions of
these implications are P (1), P (2), and so on—we’re hoping to find that all those statements are
true. So presumably we expect to find all the premises are true, for some reason. The premises
are still complicated, so let’s replace the quantifier by writing out the lists of statements explicitly.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 60
These are all finite lists of statements. In fact, on the first line we are quantifying over an empty
set—there is no natural number less than 1—so for that line we need to check the definition to
remember that ‘for all’ things in an empty set is vacuously true. What we get is
true Ô⇒ P (1)
P (1) Ô⇒ P (2)
P (1) ∧ P (2) Ô⇒ P (3)
P (1) ∧ P (2) ∧ P (3) Ô⇒ P (4)
P (1) ∧ P (2) ∧ P (3) ∧ P (4) Ô⇒ P (5) and so on.
At this point, you can start to believe that this Strong Induction Principle makes sense. The
first line above is true; that tells us (check the truth table for Ô⇒ ) that P (1) is true.
But if P (1) is true, the second line tells us P (2) is true.
And then the third line says, since P (1) and P (2) are true, that P (3) is true. And so on.
This ‘and so on’ is of course not a proof of the Strong Induction Principle. But we can prove
it using the Principle of Induction.
Activity 4.2. Try to understand why the Strong Induction Principle follows from the Principle
of Induction. Hint: consider Q(n), the statement ‘∀s ≤ n, P (s) is true’.
This is difficult, so you may want to omit this activity at first.
Here is a reformulation, less ‘mathematically precise’ but maybe more useful, of the Strong
Induction Principle. Remember ‘asssuming P (1) we prove P (2) is the same thing as ‘we prove
P (1) Ô⇒ P (2)’, and so on.
The Strong Induction Principle: Suppose P (n) is a statement about natural numbers n.
Suppose furthermore that you can prove true Ô⇒ P (1) (i.e. you can prove P (1) ).
And, if you assume P (1), you can prove P (2).
In fact for every k ∈ N, if you assume P (1), P (2), . . . , P (k − 1) are true, you can prove P (k).
Then P (n) is true for all natural numbers n.
It’s immediately worth pointing out that just because you assume P (1), P (2), ..., P (k − 1)
when you want to prove P (k) doesn’t mean you have to use all of them in your proof. It just
means you can if you want to.
A standard question at this point is ‘what is the base case in strong induction? is it P (1)?’
The answer to this is a bit complicated—it can be yes, it can be no, it depends. This will be
easier to understand once you saw a few examples!
What is probably very unclear at this point is when or why you might want to use this
complicated-looking Strong Induction. The answer, below, is simple enough, but probably it is
not easy to understand until after you’ve read the next couple of sections.
Just as induction is what you should think of using when you try to prove a predicate P (n)
and think ‘it would really help me if I knew the last case P (n − 1) was true’, strong induction is
what you should think of using when you try to prove P (n) and think ‘it would really help me
if I knew one or several smaller cases were true’.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 61
With this observation, we can use proof by induction to prove many results about the values
and properties of such sums. Here is a simple, classical, example.
Example 4.3. For all n ∈ N, ∑nr=1 r = 12 n(n + 1). This is simply the statement that the sum of
the first n natural numbers is 12 n(n + 1).
Proof. We prove the result by induction. Let P (n) be the statement that ∑nr=1 r = 12 n(n + 1).
Then P (1) states that 1 = 12 × 1 × 2, which is true; that is the base case.
Given k ∈ N, suppose (the induction hypothesis) ∑kr=1 r = 12 k(k + 1) is true.
Consider ∑k+1
r=1 r. We have
k+1 k
∑ r = ∑ r + (k + 1)
r=1 r=1
= 2 k(k + 1) + (k + 1) by
1
the induction hypothesis
= 2 (k + k + 2k + 2)
1 2
= 2 (k + 3k + 2)
1 2
= 2 (k + 1)(k + 2)
1
= 2 (k + 1)((k + 1) + 1).
1
Checking the first and last lines, what we have proved (assuming the induction hypothesis) is
P (k + 1), i.e. we proved P (k) Ô⇒ P (k + 1). We did this for an arbitrary k, so we proved the
induction step. By the Principle of Induction, P (n) is true for all natural numbers n.
Note how the the induction hypothesis was used. In the induction step, you always prove
P (k + 1) to be true assuming P (k) is. Unless you do so, it isn’t really a proof by induction.
If you write a ‘proof by induction’ and notice that you never use the induction hypothesis in
the induction step, then what you have is a fake induction. Cross out all the lines talking about
induction, and check that what is left is still a proof. Unless you’re answering a question that
explicitly says ‘prove by induction...’, we’re happier to get a direct proof than a fake induction
(even though both are proofs!).
Activity 4.3. Prove by induction that the sum of the first n terms of an arithmetic progression
with first term a and common difference d (that is, the sequence a, a + d, a + 2d, a + 3d,. . . ) is
2 n(2a + (n − 1)d).
1
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 62
Assume, as the strong induction hypothesis, that xt = 5 + 2t+1 is true for each integer t < k.
In particular, the induction hypothesis means we assume xk−2 = 5 + 2k−1 , and xk−1 = 5 + 2k .
Now by definition (since k ≥ 3) we have
xk = 3xk−1 − 2xk−2
= 3(5 + 2k ) − 2(5 + 2k−1 )
= 15 + 6 × 2k−1 − 10 − 2 × 2k−1
= 5 + 4 × 2k−1
= 5 + 2k+1
which is the statement we wanted to show, so we proved the induction step.
By strong induction, we conclude the formula holds for all natural numbers n.
Let’s notice that we could replace ‘the statement we want to show’ with a defined predicate
so that it looks more like Strong Induction. Then we would have written:
Proof. For each n ∈ N, let S(n) be the statement ‘xn = 5 + 2n+1 ’.
First, we can check that S(1) and S(2) hold, which we will call base cases.
We have 9 = x1 = 5 + 22 . And we have 13 = x2 = 5 + 23 .
Now suppose k ≥ 3. We want to prove (∀t < k , S(t)) Ô⇒ S(k).
Assume, as the strong induction hypothesis, that S(t) is true for each integer t < k.
In particular, the induction hypothesis means we assume S(k − 2) and S(k − 1), i.e. that
xk−2 = 5 + 2k−1 , and xk−1 = 5 + 2k .
Now by definition (since k ≥ 3) we have
xk = 3xk−1 − 2xk−2
= 3(5 + 2k ) − 2(5 + 2k−1 )
= 15 + 6 × 2k−1 − 10 − 2 × 2k−1
= 5 + 4 × 2k−1
= 5 + 2k+1
which is S(k), so we proved the induction step.
By strong induction, we conclude S(n) holds for all natural numbers n.
The second version looks ‘more formal’, but both are equally good. As long as you can write
clearly, you don’t need to write some predicate P (n) (or S(n), or whatever other letter) in an
induction proof. But if you do write some P (n), you need to define it. ‘It’s obvious from the
question what that should be’ isn’t acceptable.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 63
Warning 4.5. When you write a predicate, by definition that is a true-or-false statement. If you
write ‘Let P (k) = 5 + 2k+1 ’ then you will definitely lose marks, because on the next few lines you
will write things like ‘7 is True’ which are nonsense. In general, any time you write P (n) = ..
where P (n) is supposed to be a predicate, the = sign has instantly lost you marks. We never
write that a logical statement is = something. (You can use the symbol ⇐⇒ , but usually it is
better to use words.)
Let’s check that we really are using Strong Induction correctly here. When we say ‘by Strong
Induction’ we’re claiming that we proved each of the implications that we have to prove.
We proved true Ô⇒ S(1) by proving S(1) directly. And we proved S(1) Ô⇒ S(2) by
proving S(2) directly.
And for each k ≥ 3, we proved S(1) ∧ S(2) ∧ ⋅ ⋅ ⋅ ∧ S(k − 1) Ô⇒ S(k) in the ‘induction step’.
So, yes, we did prove all the statements we were supposed to.
This also explains why we called S(1) and S(2) ‘base cases’ and the rest ‘the induction
step’. We did something special and different to prove those first two cases, which didn’t use
any induction hypothesis. To help the reader (who might expect us to have used S(1) to prove
S(2) !) we call it a base case; that’s just telling the reader ‘this case will be special’. And for
the rest of the cases, we used one argument that deals with all of them (and it does assume
some smaller cases are true) so we call it ‘the induction step’.
We’ll see later examples of strong induction arguments with one base case, or two, or three—or
even sometimes with no base case at all. The base cases are just the cases you find you need to
handle separately because the ‘main argument’ doesn’t work for them. In the example above,
in the ‘main argument’ we used the recursion formula xk = xk−1 + xk−2 . We were only told that
that formula makes sense if k ≥ 3, so the ‘main argument’ can’t handle k = 1 or k = 2. You can
find out what base cases you need by reading over the induction step, once you figure it out,
and checking whether it really works for all values of k (if so, no base cases) or if there are a
few small values of k for which it doesn’t work (these are the base cases, make sure you write a
proof separately for each of them).
2k + 1 ≤ k 2 ⇐⇒ k 2 − 2k − 1 ≥ 0 ⇐⇒ (k − 1)2 ≥ 2,
(k + 1)2 ≤ 2k + 2k + 1 ≤ 2k + k 2 ≤ 2k + 2k = 2k+1 .
But if P (s) is true for all s ≤ k then its truth for all s ≤ k + 1 follows just from its truth when
s = k + 1. That is, Q(k) Ô⇒ Q(k + 1) is the same as (P (s) true ∀s ≤ k) Ô⇒ P (k + 1). The
(standard) Induction Principle applied to the statement Q(n) tells us that: Q(n) is true for all
n ∈ N if the following two statements are true:
What we’ve established is that (i) and (ii) can be rewritten as:
We deduce that: P (n) is true for all n ∈ N if the following two statements are true:
This is exactly the Strong Induction Principle. So the Strong Induction Principle follows from
the standard one and is, therefore, not really ‘stronger’.
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 65
Comment on Activity 4.3. Let P (n) be the statement that the sum of the first n terms is
(n/2)(2a + (n − 1)d). The base case is straightforward. The first term is a, and the formula
(n/2)(2a + (n − 1)d) gives a when n = 1. Suppose that P (k) holds, so the sum of the first k
terms is (k/2)(2a + (k − 1)d). Now, the (k + 1)st term is a + kd, so the sum of the first k + 1
terms is therefore
k k(k − 1)
a + kd + (2a + (k − 1)d) = a + kd + ak + d
2 2
k(k + 1)
= (k + 1)a + d
2
(k + 1)
= (2a + kd)
2
(k + 1)
= (2a + ((k + 1) − 1)d),
2
so P (k + 1) is true. The result follows for all n by induction.
Then P (1) states that 1 = 1(2)(3)/6, which is true. Suppose P (k) is true for k ∈ N. Then
k
1
∑ r2 = k(k + 1)(2k + 1)
r=1 6
We have
k+1 k
∑ r2 = (k + 1)2 + ∑ r2
r=1 r=1
1
= (k + 1)2 + k(k + 1)(2k + 1) (by the induction hypothesis)
6
1
= (k + 1) [6(k + 1) + k(2k + 1)]
6
1
= (k + 1) (2k 2 + 7k + 6)
6
1
= (k + 1)(k + 2)(2k + 3),
6
so P (k + 1) is true. By induction, P (n) is true for all n ∈ N.
n
1 n
Solution to Exercise 4.4. Let P (n) be the statement that ∑ = . Then P (1) states
i=1 i(i + 1) n+1
1 1
that = , which is true. Suppose P (k) is true for k ∈ N. Then
1×2 1+1
k
1 k
∑ =
i=1 i(i + 1) k+1
Now,
k+1 k
1 1 1
∑ = +∑
i=1 i(i + 1) (k + 1)(k + 2) i=1 i(i + 1)
1 k
= + (by the induction hypothesis)
(k + 1)(k + 2) k + 1
1 + k(k + 2)
=
(k + 1)(k + 2)
k 2 + 2k + 1
=
(k + 1)(k + 2)
(k + 1)2
=
(k + 1)(k + 2)
k+1
= ,
k+2
so P (k + 1) is true. By induction, P (n) is true for all n ∈ N.
Solution to Exercise 4.5. Let P (n) be the statement that xn = 3n+1 − 2n . We use the Strong
Induction Principle to prove P (n) is true for all n ∈ N. The base cases are n = 1 and n = 2.
When n = 1, x1 = 7 and 3n+1 − 2n = 9 − 2 = 7. When n = 2, x2 = 23 and 3n+1 − 2n = 27 − 4 = 23, so
these are true. Suppose that k ≥ 2 and that for all s ≤ k, P (s) is true. In particular, P (k) and
CHAPTER 4. STRUCTURES, NATURAL NUMBERS AND PROOF BY INDUCTION 67
which is a multiple of 7. So the statement is true for P (k + 1). This proves P (k) Ô⇒ P (k + 1),
the induction step, and hence, by induction, for all n ∈ N.
Solution to Exercise 4.7. Let P (n) be the statement
1 − x2
n n
r−1
∏(1 + x2 )= .
r=1 1−x
When n = 1, the left hand side is 1 + x2 = 1 + x and the right hand side is (1 − x2 )/(1 − x) = 1 + x,
0
1 − x2
k k
2r−1
∏(1 + x )= .
r=1 1−x
Then
k+1 k
r−1 r−1
∏(1 + x2 ) = (1 + x2 ) × ∏(1 + x2 )
(k+1)−1
r=1 r=1
1 − x2
k
k
= (1 + x2 ) (by the induction hypothesis)
1−x
1 − (x2 )2
k
=
1−x
1 − x2
k+1
= ,
1−x
which shows that P (k + 1) is true. So P (n) is true for all n ∈ N, by induction.
Functions and counting
5
The material in this chapter is also covered in:
• Eccles, P.J. An Introduction to Mathematical Reasoning. Chapter 10, Sections 10.1 and 10.2,
and Chapter 11.
5.1 Introduction
In this chapter we look at the theory of functions, and we see how the idea of the ‘size’ of a set
can be formalised.
5.2 Functions
5.2.1 Basic definitions
You have worked extensively with functions in your previous mathematical study. Chiefly, you
will have worked with functions from the real numbers to the real numbers, these being the
primary objects of interest in calculus.
You are probably used to writing a function down by writing a formula, something like
‘f (x) = x2 + sin x’. This is not the approach we are going to take, because it’s too restrictive.
For a very simple example, take the function g(x) which is defined as follows:
⎧
⎪ 0 if x ≤ 11850 ,
⎪
⎪
⎪1
g(x) = ⎨ 5 (x − 11850) if 11850 < x ≤ 46350 and
⎪
⎪
⎪
⎪
⎩ 5 (x − 46350) + 6900
2
if x > 46350 .
This is a perfectly good function, but finding a single formula for it is a bit tricky. Furthermore,
once you find it you’ll notice that the formula is much less helpful than the definition above.
This function was actually an important function (at least in the UK): it’s the (in 2018) income
tax you pay on income £x. I won’t bother trying to update that for 2022, because Liz Truss
will probably have changed her mind at least twice between me writing it and you reading it.
Activity 5.1. Find a single formula which gives the function g(x) above.
68
CHAPTER 5. FUNCTIONS AND COUNTING 69
Definition 5.1. Suppose that X and Y are sets. Then a function (also known as a mapping)
from X to Y is a rule that associates a unique member of Y to each member of X. We write
f ∶ X → Y . The set X is called the domain of f and Y is called the codomain.
The element of Y that is assigned to x ∈ X is denoted by f (x) and is called the image of x.
We can write x ↦ f (x) to indicate that x maps to f (x).
There are lots of examples of functions you already know, such as sin x, or g(x) defined
above.
If you have a social network, then that social network contains a number of friendships (i.e.
pairs of people who are friends); that defines a function from social networks to the integers,
which given a social network returns the total number of friendships.
If you have a road map of some country, then there may or there may not be a way to drive
through all the villages without ever having to return to a village you already visited. That
defines a function from road maps to {Yes, No}.
You can also generate your own personal function as follows. Throw a die 1 000 000 times, and
write down the numbers in order that you get—that defines you a function from {1, . . . , 1 000 000}
to {1, . . . , 6}. (It’s extremely unlikely anyone ever wrote down your personal function before. Of
course, the next time you try this you are very likely to get a different function..!)
Some of these functions are easier to work with, or more interesting, than others. You know
sin x shows up a lot in real-world calculations (in engineering, for example), and you know how
to do algebra and calculus with it.
What about the road map function? If you’re a fraudster, you need to keep moving on, and
you probably care a lot about not going back to villages where you already conned people—but
how do you actually work out, for a given road map with maybe 50 000 villages, whether the
answer is ‘Yes’ or ‘No’ ? It’s an interesting function, but it’s very hard to work with.
Finally, what about one of these generated-by-dice functions? It’s not easy to describe—you
don’t want to read a list a million characters long—and it’s not clear what it should be useful
for. Often (but certainly not all the time), we are really only interested in functions which we
can describe in some useful way.
There are various ways of describing a function.
If X has only finitely many members, we can simply list the images of the members of X.
You’re used to seeing a function defined by giving a formula for the function. For instance,
f ∶ R → R given by f (x) = 2x is the function that maps each real number a to the real number
2a.
Sometimes a function can be defined recursively. For example, we might define f ∶ N → N by
We might also define a function by writing down some properties it has. For example, I could
say ‘let h ∶ R → R be the function such that h(0) = 1 and dh(x)
dx = h(x) holds for all x ∈ R.’ You
CHAPTER 5. FUNCTIONS AND COUNTING 70
probably recognise from school that h(x) = ex is the exponential function. Here, we really need
to be careful: am I actually defining a function? In this case, yes: there is exactly one function
that satisfies the properties I wrote down. But if I left out the condition h(0) = 1 then I would
be writing something not well-defined, i.e. something that looks like it’s defining a function but
in fact isn’t. The reason is there would be many possible valid answers, such as h(x) = 2022ex .
Finally, we define one very basic function. For any set X, the identity function 1 ∶ X → X is
given by 1(x) = x.
If X and Z are distinct sets, there is only one way we can compose f and g.
If X = Z, then both f ○ g and g ○ f make sense—but they are generally not the same function:
the order is important.
In some textbooks you will see a different notation for function composition, leaving out
the ○. So you might see f g where I would write f ○ g. This notation f g can cause confusion
(which is why I won’t use it). For example, suppose X = Y = Z = R. Then you might be tempted
to think that gf denotes the product function x → g(x)f (x). But this would be wrong. The
notation g ○ f avoids this confusion.
Example 5.2. Suppose f ∶ N → N and g ∶ N → N are given by f (x) = x2 + 1 and g(x) = (x + 1)2 .
CHAPTER 5. FUNCTIONS AND COUNTING 71
2
Then (f ○ g)(x) = f (g(x)) = f ((x + 1)2 ) = ((x + 1)2 ) + 1 = (x + 1)4 + 1 ,
2
while (g ○ f )(x) = g(f (x)) = g(x2 + 1) = ((x2 + 1) + 1) = (x2 + 2)2 ,
and g(x)f (x) = (x + 1)2 (x2 + 1) .
Definition 5.3 (Surjection). Suppose f is a function with domain X and codomain Y . Then f
is said to be a surjection (or ‘f is surjective’) if every y ∈ Y is the image of some x ∈ X; that is,
f is a surjection if and only if ∀y ∈ Y, ∃x ∈ X, s.t. f (x) = y.
Definition 5.4 (Injection). Suppose f is a function with domain X and codomain Y . Then f
is said to be an injection (or ‘f is injective’) if every y ∈ Y is the image of at most one x ∈ X. In
other words, the function is an injection if different elements of X have different images under
f . Thus, f is an injection if and only if
This latter characterisation often provides the easiest way to verify that a function is an injection.
Definition 5.5 (Bijection). Suppose f is a function with domain X and codomain Y . Then
f is said to be a bijection (or ‘f is bijective) if it is both an injection and a surjection. So this
means two things: each y ∈ Y is the image of some x ∈ X, and each y ∈ Y is the image of no
more than one x ∈ X. Well, of course, this is equivalent to: each y ∈ Y is the image of precisely
one x ∈ X.
We write (a, b) (which is called an open interval and we will meet again later) for the set
of real numbers x such that a < x < b. And we write ∣x∣ for the absolute value of x, defined by
∣x∣ = x if x ≥ 0 and ∣x∣ = −x if x < 0. Thus ∣x∣ is always non-negative.
Proof. First, we prove f is injective. To do this, we prove that f (x) = f (y) implies x = y. So,
suppose f (x) = f (y). Then
x y
= .
1 + ∣x∣ 1 + ∣y∣
Rearranging, we want to solve x + x∣y∣ = y + y∣x∣.
Suppose x ≥ 0. If y < 0, then the left hand side of the above equation is non-negative and the
CHAPTER 5. FUNCTIONS AND COUNTING 72
right hand side is negative—this cannot be a solution. So y ≥ 0. But then ∣x∣ = x and ∣y∣ = y, and
we get x + xy = y + xy, which tells us x = y.
Suppose x < 0. If y ≥ 0, then the left hand side of the above equation is negative and the right
hand side is non-negative—this cannot be a solution. So y < 0. Then ∣x∣ = −x and ∣y∣ = −y, we
get x − xy = y − xy and again x = y.
Next, we show f is surjective. We need to prove that, for each y ∈ (−1, 1), there is some
x ∈ R such that 1+∣x∣
x
= y.
Suppose y ≥ 0. Then, to have 1+∣x∣ x
= y, we need x ≥ 0. So ∣x∣ = x and we need to solve 1+x
x
= y.
This has solution x = 1−y , which is well-defined and non-negative because we know 0 ≤ y < 1.
y
Suppose y < 0. Then we’ll need to have x < 0 and the equation to solve is 1−xx
= y, for a solution
x = 1+y ; this is well-defined and negative since 0 > y > −1.
y
At first, you might well think that the above proof is difficult; it’s certainly not short, and
has a bunch of somewhat complicated formulae and cases to consider.
But actually, this proof is long, but not hard. We will see quite a few proofs which are long,
but not hard, in this course. This is one of the standard places where students are put off the
course (and maybe mathematics as a whole), because they feel that how difficult it will be to
find a proof has to be proportional to the length of the proof, and the proofs are getting rapidly
longer.
There will certainly be difficult proofs in the course. But proof difficulty doesn’t have much
to do with length. Let me explain why this proof is not hard.
To begin with, we’re supposed to prove a function is bijective. That means (definition chasing)
we need to prove it is injective and surjective (because that’s what ‘bijective’ means). Well, if
we want to prove two things, we should probably do them one after the other. So we do that
(and unsurprisingly, if we prove two things it will be twice as long).
Next, we look up the definition of ‘f is injective’ in order to prove it. We take the hint from
the lecture notes to use the contrapositive form: we should (definition chasing) try to prove
f (x) = f (y) Ô⇒ x = y is true for all x, y ∈ R. Well, this is a ‘for all’ statement, so we use the
standard first thing to try: fix x and y, and try to prove the statement for this particular x and
y. We write in the definition of f (x) and f (y) (definition chasing, again) and hope to get some
nice equation that we can hit with algebra and solve. What we get is x + x∣y∣ = y + y∣x∣.
This isn’t quite a nice equation, because of the ∣⋅∣ signs; we would be much happier if we could
get rid of them. How can we do that? Well, we can get rid of ∣x∣ by definition chasing (again!):
let’s think about the cases x ≥ 0 (which is when ∣x∣ = x) and x < 0 (so ∣x∣ = −x) separately. That
case distinction in the proof is not magic, it was copied straight from the definition of ∣x∣.
We still have a nasty ∣y∣ around. Let’s repeat the definition chasing: in each of our two cases
for x, let’s separately consider whether y ≥ 0 or y < 0 (so we have four cases in total).
At this point, we need to think for a couple of seconds to notice that two of our four cases
can’t really happen: if x ≥ 0 and y < 0 (or vice versa) then we don’t need to start doing algebra
because there can’t be a solution.
What would happen if you did just start doing algebra here? Well, you’d try to solve
x − xy = y + xy (plugging in ∣x∣ = x and ∣y∣ = −y) and so x = y + 2xy, so y = 1+2x
x
. Then you need
to notice that since x ≥ 0, the ‘solution’ you’ve just found gives us y ≥ 0, whereas we assumed
y < 0. So it’s not really a solution; it violates the assumption we made.
And then we do just do the algebra in the two remaining cases, and in both cases it is easy.
Now we move to ‘f is surjective’. Again, we definition-chase, and write out what that means.
Again, we need to deal with the ∣x∣ and again the right thing to do is to separate the two cases
(since we are given y and want to find x such that f (x) = y, it makes sense to consider the
two possible cases for y). And again, we can then do the algebra and double-check our solution
makes sense.
What you should notice is that although there are a lot of steps here (and this only got to ‘f
is injective’) all the steps are basic strategies: mostly definition chasing, plus a couple of times
we did some high-school algebra to solve equations. Because there are a lot of steps, you have
no chance of looking at the problem and seeing how the proof will go; it’s easy to get scared.
But if you simply try, you can write the proof down without ever having to pause for thought
for more than a minute (if you’re revising, you’re probably at the stage where it takes longer to
write the next line than to think what it should be). Whenever we had some not-so-nice concept
left over, we used definition chasing to replace it with something nicer (even when that means
considering cases, this is a winner: two nice things is better than one not-so-nice thing). Until we
finally got down to a problem you know how to do from high school ‘solve this nice equation’.
That is ‘long but not hard’; get used to it. Don’t get scared until standard strategies don’t help.
CHAPTER 5. FUNCTIONS AND COUNTING 74
First, we prove:
f ∶ X → Y has an inverse ⇐⇒ f is bijective.
Proof. This is an ⇐⇒ theorem, so there are two things to prove: the ⇐ and the ⇒.
First, we show: f ∶ X → Y has an inverse ⇐ f is bijective.
Suppose f is a bijection. For each y ∈ Y there is exactly one x ∈ X with f (x) = y. Define
g ∶ Y → X by g(y) = x. Then this is an inverse of f . Check this!
Next, we show: f ∶ X → Y has an inverse ⇒ f is bijective.
Suppose f has an inverse function g. We know that for any y ∈ Y , f (g(y)) = (f ○ g)(y) = y,
so there is some x ∈ X (namely x = g(y)) such that f (x) = y. So f is surjective.
Now suppose f (x) = f (x′ ). Then g(f (x)) = g(f (x′ )). But g(f (x)) = (g ○ f )(x) = x and,
similarly, g(f (x′ )) = x′ . So: x = x′ and f is injective.
Now we prove that if an inverse function exists, it is unique.
Proof. Suppose that g and h are inverses of f . Then both have domain Y and codomain X, and
we just need to check that g(y) = h(y) for every y ∈ Y . Well, h ○ f is the identity function on X
and f ○ g is the identity function on Y . So, for any y ∈ Y we have
so g = h.
Note that if f ∶ X → Y is a bijection, then its inverse function (which exists, by Theorem 5.8)
is also a bijection. The easiest way to see that is: if g is the inverse function of f , then also by
definition f is the inverse function of g. So g has an inverse function, so by Theorem 5.8 g is
bijective.
Again, you need to be a bit careful with the notation if your function is (for example) from
R to R. Do not confuse f −1 , the inverse function, with the function x → (f (x)) = 1/f (x).
−1
5.4.2 Examples
Example 5.9. The function f ∶ R → R is given by f (x) = 3x + 1. Find the inverse function.
To find a formula for f −1 , we use: y = f (x) ⇐⇒ x = f −1 (y). Now,
y = f (x) ⇐⇒ y = 3x + 1 ⇐⇒ x = 31 (y − 1),
so
f −1 (y) = 31 (y − 1).
CHAPTER 5. FUNCTIONS AND COUNTING 75
Recall Z denotes the set of all integers (positive, zero, and negative).
2n if n ≥ 0
f (n) = {
−2n − 1 if n < 0.
Prove that f is a bijection and determine a formula for the inverse function f −1 .
First, we prove that f is injective: Suppose f (n) = f (m). Since 2n is even and −2n − 1 is
odd, either (i) n, m ≥ 0 or (ii) n, m < 0. (For otherwise, one of f (n), f (m) is odd and the other
even, and so they cannot be equal.)
In case (i), f (n) = f (m) means 2n = 2m, so n = m.
In case (ii), f (n) = f (m) means −2n − 1 = −2m − 1, so n = m. Therefore f is injective.
Next, we prove that f is surjective: We show that ∀m ∈ N ∪ {0}, ∃n ∈ Z such that f (n) = m.
Consider separately the case m even and the case m odd.
Suppose m is even. Then n = m/2 is a non-negative integer and f (n) is 2(m/2) = m).
If m odd, then n = −(m + 1)/2 is a negative integer and
(m + 1)
f (n) = f (−(m + 1)/2) = −2 (− ) − 1 = m.
2
The proof that f is surjective reveals to us what the inverse function is. We have
m/2 if m even
f −1 (m) = {
−(m + 1)/2 if m odd.
g(x) = x.
√ function of f , and indeed (f ○ g)(x) = x for all
It’s tempting to think that g is the inverse
x ∈ R≥0 . But (g ○ f )(−1) = g(1) = 1, because x means the non-negative square root of x. If you
check Theorem 5.8 you’ll see that in fact f doesn’t have an inverse function: it is not a bijection.
√ example f (1) = 1 = f (−1). It’s a somewhat common mistake in basic algebra to assume
For
x2 = x; as we just saw it’s not true when x < 0. We saw essentially this error as Mistake 4 in
Section 2.7.
CHAPTER 5. FUNCTIONS AND COUNTING 76
f (S) = {f (x) ∶ x ∈ S} .
Note that f (∅) = ∅, and for any single x ∈ X we have f ({x}) = {f (x)}. It’s important to
remember that {f (x)} is not the same as f (x) (in the same way that an apple in a box is not
the same as an apple).
We also define, for any function f ∶ X → Y and any T ⊆ Y , the set
f −1 (T ) = {x ∈ X ∶ f (x) ∈ T } .
Again, it’s important to remember that for y ∈ Y , the set f −1 ({y}) is a set of elements in X,
and it always exists, in contrast to f −1 (y) which is a member of X and is only defined if f is an
invertible function.
If f is invertible, then for every y ∈ Y the set f −1 ({y}) contains exactly one element, namely
f −1 (y). However if f is not invertible, then by Theorem 5.8 either there will be some y ∈ Y such
that f −1 ({y}) = ∅ (i.e. f is not surjective) or there will be some y ∈ Y such that f −1 ({y}) has
two or more elements (i.e. f is not injective), or both.
Given a function f ∶ X → Y , the set f (X) is sometimes called the image of f . The image
f (X) of f is always a subset of the codomain Y (by definition!). It might be that f (X) = Y , or
it might not be—by definition, f (X) = Y if and only if f is surjective.
So, the set has m members if to each number from 1 to m, we can assign a corresponding
member of the set S, and all members of S are accounted for in this process. This is like the
attachment of labels ‘Object 1’, etc, described above.
Note that an entirely equivalent definition is to say that S has m members if there is a
bijection from S to Nm . This is because if f ∶ Nm → S is a bijection, then the inverse function
f −1 ∶ S → Nm is a bijection also. In fact, because of this, we can simply say that S has m
members if there is a bijection ‘between’ Nm and S. (Eccles uses the definition that involves a
bijection from Nm to S and Biggs uses the definition that involves a bijection from S to Nm .)
For m ∈ N, if S has m members, we say that S has cardinality m (or size m). The cardinality
of S is denoted by ∣S∣, so we would usually simply write ∣S∣ = m for ‘S has cardinality m’.
Warning 5.13. If you are very alert, you might notice that there is a potential problem with our
definition of cardinality. We said something about ‘the cardinality of S’. That means we have
some idea that there should only be one number m such that ∣S∣ = m. Well, if I have a set of
CHAPTER 5. FUNCTIONS AND COUNTING 77
five fruit, you’ll probably happily agree with me that it has cardinality five and nothing else.
But is that kind of statement always true whatever S is a set of? What we’re worried about
here is whether cardinality is well-defined. We’ll shortly see that it is.
In general ‘well-defined’ means that whatever definition we just wrote down is not ‘cheating’
or ‘wrong’ in some way. What might be an example of a bad definition? Suppose I say ‘let t
be the number of cards in a deck’. I am claiming to define a number t here; there should be
only one answer to the question of what t is. But what deck of cards? A bridge deck (with 52
cards)? or a skat deck (with 32)? Or something else? This t is not well-defined, and it’s exactly
this kind of problem that the warning is getting into. Could it be that there is a set S such that
by our definition we have ∣S∣ = 32 and also ∣S∣ = 52?
It’s usually easiest to write down a definition and then try to argue that it makes sense; we
say we are showing the definition is well-defined. We’ll do that for cardinality shortly, but we
need some more theory first.
Theorem 5.14 (Pigeonhole Principle (PP)). Suppose that A and B are sets with ∣A∣ = n and
∣B∣ = m, where m, n ∈ N. If there is an injection from A to B, then n ≤ m.
We’ve just formalised the first statement above: if we place (the function f ) letters (the
set A) into pigeonholes (the set B) such that no pigeonhole contains more than one letter (f
is injective) then A cannot be bigger than B. This is now a clear formal statement: we know
exactly what we need to prove.
But coming up with a proof is not easy. We’ll need to talk about injective functions (because
there is an injective function in the statement), but we will also need to use the definition
of cardinality, because that also shows up (we say ∣A∣ = n) and that talks about (completely
different!) bijective functions. And furthermore, we will probably need to talk about the members
of A and of B, which are two arbitrary sets—we don’t know what the members are. To get
around that (temporarily!) let’s try to prove the statement for a couple of specific sets.
Theorem 5.15 (Pigeonhole Principle (PP), special case). The following statement is true for
all n ∈ N: For all natural numbers m, if there is an injection from Nn to Nm , then n ≤ m.
This version doesn’t talk about cardinality; we know (by definition!) that ∣Nn ∣ = n and
∣Nm ∣ = m, and we know what the elements of these two sets are. This will make it easier to write
a formal proof. But it’s still not easy to see what to do next.
We know we need to deal with injective functions to prove this special case. So let’s prove a
statement about injective functions. For now, it is going to be unclear what this statement has
to do with the Pigeonhole Principle; I’ll try to explain where it comes from later.
CHAPTER 5. FUNCTIONS AND COUNTING 78
Lemma 5.16. Suppose that A and B are sets, each of which has at least two distinct elements.
Suppose that a is an element of A, and b is an element of B. If there is an injection f ∶ A → B,
then there is an injection g ∶ A ∖ {a} → B ∖ {b}.
Proof. Given A and B, and elements a and b, as in the lemma statement, we want to prove that
if there is an injection f ∶ A → B, then there is an injection g ∶ A ∖ {a} → B ∖ {b}.
So suppose that f ∶ A → B is an injection.
We want to use f to help us construct g. We consider two cases.
Case 1: f (x) ∈ B ∖ {b} for each x ∈ A ∖ {a}.
This case is easy. We define a function g ∶ A ∖ {a} → B ∖ {b} by g(x) = f (x) for each x. This
is well-defined because we assumed that for each x in A ∖ {a}, indeed f (x) is in B ∖ {b}. We
just need to check that g is indeed injective. Well, suppose g(x) = g(y). Then by definition
f (x) = f (y), and since f is injective x = y. So g is indeed injective.
Case 2: there is s ∈ A ∖ {a} such that f (s) = b.
This case is simply what we get when we say ‘we are not in Case 1’. It’s what is left over
after dealing with the easy case.
Let’s first check that there is only one s such that f (s) = b. Indeed, suppose that for some
x ∈ A we have f (x) = b. Then f (x) = f (s), and since f is injective, we conclude x = s.
This time, if we tried to define g as in Case 1, we would find g is not well-defined. g is
supposed to have codomain B ∖ {b}, but f (s) = b. But this is the only ‘problem’; we can hope
to define g(s) in some other way. The trick is: we define g ∶ A ∖ {a} → B ∖ {b} by
⎧
⎪
⎪f (x) if x ≠ s
g(x) = ⎨ .
⎪
⎩f (a)
⎪ if x = s
This is well-defined—that is, g(x) is in B ∖ {b} for each x in the domain—because we know
f (x) is always in B and we don’t use f (s) which is the only way of getting b.
What is not quite so clear is that g is injective. Let’s check. Suppose that g(x) = g(y) for
some x, y. We want to show x = y.
If neither x nor y is equal to s, then by definition we have g(x) = f (x) and g(y) = f (y), so
f (x) = f (y), so since f is injective we have x = y. We need to deal with the case that at
least one of x and y is equal to s; suppose without loss of generality it is x. Then we have
g(y) = g(x) = f (a). If y ≠ x then g(y) = f (y) = f (a), but then since f is injective we have
y = a—and this is impossible, since y ∈ A ∖ {a}. So in this case also y = x and we are done.
In either case, we were able to construct an injective g as desired, and the two cases are
exhaustive.
This proof is not all that easy to understand—because it is quite abstract—so here is a
concrete ‘story’ of the proof.
Suppose you have a set of hotel guests (A) who are booked into the set of single rooms (B)
in a hotel. The function f ∶ A → B says which guest is booked into each room: that it is injective
is telling you that each room has at most one guest booked in (some rooms might be empty,
but no room has two guests booked in to it).
Now one guest (called a) checks out, and there is a water leak in one room (room number b)
so that room becomes unusable. What does the hotel manager do? Well, if none of the remaining
guests (the set A ∖ {a}) is booked into the wet room, they don’t have to do anything. That’s
case 1.
If on the other hand there is a guest s booked into the wet room, then the manager can
solve the problem by changing s’s room to the one a has vacated. That’s case 2.
We’ll see this Lemma is what we need to prove Theorem 5.15 by induction. As a quick
remark, it’s maybe not clear why in the statement of the Lemma we say that A and B each
CHAPTER 5. FUNCTIONS AND COUNTING 79
have at least two distinct elements. The reason is that we do not want A ∖ {a} or B ∖ {b} to be
the empty set; it’s not clear what a function with domain or codomain the empty set should be.
We can now prove Theorem 5.15.
Proof of Theorem 5.15. We prove this by induction. The statement we want to prove is the
statement P (n): ‘for all m ∈ N, if there is an injection from Nn to Nm , then n ≤ m.’
The base case, n = 1, is true because for all m ∈ N we have 1 ≤ m.
Given a natural number k, we want to prove P (k) Ô⇒ P (k + 1).
Suppose for an induction hypothesis that P (k) is true. We want to prove P (k + 1). That is,
given m, we want to show that if there is an injection f ∶ Nk+1 → Nm , then k + 1 ≤ m.
So suppose there is an injection f ∶ Nk+1 → Nm . We want to show k + 1 ≤ m.
Since k ≥ 1, we have k + 1 ≥ 2.
If m = 1, then the codomain of f is {1}, so f (1) = f (2) = 1. But this is a contradiction to
our assumption that f is injective; this case cannot occur.
If m ≥ 2, then f is an injective function from Nk+1 to Nm , and both of these sets have at least
two elements (both contain 1 and 2). So we can apply Lemma 5.16, with A = Nk+1 and a = k + 1,
and B = Nm and b = m. The Lemma says that there is an injective function g ∶ Nk → Nm−1 .
And now our induction hypothesis P (k) tells us that k ≤ m − 1. Adding 1 to both sides, we
conclude k + 1 ≤ m, which is what we wanted.
This proves the induction step. By the Principle of Induction, we conclude that P (n) is true
for all n ∈ N.
Finally, let’s explain why the special case of the Pigeonhole Principle implies the general
case, Theorem 5.14.
Proof of Theorem 5.14. From the definition of cardinality, there are bijections g ∶ Nn → A and
h ∶ Nm → B. We also have an inverse bijection h−1 ∶ B → Nm by Theorem 5.8.
Suppose there is an injection f ∶ A → B. Consider the composite function h−1 ○f ○g ∶ Nn → Nm .
If we can prove that this is an injection, then from Theorem 5.15 it follows that n ≤ m.
So, let us prove injectivity. Suppose a, b ∈ Nn with a ≠ b. Since g is a bijection g(a), g(b) ∈ A
with g(a) ≠ g(b). Since f is an injection, there are f (g(a)), f (g(b)) ∈ B with f (g(a)) ≠ f (g(b)).
Since h−1 is a bijection, h−1 (f (g(a))) and h−1 (f (g(b))) belong to Nm , and h−1 (f (g(a))) ≠
h−1 (f (g(b))). This last inequality is what we need.
This was a long proof. Before we make a couple of comments on what you should learn from
it, let’s deduce one important conclusion.
Theorem 5.17. Suppose n, m are two natural numbers. If there is a bijection from Nn to Nm ,
then n = m.
Proof. Suppose f ∶ Nn → Nm is a bijection. Then f is an injection. So from Theorem PP, n ≤ m.
But by Theorem 5.8 there is an inverse function f −1 ∶ Nm → Nn and this is also a bijection.
In particular, f −1 is an injection from Nm to Nn , and hence m ≤ n.
Now we have both n ≤ m and m ≤ n, hence n = m.
What this theorem tells us is that our definition of cardinality is well-defined. Remember we
were worried that possibly there is some set S such that we can write ∣S∣ = m and ∣S∣ = n, and
m and n aren’t the same; then it wouldn’t make sense to say that either is ‘the size of S’. But if
both ∣S∣ = m and ∣S∣ = n, then there are by definition bijections f ∶ Nm → S and g ∶ Nn → S, and
so f −1 ○ g is a bijection from Nn → Nm . And now Theorem 5.17 says n = m.
The pigeonhole principle is remarkably useful (even in some very advanced areas of mathe-
matics). It has many applications. For most applications, it is the contrapositive form of the
principle that is used. This states:
CHAPTER 5. FUNCTIONS AND COUNTING 80
point. But students generally complained it was confusing, so now we separate the Lemma out
explicitly.
At last, we have one more question: how do we think of the proof of the Lemma? Well, the
first few lines are ‘automatic’; we’ve just written down the information we’re given in the lemma
statement, and then we have to prove an implication—so we go for the simplest route, namely
assume the premise and try to prove the conclusion from it.
Then we get to a case distinction. This case distinction looks a bit complicated at first,
but it follows the basic idea mentioned earlier in these notes: if you’re not sure how to prove
something, identify a special ‘easy’ case you can do, do it, then figure out how to do the rest.
The ‘easy case’ is Case 1; here f really immediately gives us the injection g we want, we just
need to write it down and check it.
The ‘hard’ case is Case 2. All we do to write it down is figure out what it means that ‘we are
not in Case 1’, but it turns out to give us a piece of abstract information; we get told something
about the function values of f , namely f (s) = b, which we did not know before, and which we
should try to use. And, finally, once we got this far it turns out not to be that hard!
This kind of understanding is what I want you to get from the proof of PP. For all the longer
proofs in these notes, I would like you to get an idea of why the proof works and what ideas you
are being shown that you can use elsewhere in your own proofs; this is why these proofs are
there. Sometimes, as here, I’ll break the proof into bitesize pieces and give more details of what
and why we are doing something, but not always. It is good for you to learn to break a long
complicated argument into pieces yourself—identify the key points, figure out which things are
‘automatic’ (i.e. the first thing you should try works) and which are ‘difficult’ (everything else,
especially the times where the second and third things you should try don’t work either). It’s
not quite as good as coming up with a long complicated proof of your own, but it’s a next best.
Theorem 5.18. In any group of 13 or more people, there are two persons whose birthday is in
the same month.
Proof. Consider the function that maps the people to their months of birth. Since 13 > 12, this
cannot be a bijection, so two people are born in the same month.
This next one is not hard, but perhaps not immediately obvious.
Theorem 5.19. In a room full of people, there will always be at least two people who have the
same number of friends in the room.
Proof. Let X be the set of people in the room and suppose ∣X∣ = n ≥ 2. Consider the function
f ∶ X → N ∪ {0} where f (x) is the number of friends x has in the room.
Let’s assume that a person can’t be a friend of themselves. (We could instead assume that a
person is always friendly with themselves: we simply need a convention one way or the other.)
Then f (X) = {f (x) ∶ x ∈ X} ⊆ {0, 1, . . . , n − 1}. But there can’t be x, y with f (x) = n − 1 and
f (y) = 0. Why? Well, such an x would be a friend of all the others, including y, which isn’t
possible since y has no friends in the room.
So either f (X) ⊆ {0, 1, . . . , n − 2} or f (X) ⊆ {1, . . . , n − 1}. In each case, since f (x) can take
at most n − 1 values, there must, by PP, be at least two x, y ∈ X with f (x) = f (y). And that’s
what we needed to prove.
CHAPTER 5. FUNCTIONS AND COUNTING 82
Here’s an interesting geometrical example. For two points (x1 , y1 ), (x2 , y2 ) in the plane, the
midpoint of (x1 , y1 ) and (x2 , y2 ) is the point
( 21 (x1 + x2 ), 12 (y1 + y2 ))
s0 = 0,
s 1 = a1 ,
s 2 = a1 + a2 ,
s 3 = a1 + a2 + a3 ,
etc., until
s n = a1 + a2 + ⋯ + an .
(It is not obvious, at all, why we should do this, but it will work!)
For each of these si , consider the remainder upon division by n. Since there are n + 1
numbers si , but only n possible remainders (0, 1, . . . , n − 1), two of the si will have the same
remainder upon division by n.
So suppose sk and s` have the same remainder, where k < `. Then s` − sk is divisible by n.
But since s` − sk = ak+1 + ak+2 + ⋯ + a` , this means that the sum ak+1 + ak+2 + ⋯ + a` is divisible
by n. Se we have proved the result.
In fact we proved something even stronger than what we set out to prove :
Let a1 , a2 , . . . , an be a list of n integers ( where n ≥ 2 ). Then there exists a non-empty collection
of consecutive numbers from this list ak+1 , ak+2 , . . . , a` whose sum is divisible by n.
The theorem isn’t true if we have fewer than n integers. For instance, if for any n ≥ 2 we
take the numbers a1 , . . . , an−1 all equal to 1, then it’s impossible to find a sum that adds up to
something divisible by n.
CHAPTER 5. FUNCTIONS AND COUNTING 83
x + 1 if x is even
f (x) = {
−x + 3 if x is odd.
Exercise 5.5. Suppose that A and B are non-empty finite sets and that they are disjoint (i.e.
A ∩ B = ∅). Prove, using the formal definition of cardinality, that ∣A ∪ B∣ = ∣A∣ + ∣B∣.
Exercise 5.6. Suppose that X, Y are any two finite sets. By using the fact that
X ∪ Y = (X ∖ Y ) ∪ (Y ∖ X) ∪ (X ∩ Y ),
∣X ∪ Y ∣ = ∣X∣ + ∣Y ∣ − ∣X ∩ Y ∣.
Exercise 5.7. Suppose n ∈ N and that f ∶ N2n+1 → N2n+1 is a bijection. Prove that there is some
odd integer k ∈ N2n+1 such that f (k) is also odd. (State clearly any results you use.)
g(x) = 1
10
((x − 11850) + ∣x − 11850∣) + 10
1
((x − 46350) + ∣x − 46350∣) .
Would that formula be more or less useful to you than the description we gave to define it?
Comment on Activity 5.2. Given any y ∈ R, let x = y/2. Then f (x) = 2(y/2) = y. This shows
that f is surjective. Also, for x, y ∈ R,
f (x) = f (y) Ô⇒ 2x = 2y Ô⇒ x = y,
so z is the image of some x ∈ X under the mapping gf . Since z was any element of Z, this shows
that g ○ f is surjective.
Solution to Exercise 5.2. Suppose one of x, y is even and the other odd. Without any loss of
generality, we may suppose x is even and y odd. (‘Without loss of generality’ signifies that there
is no need to consider also the case in which x is odd and y is even, because the argument we’d
use there would just be the same as the one we’re about to give, but with x and y interchanged.)
CHAPTER 5. FUNCTIONS AND COUNTING 85
So f (x) = x + 1 and f (y) = −y + 3. But we cannot then have f (x) = f (y) because x + 1 must be
an odd number and −y + 3 an even number. So if f (x) = f (y), then x, y are both odd or both
even. If x, y are both even, this means x + 1 = y + 1 and hence x = y. If they are both odd, this
means −x + 3 = −y + 3, which means x = y. So we see that f is injective.
Is f surjective? Let z ∈ Z. If z is odd, then z − 1 is even and so f (z − 1) = (z − 1) + 1 = z. If
z is even, then 3 − z is odd and so f (3 − z) = −(3 − z) + 3 = z. So for z ∈ Z there is x ∈ Z with
f (x) = z and hence f is surjective.
Solution to Exercise 5.3. Suppose f is surjective and that h ○ f = g ○ f . Let y ∈ Y . We show
g(y) = h(y). Since y is any element of Y in this argument, this will establish that g = h.
Because f is surjective, there is some x ∈ X with f (x) = y. Then, because h ○ f = g ○ f , we have
h(f (x)) = g(f (x)), which means that h(y) = g(y). So we’ve achieved what we needed.
Solution to Exercise 5.4. Suppose g ○ f is injective. To show that f is injective we need to show
that f (x) = f (y) Ô⇒ x = y. Well,
by definition of a function. Now g(f (x)) = (g ○ f )(x), and similarly for y; this is what ○ means.
And
(g ○ f )(x) = (g ○ f )(y) Ô⇒ x = y ,
because g ○ f is injective. So we proved
f (x) = f (y) Ô⇒ x = y ,
i.e. f is injective.
Now suppose g ○ f is surjective. So for all z ∈ Z there is some x ∈ X with (g ○ f )(x) = z. So
g(f (x)) = z. Denoting f (x) by y, we therefore see that there is y ∈ Y with g(y) = z. Since z was
any element of Z, this shows that g is surjective.
Solution to Exercise 5.5. Suppose ∣A∣ = m and ∣B∣ = n. We need to show that ∣A ∪ B∣ = m + n
which means, according to the definition of cardinality, that we need to show there is a bijection
from Nm+n to A ∪ B. Because ∣A∣ = m, there is a bijection f ∶ Nm → A and because ∣B∣ = n, there
is a bijection g ∶ Nn → B. Let us define h ∶ Nm+n → A ∪ B as follows:
because g is injective. The only other possibility is that one of i, j is between 1 and m and the
other between m + 1 and m + n. In this case, the image under h of one of i, j belongs to A and
the image of the other to B and these cannot be equal because A ∩ B = ∅. So h is indeed an
injection. It is also a surjection. For, given a ∈ A, because f is a surjection, there is 1 ≤ i ≤ m
with f (i) = a. Then h(i) = a also. If b ∈ B then there is some 1 ≤ j ≤ n such that g(j) = b. But
then, this means that h(m + j) = g((m + j) − m) = b, so b is the image under h of some element
of Nm+n . So h is a bijection from Nm+n to A ∪ B and hence ∣A ∪ B∣ = m + n.
CHAPTER 5. FUNCTIONS AND COUNTING 86
Solution to Exercise 5.6. Note first that the two sets (X ∖ Y ) ∪ (Y ∖ X) and X ∩ Y are disjoint.
Therefore,
∣X ∪ Y ∣ = ∣(X ∖ Y ) ∪ (Y ∖ X)∣ + ∣X ∩ Y ∣.
Now, (X ∖ Y ) and (Y ∖ X) are disjoint, so
and therefore
∣X ∪ Y ∣ = ∣(X ∖ Y )∣ + ∣(Y ∖ X)∣ + ∣X ∩ Y ∣.
Now, the sets X ∖ Y and X ∩ Y are disjoint and their union is X, so
∣X∣ = ∣(X ∖ Y ) ∪ (X ∩ Y )∣ = ∣X ∖ Y ∣ + ∣X ∩ Y ∣.
∣Y ∣ = ∣(Y ∖ X) ∪ (X ∩ Y )∣ = ∣Y ∖ X∣ + ∣X ∩ Y ∣.
Solution to Exercise 5.7. Let E be the set of even integers, and O the set of odd integers, in the
range {1, 2, . . . , 2n + 1}. Then ∣E∣ = n and ∣O∣ = n + 1. If f was such that f (k) was even for all
k ∈ O, then f ∗ ∶ O → E given by f ∗ (x) = f (x) would be an injection. But, by the pigeonhole
principle, since ∣O∣ > ∣E∣, such an injection cannot exist. Hence there is some odd k such that
f (k) is odd.
Equivalence relations and the rational numbers
6
The material in this chapter is also covered in:
6.1 Introduction
In this chapter of the notes we study the important idea of an equivalence relation, a concept
that is central in abstract mathematics. As an important example, we show how to formally
construct the rational numbers using the integers and a carefully chosen equivalence relation.
We will return to this in Lent Term.
87
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 88
properties we met in the example above. For no x ∈ R do we have x R x because x is not greater
than x. Furthermore, if x R y then x > y, and we cannot therefore also have y R x, for that would
imply the contradictory statement that y > x.
In many cases, we use special symbols for relations. For instance ‘=’ is a relation, as is >. It
is often convenient to use a symbol other than R: for instance, many textbooks use x ∼ y rather
than x R y as a symbol for ‘some relation’, particularly if the relation is an equivalence relation
(see below).
A relation that has all three of these properties is called an equivalence relation.
m R n ⇐⇒ m + n is even
is reflexive and symmetric. It is also transitive. To prove that, suppose x, y, z are three natural
numbers and that x R y and y R z. Then x + y is even and y + z is even. To show that x R z we
need to establish that x + z is even. Well,
x + z = (x + y) + (y + z) − 2y,
and all three terms on the right (x + y, y + z, and 2y) are even. Therefore, x + z is even and so
x R z.
Example 6.5. Let X be the set of n × n real matrices. Define a relation ∼ on X by:
M rt = (M r )t = (N s )t = (N t )s = (Ru )s = Rus ,
Example 6.6. Let S be a set of people in a given social network, and let F be the relation
‘friendship’, i.e. aF b if a and b are people in S who are friends in the social network. This
relation is symmetric (in real life, it might be that a says they are friends with b but b disagrees.
Social networks such as Facebook don’t allow this one-sided ‘friendship’). Let’s say that you are
automatically a friend of yourself, so the relation is reflexive.
Is the relation transitive? Well, that depends on the social network. You probably want to
say ‘No’, because (if you’re on Facebook) you surely have some friend not all of whose friends
you know. So for the example of S and F coming from Facebook, you know the relation F is
not transitive; you have a counterexample—and hence it’s also not an equivalence relation. But
it doesn’t have to be that way. If S is all the people in this lecture hall—well, we’re all friends
(I hope!) and so from the lecture example we do get a transitive relation, and hence (because we
checked all three properties) an equivalence relation.
[x]R = {y ∈ X ∣ y R x}.
Often, we will want to talk about the set of all equivalence classes of R. This set is written
X/R, and referred to as the quotient set of X by R. So we have
X/R = {[x]R ∶ x ∈ X} .
Notice that each [x]R is a subset of X. If R is clear from the context—which it usually will
be; in general we will only be talking about one equivalence relation at any given time—we may
just write [x] for [x]R .
Example 6.8. Consider again R on N given by m R n ⇐⇒ m + n is even. Any even number is
related to any other even number; and any odd number to any odd number. So there are two
equivalence classes:
[1] = [3] = [5] = ⋯ = {n ∈ N ∣ n is odd} ,
[2] = [4] = [6] = ⋯ = {n ∈ N ∣ n is even} ,
and we have N/R = {[1], [2]}.
You should keep in mind that even though we use the word ‘equivalence class’, what an
equivalence class is, is simply a set: and you know how to handle sets. The name ‘equivalence
class’ is just to remind you that this particular set is a special set and (as we’ll shortly see)
they have some extra nice properties. Similarly, we might say that 3 is a representative of the
equivalence class [1] (in the above example). That means exactly the same as saying that 3 is a
member of the set [1]; we use the word ‘representative’ instead of ‘member’ to remind ourselves
that we are dealing with a special set.
Example 6.9. Given a function f ∶ X → Y , define a relation R on X by x R z ⇐⇒ f (x) = f (z).
Then R is an equivalence relation. If f is a surjection, the equivalence classes are the sets
{x ∈ X ∶ f (x) = y} = f −1 ({y}),
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 90
for y ∈ Y . Note that the place where we use that f is a surjection is that it implies each f −1 ({y})
is non-empty. If f is not a surjection, then the equivalence classes are the sets f −1 ({y}) for all
y ∈ Y such that there is an x ∈ X with y = f (x), in other words for each y ∈ f (X).
The equivalence classes have a number of important properties. These are given in the
following result.
Proof. (i) This is an if and only if statement, so we have two things to prove: namely that
[x] = [y] Ô⇒ x R y and that x R y Ô⇒ [x] = [y].
Suppose, then, that [x] = [y]. The relation R is reflexive, so we have x R x. This means that
x ∈ [x]. But if [x] = [y], then we must have x ∈ [y]. But that means (by definition of [y]) that
x R y.
Conversely, suppose that x R y. We now want to show that [x] = [y]. So let z ∈ [x]. (We
will show that z ∈ [y].) Then z R x. But, because x R y and R is transitive, it follows that z R y
and hence z ∈ [y]. This shows [x] ⊆ [y]. We now need to show that [y] ⊆ [x]. Suppose w ∈ [y].
Then w R y and, since x R y, we also have, since R is symmetric, y R x. So w R y and y R x. By
transitivity of R, w R x and hence w ∈ [x]. This shows that [y] ⊆ [x]. Because [x] ⊆ [y] and
[y] ⊆ [x], [x] = [y], as required.
(ii) Suppose x and y are not related. We prove by contradiction that [x] ∩ [y] = ∅. So suppose
[x] ∩ [y] ≠ ∅. Let z be any member of the intersection [x] ∩ [y]. (The fact that we’re assuming
the intersection is non-empty means there is such a z.) Then z ∈ [x], so z R x and z ∈ [y], so
z R y. Because R is symmetric, x R z. So: x R z and z R y and, therefore, by transitivity, x R y.
But this contradicts the fact that x, y are not related by R. So [x] ∩ [y] = ∅.
Theorem 6.10 shows that either two equivalence classes are equal, or they are disjoint.
Furthermore, because an equivalence relation is reflexive, any x ∈ X is in some equivalence class
(since it certainly belongs to [x] because x R x). So what we see is that the equivalence classes
form a partition of X: their union is the whole of X, and no two equivalence classes overlap.
m R n ⇐⇒ m + n is even.
We have seen that there are precisely two equivalence classes: the set of odd positive integers
and the set of even positive integers. Note that, as the theory predicted, these form a partition
of all of N (since every natural number is even or odd, but not both).
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 91
∃C ∈ R ∶ ∀x ∈ R , f (x) − g(x) = C .
What are the equivalence classes of S? Well, they are sets of functions. If f is any particular
function, then [f ]S is
the set of all functions g ∶ R → R such that g(x) = f (x) + C for some constant C.
We can say that ∫ 3x2 dx is [x3 ]S . That is, if we pick any function f in [x3 ]S (remember,
this is a set of functions) then we have dx d
f (x) = 3x2 . And if we pick any function that’s not in
[x ]S , then its derivative (if it is has one!) will not be 3x2 .
3
The advantage of this is that we can work with several indefinite integrals without having to
write several different letters for constants of integration, but while still being reminded by the
notation [f ]S that when it matters we should put them in. We formalised the idea that x3 + 5
and x3 + 10 are ‘kind of the same’ as far as being indefinite integrals is concerned: what it means
is they’re both representatives of [x3 ]S .
In this case, we don’t really gain a lot by introducing this equivalence relation. It is also
pretty easy to write +C whenever we want to write an indefinite integral. What we are doing
when we write +C every time, is simply writing out explicitly the definition of the equivalence
relation S on every line. Whatever equivalence relation you are asked to work with, you can
always write out the definition on every line instead of talking about equivalence classes.
However, if the equivalence relation we want to work with is complicated, it very quickly
gets painful to write out the definition every time, and the mass of written-out-definition makes
it hard to see what’s important. Let’s see an (important!) example.
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 92
You should quickly check that this relation Q does what you think it should do: if (by
m′
your school-style calculation) the fractions m n and n′ are the same, then indeed we have
(m, n)Q(m′ , n′ ). However so far in this course we did not define ‘division’ nor ‘fraction’—that’s
exactly what we want to do now. The relation Q only uses the properties of Z which we are
already happy with.
Let’s pause for a moment to prove that Q is indeed an equivalence relation.
Q is Reflexive: (m, n)Q(m, n) because mn = nm.
Q is Symmetric: (m, n)Q(p, q) means mq = np. Rearranging we get pn = qm, which by
definition is the same as (p, q)Q(m, n).
Q is Transitive: Suppose (m, n)Q(p, q) and (p, q)Q(s, t). Then mq = np and pt = qs. So,
(mq)(pt) = (np)(qs) and, after cancelling qp, this gives mt = ns, so (m, n)Q(s, t).
But, wait a minute: can we cancel pq? Sure, if it’s nonzero. If it is zero then that means p = 0
(since we know that q ≠ 0). But then mq = 0, so m = 0; and qs = 0, so s = 0. So, in this case also
we get mt = ns (both sides are zero) and so (m, n)Q(s, t).
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 93
equal). Why are they equal? You could prove it from the definition of set equality, but it’s
easier to use the theory we developed. We know (2, 5)Q(4, 10), because 2 × 10 = 4 × 5. And now
Theorem 6.10 tells us [(2, 5)]Q = [(4, 10)]Q .
We can now say what the set of fractions, written Q, is. It is the set of all the equivalence
classes of the relation Q. So 25 is a fraction. So is 21 . So is 10
4
, and (as we just saw) in fact 25 is
4
the same thing as 10 .
What we have done here is to find a way of making sense of fractions, and being able to say
when two fractions are the same, without ever having to define ‘division’. We are only relying
on the properties of Z—adding and multiplying integers—that you are already happy with.
Logically, this is the ‘right thing to do’.
To see why, think about how you might try to define ‘division’ without using fractions. You
can do that if you try hard enough (for example, you might write out exactly what you mean
by decimal long division) but you will end up with a complicated definition that’s hard to work
2 3
with (why is it true that 14 and 21 have the same decimal expansion..?). Whereas if we define
fractions, we can rather easily define division: we’ll just say that dividing by n means multiplying
by n1 . (We didn’t say what multiplying fractions means yet—we’ll get to that in Lent Term!)
and this turns out to be much easier to work with.
What about simply saying ‘obviously fractions exist and we can just work with them’ ? The
problem with this is: what if there is something we’re missing? It’s a bit hard to explain what
exactly the problem might be here: you are so used to fractions that you probably cannot
5 ⋅ 5 = 25 ; it’s hard to
1
imagine what could possibly be a problem. But it is a bit funny that −1 −1
explain what that should mean in ‘real world’ terms. The last time we saw something a bit
funny was in Section 3.6, and there, we did run into problems with a way of defining sets that
at first looks perfectly reasonable.
What we are doing here—we’re currently part-way through—is to construct the fractions
from the integers. So far, we defined an equivalence relation, and we created a set Q which is the
set of equivalence classes. A set on its own isn’t very interesting: we want to do things with it.
We would like to define addition and multiplication of fractions: which we will do in Lent Term.
This is of course something you know very well how to do from school, and in Lent Term what
you will see is nothing other than what you know from school. However, there is an important
point that you probably never thought about at school—briefly, why exactly is it that 17 + 82 and
14 + 12 come out to the same answer?
2 3
By the way, the rational numbers are described as such because they are (or, more formally,
can be represented by) ratios of integers.
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 94
(F1) Closure under addition and multiplication: for each a, b ∈ F both a + b and a × b are in F.
(F5) Additive and multiplicative identity: there are two different elements 0 and 1, such that
for each a ∈ F we have a + 0 = a and a × 1 = a.
(F6) Additive and multiplicative inverses: for each a ∈ F there is an element −a such that
a + (−a) = 0, and if a ≠ 0 there is an element a−1 such that a × a−1 = 1.
If you want to in LT, you can do this check for Q without too much trouble: it’s (very) long,
but not hard. You’ll need to use the definition of Q together with its operations, and you’ll need
to know things like that multiplication of integers is commutative, and the distributive law for
the integers.
You should notice that the axioms for a field are all things you knew long ago are true for
the rational numbers, or for the real numbers. They are all part of the ‘doing algebra as normal’
that you learned in school. You’ll probably notice, too, that there are some properties that the
rational numbers have which aren’t listed. For example the rational numbers have an order
and you know how to do algebra with the < sign, too. We could have written down some more
axioms saying that there is an order and saying how ‘algebra as normal’ works with an order
(things like: if a < b then a + c < b + c). That would give us an ordered field. You will see later in
the course that some fields are not ordered—for example there is no way to put a sensible order
on C; no matter what you try, some piece of ‘algebra as normal with <’ won’t work.
This is your first (brief, and non-examinable!) introduction to the axiomatic approach to
mathematics, which we will see a good deal more of next term. What is the point?
Well, look to your MA100 notes on linear algebra (if you got that far—if not, it will start
in the next week or two) and look at a few statements. Matrix addition is commutative; both
addition and multiplication are associative. There isn’t really an explicit proof given for these
statements, but you can probably see how to check at least the statements for addition. You’ll
notice that all the things you do in your proof are using the field axioms. What that means is:
those statements are true not just for matrices of real numbers, but for any field. In fact, that’s
true for most of the linear algebra in MA100.
That turns out to be incredibly useful. There are many different fields in mathematics, and
some of them have practical applications too. For one example, the way that this document
was transmitted from my computer to yours makes use of linear algebra over the field Z2 .
Linear algebra over Z2 is the basis of coding theory, which is (part of) what makes Internet
communication work reliably.
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 95
But you do not have to learn a whole new MA100 course to understand coding theory. All
you need to know is what Z2 is (next term) and to quickly check which results in MA100 use
only the field axioms. Then you can simply use what you learned in MA100 (and when you
want to do linear algebra over Z3 , you don’t even need to check which results are allowed, since
you already did that for Z2 ).
Not everything in MA100 does use only the field axioms. For √example, you can’t find a unit
vector in the direction (1), because the length of that vector is 2, which (we said earlier) isn’t
1
rational.
If we tried to do mathematics without the axiomatic approach to structures, we’d either
spend forever re-proving (and trying to learn!) dozens of almost-identical theorems for all the
different fields we want to do linear algebra over (and so on), or we would have to say something
vague like ‘matrix addition is commutative whenever we can do algebra as normal’. The first
of these is a waste of time and energy, the second is dangerous: what is algebra as normal?
Does it include taking square roots? Either you have to guess, or you have to check the proof
for square-rooting every time you want to use a theorem (which is again a waste of time and
energy).
With the axiomatic approach, we can simply say ‘matrix addition is commutative over any
field’: easy to learn and precise, once you remember what a field is. Quite a lot of mathematics
is like this: you need to learn some definition or concept, which at first looks like formality for
the sake of it (you don’t do algebra by thinking explicitly about the field axioms!) but the payoff
is that in the long run it will make your life easier.
Don’t be fooled into believing that checking the field axioms for Q from our definition
(remember, Q is a set of equivalence classes with some funny way to define addition and
multiplication) is somehow ‘automatically’ going to work. It’s reasonable to think: we start with
integers (where multiplication is commutative), we construct something with pairs of integers
and define ‘multiplication’, of course the multiplication will be commutative.
That’s not a valid argument. To see why, think about 2-by-2 matrices (with integer entries, say).
We build the set of 2-by-2 integer matrices by starting with the integers (where multiplication is
commutative) and define ‘multiplication’ of matrices, but the multiplication is not commutative.
Exercise 6.2. Define the relation R on the set N by x R y if and only if there is some n ∈ Z
such that x = 2n y. Prove that R is an equivalence relation.
Exercise 6.3. Let X be the set of n × n real matrices. Define a relation ∼ on X by:
M ∼ N ⇐⇒ ∃ an invertible P ∈ X s.t. N = P −1 M P.
Exercise 6.5. Prove that the set {x ∈ Z ∣ x is a multiple of 4} has no lower bound.
CHAPTER 6. EQUIVALENCE RELATIONS AND THE RATIONAL NUMBERS 96
The treatment in Biggs is probably better for the purposes of this course.
Neither of these books covers complex numbers. You do not have to know very much about
complex numbers for this course, but because this topic is not in these books, I have included
quite a bit of material on complex numbers in this chapter.
You can find useful reading on complex numbers in a number of books, including the following
(which you might already have, given that it is the MA100 text).
• Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Cambridge University
Press 2012. Chapter 13.
7.1 Introduction
In this chapter, we explore real numbers and complex numbers.
We are going to stick, mainly, to your intuition and what you already know about numbers
from school—which means we are not going to formally construct the real numbers.
97
CHAPTER 7. REAL AND COMPLEX NUMBERS 98
m2 = 2n2 becomes 4m21 = 2n2 , and so n2 = 2m21 . Well, this means n2 is even and hence n must
be even. So m and n are both divisible by 2. But this is a contradiction; we just said we can
assume they are not both divisible by 2.
So our assumption that (m/n)2 = 2 must have been wrong and we can deduce no such
integers m and n exist.
Isn’t this theorem a thing of beauty?
Activity 7.1. Make sure you understand that this is a proof by contradiction, and that you
understand what the contradiction is.
What this theorem tells us is that, at least if we want to solve equations like x2 = 2, then the
rational numbers are not enough; we need more. Of course, we could just invent new symbols
and define them to satisfy the equations we want. But this is a dangerous thing to do—we
might be assuming something exists which doesn’t in fact exist; whose existence leads to a
contradiction. We’d better rather construct the reals.
The second route is the ‘Cauchy sequence’ construction. This is rather more complicated.
The idea is the following: if I want to specify a real number, I’ll give a sequence of rational
numbers which get closer and closer to the number I want (the formal term is ‘Cauchy √ sequence’;
it’ll be defined later), such as longer and longer parts of the decimal expansion of 2; I might
give the sequence
(1, 1.4, 1.41, 1.414, ...) .
It’s easy to add such sequences—I just add the terms:
Multiplication works similarly (this time negative numbers aren’t a problem). So far this looks
rather like the ‘decimals’ construction. But of course I might have many possible sequences of
rational numbers which get closer and closer to say 0 (or any other real number), so I should
write down an equivalence relation which says two such sequences are equivalent. And then the
real numbers are the equivalence classes of this relation.
By the end of this term, you will have studied sequences in sufficient detail to make sense of
this ‘Cauchy sequence’ construction, and prove that it really works. It’s worth trying—this is
fairly hard work, but it is also a good test of the Analysis you’ll learn.
a0 .a1 a2 . . . an
a0 .a1 a2 a3 . . . ai . . . ,
where ai ∈ N ∪ {0} and ai ≤ 9 for i ≥ 1. We allow for ai to be 0, so, in particular, it is possible that
ai = 0 for all i ≥ N where N is some fixed number: such an expansion is known as a terminating
expansion. Given such an infinite decimal expansion, we say that it represents a real number a
if, for all n ∈ N ∪ {0},
a0 .a1 a2 . . . an ≤ a ≤ a0 .a1 a2 . . . an + 1/(10)n .
This formalism allows us to see that the infinite decimal expansion 0.99999 . . . , all of whose
digits after the decimal point are 9, is in fact the same as the number 1.0000000 . . . .
For example, two infinite decimal expansions are
3.1415926535 . . .
and
0.1833333333333 . . . .
(You’ll probably recognise the first as being the number π.) Suppose, in this second decimal
expansion, that every digit is 3 after the first three (that is, ai = 3 for i ≥ 3). Then we write this
as 0.183 (or, in some texts, 0.183̇). We can extend this notation to cases in which there is a
repeating pattern of digits. For example, suppose we have
0.1123123123123 . . . ,
0.5714285 ⋯
7 4.0000000
3.5
.50
.49
10
7
30
28
20
14
60
56
40
35
50
So,
4/7 = 0.571428.
Notice: we must have the same remainder re-appear at some point, and then the calculation
repeats. Here’s the calculation again, with the repeating remainder highlighted.
0.5714285 ⋯
7 4.0000000
3.5
.50
.49
10
7
30
28
20
14
60
56
40
35
50
CHAPTER 7. REAL AND COMPLEX NUMBERS 102
Theorem 7.3. If pq = a0 ⋅ a1 a2 a3 . . . in decimal, where p and q > 0 are integers, then there exist
some natural numbers N and k such that for each n ≥ N we have an+k = an .
The idea here is that the first few digits might not fit the ‘repeating pattern’ (as is the case
for, for example, 16 = 0.16666....) but from digit N onwards, the repeating pattern starts, and
the length of the repeating block of digits is k.
Rather than just jumping into a proof of this theorem, let’s think about how we can get to
it. This is about the right level of difficulty for a (moderately hard) exam question (or it would
be if it wasn’t in the notes..!) and so you might want to close the notes for a while and try to
solve it yourself.
We’ve seen in an example how we get to a repeating pattern. When we do long division to
work out 47 as a decimal, at some point the remainder repeats and after that point the calculation
will repeat forever. Maybe the same statement is true if we replace 47 by pq ? Then we would be
done.
So we have two things to prove. First, at some point the remainder repeats. Second, after
that point the calculation repeats forever.
Why should the remainder repeat at some point? Intuitively, this is almost obvious. The
remainder on division by q is an integer between 0 and q − 1 inclusive. There are q such integers,
so after at most q + 1 steps we surely have to repeat. That is not quite a formal proof, but ‘once
we have more steps than possible remainders we have to repeat’ should sound like a special case
of something you know. That something is the Pigeonhole Principle, so we should be using the
Pigeonhole Principle. In order to avoid talking about ‘the first remainder’, it will help to give it
a name. Let’s say that r1 is the first remainder, i.e. when we try to divide p by q, we get the
quotient a0 and remainder r1 . Then r2 is the second remainder; when we try to divide 10r1 by
q, we get the quotient a1 and remainder r2 , and so on. The Pigeonhole Principle should tell us
that there exist N and k such that rN = rN +k .
Why does the calculation repeat from this point? Again, this is almost obvious. We know
that aN is the quotient when we try to divide 10rN by q, and rN +1 is the remainder. And we
know that aN +k is the quotient when we try to divide 10rN +k by q, and rN +k+1 is the remainder.
But that is the same calculation, so aN = aN +k and rN +1 = rN +k+1 .
Well, now we know that rN +1 = rNk +1 , we can use exactly the same argument to show
aN +1 = aN +k+1 and rN +2 = rN +k+2 . And so on... in other words, this is an induction with base
case N .
Let’s write that formally.
Proof. We define two sequences recursively. We let a0 be the quotient when we try to divide p
by q, and r1 be the remainder. Then, for each integer i ≥ 1, we let ai be the quotient when we
try to divide 10ai by q, and ri+1 be the remainder.
Since each ri is an integer such that 0 ≤ ri ≤ q−1, we can define a function f from {1, 2, . . . , q+1}
to {0, 1, . . . , q − 1} by setting f (i) = ri . Since the domain is larger than the codomain, by the
Pigeonhole Principle there exist i, j ∈ {1, 2, . . . , q + 1} which are distinct such that f (i) = f (j).
Suppose that i < j, and define N = i and k = j − i. Then f (N ) = f (N + k), i.e. rN = rN +k .
We now try to prove by induction that for each n ≥ N we have the statement P (n), where
P (n) is ‘an = an+k and rn+1 = rn+k+1 ’.
The base case is n = N . We know aN is the quotient when we try to divide 10rN by q, and
rN +1 is the remainder. And we know that aN +k is the quotient when we try to divide 10rN +k by
q, and rN +k+1 is the remainder. Since 10rN = 10rN +k , this is the same calculation, so aN = aN +k
and rN +1 = rN +k+1 as required.
CHAPTER 7. REAL AND COMPLEX NUMBERS 103
Now let s ≥ N , and suppose the induction hypothesis P (s) holds. In particular, we have
rs+1 = rs+k+1 . We know as+1 is the quotient when we try to divide 10rs+1 by q, and rs+2 is the
remainder. And we know that as+k+1 is the quotient when we try to divide 10rs+k+1 by q, and
rs+k+2 is the remainder. Since 10rs+1 = 10rs+k+1 , this is the same calculation, so as+1 = as+k+1 and
rs+2 = rs+k+2 . That is P (s + 1), so we proved the induction step. By the Principle of Induction,
we have P (n) for all n ≥ N .
In particular, we have an = an+k for all n ≥ N , which proves the theorem.
We’re calling it ‘obvious’ that when we divide p by q in the above, there is only one possible
answer for the quotient and remainder. If you’re not happy about that—maybe you shouldn’t
be—you will see a proper proof that this is true in Lent Term.
Next, we think about the second part of the statement: that if the decimal expansion repeats,
then the number is rational.
Clearly, if the decimal expansion is terminating, then the number is rational. But what about
the infinite, repeating, case? We’ve given two examples above. Let’s consider these in more
detail.
Example 7.4. Consider a = 0.183. Let x = 0.003. Then 10x = 0.03 and so 10x − x = 0.03 − 0.003 =
0.03. So, 9x = 0.03 and hence x = (3/100)/9 = 1/300, so
18 1 55 11
0.183 = 0.18 + 0.003 = + = = ,
100 300 300 60
and this is the rational representation of a.
Example 7.5. Consider the number 0.1123. If x = 0.0123, then 1000x = 12.3123 and 1000x − x =
12.3. So 999x = 12.3 and hence x = 123/9990. So,
1 1 123 1122
0.1123 = +x= + = .
10 10 9990 9990
In general, if the repeating block is of length k, then an argument just like the previous two,
in which we multiply by 10k , will enable us to express the number as a rational number.
Activity 7.2. Formalise this argument.
Theorem 7.6. Suppose q, q ′ ∈ Q with q < q ′ . Then there is r ∈ Q with q < r < q ′ .
If you sketch the graph of p(x) you will find that the graph intersects the x-axis at the two real
solutions (or roots) of the equation p(x) = 0, and that the polynomial factors into the two linear
factors,
p(x) = x2 − 3x + 2 = (x − 1)(x − 2)
Sketching the graph of q(x), you will find that it does not intersect the x-axis. The equation
q(x) = 0 has no solution in the real numbers, and it cannot be factorised (or factored) over the
reals. Such a polynomial is said to be irreducible over the reals. In order to solve this equation,
we need to define the complex numbers.
If you met the complex numbers in school, then probably you were told to accept ‘there is a
symbol i which means the square root of −1’ and you did arithmetic with it. This isn’t a very
satisfactory way of doing things: why can we assume there is such a symbol? We could equally
well invent a symbol (say E) to be the result of trying to divide 1 by 0 and do arithmetic with
it—and if you do, you’ll find you can ‘prove’ 1 = 2. (Try it!)
What we will do is instead to write down a new number system, explain how to do arithmetic,
and then show that we can find a ‘square root of −1’ in this new system.
CHAPTER 7. REAL AND COMPLEX NUMBERS 105
You should check that these definitions really work, that is, that (for example) the multipli-
cation is commutative, and that the distributive law holds. More generally, you should check
that C satisfies (F1)–(F6), i.e. it is a field: we can do all the algebra we’re used to. (What C
doesn’t have is an order: there isn’t any way of defining an order < on C which plays nicely with
addition and multiplication in the way that the order plays nicely in N, Z, Q or R.)
You can also check that the complex numbers of the form (x, 0) behave like the real numbers,
in other words that (x, 0)+(y, 0) = (x+y, 0), and (x, 0)×(y, 0) = (xy, 0), which is what you expect
for adding and multiplying real numbers. Finally, let’s remember why we began this: we wanted
to be able to solve the equation x2 + 1 = 0. Well, that means we want a complex number (a, b)
such that (a, b) × (a, b) + (1, 0) = (0, 0). And we can find such a number: (0, 1) × (0, 1) = (−1, 0),
so we are done.
Let’s return briefly to the 10 bad example from the last section. Suppose you try to construct
a new number system—maybe by taking pairs or triples or whatever of numbers, maybe with
some equivalence relation to say when two pairs are ‘equivalent’ (as we did to construct the
rationals). To do arithmetic with your new number system, you need to explain how to add and
to multiply, and (if you have some equivalence relation involved) you need to show that the
addition and multiplication you wrote down are well-defined. And you would like that there is
something like ‘subtraction’ and ‘division’ that are inverse operations, and you would like it to
be true that addition distributes over multiplication, and so on.
There are in fact lots of things you might come up with that make sense—not just the rational,
real and complex numbers—these other things are also fields and they are very important in
mathematics (and some of them turn out to be very important in modern technology). What
you will not find is a field that contains a solution to the equation 0 × x = 1, in the way that the
complex numbers we just defined contain a solution to the equation x2 + 1 = 0. This is why √ we
cannot invent a symbol E = 10 and do arithmetic with it, but we can invent a symbol i = −1.
rather than the equivalence class [(a, b)]R of the relation R we defined in Section 6.4.1). We
can then say what we mean by the complex numbers.
Definition 7.7. A complex number is a number of the form z = a + ib, where a and b are real
numbers, and i2 = −1. The set of all such numbers is
C = {a + ib ∶ a, b ∈ R} .
If z = a + ib is a complex number, then the real number a is known as the real part of z,
denoted Re(z), and the real number b is the imaginary part of z, denoted Im(z). Note that
Im(z) is a real number.
If b = 0, then z is a real number, so R ⊆ C. If a = 0, then z is said to be purely imaginary.
The quadratic polynomial q(z) = x2 + x + 1 can be factorised over the complex numbers,
because the equation q(z) = 0 has two complex solutions. Solving in the usual way, we have
√
−1 ± −3
x= .
2
√ √ √ √ √
We write, −3 = (−1)3 = −1 3 = i 3, so that the solutions are
√ √
1 3 1 3
w =− +i and w =− −i .
2 2 2 2
Notice the form of these two solutions. They are what is called a conjugate pair. We have the
following definition.
Definition 7.8. If z = a+ib is a complex number, then the complex conjugate of z is the complex
number z = a − ib.
We can see by the application of the quadratic formula, that the roots of an irreducible
quadratic polynomial with real coefficients will always be a conjugate pair of complex numbers.
z + w = (1 + i) + (4 − 2i) = (1 + 4) + i(1 − 2) = 5 − i
and
zw = (1 + i)(4 − 2i) = 4 + 4i − 2i − 2i2 = 6 + 2i
You should check that this is really exactly the same as the definitions we gave when we
formally constructed the complex numbers: the only difference is the way we’re writing complex
numbers.
If z ∈ C, then zz is a real number:
zz = (a + ib)(a − ib) = a2 + b2 .
Activity 7.5. Carry out the multiplication to verify this: let z = a + ib and calculate zz.
z zw
Division of complex numbers is then defined by = since ww is real.
w ww
Example 7.10.
1+i (1 + i)(4 + 2i) 2 + 6i 1 3
= = = + i
4 − 2i (4 − 2i)(4 + 2i) 16 + 4 10 10
CHAPTER 7. REAL AND COMPLEX NUMBERS 107
and
a0 + a1 z + a2 z 2 + ⋯ + an z n = 0.
Since the coefficients ai are real numbers, this becomes
a0 + a1 z + a2 z 2 + ⋯ + an z n = 0.
Activity 7.7. Multiply out the last two factors above to check that their product is the irreducible
quadratic x2 + x + 1.
Theorem 7.13. Two complex numbers are equal if and only if their real and imaginary parts
are equal.
There are two ways to prove this. We can do it directly, using the fact that the complex
numbers are a field:
Proof. Two complex numbers with the same real parts and the same imaginary parts are clearly
the same complex number, so we only need to prove this statement in one direction. Let z = a + ib
and w = c + id. If z = w, we will show that their real and imaginary parts are equal. We have
a+ib = c+id, therefore a−c = i(d−b). Squaring both sides, we obtain (a−c)2 = i2 (d−b)2 = −(d−b)2 .
But a − c and (d − b) are real numbers, so their squares are non-negative. The only way this
equality can hold is for a − c = d − b = 0. That is, a = c and b = d.
The other, much shorter (by now!) way to prove this is simply to observe that the complex
numbers are the same as pairs of real numbers (with addition and multiplication as we defined
them when we formally constructed the complex numbers) and pairs of real numbers are by
definition equal if and only if both parts—which are precisely the real and imaginary parts—are
equal.
As a result of this theorem, we can think of the complex numbers geometrically, as points in
a plane. For, we can associate the vector (a, b)T uniquely to each complex number z = a + ib,
and all the properties of a two-dimensional real vector space apply. A complex number z = a + ib
is represented as a point (a, b) in the complex plane; we draw two axes, a horizontal axis to
represent the real parts of complex numbers, and a vertical axis to represent the imaginary parts
of complex numbers. Points on the horizontal axis represent real numbers, and points on the
vertical axis represent purely imaginary numbers.
CHAPTER 7. REAL AND COMPLEX NUMBERS 109
(a, b)
7
z = a + ib
i
θ
(0, 0) 1
a = r cos θ, b = r sin θ
√
where r = a2 + b2 is the length of the line joining the origin to the point (a, b) and θ is the
angle measured anticlockwise from the real (horizontal) axis to the line joining the origin to the
point (a, b). Then we can write z = a + ib = r cos θ + i r sin θ.
Definition 7.14. The polar form of the complex number z is
• ∣z∣2 = zz.
z r
= ( cos(θ − φ) + i sin(θ − φ))
w ρ
Activity 7.10. Show these by performing the multiplication and the division as defined earlier,
and by using the facts that cos(θ+φ) = cos θ cos φ−sin θ sin φ and sin(θ+φ) = sin θ cos φ+cos θ sin φ.
CHAPTER 7. REAL AND COMPLEX NUMBERS 110
DeMoivre’s Theorem
We can consider explictly a special case of the multiplication result above, in which w = z. If we
apply the multiplication to z 2 = zz, we have
z2 = zz
= (r(cos θ + i sin θ))(r(cos θ + i sin θ))
= r2 (cos2 θ + i2 sin2 θ + 2i sin θ cos θ)
= r2 (cos2 θ − sin2 θ + 2i sin θ cos θ)
= r2 (cos 2θ + i sin 2θ).
Here we have used the double angle formulae for cos 2θ and sin 2θ.
Applying the product rule n times, where n is a positive integer, we obtain DeMoivre’s
Formula
Theorem 7.15.
(cos θ + i sin θ)n = cos nθ + i sin nθ
Proof.
n
zn = z⋯z = (r(cos θ + i sin θ))
±
n times
Sure, I know addition is commutative, but that will only let me change places of finitely many
terms in the sum (which I don’t quite understand anyway), and I still have infinitely many more
things which I need to change places. The answer to that objection is: we’ll explain properly
some of it later this term, and some next year in MA203 Real Analysis. For now, take it on
faith that it does actually make sense.
Definition 7.16. The exponential form of a complex number z = a + ib is
z = reiθ
where r = ∣z∣ is the modulus of z and θ is the argument of z.
In particular, the following equality is of note because it combines the numbers e, π and i
in a single expression: eiπ = −1.
If z = reiθ , then its complex conjugate is given by z = re−iθ . This is because, if z = reiθ =
r(cos θ + i sin θ), then
z = r(cos θ − i sin θ) = r(cos(−θ) + i sin(−θ)) = re−iθ .
We can use either the exponential form, z = reiθ , or the standard form, z = a + ib, according
to the application or computation we are doing. For example, addition is simplest in the form
z = a + ib, but multiplication and division are simpler in exponential form. To change a complex
number between reiθ and a + ib, use Euler’s formula and the complex plane (polar form).
Example 7.17. √
2π
3 + i sin 3 = − 2 + i 2 .
ei 3 = cos 2π 2π 1 3
√ √ √ √
e2+i 3 = e2 ei 3 = e2 cos 3 + ie2 sin 3.
Activity 7.11. Write each of the following complex numbers in the form a + ib:
π 3π 3π 11π
ei 2 ei 2 ei 4 ei 3 e1+i e−1
√ π √ π
Example 7.18. Let z = 2 + 2i = 2 2 ei 4 and w = 1 − i 3 = 2e−i 3 , then
√ π
w6 = (1 − i 3)6 = (2e−i 3 )6 = 26 e−i2π = 64
√ π π √ π
zw = (2 2ei 4 )(2e−i 3 ) = 4 2e−i 12
and
z √ i 7π
= 2e 12 .
w
Notice that in the above example we are using certain properties of the complex exponential
function, that if z, w ∈ C,
ez+w = ez ew and (ez )n = enz for n ∈ Z.
This last property is easily generalised to include the negative integers.
Example 7.19. Solve the equation z 6 = −1 to find the 6th roots of −1.
Write z 6 = (reiθ )6 = r6 ei6θ , −1 = eiπ = ei(π+2nπ)
Equating these two expressions, and using the fact that r is a real positive number, we have
π 2nπ
r=1 6θ = π + 2nπ, + θ=
6 6
This will give the six complex roots by taking n = 0, 1, 2, 3, 4, 5.
Activity 7.12. Show this. Write down the six roots of −1 and show that any one raised to the
power 6 is equal to −1. Show that n = 6 gives the same root as n = 0.
Use this to factor the polynomial x6 + 1 into linear factors over the complex numbers and
into irreducible quadratics over the real numbers.
CHAPTER 7. REAL AND COMPLEX NUMBERS 112
a2 = nb2
and it follows that a2 is divisible by n. But it doesn’t follow that a is divisible by n, in general.
For example 62 = 36 is divisible by 18, but 6 is certainly not divisible by 18). In order to get
further, it helps to think about the prime factorisation of n—this is something we will meet in
MA103 next term.
Comment on Activity 7.7. We have
Now, w + w = 2 Re(w) = 2(− 12 ) and ww = 41 + 34 so the product of the last two factors is x2 + x + 1.
2i ● z = 2 + 2i
(0, 0) 1 2
−i
√
● w =1−i 3
−2i
Comment on Activity 7.8.
Comment on Activity 7.9. Draw√ the line from the origin
√ to theπpoint z inπ the diagram above.
Do the same for w. For z, ∣z∣ = 2 2 and θ = 4 , so z = 2 2( cos( 4 ) + i sin( 4 )). The modulus of
π
115
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 116
h h
Figure 8.1: A hemisphere, cone and cylinder, with a slice through at height h
Take a sphere of radius 1, and cut in in half through the centre to obtain a hemisphere.
Place the hemisphere with its flat base on the table, and put next to it a cone of height
1, with its base a circle of radius 1, standing vertically on its point. And next to that, put a
cylinder of radius and height 1.
Now, the critical observation: if you take a slice through this picture at any height h between
0 and 1, you slice a circle out of each of the hemisphere, cone and cyclinder. The area of the circle
sliced from the hemisphere, plus the area of the circle sliced from the cone, is equal to the area
of the circle sliced from the cylinder. This is easy to check using the Pythagoras theorem and
the formula for the area
√ of a circle (which Archimedes knew). The slice through the hemisphere
is a circle of radius 1 − h2 for an area π(1 − h2 ); the slice through the cone is again a circle,
this time of radius h for an area πh2 ; the slice through the cylinder is a circle of radius 1 for an
area π.
So, (in modern language) if we integrate we see that the volume of the hemisphere plus the
volume of the cone is equal to the volume of the cylinder. Archimedes, of course, did not say
‘integrate’, but he had a similar conception.
Putting it another way, the volume of the hemisphere is the volume of the cylinder minus
the volume of the cone: π − 31 π = 23 π. (Archimedes knew the volume of a cone, too.)
So the volume of the sphere is 34 π.
Why did I go through this argument? Well, because you can try to do something similar
to find the surface area of the sphere. Here is one way, which I’ll give in terms of modern
integration.
Look again at the hemisphere √ lying on the table, and take a slice through it at√height h.
That gives us a circle of radius 1 − h2 , as before, and the length of that circle is 2π 1 − h2 .
Integrating, we find the area of the curved bit of the hemisphere is
1 √
∫h=0 2π 1 − h dh = 2 π .
2 1 2
Activity 8.1. Why did we underestimate the surface area—what are we missing?
Liu Hui developed quite a number of methods of this nature in China, and (importantly) he
gave intuitive explanations of why certain methods work and others do not; following his ideas,
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 117
one can solve rather a lot of geometrical problems accurately. But this is an unpleasant situation
for a mathematician to find themselves in, with methods which might or might not work.
While mathematicians were using calculus-like methods to find answers to geometrical
problems, these intuitive explanations were considered good enough. But Newton and Leibniz,
and the mathematicians who came later, were soon using the calculus as you know it to find
answers to problems which aren’t obviously geometrical. Newton himself did not really trust
the calculus (or at least, he didn’t expect others to trust it); while he used it to solve problems,
once he knew the answer he generally looked for a geometrical proof before publishing. But
doing this is hard work (and sometimes you will get completely stuck and be unable to do it at
all), and if you trust the calculus, it is a waste of time.
But you cannot always trust the calculus, as we have seen!
The solution to this—developed by Cauchy and Weierstrass in the 1800s—is to formalise all
of this properly, remove the appeals to intuition, and provide some clear rules: if you do this
it will always work, and we can prove it. This is where analysis begins. If that was all there
is to it, it would be a small part of mathematics: but in fact, analysis provides us with many
surprising tools, going far beyond the calculus of Newton and Leibniz, and it is one of the two
major branches of pure mathematics, along with Algebra (which MA103 will introduce you to
in Lent Term).
We will not get that far in this course. But we can at least lay the foundations.
You probably started thinking about the following: differentiate, set the derivative to zero
and solve...
What you are doing, if you do that, is too much work. You’re trying to find the maximum
value of f (x) on R. It turns out to be a pain to do this, and it’s not what the question asked
for. If you do it (correctly!) then you do have a right answer, but you have also spent a lot of
time getting it.
It’s much easier to observe that −1 ≤ sin x ≤ 1 is true for all x ∈ R (you know this from school,
I hope!), and −(x − 2)2 is never positive. So f (x) ≤ 1 − 0 = 1. So we can answer ‘1 is such a
number’ and write this very short proof, and we are done. The number 2022, by the way, would
have done just as well; there are no points for some ‘best possible’ answer.
For another example, if you are supposed to ‘choose ε > 0 such that 17ε + 283ε2 < 0.1’ then
you don’t need to start trying to solve a quadratic equation in ε (which is a pain). You can say:
I will choose some ε ≤ 1, so that ε2 ≤ ε. Then I get
and if I choose ε = 1
6 000 (which is indeed ≤ 1) then I am done.
A final comment: some of you most likely heard something about funny names or symbols
like ‘infinitesimals’ or ‘dx’ before, and maybe have some idea that ε is ‘really’ the ‘dx’. This is
not true; ε is a real number and (whatever those funny things are) they are not real numbers.
S = {q ∈ Q ∣ q 2 ≤ 2}
does not have a largest element in Q. So we see that the rational number system has “gaps”.
The real number system R includes numbers to fill all of these gaps. Thus the set
T = {x ∈ R ∣ x2 ≤ 2}
has a largest element. This is a consequence of a very important property of the real numbers,
called the least upper bound property. Before we state this property of R, we need a few
definitions.
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 119
(ii) The set S = {n ∣ n ∈ N} is bounded below; indeed, any real number x ≤ 1 serves as a lower
bound. It has no upper bound—we shall give a formal proof of this shortly—and so it is
not bounded above, and therefore also not bounded.
(iii) The set S = {(−1)n ∣ n ∈ N} is bounded. Note that S = {−1, 1}. It is bounded above by 1
and bounded below by −1.
More generally, any finite set S is bounded.
(iv) The set S = { n1 ∣ n ∈ N} is bounded. Any real number x satisfying 1 ≤ x is an upper bound,
and 0 is a lower bound.
(v) The sets Z and R are neither bounded above nor bounded below.
(vi) The set T = {x ∈ R ∣ x2 ≤ 2} is bounded. Any real number x satisfying x2 ≤ 2 also satisfies
x2 < 4. Therefore, x2 − 4 < 0, that is, (x − 2)(x + 2) < 0. It follows that
The second case is impossible. Thus the first case has to hold, that is −2 < x < 2. It follows
that T is bounded above by 2 and bounded below by −2.
(vii) The set ∅ is bounded. For instance, 1 is an upper bound for ∅, since the condition “for
every element x of ∅, x ≤ 1” is satisfied: there is certainly no element x of ∅ that doesn’t
satisfy x ≤ 1. Indeed, every real number is an upper bound for ∅, and similarly every real
number is a lower bound for ∅.
We now introduce the notions of a least upper bound (also called supremum) and a greatest
lower bound (also called infimum) of a subset S of R.
Definition 8.3. Let S be a subset of R.
1. An element u∗ ∈ R is said to be a least upper bound of S (or a supremum of S) if
Proof. Suppose that S has a least upper bound u∗ . Suppose that u′∗ is also a least upper bound
of S. Then in particular u∗ and u′∗ are also upper bounds of S. Now since u∗ is a least upper
bound of S and u′∗ is an upper bound of S, it follows that
u∗ ≤ u′∗ . (8.1)
Furthermore, since u′∗ is a least upper bound of S and u∗ is an upper bound of S, it follows that
u′∗ ≤ u∗ . (8.2)
Definition 8.6.
When the supremum and the infimum of a set belong to the set, we give them the following
familiar special names:
Definition 8.7.
Example 8.8.
(i) If S = {x ∈ R ∣ 0 ≤ x < 1}, then sup S = 1 ∈/ S and so max S does not exist. But inf S = 0 ∈ S,
and so min S = 0.
(ii) If S = N, then sup S does not exist, inf S = 1, max S does not exist, and min S = 1.
(iii) If S = {(−1)n ∣ n ∈ N}, then sup S = 1, inf S = −1, max S = 1, min S = −1.
(iv) If S = { n1 ∣ n ∈ N}, then sup S = 1 and max S = 1. We show below (after Theorem 8.12)
that inf S = 0. So min S does not exist.
(v) For the sets Z and R, none of sup, inf, max, min exist.
(vi) For the set ∅, none of sup, inf, max, min exist.
1. S ≠ ∅, and
Example 8.9. Use the least upper bound property to show that there exists a number s ∈ R
such that s > 0 and s2 = 2.
Since the function x → x2 is increasing on the non-negative reals (i.e. if x ≥ 0) and it ‘doesn’t
have jumps’ (draw the graph!), the intuition is that we trace the number line (the x axis) from
0 upwards, until we reach the desired point x = s where x2 gets to 2. When x > s we will have
x2 > 2.
This is of course not a proof. We don’t know what ‘doesn’t have jumps’ means formally
(we’ll get to that!) and ‘trace the number line’ is not part of the least upper bound property.
But this idea does suggest something: we can split the reals into two parts: the part
S = {x ∈ R ∣ x2 < 2} of numbers that are ‘too small’ and the rest. Again, draw the graph and
look at where S is on the x axis. We would really like to say that s is the number ‘at the end of’
S. This informal idea is exactly what ‘least upper bound’ is supposed to formalise.
So we would like to say that s = sup S exists and that it satisfies s2 = 2. Let’s now formalise
this.
Proof. To begin with, we justify that x → x2 is an increasing function for x ≥ 0. That is, if
0 ≤ y < z then we want to prove y 2 < z 2 . That is the same as proving z 2 − y 2 > 0, and we can
factorise z 2 − y 2 = (z − y)(z + y) which is positive because both factors are positive.
Let S = {x ∈ R ∣ x2 < 2}. We now want to show sup S exists. This is what the least upper
bound property is for. We just need to show S is not empty and it has an upper bound.
First, 1 ∈ S (by definition), so S is not empty. And for example 3 is an upper bound for S.
This needs some justification: why is it that everything in S is at most 3? In other words, why
is everything bigger than 3 not in S? Well, 32 = 9, and so (because of the increasing property) if
we are given any x with 3 < x then 32 < x2 . That means in particular x2 > 2, so x is not in S.
Since we now know S is not empty and has an upper bound, by the least upper bound
property sup S exists. Let s = sup S. We can notice that s ≥ 1 since 1 ∈ S.
Finally, we need to prove s2 = 2. We do this by showing that each of the two alternatives
s2 > 2 and s2 < 2 leads to a contradiction.
Suppose first that s2 > 2. Intuitively, s is ‘too big’. We should be trying to contradict the
‘least’ part of ‘least upper bound’; we want to find an upper bound of S that is smaller than s.
That is, we want to find some small ε > 0 such that s − ε is an upper bound for S. By the
increasing property, that is the same as finding a small ε > 0 such that (s − ε)2 > 2.
2
We choose ε = s 2s−2 ; this formula comes out positive since s2 > 2 and s ≥ 1. Calculating, we
get
2
(s − ε)2 = s2 − 2sε + ε2 > s2 − 2sε = s2 − 2s s 2s−2 = 2 .
Here the inequality is since ε2 > 0. We are done in this case.
Now suppose s2 < 2. Again, intuitively s is ‘too small’ so we should be trying to contradict
the ‘upper bound’ part of ‘least upper bound’; we want to find something in S which is bigger
than s.
That is, we want to find some small ε > 0 such that s + ε is in S, that is such that (s + ε)2 < 2.
2−s2 2−s2
We choose ε = min ( 12 , 2s+1 ). That is, ε is whichever is smaller out of 12 and 2s+1 . Again, since
s < 2 and s ≥ 1 this formula comes out positive.
2
In the ‘checking that s2 = 2’ part of this proof, there are two big ‘magic steps’ where I just
pulled a weird formula for ε out of somewhere and it turned out to work. Of course, these steps
cannot really be magic. Where does the formula come from?
The first one is easier. We suppose s2 > 2, so s is ‘too big’ and we can think about removing
some tiny ε > 0.
We knew from the start we wanted to get (s−ε)2 > 2. That’s the same as saying s2 −2sε+ε2 > 2.
This looks like a quadratic in ε, which is nasty: let’s try to avoid solving a quadratic. Let’s work
backwards.
It’s enough to get s2 − 2sε ≥ 2, because then we can add ε2 to both sides (and 2 + ε2 > 2).
This is now a linear inequality, which is easy. Rearranging, it’s enough to get
s2 −2
ε≤ 2s .
Well, but we can choose ε, so in particular we can choose ε to satisfy this inequality. Maybe the
2
easiest option was the choice we made; though really any ε satisfying 0 < ε ≤ s 2s−2 would work.
The other calculation needs a bit more explanation. The idea is the same. Since we assume
s2 < 2, we think s is ‘too small’ and we want to look at s + ε: think of ε as being tiny, and work
backwards.
We want (s + ε)2 < 2, or equivalently s2 + 2εs + ε2 < 2. This time the ε2 doesn’t help us. But
if ε is tiny, then ε2 should be even smaller. In particular, we should be able to write ε2 < ε. So
let’s do that.
It’s enough to get s2 + 2εs + ε ≤ 2, because ε2 < ε. But this is (again) linear and we can
2−s2 2−s2
rearrange it: it’s enough to get ε ≤ 2s+1 . And again—we can choose ε; we can choose ε = 2s+1 .
This time, though, we need to be a bit careful. We said ε is tiny so ε2 < ε. Our argument
2−s2
relied on it! But what if this funny fraction 2s+1 isn’t tiny? If it is 1 or bigger, then our ‘ε2 < ε’
assumption would go wrong and the proof would not work.
2−s2
This is why we chose ε = min ( 12 , 2s+1 ). We insist ε is at most 12 so that we can write ε2 < ε.
2−s 2
And then we insist ε is at most 2s+1 so the rest of the argument works.
Example 8.10. Prove the ‘greatest lower bound property’: if S is a non-empty subset of R
that is bounded below, then S has a greatest lower bound.
This is an exercise.
Theorem 8.12 (Archimedean property). If x, y ∈ R and x > 0, then there exists an n ∈ N such
that y < nx.
Proof. Suppose we are given x and y such that the conclusion is false. That is, that there does
not exist an n ∈ N such that y < nx. This means that for all n ∈ N, y ≥ nx. In other words, y is
an upper bound of the non-empty set S = {nx ∣ n ∈ N}.
By the least upper bound property of the reals, S has a least upper bound u∗ . Note that
u∗ − x is not an upper bound of S, since u∗ − x is smaller than u∗ (x is positive) and u∗ is the
least upper bound. Hence there exists an element mx ∈ S (with m ∈ N) such that u∗ − x < mx,
that is, u∗ < (m + 1)x. But (m + 1)x is also an element of S, and we just said (m + 1)x > u∗ : this
contradicts the fact that u∗ is an upper bound of S.
Let’s see why the Archimedean property tells us that 0.999... = 1. Well, if not, then x =
1 − 0.999... cannot be zero; suppose x > 0. (This is the obvious thing to assume, but formally
we should consider the ‘other’ case that it is negative.) And let y = 1. Then the Archimedean
property says that there is some natural number n such that 1 < nx, i.e. x > n1 . Putting it another
way, we have
n + 10 + 100 + 1000 + ⋅ ⋅ ⋅ < 1
1 9 9 9
for some fixed natural number n. But this is not possible. To see that, observe that
n
1
n + 10
9
+ 100
9
+ 1000
9
+ ⋅⋅⋅ > 1
n + 10
9
+ 100
9
+ ⋅ ⋅ ⋅ + 109n = 1
n + 1010n−1
where the first inequality is simply because we’re leaving out all the (infinitely many, positive)
terms of the series after 109n , and the equality uses the formula for the sum of a geometric series.
But n < 10n is true for every natural number n, so n1 > 101n , so the right hand side of the above
is bigger than one.
As a consequence of the Archimedean property we are now able to prove that the set N of
natural numbers is not bounded above.
Proof. Suppose that N is bounded above. Then N has an upper bound y ∈ R. Since 1 ∈ N, 1 ≤ y,
and in particular, y > 0. Let x = 1. By the Archimedean property (Theorem 8.12), there exists
an n ∈ N such that y < nx = n. This contradicts the fact that y is an upper bound of N.
Warning 8.14. It’s rather common for students to say ‘the natural numbers are bounded above,
by ∞’. This is wrong. The symbol ∞ is a very convenient thing to use—and we will use it
repeatedly in what follows—but it is not a real number. Trying to treat ∞ as a real number is
one of the quickest ways to get to a wrong answer. Not just wrong in that your proof doesn’t
make sense, but wrong in that you get the wrong number out of your calculations, you end up
calculating things like 3 − 1 = 0.
Proof. We know that 0 is a lower bound of S. Suppose that l is a lower bound of S such that
l > 0. By the Archimedean property (with the real numbers x and y taken as x = 1 (> 0) and
y = 1l ), there exists n ∈ N such that 1l = y < nx = n ⋅ 1 = n, and so n1 < l, contradicting the fact
that l is a lower bound of S. Thus any lower bound of S must be less than or equal to 0. Hence
0 is the infimum of S.
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 125
For instance, the set S = {x ∈ R ∣ x < 2} is an interval: if x and y are both in S, and x ≤ z ≤ y,
then in particular z ≤ y < 2 so z ∈ S.
An interval may or may not have an upper bound: if it does have an upper bound, then it
has a supremum which may or may not be in the interval. Similarly, an interval may or may not
have a lower bound: if it does have a lower bound, then it has an infimum which may or may
not be in the interval. In the example above, S has an upper bound, and the supremum is not
in S, while S has no lower bound. There are thus three possible forms for the “lower” end of
an interval, and three possible forms for the “upper end”, making nine forms in all. These are
listed in the figure below, along with the notation for each type of interval.
An interval of the form (−∞, b), (a, b) or (a, ∞) is called an open interval. An interval of the
form (−∞, b], [a, b] or [a, ∞) is called a closed interval.
Thus in the notation for intervals used in Figure 8.2, a parenthesis ‘(’ or ‘)’ means that the
respective endpoint is not included, and a square bracket ‘[’ or ‘]’ means that the endpoint is
included. For example, [0, 1) is the set of all real numbers x such that 0 ≤ x < 1. (Note that the
use of the symbol ∞ in the notation for intervals is simply a matter of convenience and is not
be taken as suggesting that there is a real number ∞.) We do not give any special name to
intervals of the form [a, b) or (a, b].
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 126
In analysis, in order to talk about notions such as convergence and continuity, we will need
a notion of ‘closeness’ between real numbers. This is provided by the absolute value ∣ ⋅ ∣, and the
distance between real numbers x and y is ∣x − y∣. We give the definitions below.
Definition 8.17.
1. For a real number x, the absolute value ∣x∣ of x is defined as follows:
⎧
⎪
⎪ x if x ≥ 0,
∣x∣ = ⎨
⎪
⎩−x if x < 0.
⎪
2. The distance between two real numbers x and y is the absolute value ∣x − y∣ of their difference.
Note that ∣x∣ ≥ 0 for all real numbers x, and that ∣x∣ = 0 if and only if x = 0. Thus ∣1∣ = 1,
∣0∣ = 0, ∣−1∣ = 1, and the distance between the real numbers −1 and 1 is equal to ∣−1−1∣ = ∣−2∣ = 2.
The distance gives a notion of closeness of two points, which is crucial in the formalization of
the notions of analysis.
We can now specify regions comprising points close to a certain point x0 ∈ R in terms of
inequalities in absolute values, that is, by demanding that the distance of the points of the
region, to the point x0 , is less than a certain positive number δ, say δ = 0.01 or δ = 0.0000001,
and so on. See Figure 8.3.
I
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
x0 − δ x0 x x0 + δ
Figure 8.3: The interval I = (x0 − δ, x0 + δ) = {x ∈ R ∣ we have ∣x − x0 ∣ < δ} is the set of all points
in R whose distance to the point x0 is strictly less than δ (> 0).
x + y ≥ 0.
Then ∣x + y∣ = x + y. As ∣x∣ ≥ x and ∣y∣ ≥ y, we obtain ∣x∣ + ∣y∣ ≥ x + y = ∣x + y∣.
x + y < 0.
Then ∣x + y∣ = −(x + y). Since ∣x∣ ≥ −x and ∣y∣ ≥ −y, it follows that ∣x∣ + ∣y∣ ≥ −x + (−y) =
−(x + y) = ∣x + y∣.
The second of these inequalities, (8.4), is often called the triangle inequality. If you draw
a triangle in the plane, with sides of length a, b and c, then c ≤ a + b; to go from one vertex
of a triangle to another, it’s never longer to go down the side connecting them than via the
other two sides. The equation ∣x + y∣ ≤ ∣x∣ + ∣y∣ is just the special case when all three points of
the triangle are on a line.
Remark 8.19. We won’t go into it in this course, but this is going in the direction of why analysis
is such an important part of mathematics. It’s often useful to be able to make precise things
like: this shape is more like a circle than that shape; the distance you have to travel in the space
of shapes to get to the circle is shorter. Or the same thing with similarity of shapes replaced by
similarity of functions, or of many other things. One real-world application: if you use Amazon,
it recommends you products which were bought by people whose purchase history is similar
to yours. But it would take far too much computing time to find all users on Amazon whose
purchase history is similar to yours and look up what they bought that you don’t have, every
time you log on.
What is much quicker is for Amazon to keep track of a few ‘model users’ whose purchase
histories are fairly different, and for each model user a list of what people similar to the model
user bought. Then when you log on, Amazon just has to look up the one or two model users
whose purchase history is closest to yours, and output the list of what people similar to them
bought (minus the things you already have). Why does that work? Well, because of the triangle
inequality. The distance (in purchase history space) between your purchase history and a model
user M is maybe 3 units, and the similarity between M and any one of the people they are
similar to, say N , is at most 5 units. Why is something that N bought likely to be a good
suggestion for you? Well, because you and N cannot be more than 3 + 5 = 8 units apart: in this
‘purchase history space’ the triangle inequality holds. You have similar preferences to N , so you
might well like something they bought.
In MA203 you can study ‘metric spaces’—the abstract idea of a space together with an
idea of the distance between two points in the space. All one assumes is that two points are at
zero distance if and only if they are the same point (otherwise the distance is positive) and the
triangle inequality: the distance between any two points is never more than going via a third.
Amazingly, you can say quite a lot about these spaces without assuming anything more.
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 128
Exercise 8.2. Let A and B be non-empty subsets of R that are bounded above and define
A + B = {x + y ∣ x ∈ A and y ∈ B} .
(a) Show that sup A + sup B is an upper bound for A + B.
Deduce that sup(A + B) exists and that sup(A + B) ≤ sup A + sup B.
(b) For any real number ε > 0, show that (sup A − ε) + (sup B − ε) is not an upper bound for
A + B.
Deduce that sup(A + B) ≥ sup A + sup B − 2ε, for every ε > 0.
(c) Show that sup(A + B) = sup A + sup B.
Exercise 8.3. Let S be a non-empty set of positive real numbers, and define S −1 = { x1 ∣ x ∈ S}.
(a) Show that, if inf S = 0, then S −1 is not bounded above.
(b) Show that, if inf S > 0, then S −1 is bounded above and sup S −1 = 1
inf S .
Exercise 8.4. In this exercise we define the floor and ceiling functions. For any x ∈ R define
the set Sx = {n ∈ Z ∣ n ≤ x}.
(a) Show that, for any x ∈ R, the set Sx is non-empty and bounded above.
Hint: To show that Sx ≠ ∅, you will need the Archimedean property.
(b) Show that sup Sx exists and sup Sx ∈ Sx for any x ∈ R. Explain why we obtain as a
consequence that the following gives a proper definition of the floor function ⌊⋅⌋ ∶ R → Z
⌊x⌋ = max {n ∈ Z ∣ n ≤ x} , x ∈ R.
Solution to Exercise 8.3. One key idea is that, provided we stay within the realm of positive
real numbers, a larger element of S corresponds to a smaller element of S −1 .
(a) We prove the contrapositive: if S −1 is bounded above, then inf S =/ 0. Take an upper
bound M for S −1 . [This is what we gain: we can get our hands on a specific object M , and
reason about it.] Note that M > 0, as S contains at least one positive real number. Then x ≤ M
for every element x ∈ S −1 , which is equivalent to saying that 1/z ≤ M for every element z ∈ S.
[Because the elements of S −1 are exactly the elements of the form 1/z for z ∈ S.] In turn, this is
equivalent to saying that z ≥ 1/M for every z ∈ S. [We use here that both M and z (an element
of S) are positive.] This means exactly that the positive real number 1/M is a lower bound for
S. The infimum of S is then at least 1/M , and therefore greater than 0.
(b) Suppose now that s = inf S > 0. We claim that 1/s is an upper bound for S. [It’s important
to have an idea of what is going on, so that you can see that this is what is likely to be true.]
Indeed, as s is a lower bound for S, we have s ≤ z for every z ∈ S, and therefore 1/z ≤ 1/s for
every z ∈ S. This means that x ≤ 1/s for every x ∈ S −1 , i.e., that indeed 1/s is an upper bound
for S −1 .
What do we have left to prove? We have seen that S −1 is bounded above, and that 1/ inf S = 1/s
is an upper bound for S −1 . We still need to show that 1/s is the least upper bound for S −1 .
Suppose then that 0 < u < 1/s, i.e., 1/u > s. As s is the infimum of S, it follows that 1/u is not
a lower bound for S, in other words there is some z ∈ S with z < 1/u, or equivalently u < 1/z.
But now 1/z is an element of S −1 , so u is not an upper bound for S −1 . Hence indeed 1/s is the
supremum of S −1 , as required.
(By the way, normally we do not attach any meaning to S −1 when S is a set: sets don’t have
“inverses”. If, in the future, you want the notation to mean what it does here, you either have
to define it afresh, or you could say “where, for a set S of positive real numbers, S −1 is as in
Exercise 8.3”.)
Solution to Exercise 8.4.
(a) The set Sx is certainly bounded above by x, so the main issue is to show that Sx is
non-empty. If x ≥ 0, then 0 ∈ Sx , so we can assume x is negative. Once you’ve realised that,
you should see that the Archimedean property is exactly what we need: given any (negative) x,
there is an integer m such that −m ≥ −x, and so m ≤ x. The integer m is then in the set Sx , so
Sx is non-empty.
(b) For each x, the set Sx is non-empty and bounded above, so has a supremum sup Sx .
Moreover, the set Sx is a set of integers. You have seen in the first half of the course that a set of
integers bounded above has a maximum, so the set has a maximum element (which is the same
as the supremum). Thus, for every real number x, max{n ∈ Z ∣ n ≤ x} is a well-defined integer.
(c) The fact that ⌊x⌋ ≤ x is immediate from the definition. To see the other inequality,
suppose that ⌊x⌋ ≤ x − 1. Then m = ⌊x⌋ + 1 is an integer with m ≤ (x − 1) + 1 = x, so m is in the
set {n ∈ Z ∣ n ≤ x}, and m > ⌊x⌋, contradicting the choice of ⌊x⌋ as the maximum of this set.
(d) Similar to (a) √
and (b). √
(e) Evidently k ≤ k 2 + k, so k ∈ S√k2 +k . On the other hand, k + 1 > k 2 + k: both sides are
non-negative, so this inequality is equivalent to (k + 1)2 > k 2 + k, which is indeed true.√ Thus no
integer greater than k is in S k2 +k , and hence k is the maximum of S k2 +k , i.e., k = ⌊ k 2 + k⌋,
√ √
whenever k ∈ N.
Solution to Exercise 8.5.
Recall the definition: if S has a supremum s, and s ∈ S, then s is the maximum of S.
(Otherwise S has no maximum.)
So there are two ways that S could fail to have a maximum: (i) S has a supremum, s, but s
is not a member of S; (ii) S has no supremum.
In case (i), take any element x of S. Then x ≤ s because s is an upper bound for S, but
x =/ s since x is in S and s isn’t. Thus x < s, and so x is not an upper bound for S. That means
CHAPTER 8. ANALYSIS: THE REAL NUMBERS 131
9.1 Sequences
The notion of a sequence occurs in ordinary conversation. An example is the phrase “an
unfortunate sequence of events”. In this case, we envision one event causing another, which in
turn causes another event and so on. We can identify a first event, a second event, etcetera.
A sequence of real numbers is a list
a1 , a 2 , a 3 , . . .
of real numbers, where there is the first number (namely a1 ), the second number (namely a2 ),
and so on. For example,
1, 12 , 31 , . . .
is a sequence of real numbers. The first number is 1, the second number is 12 and so on. (There
may not be a connection between the numbers appearing in a sequence.) If we think of a1 as
f (1), a2 as f (2), and so on, then it becomes clear that a sequence of real numbers is a special
type of function, namely one with domain N and co-domain R. This leads to the following
formal definition.
Only the notation is somewhat unusual. Instead of writing f (n) for the value of f at a
natural number n, we write an . The entire sequence is then written in any one of the following
ways:
(an )n∈N , (an )∞
n=1 , (an )n≥1 , (an ).
132
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 133
In (an )∞
n=1 , the ∞ symbol indicates that the assignment process 1 ↦ a1 , 2 ↦ a2 , . . . continues
indefinitely. In these notes, we shall normally use the notation (an )n∈N . In general, the terms of
a sequence need not be real numbers, but in these notes we shall only be dealing with sequences
whose entries are real numbers, so we shall simply refer to them as sequences from now on.
The n-th term an of a sequence may be defined explicitly by a formula involving n, as in the
example given above:
an = n1 , n ∈ N.
It might also sometimes be defined recursively. For example,
a1 = 1, an+1 = n
n+1 an for n ∈ N.
(Write down the first few terms of this sequence.)
Example 9.2.
(i) ( n1 )n∈N is a sequence with the n-th term given by n1 , for n ∈ N. This is the sequence
1, 21 , 31 , . . . .
(ii) (1 + n1 )n∈N is a sequence with the n-th term given by 1 + n1 , for n ∈ N. This is the sequence
2, 32 , 43 , 45 , 65 , 76 , . . . .
(iii) ((−1)n (1 + n1 ))n∈N is a sequence with the n-th term given by (−1)n (1 + n1 ), for n ∈ N. This
is the sequence
−2, 23 , − 43 , 45 , − 65 , 67 , . . . .
(iv) ((−1)n )n∈N is a sequence with the n-th term given by (−1)n , for n ∈ N. This sequence is
simply
−1, 1, −1, 1, −1, 1, . . .
with the n-th term equal to −1 if n is odd, and 1 if n is even.
(v) (1)n∈N is a sequence with the n-th term given by 1, for n ∈ N. This is the constant sequence
1, 1, 1, . . . .
(vi) (n)n∈N is a sequence with the n-th term given by n, for n ∈ N. This is the strictly increasing
sequence
1, 2, 3, . . . .
(vii) ( 111 + 212 + 313 + ⋅ ⋅ ⋅ + n1n )n∈N is a sequence with the n-th term given by 1
11 + 212 + 313 + ⋅ ⋅ ⋅ + n1n ,
for n ∈ N. This is the sequence of ‘partial sums’
1 1 1 1 1 1
, + , + + , ....
11 11 22 11 22 33
(viii) (n1 000 000 2−n )n∈N is the sequence whose nth term is n1 000 000 2−n . Its first term is 12 , its
second term is a huge integer with about 30 000 decimal digits, its third term is even
bigger, and if you keep calculating, the terms will just keep getting bigger and bigger as
long as you have patience to keep going.
n
(ix) ( n2 + (−1)
n
)n∈N is the sequence whose terms are
1
1 , 23 , 31 , 34 , 15 , 63 , . . .
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 134
1
0.5
0
0 1 2 3 4 5 6 7
This portion of the graph suggests that the terms of the sequence ( n1 )n∈N “tend toward 0” as
n increases. This is consistent with the idea of convergence that you might have encountered
before: a sequence (an )n∈N converges to some real number L, if the terms an get “closer and
closer” to L as n “increases without bound”. Symbolically, this is represented using the notation
lim an = L,
n→∞
where L denotes the limit of the sequence. If there is no such finite number L to which the
terms of the sequence get arbitrarily close, then the sequence is said to diverge.
The problem with this characterization is its imprecision. Exactly what does it mean for the
terms of a sequence to get “closer and closer”, or “as close as we like”, or “arbitrarily close” to
some number L? Even if we accept this apparent ambiguity, how would one use the definition
given in the preceding paragraph to prove theorems that involve sequences? Since sequences are
used throughout analysis, the concepts of their convergence and divergence must be carefully
defined.
For example, the terms of (1 + n1 )n∈N get “closer and closer” to 0 (indeed the distance to 0
keeps decreasing), but its limit is 1:
2
1.5
1
0.5
0
0 1 2 3 4 5 6 7
Some of the terms of ((−1)n (1 + n1 ))n∈N get “as close as we like” or “arbitrarily close” to 1,
but the sequence has no limit:
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
−1
−2
Figure 9.3: First sixteen points of the graph of the sequence ((−1)n (1 + n1 ))n∈N .
The terms of (n1 000 000 2−n )n∈N , despite all appearances, do eventually get smaller. The first
few terms—indeed, the first few million terms—are enormous. But eventually (when n is very
large) the nth term is guaranteed to be very close to 0.
n
Finally, the terms of ( n2 + (−1)n
)n∈N don’t always get closer to 0. If you look at the nth
term where n is very large, that term will be very tiny—it will be either n1 or n3 depending on
whether n is odd or even—but the even-numbered terms are almost three times as big as the
odd-numbered term before: it keeps on getting further from 0:
1
0.5
0
0 1 2 3 4 5 6 7 8 9 10 11
n
Figure 9.4: First 11 points of the graph of the sequence ( n2 + (−1)
n
)n∈N .
So which of the sequences from Example 9.2 converge to 0? The answer is: the ones where
(eventually, maybe when n is very large) the terms are guaranteed to be close to 0. These are
examples (i), (viii) (even though the first few terms are enormous and growing), and (ix) (even
though each even term is further from 0 than the previous odd term). Sequence (ii) does converge,
but to the limit 1 not 0 (even though the terms are always getting closer to 0). Sequence (iii)
doesn’t converge to any limit (even though lots of terms are very close to 1, and lots more to
−1). The same is true for sequence (iv). The sequence (v) (obviously!) converges to 1.
The sequence (vi) ‘obviously’ doesn’t converge; it just keeps getting bigger, so it can’t possibly
stay close to any fixed real number (whatever candidate limit you pick, when n is large enough
the nth term of the sequence is going to be much bigger than your candidate limit). And finally
the sequence (vii) does converge, but to a number bigger than 1. It’s obvious the terms are all
at least 1—but how do I know it converges? More on that later.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 136
More generally, we want to say a sequence (an )n∈N converges to the real number L if, when
n is very large, an is guaranteed to be close to L. The following, which formalises that idea, is
the key definitio for this chapter.
Definition 9.3. The sequence (an )n∈N is said to converge to L if for every real number ε > 0,
there exists an N ∈ N (possibly depending on ε) such that for all n > N ,
∣an − L∣ < ε.
Then we say that (an )n∈N is convergent with limit L and write
lim an = L .
n→∞
Remember, you need to understand this definition not just memorise it—and the English
text version is easier to understand!
Warning 9.4. As we saw back in Chapter 3, if you swap around the order of quantifiers you can
change the meaning of a logical statement. This is the case here: if you change around the order
of the quantifiers in ‘converges to L’ then you will get a statement which means something, but
not any more what you want!
You can try to prove a sequence converges by following the general strategies in Chapter 3.
That is, the first quantifier in the definition is ∀ε > 0, a universal statement. So the first line of
the proof should be ‘Let ε > 0 be given.’ and then you need to prove the statement
where now you know that ε is some fixed positive real number. Now, this statement is an
existential statement: the easiest way to prove it is to find an N which works. In other words,
the next line of the proof is going to be something like ‘We choose N = ..’. Having chosen N ,
you need to prove
∀n > N , ∣an − L∣ < ε .
Back to a universal statement! So: ‘Fix n > N .’ And finally you need to prove
∣an − L∣ < ε
Some of these sequences are harder to work with than others. If you couldn’t do any of them,
the worked examples that follow should help. If you could do all of them except (vii) and (viii),
you’re doing very well. (If you think you have proofs for (vii) and (viii), then either you are
doing exceptionally well, or you saw this material before, or you assumed something unjustified!)
Note that ∣an − L∣ < ε if and only if an ∈ (L − ε, L + ε). Hence pictorially, for a convergent
sequence with limit L, this definition means the following, as illustrated in Figure 9.5.
Pick any ε > 0, and consider the shaded strip of width ε around the horizontal line passing
through L. Then one can find an N ∈ N, large enough, such that all the terms an of the sequence,
for n > N , lie in the shaded strip.
L+ε
L−ε
... ...
N N +1 N +2 N +3
It is definitely worth keeping this picture in mind for the rest of the chapter. Of course you
can do everything in Analysis just by sticking to the algebra and logic without ever drawing a
picture, but (at least for most people) trying to do this is a good way to get confused and make
errors.
∣ 2n+51sin n − 0∣ = 1
2n+5 sin n ≤ 1
n < 1
N = 1
1000
We’ll prove this by starting with the model proof above, and changing it so that it works for
any ε > 0 and not just 1000
1
.
Proof. Given ε > 0, choose N to be the smallest integer which is at least as big as both 5 and 1ε .
Then for all n > N , we have n + 5 sin n ≥ 0, so 2n + 5 sin n ≥ n, so
∣ 2n+51sin n − 0∣ = 1
2n+5 sin n ≤ 1
n < 1
N ≤ε
Definition 9.7. The ceiling of a real number x, written ⌈x⌉, is the smallest integer at least as
big as than x; the floor of x, written ⌊x⌋, is the largest integer which is not bigger than x.
Given a collection of real numbers s1 , s2 , . . . , st , we write max(s1 , ..., st ) for the largest of
them, and min(s1 , ..., st ) for the smallest.
So ‘the smallest integer which is at least as big as both 5 and 1ε ’ is simply max (5, ⌈ 1ε ⌉).
You should now be able to do the following.
Activity 9.2. Use the definition of the limit of a sequence to show that ( n1 )n∈N is a convergent
sequence with limit 0.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 139
Example 9.8. Use the definition of the limit of a sequence to show that (1 + 2n21−n )n∈N is a
convergent sequence with limit 1.
Example 9.9. Use the definition to show that (an )n∈N = ((−1)n (1 + n1 ))n∈N is a divergent
sequence.
There is only one reason why a sequence might converge: the terms get close to a limit and
stay there as you look at larger and larger n. But there are a few different ways a sequence
(an )n∈N can fail to converge. It might be that an bounces around crazily all over the place,
sometimes very big and sometimes very small, no matter how big you make n. It might be that
it just keeps getting bigger and bigger and eventually gets too big for any candidate limit (or
the same thing but in the negative direction). It might be that the sequence doesn’t look too
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 140
crazy, but it jumps between being close to different real numbers. This (see Figure 9.3) is an
example of the last: the odd-numbered terms get close to −1, and the even-numbered terms get
close to 1.
It’s very tempting to say ‘we prove that −1 is not a limit. Then we prove that 1 is not a limit.
And now we are done’. But this is not enough. If you say this, you didn’t rule out the possibility
that 0 is a limit, or π, or any other real number; you need to rule out all the real numbers.
So the proof has to start ‘Given a real number L..’ and then go on to show that L cannot be
a limit. Let’s see how that goes.
Proof. Given a real number L, we need to show that ∀ε > 0 , ∃N ∈ N , ∀n > N , ∣an − L∣ < ε is a
false statement.
Working from the start, that means we need to find a counterexample to the ‘for all ε > 0’.
That is, one particular ε > 0 such that ∃N ∈ N , ∀n > N , ∣an − L∣ < ε is false. We’ll do that for
ε = 1.
Now we need to show that ∃N ∈ N , ∀n > N , ∣an − L∣ < 1 is a false statement. So: given any
N ∈ N, we need to show ∀n > N , ∣an − L∣ < 1 is a false statement, so we need to come up with
some particular number n > N such that ∣an − L∣ < 1 is false.
At this point, we need to look at the terms an and what L is.
Case 1: L ≥ 0. Choose n to be any odd integer greater than N , then we have
∣an − L∣ = ∣L − an ∣ = L − (−1)n (1 + n1 ) = L + 1 + n1
∣an − L∣ = (−1)n (1 + n1 ) − L = 1 + n1 − L
Activity 9.3. Check that for any real number L, any sequence (an )n∈N , and any ε > 0, we have:
‘ ∃N ∈ N , ∀n > N , ∣an − L∣ < ε is a false statement’ if and only if ‘there are infinitely many n
such that ∣an − L∣ ≥ ε’.
We can (and generally we would) shorten the proof of Example 9.9 a bit:
Proof. Given a real number L, we choose ε = 1. We want to show there are infinitely many n
such that ∣an − L∣ ≥ 1.
Case 1: L ≥ 0. For every odd integer n, we have
∣an − L∣ = ∣L − an ∣ = L − (−1)n (1 + n1 ) = L + 1 + n1
∣an − L∣ = (−1)n (1 + n1 ) − L = 1 + n1 − L
We should also remember (from Chapter 3) that when we want to use a ‘for all’ statement,
what we will do generally doesn’t look like proving a ‘for all’ statement. As we’ve just seen, the
first line of proving the statement lim an = L is generally going to be ‘Given ε > 0, . . . ’. What
n→∞
do we do if we are given that (an )n∈N is a convergent sequence with limit L, and we want to
prove something about (say) L?
The notation lim an suggests that the limit is unique. But is this actually well-defined, or
n→∞
could it be that there is a convergent sequence with two different limits?
Warning 9.10. We saw the sequence ((−1)n (1 + n1 ))n∈N before. It’s tempting to say ‘yes, this
sequence has two limits, 1 and −1’. But this is false: this sequence doesn’t tend to any limit at
all; it is divergent, as we just proved.
The proof of this is a good example of how we can use the fact that a given sequence is
convergent with a certain limit.
Proof. Formally, to prove ‘a convergent sequence has a unique limit’, we need to show two
things:
Here, (i) is true by the definition of convergence, so we only have to prove (ii).
In other words, if lim an = L1 and lim an = L2 , then we have to prove L1 = L2 .
n→∞ n→∞
Suppose that (an )n∈N is a sequence which is convergent with limit both L1 and L2 .
If L1 = L2 , then there is nothing to prove. So suppose for a contradiction that this is not the
case.
We choose ε = 13 ∣L2 − L1 ∣, which is positive since L2 ≠ L1 .
Because lim an = L1 , there is some N1 ∈ N such that if n > N1 then an is guaranteed to be
n→∞
within ε of L1 . And since lim an = L2 there is a (maybe different) N2 ∈ N such that if n > N2
n→∞
then an is guaranteed to be within ε of L2 .
Now pick an n which is bigger than both N1 and N2 (for example n = N1 + N2 + 1). Then we
have ∣an − L1 ∣ < ε and ∣an − L2 ∣ < ε. So by the triangle inequality, we have
You should notice that this proof, where we use the assumption that a sequence tends to a
limit, looks nothing like proving that a sequence tends to a limit. We get to choose our favourite
ε > 0, and then we are given N1 and N2 . I want to stress that the choice we made, ε = 13 ∣L2 − L1 ∣,
isn’t ‘obvious’ at the point in the proof where we write ‘we choose...’. Again, if you were not
just reading this proof but trying to think it up, you’d leave a blank space here to fill in later,
once you see (at the second to last line) what you actually need: 2ε shouldn’t be bigger than
∣L2 − L1 ∣. If you think a bit, in fact ε = 12 ∣L2 − L1 ∣ would actually work as well (because we have
strict inequalities) but it is better to write something which ‘works easily’; you’re less likely to
make a mistake.
Again, it’s not too easy to see where this proof comes from just by looking at the algebra.
How did I find it? Well, I drew Figure 9.5, with the ‘first’ limit L1 :
L1 + ε
L1
L1 − ε
... ...
N1 N1 + 1 N1 + 2 N1 + 3
...
N2 N2 + 1
L2 + ε
L2
L1 + ε
L2 − ε
L1
L1 − ε
...
N1 N1 + 1 N1 + 2 N1 + 3
Now, what does this picture mean? In order for lim an = L1 to be true, all the points after N1
n→∞
have to be in the grey box, which they are. And for lim an = L2 to be true, all the points have
n→∞
to be in the hashed box after N2 . Which they are not in this picture—but they could have been:
they could have all been in the grey-hashed overlap. We want to get rid of that possibility—we
chose ε too big, we should choose a smaller value so the boxes don’t overlap:
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 143
...
N2 N2 + 1
L2 + ε
L2
L2 − ε
L1 + ε
L1
L1 − ε
...
N1 N1 + 1 N1 + 2 N1 + 3
And now we are happy: if N is at least as big as both N1 and N2 , for lim an = L1 to be true
n→∞
all the points an with n > N need to be in the grey box; but for lim an = L2 to be true all the
n→∞
points an with n > N need to be in the hashed box. That can’t be: the boxes don’t overlap!
The proof we saw really came from drawing this final picture. It tells us how we need to
choose ε > 0. Then N1 and N2 are given to us by the definition of ‘converges to L1 ’ and ‘converges
to L2 ’ respectively. Then the picture tells us to choose N = max(N1 , N2 ).
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 144
Let’s finally prove one important fact about limits of sequences: if all the terms of a convergent
sequence are contained in a closed interval [a, b], then so is the limit.
Theorem 9.12. Suppose [a, b] is any closed interval, and (xn )n∈N be a convergent sequence with
limit L, where xn ∈ [a, b] for all n ∈ N. Then L is also in [a, b].
Proof. We prove this theorem by contradiction. Suppose for a contradiction that L ∈/ [a, b]. Then
either L > b, or L < a.
Case 1: Suppose L > b. Choose ε = L−b2 , which is positive. Since (xn )n∈N converges to L, there
exists N ∈ N such that for all n > N we have
where the final inequality is since L > b. In particular, for n = N + 1 we have xn > b. But this is a
contradiction to our assumption xn ∈ [a, b].
The second case is very similar.
Case 2: Suppose L < a. Choose ε = a−L 2 , which is positive. Since (xn )n∈N converges to L, there
exists N ∈ N such that for all n > N we have
where the final inequality is since L < a. In particular, for n = N + 1 we have xn < a. But this is
a contradiction to our assumption xn ∈ [a, b].
In either case, we found a contradiction, so the theorem is proved.
It’s worth noticing that this theorem would be false for an open interval; we’ve already seen
that ( n1 )n∈N is a convergent sequence with limit 0. All the terms of this sequence are in (0, 2),
but the limit is not.
bigger than 4ε−2 , so (yet again!) we can replace n with 4ε−2 to make the denominator smaller
and the fraction bigger. And simplifying we have the ε we wanted: we proved ∣an − 1∣ < ε.
This proof works fine. It’s easy to check. It is not obvious how to think of it. We make a
‘magical’ choice of N at the beginning, and it just happens to be exactly what we need to get a
pretty ε at the end.
Of course, the truth is that I didn’t really make this choice of N at the beginning. I left a
blank space, and filled it in later after realising what I needed. Later on, we’ll see proofs which
are more complicated, and there you might have several ‘magical’ choices made at the beginning
of the proof.
Some people (and some textbooks too) prefer a more informal style, where we don’t choose
N at the beginning, but simply write down that it is to be chosen later and then write down
what we need as we go. Here’s the same proof, written that way.
Proof. Given ε > 0, we will choose an integer N later.
Suppose n > N . Then we have
RRR R R RRR
R
R 1 RRRR RRRR 1 RRR
∣an − 1∣ = RRR √ RRR = RRR √ n
RRR n − π RRR RRR 2 4 − π RRRRR
√ √ √ √
We will need N ≥ 400 in order to write 2 n4 − π < n4 + 100 − π < n4 . Putting this in, we get
RRR R
1 RR
∣an − 1∣ < RRRRR √ n RRRRR .
RRR 4 RRR
We need the right side of this to be less than ε, so we need N ≥ 4ε−2 for that to work. Putting
this in, we get
RRR R
R
R 1 RRRR
∣an − 1∣ < RRR √ R=ε
RRR 4ε−2 RRRRR
4
Warning 9.13. If you choose N as you go in a proof, you might end up writing something like
‘we choose N bigger than (7 − ε)(n − π)’. This looks all fine, and there certainly is an integer n
all over the place in the proof: why not, if it makes the inequalities come out the way you want?
Let’s try to prove that ( n−π
n
)n∈N tends to 7.
Proof. Given ε > 0, we will choose an integer N later.
Suppose n > N . When n > 5, we have 6n > 30 > 7π, so 6n − 7π > 0, so 7(n − π) > n, so n
n−π < 7.
We want this last inequality, so we will choose N > 5. Then we have
n n
∣ − 7∣ = 7 − .
n−π n−π
We should choose N bigger than (7 − ε)(n − π), because then we can make the numerator smaller
(and so the RHS becomes larger) by writing
n n N (7 − ε)(n − π)
∣ − 7∣ = 7 − <7− <7− = 7 − (7 − ε) = ε ,
n−π n−π n−π n−π
which is what we wanted. So we should choose N = max (5, ⌈(7 − ε)(n − π)⌉).
If you try writing this proof in the formal style, you’ll see something is fishy rather fast:
Proof. Given ε > 0, choose N = max (5, ⌈(7 − ε)(n − π)⌉).
Wait—what is n? Something is wrong.
When you read the formal proof (as when you read any proof) one thing you should be
thinking is: do I know what each quantity is as it comes? With ‘given ε > 0’ there is no problem;
that means that ε is allowed to be any positive real number, from this point on we fix one
particular choice, and the rest of the proof should work whatever positive real number it happens
to be. Then ‘choose N to be..’ means that we are trying to define a quantity N . We want it
bigger than 5; no problem. And bigger than a formula. Well, the formula contains ε—we know
what that is, we just fixed it. And it contains π—that’s about 3.14. And it contains n. What
is n? We haven’t seen it before, we don’t know what it is—how should we work out what this
formula is? ERROR ERROR! COMPUTER SAYS NO!
What is wrong is not just a formality; this is not me being picky for the sake of it. The
sequence ( n−π
n
)n∈N does not tend to 7, in fact (as you can convince yourself by working out a few
values, and as you will be able to prove easily by the end of this chapter) the sequence tends
to 1. If you want to use the informal style of writing a proof, you need to check that if you
would write it out in the formal style, then you wouldn’t ever try to use some letter in a formula
before you actually say what that letter is. If you’re trying to prove a sequence converges to a
limit, that means that when you choose N you can refer to ε, but not to n (or to anything that
depends on n!). Otherwise, you may ‘prove’ completely wrong statements, like the one above.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 147
(1 + x)n ≥ 1 + nx.
Proof. We prove this result by induction on n. Note first that, for n = 1, the inequality states
that 1 + x ≥ 1 + x, which is certainly true.
Suppose now that, for some n ∈ N, (1 + x)n ≥ 1 + nx. Now we have
So the inequality holds for n + 1. Hence, by induction, the inequality holds for all n ∈ N.
You might wonder where we used the assumption x ≥ −1 in this proof. The answer is: we
multiplied the induction hypothesis (1 + x)n ≥ 1 + nx through by 1 + x and didn’t change the
direction of the inequality because x + 1 ≥ 0.
Now we use Bernoulli’s Inequality to show that, whenever ∣x∣ < 1, (xn )n∈N is convergent with
limit 0.
Theorem 9.15. Let x be any real number with −1 < x < 1. Then (xn )n∈N is a convergent sequence
with limit 0.
and so
1
0 ≤ ∣x∣n ≤.
nh
Remembering this inequality, let’s begin the proof of convergence.
Given any ε > 0, we take N = ⌈ εh
1
⌉. Now, for n > N , we have
1 1
∣xn − 0∣ = ∣x∣n ≤ < ≤ ε.
nh N h
Hence lim xn = 0.
n→∞
Let’s check quickly that we did not fall into the ‘trap’ of Warning 9.13. We chose N = ⌈ εh1
⌉.
That depends on ε, which is fine: we were given ε already. And it depends on h—what is h? We
1
defined it to be ∣x∣ − 1. What is x? That was given to us at the start of the proof too, so that’s
fine. What is important is that the choice of N we make doesn’t depend on n (because we don’t
know what n is yet, we only introduce it on the next line).
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 148
1
Example 9.16. Use Bernoulli’s Inequality to show that lim 2 n = 1.
n→∞
1 1 1
Proof. 2 > 1 and so 2 n > 1 (for otherwise 2 = (2 n )n ≤ 1, a contradiction). Let an ∶= 2 n − 1 ≥ 0.
Then Bernoulli’s inequality says 2 = (1 + an )n ≥ 1 + nan , and so
1
0 ≤ an ≤ .
n
Now, given any ε > 0, choose N = ⌈1/ε⌉. Then, for n > N ,
1 1 1
∣2 n − 1∣ = ∣an ∣ = an ≤ < ≤ ε.
n N
√ √ √ √
Therefore the sequence 2, 2, 3 2, 4 2, 5 2, . . . is convergent with limit 1.
I’ve so far been rather careful to make sure N is always chosen to be an integer (because
that’s what it is declared to be in Definition 9.3, the definition of convergence). This is why I’ve
put in these ⌈⋅⌉ symbols ‘the smallest integer at least..’.
However, it’s rather common to simply write ‘choose N ≥ 1/ε’ rather than ‘choose N = ⌈1/ε⌉’,
and leave it implicit that N is supposed to be an integer. I’ll be happy with you producing
either.
You might wonder why we don’t simply change the definition of convergence and allow N to
be any real number to avoid these issues. The answer to that is that it often is convenient to
assume N is a natural number in proofs; we can write things like aN and be assured that that
is actually a term of our sequence.
Definition 9.17. A sequence (an )n∈N is said to be bounded if there exists a real number M > 0
such that
for all n ∈ N, ∣an ∣ ≤ M. (9.1)
Note that a sequence is bounded if and only if the set S = {an ∣ n ∈ N} is bounded (this is an
exercise).
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 149
Example 9.18.
∣an ∣ = an
1 1 1 1
= 1 + 2 + 3 + ⋅⋅⋅ + n
1 2 3 n
1 1 1 1
< 1 + 2 + 3 + ⋅⋅⋅ + n
1 2 2 2
1 1 1 1 1 1 1
= 1 + (1 − ) + 2 (1 − ) + ⋅ ⋅ ⋅ + n−1 (1 − )
1 2 2 2 2 2 2
1 1 1 1 1 1 1
= 1 + − 2 + 2 − 3 + 213 − 4 + ⋅ ⋅ ⋅ + n−1 − n
2 2 2 2 2 2 2
1 1
=1+ − n
2 2
3
< .
2
Thus all the terms are bounded by 32 , and so the sequence is bounded.
As usual, I did not know 32 would turn out to be an upper bound at the start of the proof;
I left a blank space and filled it in once I found it at the end. As with convergence, what’s
important is that whatever number I write for an upper bound has to be a number which does
not depend on n.
You might be a bit unhappy with the proof above. If so:
1 1
Activity 9.5. Write down a detailed proof that ∣an ∣ ≤ 1 + − using induction on n.
2 2n
Example 9.20. Show that the sequence (an )n∈N given by an = n for n ∈ N, is not bounded.
Proof. Given any M > 0, there exists an N ∈ N such that M < N (Archimedean property with
y = M and x = 1). Thus
convergent bounded
sequences sequences
The sequences (1)n∈N , ( n1 )n∈N , (1 + n1 )n∈N are all convergent, and we have shown above that
these are also bounded. This is not a coincidence, and in the next theorem we show that the set
of all convergent sequences is contained in the set of all bounded sequences, as illustrated in
Figure 9.6.
Proof. Let (an )n∈N be a convergent sequence with limit L. Set ε = 1. Then, using the definition
of convergence for this value of ε, we see that there exists N ∈ N such that, for all n > N ,
∣an − L∣ < 1.
∣an ∣ ≤ M
Proof. We have seen above that (n)n∈N is unbounded. It follows from Theorem 9.21 that (n)n∈N
is not convergent.
Keep in mind that some divergent sequences are not bounded, but some other divergent
sequences are bounded, such as sequence (iv) of Example 9.2.
Example 9.24.
monotonically strictly monotonically strictly
Sequence monotone?
increasing? increasing? decreasing? decreasing?
( n1 )n∈N No No Yes Yes Yes
(1 + n1 )n∈N No No Yes Yes Yes
((−1)n (1 + n1 ))n∈N No No No No No
(1)n∈N Yes No Yes No Yes
(n)n∈N Yes Yes No No Yes
( 111 + 1
22
+ 1
33
+ ⋅⋅⋅ + 1
)
nn n∈N Yes Yes No No Yes
The following theorem can be useful for showing that sequences converge without knowing
the limit beforehand (or with less work than using the definition).
Again, to see how to prove this it helps to draw a picture. We’ll draw the picture for the
case ‘monotonically increasing’.
2
1.5
1
0.5
0
0 1 2 3 4 5 6 7
The black line at 2 is an upper bound for the sequence. But it doesn’t look like 2 is a good
candidate for the limit: it’s too big.
The dashed line at 1 isn’t an upper bound for the sequence. This proves 1 can’t be the limit,
it is too small (because the sequence is increasing, it can never go back down from above 1 to
get arbitrarily close to 1).
The dotted line at 1.5 looks like a better candidate for the limit. It is an upper bound for
the sequence, and this proves that 2 can’t be the limit: the sequence can never get above 1.5 so
it cannot get arbitrarily close to 2.
But if there is a smaller upper bound than 1.5 then that would prove 1.5 cannot be the limit.
What we are looking for is the least upper bound of the set {an ∣ n ∈ N}. Since this is a non-
empty, bounded set of real numbers, the least upper bound property says that L = sup {an ∣ n ∈ N}
exists. We just need to prove it is the limit. Let’s formalise that.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 152
Proof. We first consider the case that our sequence is monotone increasing, then do the monotone
decreasing case.
Let (an )n∈N be a monotonically increasing sequence. Since (an )n∈N is bounded, it follows that
the set
S = {an ∣ n ∈ N}
has an upper bound and so L = sup S exists. We show that in fact (an )n∈N converges to L.
Given ε > 0, we want to find N such that L − ε < an < L + ε for all n > N . The right-hand
inequality is going to be easy: by definition of ‘upper bound’, an ≤ L < L + ε is true for all n ∈ N.
So the difficulty is to get L − ε < an .
Because L is the least upper bound of S, it follows L − ε is not an upper bound of S. That
means there exists some N such that aN > L − ε.
But because the sequence is increasing, we have for any n > N the fact
L − ε < aN ≤ aN +1 ≤ aN +2 ≤ ⋅ ⋅ ⋅ ≤ an
Example 9.27. The following table gives a summary of the valid implications, and gives
counterexamples to implications which are not true.
Question Answer Reason/Counterexample
The following activity is slightly tricky: think about how to put together the theorems you
already saw in order to prove it.
Activity 9.6. Let (an )n∈N be a sequence. We say it is a Cauchy sequence if the following is
true: For every δ > 0, there is an M ∈ N such that if m, m′ > M then we have ∣am − am′ ∣ < δ.
Prove that a sequence is convergent if and only if it is a Cauchy sequence.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 153
9.4.1 Series
We won’t really talk about series in this course, but it is worth giving the definition.
Given real numbers a1 , a2 , . . . , when we write the ‘series’
∞
∑ an
n=1
As we’ve seen, some sequences converge and other sequences diverge; that’s equally true for
sequences of partial sums. So some series converge (we can write down a real number which is
the ‘infinite sum’) and some diverge (the ‘infinite sum’ doesn’t make sense).
What Example 9.28 shows, in this language, is that
∞
1
∑ n
n=1 n
is a convergent series; it makes sense to say that this ‘infinite sum’ is a real number.
It would be reasonable to guess that there is perhaps some nice formula for the limit of this
series that lets us find out what it is more easily than adding up infinitely many terms. But
we don’t know any such formula. Not only that, we don’t even know if the limit is a rational
number or not! This is a long open problem: in 1697, Johann Bernoulli proved that
∞
1 1 1
∑ n
= ∫ dx ,
n=1 n 0 xx
but this doesn’t help us, either with calculating the limit or finding out if it is rational.
One can fairly easily (with a computer) find out what the first few (or few million) digits of
the limit are, and from this calculation we can show that if the limit is rational, then the fraction
p
q which is the limit needs to have a very large denominator: q has to be into the millions. So
our best guess is that no such fraction exists: probably the limit is irrational.
This is about all I want to say about series in this course. If you take MA203, you’ll return
to the topic there. The only thing I have left to say is a warning. There is a reason that we
invented the new word ‘series’ rather than just say ‘infinite sum’. The reason is that ‘infinite
sum’ sounds friendly and well-behaved. You can do all kinds of things in a sum, like rearrange
the terms (because addition is commutative).
Series are not friendly and well-behaved. If you rearrange the terms you get a different series,
which might have a completely different limit.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 154
Activity 9.7. Prove that if (an )n∈N and (bn )n∈N are both are convergent, with limits respectively
A and B, then (an + bn )n∈N is a convergent sequence too, with limit A + B.
However, once you have done this once, you will not learn much from doing the same thing for
(say) (an − bn )n∈N or (an bn )n∈N . The purpose of this section is to do that work for you. We’ll see
that a sequence which looks ‘complicated’ can often be broken down into ‘simple’ sequences by
algebraic operations like addition and subtraction, and we can find the limit of the complicated
sequences by doing the same algebra with the limits of the simple sequences (which we will
hopefully already know or be able to look up). This work-saving device is called the Algebra of
Limits.
4n2 + 9
an =
3n2 + 7n + 11
converges to 34 .
Proof. We could do this by going back to the definition of convergence, and writing half a page
of algebra.
But it is much easier to write:
n2 (4 + n92 ) 4 + n92
an = = ,
n2 (3 + n7 + n112 ) 3 + n7 + n112
Theorem 9.30 (Algebra of Limits). If (an )n∈N and (bn )n∈N are convergent sequences, then the
following hold:
(a) For all α ∈ R, (αan )n∈N is a convergent sequence and lim αan = α lim an .
n→∞ n→∞
(c) (an + bn )n∈N is a convergent sequence and lim (an + bn ) = lim an + lim bn .
n→∞ n→∞ n→∞
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 155
k
(e) For all k ∈ N, (akn )n∈N is a convergent sequence and lim akn = ( lim an ) .
n→∞ n→∞
1 1
(f ) If for all n ∈ N, bn ≠ 0 and lim bn ≠ 0, then ( ) is convergent and moreover, lim =
n→∞ bn n∈N n→∞ bn
1
.
lim bn
n→∞
(g) For all k ∈ N, (an+k )n∈N is convergent and lim an+k = lim an .
n→∞ n→∞
√ √ √
(h) If for all n ∈ N, an ≥ 0, then ( an )n∈N is convergent and lim an = lim an .
n→∞ n→∞
That was a long theorem—which you should think of as good: there are lots of algebraic
operations you can do and you are guaranteed to get the right answer.
Before proving it (the proof comes in eight parts, so it is long, but no part is hard) it’s
probably best to highlight what the Algebra of Limits does not let you do.
Warning 9.31. The Algebra of Limits only works if the sequences (an )n∈N and (bn )n∈N are
convergent. If they are not, sometimes you will end up with a nonsensical answer (like ‘infinity
minus infinity’, and at least you know something is wrong. Sometimes you will get a nice real
number, but it happens to be the wrong real number.
The Algebra of Limits lets you add up (or subtract, multiply, et cetera) two sequences. By
using it repeatedly, you can also add up three sequences, or four, and so on. We’ll normally do
that without comment (as we did in Example 9.29). But let’s recall that (1)n∈N converges to 1,
and ( n1 )n∈N converges to 0. So can we write
lim 1 = lim1
+ ⋅ ⋅ ⋅ + n1 = lim n1 + ⋅ ⋅ ⋅ + lim n1 = 0 + ⋅ ⋅ ⋅ + 0 = 0 ..?
n→∞ n→∞ n
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ n→∞ n→∞ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
n times
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ n times
n times
Of course not, because that would say 1 = 0. The problem is the second equality, which looks
like the Algebra of Limits. It’s not. This is not a fixed number of sequences, and we’ve just seen
a way to misuse the Algebra of Limits to get the wrong answer. If you’re paying attention, you
will notice that the next two formula don’t make sense: what should the n under the bracket at
the bottom actually be? n is supposed to be some natural number, but which one?
What I mean by this is that in the first two formulae, n is a bound variable—that is, it is a
placeholder, it only makes sense inside of the ‘lim’ symbol. So for example, the formula lim n1
n→∞
means exactly the same as lim z1 , which means the same as ‘the limit of the sequence whose
z→∞
terms are 1, 12 , 13 , 14 , ...’. Any time you start a sequence of statements or equations with an n
(or some other letter) as a bound variable (inside a limit, or a quantifier) and at the end it’s
popped out to become a free variable (summing up n lots of zero) you can be fairly confident
that you have made a mistake.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 156
Proof of Theorem 9.30. Throughout this proof, we assume (an )n∈N and (bn )n∈N are convergent
sequences, and their limits are La and Lb , respectively.
∣an − La ∣ < ε.
Then (as you will prove in an exercise) we have for all n > N :
(d) Before beginning this, let’s quickly notice why it is a bit tricky. We want to argue that if
an is close to La , and bn is close to Lb , then an bn is close to La Lb .
The easiest way to do this is to argue in two steps: first, an bn is close to La bn , then second
La bn is close to La Lb . If we can do that, then the triangle inequality tells us an bn is close
to La Lb .
The second part is about the same as what we already did in (a), and we can copy the
proof over. For the first part, the difficulty is that if bn is huge, then an might be close to
La but still an bn is not very close to La bn . To deal with this, we use Theorem 9.21 to say
that (bn )n∈N is bounded, which gives us an upper bound on how huge bn can be.
First, since (bn )n∈N is a convergent sequence, by Theorem 9.21 it is bounded. Let M > 0
be1 such that ∣bn ∣ ≤ M for every n ∈ N.
1
Even if bn = 0 for all n, the definition of ‘bounded’ insists that we choose a bound which is strictly positive—for
example we could set M = 1 in this situation.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 157
Given ε > 0, we choose Na such that for all n > Na we have ∣an − La ∣ < 2Mε
, which we
can do since (an )n∈N converges to La . We choose Nb such that for all n > Nb we have
∣bn − Lb ∣ < 2∣Lεa ∣+1 . And finally we let N = max(Na , Nb ).
Now suppose n > N .
Step 1: We want to show ∣an bn − La bn ∣ < 2ε .
We have ∣an − La ∣ < ε
2M , so multiplying both sides by ∣bn ∣ we get
Since ∣bn ∣∣an − La ∣ = ∣an bn − La bn ∣ by Theorem 8.18, that’s what we wanted for Step 1.
Step 2: We want to show ∣La bn − La Lb ∣ < 2ε .
We have ∣bn − Lb ∣ < ε
2∣La ∣+1 , so multiplying both sides by ∣La ∣ we get
(e) This can be shown by using induction on k and from part (d) above. It is trivially true
with k = 1. Suppose that it holds for some k, then (akn )n∈N is convergent and
k
lim akn = ( lim an ) .
n→∞ n→∞
Hence by part (d) above applied to the sequences (an )n∈N and (akn )n∈N , we obtain that the
sequence (an ⋅ akn )n∈N is convergent and
k k+1
lim an akn = ( lim an ) ( lim akn ) = ( lim an ) ( lim an ) = ( lim an ) .
n→∞ n→∞ n→∞ n→∞ n→∞ n→∞
Thus (ak+1
n )n∈N is convergent and
k+1
lim ak+1
n = ( lim an ) .
n→∞ n→∞
2
delta, δ is another Greek letter traditionally used for small quantities.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 158
(f) This time, what could be tricky is if bn is very close to 0. To avoid that, let N1 ∈ N be such
that, for all n > N1 ,
∣Lb ∣
∣bn − Lb ∣ < ,
2
which we can do since (bn )n∈N converges to Lb .
For all n > N1 , we have
∣Lb ∣
∣Lb ∣ − ∣bn ∣ ≤ ∣∣Lb ∣ − ∣bn ∣∣ ≤ ∣bn − Lb ∣ < ,
2
ε∣Lb ∣2
∣bn − Lb ∣ < ,
2
which exists since (bn )n∈N converges to Lb . Now we let N = max{N1 , N2 }.
Suppose n > N . Then we have
1 1 ∣bn − Lb ∣ ε∣Lb ∣2 2 1
∣ − ∣= = ∣bn − Lb ∣ ⋅ ∣bn ∣−1 ∣Lb ∣−1 < = ε.
bn Lb ∣bn ∣ ∣Lb ∣ 2 ∣Lb ∣ ∣Lb ∣
1 1 1 1
So ( ) is convergent and lim = = .
bn n∈N n→∞ bn Lb lim bn
n→∞
ε∣Lb ∣2
Remark 9.33. Of course the funny number is something we got to by saying ‘choose
2
N2 such that if n > N2 then ∣bn − Lb ∣ < δ’, then doing algebra with δ (as in (d) to figure out
how we should choose δ). We get to
∣ b1n − L1b ∣ ≤ ∣bn − Lb ∣ ⋅ ∣bn ∣−1 ∣Lb ∣−1 < δ∣bn ∣−1 ∣Lb ∣−1
Activity 9.8. Show the remaining part of (h), i.e. that if (an )n∈N is a sequence of nonnegative
reals, converging to 0, then
√
lim an = 0 .
n→∞
Example 9.34. Determine whether the following sequence is convergent and find its limit.
n2 − 24n3 + 3n4 − 12
( )
1 + 7n + 21n4 n∈N
1
Proof. By Activity 9.2 on page 138 we know that lim = 0. We now use Theorem 9.30 after
n→∞ n
first factorizing out n4 in both the numerator and denominator:
n2 − 24n3 + 3n4 − 12 n4 1
− 24
n + 3 − n4
12
lim = lim ⋅ n2
n→∞ 1 + 7n + 21n4 n→∞ n4
n4 + n3 + 21
1 7
2 4
( lim n1 ) − 24 lim 1
+ 3 − 12 ( lim n1 )
n→∞ n→∞ n n→∞
= 4 3
( lim n1 )+ 7 ( lim n1 ) + 21
n→∞ n→∞
0 − 24 ⋅ 0 + 3 − 12 ⋅ 04
2
=
04 + 7 ⋅ 03 + 21
3 1
= = .
21 7
Example 9.35. Determine whether the following sequence is convergent and find its limit.
2n + 3n + 1
( )
3n+1 + 3 n∈N
Proof. Divide numerator and denominator by the fastest growing term appearing, which is 3n ,
and use that limn→∞ xn = 0 for ∣x∣ < 1:
2n + 3n + 1 (2/3)n + 1 + (1/3)n
lim = lim
n→∞ 3n+1 + 3 n→∞ 3 + 3(1/3)n
lim (2/3)n + lim 1 + lim (1/3)n
n→∞ n→∞ n→∞
=
lim 3 + 3 lim (1/3)n
n→∞ n→∞
0+1+0 1
= = .
3+0 3
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 160
Remark 9.36. It follows from Theorem 9.30.(c) that if we have three convergent sequences
(an )n∈N , (bn )n∈N , (cn )n∈N , then their sum (an + bn + cn )n∈N is also convergent with limit
This is also true for the sum of four convergent sequences, the sum of five convergent sequences,
and by an easy induction proof, the sum of any fixed number of convergent sequences.
In general, we can apply any fixed number of algebraic operations via the Algebra of Limits,
as indeed we did in Example 9.35 (three additions and one division, all in one step).
But remember (Warning 9.31) that it doesn’t work (or make sense) for n (or anything not
fixed) sequences.
for all n ∈ N, an ≤ cn ≤ bn ,
then (cn )n∈N is also convergent with the same limit, that is,
Proof. Let L denote the common limit of (an )n∈N and (bn )n∈N :
lim an = L = lim bn .
n→∞ n→∞
Given ε > 0, let N1 ∈ N be such that for all n > N1 , ∣an − L∣ < ε. Hence for n > N1 ,
L − an ≤ ∣L − an ∣ = ∣an − L∣ < ε,
L − ε < an ≤ cn ≤ bn < L + ε,
∣cn − L∣ < ε.
n
Example 9.38. Use the Sandwich theorem to show that lim = 0.
n→∞ 10n
Proof. Without loss of generality, suppose max(∣a∣, ∣b∣) = ∣a∣. (That is, 0 ≤ ∣b∣ ≤ ∣a∣ holds.)
We have ∣a∣n ≤ ∣a∣n + ∣b∣n ≤ ∣a∣n + ∣a∣n = 2∣a∣n . Taking nth roots of this, we see that for all n,
1/n
∣a∣ ≤ (∣a∣n + ∣b∣n ) ≤ 21/n ∣a∣ .
Now (∣a∣)n∈N converges to ∣a∣, and (21/n ∣a∣)n∈N converges to ∣a∣ as well, by Example 9.16 and
the Algebra of Limits.
So using the Sandwich theorem, it follows that
1
lim (∣a∣n + ∣b∣n ) n = ∣a∣ = max(∣a∣, ∣b∣) .
n→∞
1
In particular, with a = 24 and b = 2005, we have that lim (24n + 2005n ) n = 2005, that is, the
n→∞
sequence
2029, 2005.1436, 2005.001146260873, . . .
is convergent with limit 2005.
n n n
Example 9.40. Show that lim ( + 2 + ⋅⋅⋅ + 2 ) = 1.
n→∞ +1 n +2
n2 n +n
n n
Proof. There are n terms in the sum: the smallest is 2 and the largest is 2 . Thus, for
n +n n +1
all n ∈ N, we have
n2 n n n n2
≤ + + ⋅ ⋅ ⋅ + ≤ ,
n2 + n n2 + 1 n2 + 2 n2 + n n2 + 1
and since
n2 n2
lim 2 = 1 = lim 2 .
n→∞ n + n n→∞ n + 1
Definition 9.41. Let (an )n∈N be a sequence and let (nk )k∈N be a strictly increasing sequence of
natural numbers. Then (ank )k∈N is called a subsequence of (an )n∈N .
Another way to think about this is: a subsequence is what you get from a sequence by
crossing out some terms (but not rearranging anything).
Example 9.42.
(i) ( 2n
1
)n∈N , ( n12 )n∈N , ( n!1 )n∈N and ( n1n )n∈N are all subsequences of ( n1 )n∈N .
(ii) Let pn be the n-th prime number. (Thus p1 = 2, p2 = 3, p3 = 5, p4 = 7, etc.) Then the
sequence (an )n∈N defined by an = p1n is a subsequence of ( n1 )n∈N .
(iv) The sequence ((−1)2n )n∈N , that is, the constant sequence
1, 1, 1, . . .
and the sequence ((−1)2n−1 )n∈N , that is, the constant sequence
Theorem 9.43. If (an )n∈N is a convergent sequence with limit L, then any subsequence of
(an )n∈N is also convergent with the limit L.
lim ank = L .
k→∞
This theorem lets us build new convergent sequences from old ones; it also gives us a new
way to prove divergence of sequences.
Example 9.44.
(i) ( 2n
1
)n∈N , ( n12 )n∈N , (2−n )n∈N , ( n!1 )n∈N and ( n1n )n∈N are convergent sequences with limit 0.
(ii) The sequence ((−1)n )n∈N is divergent, since the subsequence 1, 1, 1, . . . has limit 1, while
the subsequence −1, −1, −1, . . . has limit −1.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 163
Proof. The sequence (an )n∈N is bounded: indeed, 0 ≤ an < 1 for every n ∈ N. So by the Bolzano–
Weierstrass theorem this sequence has a convergent subsequence.
We have seen that if (an )n∈N is convergent with limit L, then any subsequence also converges
to L (Theorem 9.43). We’ve also seen examples of divergent sequences (an )n∈N for which there
are exactly two limit points, that is numbers p ∈ R such that there is a subsequence of (an )n∈N
converging to p. Both sequences (iii) and (iv) from Example 9.2 have limit points 1 and −1, and
nothing else. It’s easy to give (in both cases) a subsequence which converges to 1, and another
one that converges to −1. Why are there no other limit points?
Activity 9.9. Show that, if p ∈ R is not equal to 1 or −1 then there is no subsequence of either
((−1)n )n∈N or of ((−1)n (1 + 1/n))n∈N that converges to p.
A sequence which is not bounded doesn’t have to have a convergent subsequence (because
the condition of the Bolzano-Weierstrass theorem isn’t satisfied).
Activity 9.10. Find a sequence (an )n∈N which is not bounded and which has no convergent
subsequence.
Find another sequence (bn )n∈N which is not bounded and which does have a convergent
subsequence.
It’s not too hard to come up with sequences which have two, or three, or ten, different limit
points. But there can be many more.
Activity 9.11. Show that for any real number x and any ε > 0, there are infinitely many rational
numbers in the interval (x − ε, x).
Suppose that (an )n∈N is any sequence such that every rational number is a term of the sequence.
Prove that for every real number x, there is a subsequence of (an )n∈N which converges to x.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 165
You could reasonably object to the above: but maybe there isn’t any such sequence (an )n∈N ?
But there are in fact such sequences. Here is an example. For any rational number pq written
in lowest terms (i.e. q is positive, and p and q have no common factor bigger than 1) say the
weight of pq is ∣p∣ + q. For any w ∈ N, there are at most 2w rational numbers of weight w: we have
to choose 1 ≤ q ≤ w, and then we are left with two possibilities, either p = w − q or p = −(w − q).
So we can make a sequence listing all the rational numbers by first writing down all the ones
with weight 1 (there is only one, 01 ) and then weight 2, weight 3, and so on.
In fact, the sequence from Example 9.46 has a similar property: for any x ∈ [0, 1], there is a
subsequence converging to x. This is rather harder to prove, though!
Activity 9.12. Given a sequence (an )n∈N , let n1 be an index such that an1 = max(a1 , a2 , . . . ), if
it exists. Let n2 be an index such that an2 = max(an1 +1 , an1 +2 , . . . ), if it exists, and so on: given
k ≥ 3 and nk−1 , let nk be an index such that ank = max(ank−1 +1 , ank−1 +2 , . . . ) if it exists.
Prove that either we obtain a monotone decreasing subsequence (ank )k∈N of (an )n∈N , or there
is some K ∈ N such that (aK , aK+1 , aK+2 , . . . ) has no biggest term.
Activity 9.13. Given a sequence (bn )n∈N which has no biggest term, let n1 = 1 and for each
k ≥ 2, given nk−1 , let nk be the smallest index such that bnk > bnk−1 , if it exists. Prove that this
gives a subsequence (bnk )k∈N which is strictly increasing.
Activity 9.14. Use the statements you proved in Activities 9.12 and 9.13 to prove Theorem 9.47.
Use Theorems 9.25 and 9.47 to prove Theorem 9.45.
Exercise 9.2. (a) Let (an )n∈N be a convergent sequence with limit L, and let M be some real
number with M =/ L. Show that the set {n ∈ N ∣ an = M } is bounded above.
Exercise 9.3. Use the definition of limit to prove directly that 1 is not a limit of the se-
quence (1/n)n∈N .
Exercise 9.4. In each of the cases listed below, give an example of a divergent sequence (an )n∈N
that satisfies the given conditions.
(a) For every ε > 0, there exists an N such that, for infinitely many n > N , ∣an − 1∣ < ε.
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 166
(b) There exists an ε > 0 and an N ∈ N such that for all n > N , ∣an − 1∣ < ε.
Exercise 9.5. A sequence (an )n∈N is said to be a Cauchy sequence if for every ε > 0, there
exists an N ∈ N such that for all n, m > N , ∣an − am ∣ < ε.
Show that every convergent sequence is Cauchy.
Hint: ∣an − am ∣ = ∣an − L + L − am ∣ ≤ ∣an − L∣ + ∣am − L∣.
Exercise 9.7. Suppose that the sequence (an )n∈N is bounded. Prove that the sequence (cn )n∈N
defined by
a3 + 5n
cn = n2
an + n
is convergent, and find its limit.
Exercise 9.8. Recall the definition of a Cauchy sequence from Exercise 9.5, where we have
already seen that every convergent sequence is Cauchy. Use the Bolzano–Weierstrass theorem to
prove the converse: if a sequence is Cauchy, then it is convergent.
Hint: Proceed as follows. Let (an )n∈N be a Cauchy sequence. Show that (an )n∈N is bounded. By
the Bolzano–Weierstrass theorem, it follows that (an )n∈N has a convergent subsequence, say
(ank )k∈N with limit L. Prove (using the fact that (an )n∈N is Cauchy), that then (an )n∈N is itself
convergent with limit L.
(a) What does it mean to say that a sequence (an )n≥1 is convergent? Use this definition to show that
if (an )n≥1 is convergent, then (an+1 )n≥1 is also a convergent sequence and lim an = lim an+1 .
n→∞ n→∞
Let b be a real number with 2 < b < 3. We define a sequence (bn )n≥1 by
(e) Let S = {bn ∣ n ∈ N}. Find sup S, inf S, max S, min S. Justify your answers.
(i) For this, you should find that given ε > 0, choosing N = ⌈1/ε⌉, or anything bigger, will
work. If your N is smaller, then your proof is wrong.
(ii) As (i).
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 167
(iii) You need to consider all possible values of L ∈ R, and rule out all of them. If you only
consider L = 1 and L = −1, the ‘obvious’ limits, then you haven’t shown that (for example)
this sequence doesn’t converge to 0. Whatever L is, you should find that the definition of
convergence, with ε = 1, fails. Any smaller ε will also fail (but you only need to show that
some one ε is a counterexample). If you tried to use ε > 1, then your proof will not work
for L = 0.
(iv) As (iii).
(v) For any ε > 0, you can simply choose N = 1. But if you wrote something bigger (e.g. that
N = ⌈ 1ε ⌉) your proof works, it’s just a bit more complicated than necessary.
(vi) This is a bit tricky. Given L ∈ R, we want to rule out that (n)n∈N converges to L. We will
use ε = 1. This is a counterexample to the definition of convergence to L for the following
reason.
Whatever N is given, it is not true that for all n > N we have ∣n − L∣ < 1. Indeed, we can
choose n = max(N + 1, ⌈∣L∣⌉ + 1). Now by definition we have n > N , and by definition we
have n ≥ ∣L∣ + 1, so ∣n − L∣ ≥ 1.
(vii) This is really the ‘wrong time’ to try to prove that this sequence is convergent; we will
prove it after we develop some tools that help us find when sequences are convergent.
(viii) Again, this is really the wrong time to look at this sequence. The idea is that (eventually!)
the exponential will tend to zero much faster than the polynomial grows, so it will win
in the long run. But formally proving this needs a bit of algebra, and it helps to know
Bernoulli’s Inequality.
(ix) For this, you should find that given ε > 0, choosing N = ⌈4/ε⌉, or anything bigger, will work.
If your choice of N tries to take into account whether N is odd or even, then probably
your proof is wrong. If it takes into account whether n is odd or even, then your proof is
definitely wrong: see Warning 9.13.
as+1 = as + (s+1)
1
s+1 ≤ 2 − 2s + (s+1)s+1 ≤ 2 − 2s + 2s+1 = 2 − 2s+1 ,
3 1 1 3 1 1 3 1
which is what we wanted for the induction step. By induction, we conclude the desired inequality
holds for all n.
Comment on Activity 9.7. Suppose lim an = A and lim bn = B.
n→∞ n→∞
Given ε > 0, let Na be such that for all n > Na we have ∣an − A∣ < 2ε , and let Nb be such that
for all n > Nb we have ∣bn − B∣ < 2ε . Both Na and N − b exist by the definition of convergence. We
choose N = max(Na , Nb ).
Suppose n > N . Then we have
In each case, we found the desired infinitely many rational numbers between x and x − ε.
Now suppose that (an )n∈N is a sequence whose terms contain all the rational numbers. Given
any real number x and any ε > 0, we just proved that there are infinitely many terms of the
sequence in (x − ε, x).
In particular, we can construct a subsequence (ank )k∈N as follows. Let n1 be any index such
that x − 1 < an1 < x (using the above observation with ε = 1, such an index exists).
Now for each k ≥ 2 in turn, given nk−1 , look at all the terms of (an )n∈N which are in (x − k1 , x).
There are infinitely many, and only finitely many have an index less than or equal to nk−1 . So
we can choose nk > nk−1 such that x − k1 < ank < x.
The sequence (ank )k∈N is sandwiched by (x − k1 )k∈N and (x)k∈N , so by the Sandwich Theorem
we have limk→∞ ank = x.
Comment on Activity 9.12. If at each stage in the construction the ‘if it exists’ is true, then we
obtain a subsequence (ank )k∈N . By construction an1 is the biggest term in the entire sequence;
in particular it is at least as big as all the following terms, so whatever n2 > n1 we choose we
will have an1 ≥ an2 . Similarly, for each k ≥ 2, when we choose ank it is at least as big as all of
the following terms, and in particular whatever n3 we choose we get ank ≥ ank+1 . So this is a
monotone decreasing subsequence.
What is left is the possibility that at some stage k in the construction the ‘if it exists’ is
false. That is, (ank−1 +1 , ank−1 +2 , ank−1 +3 , . . . ) has no maximum element. Letting K = nk−1 + 1, that
is precisely saying that (aK , aK+1 , . . . ) has no biggest term.
Comment on Activity 9.13. The point here which you need to see is the following. If for some
k ≥ 1 we have followed the construction up to stage k—that is, we have constructed nk —then
we know that bnk is not a biggest element in the sequence; there is certainly, somewhere in the
sequence, a bigger element. But we need to be sure that there is a bigger element which comes
after bnk . This is the reason for choosing the smallest index every time: we know bnk is (by
construction) bigger than all the terms from bnk−1 to bnk −1 , and so (by an induction argument)
it is the biggest term in {b1 , b2 , . . . , bnk }. So whatever the bigger term is that we know exists, it
has to come after bnk .
This shows that the ‘if it exists’ will always be true—there will always be such a term—and
then by construction we get a strictly increasing subsequence.
Comment on Activity 9.14. The proof of Theorem 9.47 is now the following. By Activity 9.12,
either (an )n∈N has a monotone decreasing subsequence, or it has a subsequence which has no
biggest term. But then by Activity 9.13, that subsequence has a subsequence which is strictly
increasing—and this is a subsequence of a subsequence of (an )n∈N , so by definition it is a
subsequence of (an )n∈N . Either way, we found a monotone subsequence.
Now to prove the Bolzano-Weierstrass theorem, suppose (an )n∈N is a bounded sequence. Then
any subsequence is also bounded, and by Theorem 9.47 there is a monotone subsequence (ank )k∈N .
Now (ank )k∈N is a bounded monotone sequence, so by Theorem 9.25 it is convergent.
tends to 3, as claimed.
Solution to Exercise 9.2.
(a) How are we going to use the fact that the sequence (an )n∈N converges to L? One idea you
might have is to say that, for large enough N , the elements of the sequence have to be “close
to L”, so that they cannot be equal to M . How close should close be: closer than the distance
∣L − M ∣ between L and M .
So set ε = ∣M − L∣. By the definition of a limit, there is some N ∈ N such that, for n > N ,
∣an − L∣ < ∣M − L∣, and that certainly implies that an =/ M .
That means that, if n has the property that an = M , then n ≤ N . In other words, N is an
upper bound for the set {n ∈ N ∶ an = M }, as required.
(b) We can use part (a): that’s the point. Suppose the sequence does tend to a limit, and call
the limit L. If L =/ 1, then {n ∈ N ∣ (−1)n = 1}, which is the set of even numbers, is bounded above.
That is false, which leaves only the possibility that L = 1. But then the set {n ∈ N ∣ (−1)n = −1},
which is the set of odd numbers, is bounded above, which again is false. We conclude that the
sequence does not converge to any limit, i.e., it is divergent.
Solution to Exercise 9.3.
To show that the sequence (an )n∈N does not tend to the limit L, we need to show that there
exists ε > 0 such that, for all N ∈ N, there is some n > N with ∣an − L∣ ≥ ε.
We can take ε = 1/2 here. Now, whatever N ∈ N is proposed, take n > max(2, N ). We see
that an < 1/2, and so ∣an − 1∣ > 1/2. Thus indeed an does not converge to 1.
Solution to Exercise 9.4.
(a) Take, for instance, the sequence (an )n∈N where an = 1 for n odd, and an = 0, for n even.
Given any ε > 0, take N = 1: then ∣an − L∣ = 0 < ε for all odd n > 1.
(b) Take the same sequence (why not?), and take ε = 2 and N = 1. Then indeed ∣an − L∣ < 2
for all n > 1.
[A sequence satisfies (a) if L = 1 is a limit point of the sequence. A sequence satisfies (b)
(whatever L is) if and only if the sequence is bounded.]
Solution to Exercise 9.5.
Let (an )n∈N be a convergent sequence, with limit L.
The idea is that, if we go far enough down the sequence, all the terms beyond that point are
close to L, and so they are all close to each other. More precisely (this is what the hint tells us),
if all the terms are within ε/2 of L, then the distance between any two terms is at most ε. We
now make this idea into a formal proof.
Take any ε > 0. As (an )n∈N converges to L, there is some N ∈ N such that, for n > N ,
∣an − L∣ < ε/2. Hence, for any n, m > N , we have (as in the hint)
ε ε
∣an − am ∣ ≤ ∣an − L∣ + ∣am − L∣ < + = ε.
2 2
Hence the sequence (an )n∈N is a Cauchy sequence, as claimed.
Solution to Exercise 9.6.
You might want to generate a few terms of the sequence (an ) in order to get a feel for what
is going on. Whether or not this helps, you need to have the idea that the terms of the sequence
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 171
(an ) get smaller, while staying positive, and so that perhaps the sequence is decreasing and
bounded below. In fact, you might (correctly) suspect that the sequence converges to 0, but
you’re not asked to prove that.
Indeed, it is clear that since the first term a1 is positive, and each subsequent term is a
positive multiple of the previous term, all terms are positive. (By now, you should be confident
that you can write a formal induction argument if you really needed, but as far as I’m concerned
this will suffice here.)
Also, since 2n + 1 ≤ 3n for all n ≥ 2, it follows that an ≤ an−1 for all n ≥ 2, in other words that
(an ) is a decreasing sequence, bounded below by 0. Therefore the sequence is convergent.
Solution to Exercise 9.7.
There are various ways to tackle this exercise. But they have one thing in common: the first
thing to do is to understand how the sequence (cn )n∈N behaves for large n. Since the sequence
(an )n∈N is bounded, say with ∣an ∣ ≤ M for every n ∈ N, then also ∣a3n ∣ ≤ M 3 and a2n ≤ M 2 . So the
numerator “behaves like” 5n, while the denominator “behaves like’ n, in the sense that the other
terms are “of smaller order” for large n. So we expect the limit to be equal to 5.
Here are three ways to show this.
a3n −5a2n
(a) Write cn − 5 = a2n +n , so
∣a3n − 5a2n ∣ M 3 + 5M 2
∣cn − 5∣ ≤ ≤ .
n n
Now, given ε > 0, we can choose N > (M 3 + 5M 2 )/ε, so that, for n > N , we have
M 3 + 5M 2 M 3 + 5M 2
∣cn − 5∣ ≤ ≤ < ε.
n N
in particular close to some term of the subsequence. Therefore all the terms of the sequence are
close to L. Let’s take that vague plan and make it into a proof.
Take any ε > 0. As ank → L, there is some K such that, for k > K, ∣ank − L∣ < ε/2. Also, as
(an )n∈N is a Cauchy sequence, there is some N ∈ N such that, for all m, n > N , ∣an − am ∣ < ε/2.
Now, choose a natural number k > K so that also nk > N . Note that ∣ank − L∣ < ε/2. For
n > N , we have ∣an − ank ∣ < ε/2 (as both n and nk are greater than N ), and so
ε ε
∣an − L∣ ≤ ∣an − ank ∣ + ∣ank − L∣ < + = ε.
2 2
We have now shown that an → L, as required.
(Imagine writing the displayed calculation first, then filling in the proof.)
Solution to Exercise 9.9. Since this was an exam question, I’ll give the markscheme too.
(a) [2pts] A sequence (an )n≥1 is convergent if there is L ∈ R so that for every ε > 0 there is
N ∈ N such that for all n > N we have ∣an − L∣ < ε.
[3pts] Suppose that (an )n∈N is convergent with limit L. Take any ε > 0. By the definition of
convergence, there is some N ∈ N such that, for all n > N , ∣an −L∣ < ε. Now, for any n > N , we also
have n + 1 > N and so ∣an+1 − L∣ < ε. Therefore, (an+1 )n≥1 is convergent and lim an = lim an+1 = L.
n→∞ n→∞
(b) [5pts: 1 for using induction and noting that the base case is valid, 2 each for the lower
and upper bound in the induction step]
We proceed by induction on n. For n = 1, we have 2 < b = b1 < 3. Suppose that for n = k, we have
2 < bk < 3. Then, for n = k + 1, we obtain
From part (b), we have bn − 2 > 0 and bn − 3 < 0, hence bn+1 − bn < 0 and (bn )n≥1 is a decreasing
sequence.
(d) [7pts: see breakdown in square brackets]
Parts (b) and (c) show that (bn ) is a bounded, monotone sequence. By Theorem 9.25, its limit
lim bn = B exists. [2] By part (a), lim bn+1 = B as well. [1] Hence
n→∞ n→∞
where we applied the results about the algebra of limits. The above quadratic equation has
solutions B = 2 and B = 3. [1]
However, since (bn )n≥1 is a decreasing sequence, we have bn ≤ bn−1 ≤ ⋅ ⋅ ⋅ ≤ b1 = b < 3 for all
natural numbers n. So B = lim bn ≤ b < 3. Hence B = 2. [1]
n→∞
(e) [4pts: 1 for each of max/sup/inf/min] max S = sup S = b1 = b, inf S = B = 2, and min S
does not exist because 2 < bn for all n ∈ N and, therefore, inf S = 2 ∈/ S.
Notes: A mark of 10/25 is equivalent to a bare pass; a mark of 17/25 is a First. As an exam
setter, I am asked to ensure that there are at least 10 relatively easy marks available: here I
CHAPTER 9. ANALYSIS: SEQUENCES AND LIMITS 173
would claim that (a) and (e) are certainly in this category, and at least some of the marks in
(b). On the other hand, some parts of each question are supposed to be hard, and here I would
nominate (d). In an ideal world, most marks would fall between 10 and (say) 20: in practice,
the range on any one question is always wider than this.
There is an alternative approach to the question, pointed out to me by a class teacher based
on some student answers. Set cn = bn − 2, and show that cn+1 = c2n . Then show by induction that
cn = (c1 )2 for every n ∈ N. Therefore bn = 2 + (b1 − 2)2 for each n. From this exact formula,
n−1 n−1
f (c)
Figure 10.1: A function with a break at c. If x lies to the left of c, then f (x) is not close to f (c),
no matter how close x comes to c.
In everyday speech, a ‘continuous’ process is one that proceeds without gaps of interruptions
or sudden changes. What does it mean for a function f ∶ R → R to be continuous? The common
informal definition of this concept states that a function f is continuous if one can sketch its
graph without lifting the pencil. In other words, the graph of f has no breaks in it. If a break
does occur in the graph, then this break will occur at some point. Thus (based on this visual
view of continuity), we first give the formal definition of the continuity of a function at a point.
Next, if a function is continuous at each point, then it will be called continuous.
If a function has a break at a point, say c, then even if points x are close to c, the points
f (x) might not get close to f (c), as illustrated in Figure 10.1.
174
CHAPTER 10. ANALYSIS: CONTINUITY 175
f (c) + ε
f (x)
f (c)
f (c) − ε
c−δ c x c+δ x
Figure 10.2: The definition of continuity of a function at point c. If the function is continuous at
c, then given any ε > 0 (which determines a strip around the line y = f (c) of width 2ε), there
exists a δ > 0 (which determines an interval of width 2δ around the point c) such that whenever
x lies in this interval (so that x satisfies c − δ < x < c + δ, that is, ∣x − c∣ < δ), then f (x) satisfies
f (c) − ε < f (x) < f (c) + ε, that is, ∣f (x) − f (c)∣ < ε.
This motivates the following definition of continuity, which guarantees that if a function
is continuous at a point c, then we can make f (x) as close as we like to f (c), by choosing x
sufficiently close to c. This is illustrated in Figure 10.2.
Definition 10.1.
These definitions are (even) a bit more complicated than the definition of convergence of
a sequence you met in the last chapter. You’ll see that you need to work with ‘continuous at
a point’ in much the same way as ‘convergence to a limit’; the proof that some function is
‘continuous at a point c’ is rather like the proof that a sequence is convergent to a limit.
The statement that a function is continuous on a whole interval I is, written out with
quantifiers:
This is the most complicated statement you’ve seen so far. It is important to keep in mind
what it is supposed to mean intuitively: name a point c, and a ‘how close’ ε, and then if x is
‘close enough (δ)’ to c, we’re guaranteed that f (x) is close to f (c). This will help you remember
CHAPTER 10. ANALYSIS: CONTINUITY 176
what order the quantifiers above come in. It matters. You can (as you can guess from the
sentence above) swap the first two ‘for all’ quantifiers without changing anything, but it is very
important that first you name c and ε, then you decide on how small δ has to be, and only after
that comes a second point x. In particular, the formula you write for δ cannot depend on x (it
can, and usually will, depend on c and ε.
Let’s see an example. As with convergence, I’ll write it out in the formal style in order that
this issue of ‘what does δ depend on?’ is easy to keep clear.
Example 10.2. Show that the function f ∶ R → R given by f (x) = x for all x ∈ R is continuous.
This is what we need for the definition of ‘f is continuous at c’, and since we proved it for an
arbitrary c ∈ R, we conclude that f is continuous at c for all c ∈ R, i.e. f is continuous on R.
As usual, we do not know what we should choose δ to be when we write the line ‘we choose
δ = ...’, we leave it blank at first and fill it in later once we see what works.
Example 10.3. Show that the function f ∶ R → R given by f (x) = 2x + 1 for all x ∈ R is
continuous.
Again, this is what we need for the definition of ‘f is continuous at c’, and since we proved it
for an arbitrary c ∈ R, we conclude that f is continuous on R.
Example 10.4. Show that the function f ∶ R → R given by f (x) = 1 for all x ∈ R is continuous.
Proof. Let c ∈ R = (−∞, ∞). We have to prove that f is continuous at c. Let ε > 0 be given. In
this case, any positive choice of δ will work; for instance, let δ = 1. Then if x ∈ R and ∣x − c∣ < δ = 1,
we have:
∣f (x) − f (c)∣ = ∣1 − 1∣ = ∣0∣ = 0 < ε.
So f is continuous at c. Since the choice of c ∈ R was arbitrary, it follows that f is continuous
on R.
CHAPTER 10. ANALYSIS: CONTINUITY 177
0 x
Proof. Suppose that f is continuous at 0. Then for any given ε > 0 there exists a δ > 0 such that
whenever ∣x − 0∣ < δ, ∣f (x) − f (0)∣ < ε.
We just need to find one example of ε > 0 for which the above statement fails (one coun-
terexample to the ‘for all’). One example that will work is ε = 12 .
To show ε = 12 is a counterexample, we need to show there does not exist δ > 0 such that
whenever ∣x − 0∣ < δ, ∣f (x) − f (0)∣ < 12 .
So let δ > 0 be given; we need to show ‘whenever ∣x − 0∣ < δ, ∣f (x) − f (0)∣ < 12 ’ is false. In
other words, we need to find one counterexample x, i.e. x with ∣x∣ < δ such that f (x) is not
within 12 of f (0). We can take x = 2δ . To see that this choice of x is a counterexample, we need
to observe that indeed ∣x∣ < δ, and furthermore
∣f (x) − f (0)∣ = ∣1 − 0∣ = 1
which is not smaller than 12 .
So f is not continuous at 0.
Next we show that for all c ∈ R ∖ {0}, f is continuous at c. Let ε > 0 be given. Take δ = ∣c∣
2 > 0.
Then if x ∈ R and ∣x − c∣ < δ, we have
∣c∣
∣c∣ − ∣x∣ ≤ ∣∣c∣ − ∣x∣∣ ≤ ∣c − x∣ = ∣x − c∣ < δ =
2
and so
∣c∣
∣x∣ >
> 0.
2
Thus x ≠ 0 and so f (x) = 1. Hence if x ∈ R and ∣x − c∣ < δ, we obtain
∣f (x) − f (c)∣ = ∣1 − 1∣ = ∣0∣ = 0 < ε.
Consequently f is continuous at c.
In the above, note that the proof of continuity follows the same pattern as the previous
examples. It’s easy to get the proof of discontinuity at 0 wrong. If you are not sure what to
do, write out the statement of ‘f is continuous at 0’ clearly, with all the quantifiers, and then
negate it (i.e. follow the rules from Chapter 3), and check that the negation is a true statement.
This is what we did above. If you are happy with this logic, then you can afford to shorten it:
‘To prove f is not continuous at 0, pick ε = 12 . Given δ > 0, pick x = 2δ . Then ∣x∣ < δ, but
∣f (x) − f (0)∣ = 1 > ε.’
CHAPTER 10. ANALYSIS: CONTINUITY 178
1 1 ∣c − x∣ 1 1 2 1 2δ
∣ − ∣= = ∣x − c∣ ⋅ ⋅ < δ ⋅ ⋅ = 2 ≤ ε.
x c xc x c c c c
So f is continuous at c. Since the choice of c ∈ (0, ∞) was arbitrary, it follows that f is continuous
on (0, ∞).
It’s very easy in this last proof to make a mistake, especially if you write it informally ‘let
δ > 0 be chosen later’ and only at the last line put in a value for δ. Much as in Warning 9.13, it
is very tempting to write ‘let δ = εxc’, since that would make the algebra work: we have
But this ‘choice’ of δ doesn’t make sense: it depends on x, and (at the point where we choose it
in the formal proof above) there is no x around. What we need is to give some real number for
δ which guarantees that xc δ
will be at most ε whatever x gets chosen such that ∣x − c∣ < δ. That
would be difficult if x were very tiny (because then x1 is huge) and this is why we choose δ small
enough that we can write x ≥ 2c , to rule out it being very close to 0.
Proof. “Only if ” (Ô⇒) direction: Assume that f is continuous at c ∈ I and let (xn )n∈N be a
convergent sequence contained in I with limit c. We have to show that (f (xn ))n∈N converges to
f (c).
Let ε > 0 be given.
Since f is continuous at c ∈ I, for the given ε > 0, there exists δ > 0 such that for all x ∈ I
satisfying ∣x − c∣ < δ we have ∣f (x) − f (c)∣ < ε. Now since (xn )n∈N is convergent with limit c, by
the definition of convergence there exists N ∈ N such that for all n > N we have ∣xn − c∣ < δ. This
is the N we will use to verify the definition of lim f (xn ) = f (c).
n→∞
Suppose n > N . Then by choice of N we have ∣xn − c∣ < δ. Since xn ∈ I (because (xn )n∈N is
assumed to be contained in I), by choice of δ that means we have ∣f (xn ) − f (c)∣ < ε, which is
what we wanted to show.
CHAPTER 10. ANALYSIS: CONTINUITY 179
“If ” (⇐Ô) direction: Suppose that (10.1) holds. We have to show that f is continuous at c.
We prove this by contradiction. Assume that f is not continuous at c, that is,
¬ [∀ε > 0 ∃δ > 0 such that ∀x ∈ I such that ∣x − c∣ < δ, ∣f (x) − f (c)∣ < ε,]
or equivalently,
∃ε > 0 such that ∀δ > 0 ∃x ∈ I such that ∣x − c∣ < δ but ∣f (x) − f (c)∣ ≥ ε .
is a true statement.
For each n ∈ N, we want to choose a number xn . We do this as follows. First, choose δ = n1 .
Since this is a positive number, by (10.2) there exists x ∈ I such that ∣x − c∣ < δ = n1 and
∣f (x) − f (c)∣ ≥ ε. We let xn be any such x. That is, we choose xn such that ∣xn − c∣ < n1 and
∣f (xn ) − f (c)∣ ≥ ε.
This gives us a sequence (xn )n∈N .
Claim 1: The sequence (xn )n∈N is contained in I and is convergent with limit c.
Indeed, we have for all n ∈ N, xn ∈ I. Furthermore, given any ζ > 0, we can find N ∈ N such
that ζ1 < N (Archimedean property), that is, N1 < ζ. Hence for n > N , we have ∣xn − c∣ < N1 < ζ.
So (xn )n∈N is convergent with limit c.
Claim 2: The sequence (f (xn ))n∈N does not converge to f (c).
Recall that lim f (xn ) = f (c) means (by definition):
n→∞
We now show ζ = ε
2 is a counterexample to this statement. That is, for any N ∈ N,
The point of Theorem 10.7 is that it lets us ‘translate’ the Algebra of Limits to show that
doing algebraic operations with continuous functions gives us continuous functions; this is
Theorem 10.9 below. As with the Algebra of Limits, it is painful to prove from the definition that
a given function is continuous, and we would like tools that tell us that many of the functions
we want to study are indeed continuous.
Before stating it, we introduce some convenient notation.
Definition 10.8. Let I be an interval in R. Given functions f ∶ I → R and g ∶ I → R, we define
the following:
1. If α ∈ R, then we define the function αf ∶ I → R by (αf )(x) = α ⋅ f (x), x ∈ I.
2. We define the absolute value of f to be the function ∣f ∣ ∶ I → R given by ∣f ∣(x) = ∣f (x)∣, x ∈ I.
3. The sum of f and g is the function f + g ∶ I → R defined by (f + g)(x) = f (x) + g(x), x ∈ I.
4. The product of f and g is the function f g ∶ I → R defined by (f g)(x) = f (x)g(x), x ∈ I.
5. If k ∈ N, then we define the k th power of f , to be the function f k ∶ I → R given by f k (x) =
(f (x))k , x ∈ I.
y
f (a)
a c x
b
The Intermediate Value theorem was first proved by Bernhard Bolzano in 1817.
The theorem statement says ‘if f (a) ≤ y ≤ f (b) or f (b) ≤ y ≤ f (a)’. This is two separate
(almost—unless f (a) = f (b) ) cases. It’s easy to prove the theorem if either f (a) = y or f (b) = y,
because we just take c = a or c = b respectively. So let’s suppose that y is not equal to f (a) or
f (b), but rather strictly in between them. And let’s look at the first case, f (a) < f (b), which is
the picture above.
As we can see from the above picture, it might be the case that f (x) = y occurs at several
points; in the figure, it occurs twice (once at c, and once earlier at what looks like a local
maximum of f ). We would like to avoid confusing the different possibilities, so we pick out some
special c; maybe the easiest is to pick the biggest. That is, we want to prove there exists c ∈ [a, b]
such that f (c) = y, and f (x) > c for all c < x ≤ b. Intuitively, the c we want should just be the
biggest element of {x ∶ f (x) ≤ y}; what we need to do is prove that that exists and works.
The ‘idea’—the reason that continuity comes in to the proof—is the following. We know f is
continuous at c. That is, if x is close to c then f (x) is close to f (c).
If f (c) is smaller than y, then so is f (x) for all x close to c. In particular, if we look at x
just a little bit bigger than c, then f (x) will be smaller than y—but then x should be in S, so c
isn’t an upper bound for S, a contradiction.
If f (c) is bigger than y, then so is f (x) for all x close to c. In particular, if we look at any x
just a little bit smaller than c, then f (x) is bigger than y—but then c isn’t a least upper bound
for S, again a contradiction.
We saw an argument like this before, in the proof of Example 8.9. This is really the same
argument as there, just written out in general.
Let’s now put in the formal details.
CHAPTER 10. ANALYSIS: CONTINUITY 183
Proof. If y = f (a) then c = a satisfies the statement of the theorem. Similarly if y = f (b) then
c = b satisfies the statement of the theorem. So suppose y is not equal to f (a) or f (b).
Case 1: Suppose that f (a) < y < f (b).
Let S = {x ∈ [a, b] ∣ f (x) ≤ y}. We want to prove that
∣f (c) − y∣
ε= ,
2
which is positive. Since c ∈ [a, b], so f is continuous at c. That means that there exists δ > 0
such that
∣f (c) − y∣
∣f (x) − f (c)∣ < for all x ∈ [a, b] such that ∣x − c∣ < δ . (10.3)
2
Fix δ > 0 such that (10.3) holds.
If f (c) < y: In this case, we have c < b since f (b) > y. Now observe that either c + 2δ ≤ b, or
∣b − c∣ < δ (or both). In particular, there is an x > c such that x ∈ [a, b] and ∣x − c∣ < δ; fix any
such x. Now by (10.3), we have
∣f (c) − y∣
f (x) < f (c) + <y
2
and so by definition of S, we have x ∈ S, and we already know x > c. But c is assumed to be an
upper bound of S, which is a contradiction.
If f (c) > y: In this case, we have c > a since f (a) < y. As before, observe that either c − 2δ ≥ a,
or ∣a − c∣ < δ (or both) and so there is a z < c such that z ∈ [a, b] and ∣z − c∣ < δ; fix any such z.
Now given any x ∈ [z, c], we have x ∈ [a, b] and ∣x − c∣ < δ, so by (10.3) we have
∣f (c) − y∣
f (x) > f (c) − > y,
2
and in particular x is not in S. That is, no element of [z, c] is in S, and (because c is an upper
bound of S) no element of (c, ∞) is in S. Putting these together, no element of [z, ∞) is in S,
so z is an upper bound of S. But we know z < c, which contradicts our assumption that c is a
least upper bound of S.
Assuming f (c) ≠ y, we considered both possibilities, f (c) < y and f (c) > y, and in either
case got to a contradiction. So the only possibility is that f (c) = y, which is what we wanted.
Case 2: Suppose that f (a) > y > f (b).
It would be easy to modify the above proof—just swap signs—to handle this case too. But it
is neater (and less work!) to reduce this case to Case 1. By Theorem 10.9, since f is continuous,
so is (−f ). Since f (a) > y > f (b), so (−f )(a) < −y < (−f )(b).
So by applying Case 1 to −f , there is some c ∈ [a, b] such that (−f )(c) = −y, and this tells
us f (c) = y, which is what we wanted.
CHAPTER 10. ANALYSIS: CONTINUITY 184
Example 10.16. Show that every polynomial of odd degree with real coefficients has at least
one real root.
The idea here is simple. Suppose that our odd-degree polynomial is p(x) = a0 + a1 x + ⋅ ⋅ ⋅ + ak xk ,
where k is odd. We know ak is not zero (that’s what it means to say the degree is k). Dividing the
whole polynomial by ak doesn’t change the roots, so let’s assume p(x) = a0 +a1 x+⋅ ⋅ ⋅+ak−1 xk−1 +xk .
When ∣x∣ is very big, xk will be much bigger than all the other terms; the graph of p(x) and
of xk will look pretty similar. So when ∣x∣ is very big and x is negative, we should find p(x) is
also negative. So there is some a such that p(a) < 0. Similarly, if ∣x∣ is big and x > 0, then p(x)
will also be positive. So there is some b such that p(b) > 0. And then applying the Intermediate
Value theorem on [a, b], with y = 0, says that there is some c ∈ [a, b] such that p(c) = 0, which is
what we want.
Proof. Suppose that p is a polynomial with degree k, where k is an odd natural number. Then
the coefficient of xk in p(x) is not zero. Since p(x)/s has the same roots as p(x) for any non-zero
s, we can assume that the coefficient of xk in p(x) is 1. That is, we have
We want to first justify that there is some b such that p(b) > 0 (the algebra is a bit simpler
than for a). We will actually choose b to be a positive integer. Observe that
p(n)
lim = lim na0k + nak−1
1
n +1=1
+ ⋅ ⋅ ⋅ + ak−1
n→∞ nk n→∞
where the first equality is just writing out p(n) and simplifying, and the second equality is by
the Algebra of Limits.
p(n)
Using the definition of lim k = 1, with ε = 12 , there exists some N such that for all n > N
n→∞ n
we have
RRR R
RRR p(n) − 1RRRRR < 1 p(n) 1
> 2.
RRR nk RRR 2 and so
nk
RR RR
In particular, we can let b = N + 1 and obtain p(b) > 12 bk > 0, which is what we wanted.
We do something similar for a (which will be a negative integer). We have
p(−n)
lim = lim (−n)
a0
k + (−n)k−1 + ⋅ ⋅ ⋅ + −n + 1 = 1
a1 ak−1
n→∞ (−n)k n→∞
and again using the definition of the limit, with ε = 12 , there is N such that for all n > N we have
RRR R
RRR p(−n) − 1RRRRR < 1 p(−n) 1
> .
RRR (−n)k RRR 2 and so
(−n)k 2
RR RR
In particular, we can let a = −N − 1, and this gives p(a) < 12 (−N − 1)k < − 12 (N + 1)k < 0, which is
what we wanted.
Now we apply the Intermediate Value theorem to p(x), on the interval [a, b], with y = 0.
The function p is continuous by Example 10.10, and f (a) < 0 = y < f (b), so the conditions of the
Intermediate Value theorem are satisfied. We conclude there is some c ∈ [a, b] such that p(c) = 0,
which is what we wanted.
It’s worth noticing that a bit more is true. If we want to know where a polynomial (or any
continuous function!) has roots, it’s enough to find a point where the polynomial takes a value
smaller than 0 and another close by where it takes a value bigger than 0. Then there has to be
a root in between.
CHAPTER 10. ANALYSIS: CONTINUITY 185
Example 10.19. Show that at any given time, there exists a pair of diametrically opposite
points on the equator which have the same temperature.
Proof. Let T (Θ) denote the surface temperature at the point at longitude Θ. See Figure 10.4.
(Note that Θ(0) = Θ(2π).) Assuming that T is a continuous function of Θ, it follows that the
h(x) = (1 − x) sin x1 ,
which also has infinitely many local maxima (as x gets closer to 0), where the function values
get closer and closer to 1, yet no global maximum exists. This function is also continuous—but
notice that it is not defined at 0, and in fact if we try to change the function to one defined on
[0, ∞) (i.e. we give the function a value at 0) then it will not be continuous at 0.
0.8 1
0.6 0.5
0.4
0
0.2
0 −0.5
−0.2 −1
−1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5
Activity 10.2. Prove that g(x) as defined above is continuous on R (you should assume the
sine function is continuous).
Prove that there is no continuous function h∗ ∶ [0, ∞) → R such that h∗ (x) = h(x) for all
x ∈ (0, ∞).
You might well imagine that there are continuous functions on closed intervals which have
some similar nasty behaviour: maybe they go off to infinity somewhere in the middle of the
interval, or they have infinitely many local maxima which (like h(x) above) keep getting bigger
and bigger so that even if the function stays bounded, there is no global maximum?
In fact, surprisingly, this is not the case. Any continuous function f on any closed interval is
bounded, and it has at least one global maximum and minimum (the extreme values).
CHAPTER 10. ANALYSIS: CONTINUITY 187
d
a c x
b
In this figure the extreme value are both attained once each, at c and d. For a function like
x → cos x, on the interval [0, 1000], there are many global maxima (and minima!)—0, 2π, 4π
and so on are all global maxima. There can even be infinitely many global maxima. A trivial
example is a constant function on I: every point is a global maximum.
Activity 10.3. Find a function f ∶ [0, 1] → R which is continuous, not constant, and has
infinitely many global maxima.
Theorem 10.20 (Extreme Value theorem). Let [a, b] be any closed and bounded interval and
let f ∶ [a, b] → R be a continuous function. Then there exists c ∈ [a, b] such that
Since c, d ∈ [a, b] in the above theorem, the supremum and infimum in (10.4) and (10.5) are
in fact the maximum and minimum, respectively. This proof is a beautiful application of the
Bolzano–Weierstrass theorem—actually two beautiful applications.
We’ll only prove the ‘maximum’ half of the theorem; the ‘minimum’ half is an exercise.
The idea is the following. Suppose for a contradiction that sup {f (x) ∣ x ∈ [a, b]} does not
exist. How can that happen? This is a non-empty set of real numbers; if it has an upper bound
then the least upper bound principle says it has a supremum. So the only possibility is that
this set doesn’t have an upper bound. Well, that would mean that there are points x1 , x2 , . . . in
[a, b] where the function values f (x1 ), f (x2 ), . . . are growing bigger and bigger: the sequence of
function values isn’t bounded above. But this looks suspiciously like a contradiction. We know
that if lim xn = c, then lim f (xn ) = f (c) by Theorem 10.7. But this can’t be: the sequence of
n→∞ x→∞
function values is unbounded, so it cannot be convergent.
But this isn’t quite a contradiction: we don’t know that (xn )n∈N is a convergent sequence—
what if that sequence bounces around in [a, b] without tending to a limit? The answer is: apply
the Bolzano-Weierstrass theorem to find a convergent subsequence (xnk )k∈N , and repeat the
above argument with this convergent subsequence—this time it works.
That argument shows that M = sup {f (x) ∣ x ∈ [a, b]} exists. But why is there actually a value
c ∈ [a, b] such that f (c) = M ? Well, M is supposed to be a least upper bound of {f (x) ∣ x ∈ [a, b]}.
That means that for every ε > 0,
M − ε isn’t an upper bound, i.e. there is some x ∈ [a, b] such that M ≥ f (x) > M − ε . (10.6)
Note that we can write that M ≥ f (x) since M is an upper bound for {f (x) ∣ x ∈ [a, b]}. We
want to use this to construct a sequence (xn )n∈N such that the function values converge to
M , i.e. limn→∞ f (xn ) = M . We’ll use the same trick we saw before: use (10.6) for a sequence
of ε values which tends to zero. The obvious ‘sequence tending to zero’ is ( n1 )n∈N , so let’s use
that. By (10.6), for each n ∈ N there is xn ∈ [a, b] such that M ≥ f (x) > M − n1 . The Sandwich
Theorem tells us lim f (xn ) = M . If lim xn = c exists, then by Theorem 10.7 we have M = f (c),
n→∞ n→∞
which is what we want.
CHAPTER 10. ANALYSIS: CONTINUITY 188
Again, there is no reason why lim xn should exist: if f has several global maxima, then perhaps
n→∞
(xn )n∈N bounces around between them without converging. But, again, the Bolzano-Weierstrass
theorem rescues us.
Proof. We prove the first half of the theorem, leaving the second half as an exercise.
Let S = {f (x) ∣ x ∈ [a, b]}. We first want to prove M = sup S exists, then we want to show
that there is some c ∈ [a, b] such that f (c) = M .
To show that sup S exists, we use the least upper bound property of R. We just need to
show that S is not empty and that it has an upper bound.
S is not empty because f (a) ∈ S.
Now suppose for a contradiction that S has no upper bound. Then in particular, for each
n ∈ N, the integer n is not an upper bound for S. That means there is an element of S which is
bigger than n, i.e. a function value of f which is bigger than n. There is some xn ∈ [a, b] such
that f (xn ) > n.
This gives us a sequence (xn )n∈N which is bounded: it is in [a, b]. By the Bolzano-Weierstrass
theorem, there is a convergent subsequence (xnk )k∈N , and by Theorem 9.12 the limit s of this
subsequence is in [a, b].
Since 1 ≤ n1 < n2 < . . . by the definition of a subsequence, we have nk ≥ k. Thus f (xnk ) > nk ≥ k
is true for each k ∈ N, and so (f (xnk ))k∈N is not a bounded sequence. By Theorem 9.21 it is
therefore not a convergent sequence. But this is a contradiction to Theorem 10.7, which says that
lim f (xnk ) = f (s), since lim xnk = s ∈ [a, b] and f is continuous on [a, b] (and so in particular
k→∞ k→∞
at s). This contradiction shows that our assumption—that S has no upper bound—is false.
Thus S has an upper bound, and we already observed that it is non-empty. So the least
upper bound property of the reals says that sup S exists.
Now we show that M = sup S is M = f (c) for some c ∈ [a, b].
For each n ∈ N, we let xn be a point in [a, b] such that f (xn ) > M − n1 . This exists since M
is a least upper bound of S, and so in particular M − n1 is not an upper bound of S.
Now the Sandwich Theorem says that, since M − n1 < f (xn ) ≤ M holds for each n ∈ N, so
lim f (xn ) = M .
n→∞
By the Bolzano-Weierstrass theorem, (xn )n∈N has a convergent subsequence (xnk )k∈N . Let
c = lim xnk ; by Theorem 9.12 we have c ∈ [a, b].
k→∞
The sequence (f (xnk ))k∈N is a subsequence of (f (xn ))n∈N (which we just saw is convergent
with limit M ), so by Theorem 9.43 it is convergent with limit M .
Now by Theorem 10.7, since c ∈ [a, b] and so f is continuous at c, we have f (c) = M . This
finishes the proof of the first half of the theorem.
The proof that there exists d ∈ [a, b] such that f (d) = inf S is left as an exercise.
Something we should immediately notice at this point is the following. We have just proved
a theorem which isn’t obviously true, and the proof is not really all that easy. But all the work
is being done by the theorems we proved already! We are proving things about continuous
functions using convergent sequences, yet we never actually needed to use the definitions of
either concept. Nor are we producing any ‘pages of equations’ as you might have expected from
your school maths. This is good, because working with the definitions is painful and writing
pages of equations is boring and easy to mess up.
This is your first taste of how modern mathematics looks. We want to build up a beautiful
palace of a theorem, but we generally will not go all the way down to the bricks-and-mortar of
working with all the basic definitions. Rather, we want to play architect: we outline how the
proof should go, and call upon theorems we (or, usually, others!) already proved to serve as the
walls and roof.
CHAPTER 10. ANALYSIS: CONTINUITY 189
Exercise 10.2. Prove that if f ∶ R → R is continuous and f (x) = 0 whenever x is rational, then
f (x) = 0 for all x ∈ R.
Hint: Given any real number c, there exists a sequence of rational numbers (qn )n∈N that
converges to c.
Exercise 10.3. (a) Let J = (a, b) be an open interval contained in another interval I. Let
f ∶ I → R be a function. Let c ∈ J and assume that f ∣J is continuous at c. Prove that f is
continuous at c.
(b) Give an example of intervals J and I with J ⊆ I, c ∈ J and a function f ∶ I → R such that
f ∣J is continuous at c, but f is not continuous at c.
(This shows that it is necessary to assume that J is an open interval in (a)).)
Exercise 10.4. Show that the statement of Theorem 10.20 does not hold if [a, b] is replaced by
[a, b).
Exercise 10.5. A function f ∶ R → R is a periodic function if there exists T > 0 such that for
all x ∈ R, f (x + T ) = f (x). If f ∶ R → R is continuous and periodic, then prove that f is bounded,
that is, the set S = {f (x) ∣ x ∈ R} is bounded.
Give an example of a periodic function g such that g is not bounded.
lim 1 = 1
lim n1 =0
n→∞ πn π n→∞
using the Algebra of Limits and the known fact that lim 1/n = 0.
n→∞
Since f ( πn
1
) = sin(πn) = 0, we have
lim f ( πn
1
) = f (0) .
n→∞
However, this function f is not continuous at 0. To see this, observe that the sequence
( 2πn+π/2
1
)
n∈N
f ( 2πn+π/2
1
)=1
for all n ∈ N. Since 1 ≠ f (0), we see that (10.1) does not hold. So by Theorem 10.7, f is not
continuous at 0.
Comment on Activity 10.2. We’ll need to cheat a bit to prove that g is continous on R. We
don’t know that the sine function is continuous, and proving it is rather hard given that we do
not really know what the function is at all! We sort-of know that it comes from Euler’s identity,
but we don’t really know how to work with infinite series. So let’s just assume it is a continuous
function (which is true).
Now for any c ≠ 0, we can find an interval [a, b] which contains c but not 0. And on that
interval, g(x) is a composition of two continuous functions, so it is continuous.
Proving continuity at 0 is easier, assuming we are happy to say we know ∣sinx∣ ≤ 1 is true
for all x ∈ R. For any ε > 0, choose δ = ε. Then for any ∣x − 0∣ < δ we have
by Theorem 10.7. In particular, there cannot be two choices of (xn )n∈N where the limits of
function values are different. But the two sequences in the solution above to Activity 10.1 do
give different limits, namely 0 and 1 respectively (though proving the second sequence gives
function values converging to 1 needs a little bit more work).
Comment on Activity 10.3. We can ‘cheat’ by for example letting
⎧
⎪
⎪1 if x ≤ 1
f (x) = ⎨ 3 2
.
⎪
⎩2 − x if x > 1
⎪ 2
But this only has infinitely many global maxima because it’s constant on part of [0, 1]. Can we
find a function which isn’t constant on any part of [0, 1] that does the job?
We can—try modifying one of the examples you saw before the Extreme Value theorem.
CHAPTER 10. ANALYSIS: CONTINUITY 191
g( m1 ) = m .
CHAPTER 10. ANALYSIS: CONTINUITY 192