0% found this document useful (0 votes)
56 views155 pages

G12MAN Notes

This document provides an introduction and overview to a set of lecture notes for a mathematical analysis course. Section 1 explains how to use the notes, providing context about prerequisites and emphasizing the importance of this introductory section. It also outlines the overall structure and content of the notes. The notes are meant to accompany lectures on mathematical analysis and concepts like sequences, limits, continuity, differentiation and integration.

Uploaded by

asiyahsharif1303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views155 pages

G12MAN Notes

This document provides an introduction and overview to a set of lecture notes for a mathematical analysis course. Section 1 explains how to use the notes, providing context about prerequisites and emphasizing the importance of this introductory section. It also outlines the overall structure and content of the notes. The notes are meant to accompany lectures on mathematical analysis and concepts like sequences, limits, continuity, differentiation and integration.

Uploaded by

asiyahsharif1303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

G12MAN Mathematical Analysis

School of Mathematical Sciences, University of Nottingam


Yves van Gennip
Based on lecture slides which were created by Prof. J. K. Langley
and adapted by Dr . J. F. Feinstein and Prof. K. Zhang

Last updated: September 19, 2018

Contents
1 Introduction 2
1.1 What is the deal with these notes? . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 A brief review of some fundamental concepts from logic . . . . . . . . . . . . . . 4
1.2.1 The logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 The axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Definitions, Lemmas, Theorems, Propositions, and Corollaries . . . . . . . 10
1.2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6 Scrap paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 What do we use as our starting point? . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Rough lesson plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 What is mathematical analysis and why are we interested in it? 20

3 Sequences in R 24

4 Distance in Rd 37

5 Sequences in Rd 41

6 Subsets of Rd and their boundaries 49

7 Interior points of subsets of Rd 53

8 Open subsets of Rd 55

9 Closed subsets of Rd 59

10 Continuous functions on subsets of Rd 64

11 Convergence of sequences and series of functions 76

12 Functions on the real line 83

13 Differentiability on the real line 89

1
14 The Riemann integral 100

A Background material 114


A.1 Subsets of Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.3 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.4 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A.5 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

B Optional material 128


B.1 Dirichlet’s test and convergence of the series in (2) . . . . . . . . . . . . . . . . . 128
B.2 A Fourier series formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
B.3 Proof of the monotone sequence theorem . . . . . . . . . . . . . . . . . . . . . . . 132
B.4 L’Hôpital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
B.5 Limits of subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
B.6 The Cauchy–Schwarz inequality and the triangle inequality . . . . . . . . . . . . 138
B.7 Expressing real numbers in binary . . . . . . . . . . . . . . . . . . . . . . . . . . 139
B.8 A space-filling curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
B.9 Monotone surjective functions and continuity . . . . . . . . . . . . . . . . . . . . 142
B.10 A non-decreasing function on R which is discontinuous at every rational number 143
B.11 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
B.12 A continuous nowhere differentiable function . . . . . . . . . . . . . . . . . . . . 145
B.13 The effect of refinement on Riemann sums . . . . . . . . . . . . . . . . . . . . . . 146
B.14 The existence of a rational of the form 10pq in an interval . . . . . . . . . . . . . . 147
B.15 Riemann integrability: from [a, c] and [c, b] to [a, b] . . . . . . . . . . . . . . . . . 148
B.16 Riemann integrability: from [a, b] to [a, c] and [c, b] . . . . . . . . . . . . . . . . . 148
B.17 The Riemann integral of a sum of two functions . . . . . . . . . . . . . . . . . . . 149
B.18 Uniform convergence and the Riemann integral . . . . . . . . . . . . . . . . . . . 150

1 Introduction
Before we dive into the mathematics, let me briefly say something about how to read these
notes. Section 1 explains a little bit about what you can expect in this module and goes
(briefly) over some of the background which you are expected to know and remember from
Year 1. (Of course it is not enough to only know the material which is revisited in Section 1;
G11ACF/MATH1005 (Analytical and Computational Foundations) and G11CAL/MATH1006
(Calculus) are prerequisites for this module and their content will be assumed known.) You
might be tempted to skip this section, but I strongly advise against that. Section 1 might be
the most important section in these notes, in terms of setting the right expectations for this
module and helping you revise the material from last year which you will need going forward.
This introduction addresses many of the difficulties with which we see students struggle every
year, so make good use of it. Do not skip it and study it as often as you require to really take
in the ideas presented. We will not spend much, if any, time in the lectures on Section 1, since
it is supposed to be repetition from Year 1, but experience teaches that many students require
a refresher when coming into this module.
Section 1.4 at the end of the introduction gives a rough lesson plan, which can help you to
prepare in advance for the lectures. Of course this is only a guideline, not a strict plan. During
the semester we may go slower on some parts or faster on others, and there is some room at

2
the end of the semester if we need to go beyond 20 lectures and which can otherwise be used
for revision.
While Section 1 gives a very broad recap of the mathematical fundamentals you will need
when coming into this module, Section 2 addresses the question “why analysis?” Why do we
want to study mathematical analysis, what is it about (broadly speaking), and why do we
bother being mathematically rigorous? This is not only a good section to read if you have
already decided to take this module, but also if you are still considering whether or not to take
it. Hopefully it will give you a good idea of what to expect in this module in terms of the kind
of questions we are interested in asking and answering.
Starting from Section 3 the new mathematical content of this module starts (but hopefully
I have made it clear enough that I think it would be very detrimental to skip the first two
sections).
With that, let’s get right into it. Enjoy!

1.1 What is the deal with these notes?


These notes are meant to accompany the lectures for G12MAN/MATH2009 (“Mathematical
Analysis”) and follow the same structure as the lecture slides, which were created by Prof. J.
K. Langley and over the years slightly changed by Dr . J. F. Feinstein, Prof. K. Zhang, and
myself. While the lecture slides are written in a more conversational way to introduce topics
in the context of a lecture, these notes are (mostly) written in a style that you might more
commonly find in a textbook. The aim is to be precise, detailed, and technically rigorous, in
content and presentation.
This is a highly theoretical module1 , with a strong emphasis on proofs. The module is much
more about ideas and concepts than computational techniques. The aim is to learn to think
and express yourself in a precise, mathematical, way. When you are studying these notes
I highly recommend that you not only try to understand the mathematical concepts and the
reasoning and arguments behind the proofs of the various lemmas and theorems, but that you
also learn from the way the results and proofs are written down. Learning how to present your
mathematical arguments is a very important part of learning how to do mathematics. Not only
will it help you to communicate your ideas with others (which is important in many situations,
not only later in life when your job will require you to communicate your ideas to others, but
also right now, when you are studying together with your friends, or when you are writing your
exam and have to explain a piece of mathematics to the marker through your writing), but it
will also be a good test to see if you understand —really understand— the material yourself.
If you cannot clearly and concisely explain a mathematical argument to someone
else, without being vague or leaving holes in your argument, then you probably do
not quite understand the argument yourself in the first place.
Ultimately, when studying mathematics there is no substitute for doing mathematics your-
self. That includes both attempting to solve problems —and probably failing many times before
succeeding, which is not only absolutely fine, it is expected and encouraged; it is the only way to
learn— as well as writing down your solutions in a concise and rigorous manner. Studying and
understanding the way other mathematicians write their proofs, will help you in that process.
Each author has their own style of writing and I most certainly do not claim that mine is the
only valid choice, so I very strongly encourage you to pick up other books on Analysis (there
is a suggested reading list on Moodle and there are many more excellent books on Analysis
out there than are mentioned on that list) and have a look at how other authors approach the
1
or course, if you prefer the new terminology

3
subject. Most, if not all, introductory books on Analysis will cover the same topics we cover
(and probably more).
G12MAN/MATH2009 is an introduction to real analysis, mainly featuring the following:

(a) properties of functions on the real line R (in particular involving limits, continuity, differ-
entiation and integration);

(b) properties of sets in higher dimensional space Rd ;

(c) properties of functions on subsets of Rd (such as continuity).

In this module, we will not deal with functions whose domain and codomain are subsets of the
complex numbers C. If you are interested in the analysis of such functions, you can take the
modules G12COF/MATH2007 (“Complex Functions”) and G14COA/MATH4009 (“Complex
Analysis”)
This module follows on from G11ACF/MATH1005 (“Analytical and Computational Foun-
dations”) in the first year, so you need to be familiar with the material there. We will on occasion
use some results from that module or refer back to it for comparison reasons, but by far the most
important skills and knowledge from G11ACF/MATH1005 that you should be intimately famil-
iar and comfortable with are those which relate to the logic that underpins all of mathematics.
Because this is so very important for this module (and all of mathematics), we start with a brief
review of some of these very fundamental and basic ideas from G11ACF/MATH1005. This is
not meant to be an exhaustive or in-depth repetition of the things you have done before. If you
feel you need a more detailed refresher or more practice with these concepts, have a look back at
your G11ACF/MATH1005 notes and exercises and pick up a book on elementary mathematical
logic. These concepts are absolutely fundamental and are implicitly (if not explicitly) at the
heart of every piece of mathematics you will ever come across. In order to understand any
proof or mathematical argument, it is absolutely vital that the basic logical underpinnings
are second nature to you.
This module only provides the very first introduction to mathematical analysis. It is a
field with a long history and is still thriving today with active research happening every day
all over the world and with many different subfields and interconnections with other areas of
mathematics. If you like to learn more analysis, some modules you can consider choosing in
Year 3 are G13MTS/MATH3003 (“Metric and Topological Spaces”) and G13LNA/MATH3020
(“Linear Analysis”).

1.2 A brief review of some fundamental concepts from logic


We need to start off somewhat philosophically here. What is mathematics? That is actually
quite a big question and one that we will not answer here. In fact, there is a whole philosophical
subfield called “philosophy of mathematics” that is concerned with the various aspects of that
question, but I do want to briefly touch on one particular subquestion of that very big question:
what sets mathematics apart from, say, the natural sciences or engineering? One very important
and fundamental difference is the standard by which these different areas determine what is
“true”; when do we say we “know” something?
In the natural sciences empirical evidence is the ultimate arbiter2 : If a hypothesis does
not line up with what is observed in reality, the hypothesis has to go. Engineering is more
concerned with applying the knowledge obtained by other areas, such as mathematics and the
natural sciences, to solve practical problems or improve some structure or process, for example
2
I am obviously simplifying things a lot here. Anyone with an interest in, let alone an expertise in, the
philosophy of science, will have lots more to add on this topic.

4
through inventing new machines or materials, optimizing processes, or improving designs. The
standard by which an engineering application is judged is whether or not it solves the problem
or achieves the desired improvement3 .
How about mathematics then? Mathematics is fundamentally different from these other
areas in that it values internal consistency and logic over validation from an external source. In
mathematics we start from some basic fundamentals (called axioms) and build from there, only
using the rules of logical reasoning to find out what the logical consequences of our axioms are.
When we want to “prove” a statement in mathematics, we do not design a physical experiment
or a machine to test it4 , but instead we build a logical argument which takes us from the
assumptions to the conclusions in way such that every step is logically5 justified.
In a way that might initially seem somewhat paradoxical, this greater level of abstraction
of mathematics which uncouples it from any external reality6 has turned out to be one of
its greatest strengths that has made mathematics applicable in most, if not all other sciences
(natural, social, and other), as well as in engineering.
If mathematics is this abstract enterprise which uses the rules of logic to prove new state-
ments based on an accepted set of starting axioms, that raises some questions: what are these
rules and what are these axioms? Given the many caveats sprinkled through the footnotes in
this section, it should not surprise you to hear that there is actually not just one possible choice
and there are different logics and different axioms one could start from, all leading to slightly
(or wildly) different mathematics. The fields of logic and the foundations of mathematics are
very much alive and study this in great detail. For our purposes though, we will stick with those
logical rules and those axioms that most professional mathematicians use in their research and
that leads, among many other things, to the kind of mathematics you have been learning in
school and in your first year at university, and that is so widely applicable in many real life
applications. Let us have a look!

1.2.1 The logic


The logic we use is called “first-order logic” or “predicate logic”. This is the logic you learned in
G11ACF/MATH1005 (and G11FPM/MATH1008 (“Foundations of Pure Mathematics”), if you
have taken that module). Its rules tell us how we can combine statements into new statements
and how the truth value of the new statement depends on the truth values of the statements
with which you started. Of course we do not have the space, or inclination, to go into too
much depth here and we will not repeat all the content from G11ACF/MATH1005, but we will
refresh your memory on a few topics which any mathematician will encounter on a daily basis
in their mathematical work:

• How can statements be combined into new statements?

• How do we do we use quantifiers?

• From which starting point do we set out?


3
Again, there is a lot more to say on the topic. For example, there are engineers working all along the spectrum
between the scientist’s “knowledge for knowledge sake’s” end and the very practical, applied problem solving end.
4
Here I will acknowledge, and then for simplicity’s sake conveniently ignore again, the use and value of such
computer aided mathematical fields such as scientific computation and computer aided proofs.
5
By “logically” I mean here (in principle) according to the rigorous, formal kind of logic of the logician, not the
many informal ways in which this term is used in common speech. Even though we will find that in mathematical
practice the formal notation of the logician is often abandoned for a more easily readable natural langague (e.g.
English) expression of the same logical operations, it should always be remembered that the underlying concepts
are/should be as rigorously employed in a mathematician’s work as in a logician’s.
6
except possibly those bits that are required for actual humans to do maths while being part of that reality

5
P
T F
Q
T T T
F T F

Table 1: The truth value of the statement “P ∨ Q” for all different combinations of truth values
of P and Q

• What distinguishes good notation from bad notation?


• What are definitions, lemmas, and theorems?
• How do we prove something?
• Scrap paper is your friend.
Combining statements into new statements. Given two statements P and Q, we can
use logical operators to combine them into new statements. We will always assume that the
statements we start with have a well-defined truth value, i.e. each statement is either “true” or
“not true”. To make the language a bit less cumbersome, we will write “false” instead of “not
true”. Note that this requires you to be very careful in the constructions of the statements. It
occurs quite often in the imprecise language of some students that certain statements are so
badly (or confusingly) written that they cannot even be properly understood, let alone assigned
a truth value. Avoid this! It would make your argument fail before it has gotten good and well
off the ground. We will see some specific situations in which you should be careful later.
The most common logical operators you will encounter are ¬ (“not”) ∨ (“or”), ∧ (“and”),
⇒ (“implies”7 ) and ⇔ (“is equivalent to”8 ). These operators allow you to take one (in the case
of ¬) or two (for ∨, ∧, ⇒, and ⇔) statements and combine them into one new statement. From
a logical point of view, the only thing that matters (i.e. the only thing that needs to be defined)
is what the truth value is of the resulting statement, given the truth value(s) or the original
statement(s).
The operator ¬ takes one statement (P ) and turns it into a new statement (¬P ). If P is
true, then ¬P is false. If P is false, then ¬P is true. This completely defines the logical operator
¬. It just turns a true statement into a false one and vice versa. That is all it does.
Since the other logical operators above each take two statements as input (and give one new
statement as output), the truth values of the statements “¬P ”, “P ∨ Q”, “P ∧ Q”, “P ⇒ Q”,
and “P ⇔ Q”, for all the different possible truth values of the statements P and Q, can be
easily defined using truth tables; see Tables 1, 2, 3, and 4. In the tables we write “T” for “true”
and “F” for “false”.
For example, Table 1 defines the operator ∨. Along the first row we see the truth values
which P can have: true, false. Along the first column we see the truth values which Q can have:
true, false. This gives four different combinations of truth values for P and Q and the table
gives the truth value of “P ∨ Q” in each of these cases. In particular, we see that “P ∨ Q” is
false if both P and Q are false and “P ∨ Q” is true in all other cases. This is the full definition of
what ∨ means! So do not use it in other situations where it is not applicable. You can only use
∨ when you have two statements with well-defined truth values and you would like to construct
a new statement whose truth value depends on the truth values of your starting statements in
the way defined by Table 1.
7
Note that “P ⇒ Q” is also read as “if P , then Q”.
8
Remember that “P ⇔ Q” is also read as “P if and only if Q” or “P iff Q” (note the extra f to distinguish
“iff” from “if”).

6
P
T F
Q
T T F
F F F

Table 2: The truth value of the statement “P ∧ Q”

P
T F
Q
T T T
F F T

Table 3: The truth value of the statement “P ⇒ Q”

We can now also define the other operators, in Tables 2, 3, and 4.


These definitions now also allow us to combine different operators, but always remember
that each of these operators only take one or two sentences as input. So use brackets where
needed to avoid ambiguity. For example, do not write “P ∨ Q ∧ R” (where P , Q, and R are all
statements), but rather (P ∨ Q) ∧ R or “P ∨ (Q ∧ R)”, depending on what you want to say. Do
you see that those two statements are different? Can you give truth values for P , Q, and R for
which the statements (P ∨ Q) ∧ R and “P ∨ (Q ∧ R)” have different truth values?
Exercise:
(a) Prove that “[P ⇔ Q] ⇔ [(P ⇒ Q) ∧ (Q ⇒ P )]”.

(b) Prove that “[¬(P ⇒ Q)] ⇔ [P ∧ (¬Q)]”.


It should also be noted that ⇒ has a very specific meaning. Do not use it as a shorthand
for the words “thus”, “therefore”, etc.

Quantifiers. Some mathematical phrases contain a variable. For example, “x+1=6” con-
tains the variable x. We could denote such a statement in general by P (x). Note that, while
I just called this a statement, it is not (yet) a statement in the sense of the statements above,
because we cannot (yet) assign it a truth value, because we do not have enough information yet
about the variable x. There are three options here:
1. The variable x could have been defined earlier in the same mathematical argument. For
example, if we already know that x = 4, then “x+1=6” does have a truth value (“false”).

2. We might be intending to say that there exists an x (within whatever our set of interest
is) for which P (x) should be considered. In that case we use the quantifier ∃ (“there
exists”). In our example above we could write “∃x ∈ R P (x)” (i.e. “∃x ∈ R x + 1 = 6”)
which would be true. Note that for the determination of the truth value it is important
which set we are considering. For example, “∃x ∈ R \ Z x + 1 = 6” is false.

P
T F
Q
T T F
F F T

Table 4: The truth value of the statement “P ⇔ Q”

7
3. We could also be intending to say that P (x) should be considered for all x (in the set of
interest). In that case we use the quantifier ∀ (“for all”). For example, “∀x ∈ R P (x)”
(i.e. “∀x ∈ R x + 1 = 6”), which is false. Note again that the specification of the set is
very import for the determination of the truth value. The statement “∀x ∈ {5} x + 1 = 6”
is true.

In the second and third case above, we say that the variable x in P (x) is unbound. In the first
case we say it is bound. A statement P (x) which contains a (or more) unbounded variable(s)
cannot be assigned a truth value. From the point of view of (first-order) logic it is a meaningless
statement and should not be used. All variables in your statements should be either bound or,
if they are unbound, should be quantified over.
This is also a good point to emphasise again that there is a difference between a statement
which is false, such as “∃x ∈ R \ Z x + 1 = 6”, and a statement to which no truth value can be
ascribed, such as “x+1=6” (in the absence of a prior definition of x). The former can be used
(and indeed can be useful; for example, when constructing a proof by contradiction, the goal is
to arrive at a statement which is a false, namely a contradiction), the latter has no place in a
mathematical argument.
As the above suggests, there are only two quantifiers you will need: ∀ (“for all”) and ∃
(“there exists”). It is in the best interest of clarity if you stick with these names as other ways
of pronouncing these operators can be ambiguous. For example, some people use the phrase “for
a(n)” to mean “there exists” (as in , “for an x ∈ R, x + 1 = 6”), while others use it to mean “for
all” (as in “for an x ∈ R, x2 ≥ 0”). Similar ambiguity arises when using the phrase “for some”.
Avoid such phrases and their associated ambiguity and just use “for all” and “there exists”.
Sometimes you might also see ∃! (“there exists a unique”). The notation “∃!x ∈ S P (x)” (where
S is a set) is convenient shorthand for “[∃x ∈ S P (x)] ∧ [∀y ∈ S (P (y) ⇒ y = x)]”. In words:
“there exists an x ∈ S such that P (x) is true and if P (y) is true for y ∈ S, then y is equal to
x”, i.e. “there exists a unique x ∈ S such that P (x) is true”.
When there are multiple variables in a statement and for some varables ∀ is used, while ∃
is used for others, then the order in which the quantifiers are placed is very important! The
statement “∀x ∈ R ∃y ∈ R y = x” is very different from the statement “∃y ∈ R ∀x ∈ R y = x”
mean very different things. In fact, the former is true, while the latter is false. Do you see why?
It is also good form to put quantifiers at the start of a statement, not at the end, so
“∀x ∈ R x2 ≥ 0” instead of x2 ≥ 0 ∀x ∈ R”. This is something to which many mathematicians
do not even stick (probably because when pronouncing the statement, it is much more natural
to say “x2 is greater than or equal to 0 for all x in R” instead of “for all x in R, x2 is greater
than or equal to 0”), but when writing mathematical/logical statements to preference is to put
quantifiers at the start. It also helps in making the order of the quantifiers clearer when there
are more than one.
It is important to know how quantifiers interact with negations. If S is a set and P (x) is a
statement dependent on an unbound variable x, then

• [¬(∀x ∈ S P (x))] ⇔ [∃x ∈ S (¬P (x))];

• [¬(∃x ∈ S P (x))] ⇔ [∀x ∈ S (¬P (x))].

1.2.2 The axioms


From what place do we start? There are two different answers here, or really, there are two
different questions: “From what place does (modern day, rigorous, standard) mathematics
start?” and “From what place do we start in these notes?”

8
The short answer to the first question is “from the axioms of set theory”. The longer answer
is very interesting, but falls very much outside the scope of this module. If you are interested
in finding out more about these areas, grab a book about set theory and the foundations
of mathematicas, or take a look online, and get lost in the wondrous world that underlies
mathematics. You have only seen the tip of the iceberg in your exploration of set theory (and
its role in providing a foundation for mathematics) in G11ACF/MATH1005.
The second question is of more immediate importance to us now. The in-depth answer will
be given in Section 1.3, where we give an overview of the basic mathematical concepts and
objects which we expect you to be familiar with and which we use as a starting point in these
notes. As a short answer, we can say that we expect you to know basic set theory, to be familiar
with the notion of a function, with some specific sets of numbers (such as N, Z, Q, and R,)
and with the additional algebraic structures which we can put on such sets (such as addition,
multiplication, subtraction, and division between elements of R).

1.2.3 Notation
When choosing choosing the notation in which to express your mathematical ideas, there are
two key elements to keep in mind: correctness and clarity.
Correctness has to do with the mathematical content. Your logical argument should be
valid and its mathematical content true. All the steps in your argument should be explained
in sufficient detail. Correctness of your argument should be your first concern. Clarity of
presentation, however, is also very important. If you have constructed a correct mathematical
argument, but you cannot present it in such a way that your intended audience (whether that
is a fellow student or the marker of your exam, or someone else) understands it, then that poses
a problem. If nothing else, it will definitely cost you points on your exam, since the marker can
only give points for what can be understood. Here is a (non-exhaustive) list of advice to keep
in mind

1. You should always keep your intended audience in mind. When you explain a piece of
mathematics, it makes a difference if you do so to your friend who is not a mathematics
student, to your other friend who is a mathematics student, to the marker of your course-
work or exam in your written solutions, or to the rest of the mathematical community in
a research paper. Who your audience is, will for a large part determine how many and
which details you should give and which you can leave out.

2. You should always define your notation, before you use it, unless it is standard notation
(and what counts as standard notation, is of course again quite heavily dependent on who
you r audience is). For example, on an exam you can usually use the plus sign (+) without
any explanation. Any exam marker will know what that means (unless perhaps you are
taking a exam in group and ring theory and you need to specify what your “addition”
operation means). But if what you are adding together are x and y, don’t forget to define
first what x and y are. For example, “let x, y ∈ R, then x+y ∈ R” or “let x = 4 and y = 2,
then x + y = 6” are both statements with the right amount of explanation of notation,
whereas “x + y = 6” by itself, without prior mention of x or y, is not. This might seem
obvious right now, but neglecting to explain notation is a very commonly made mistake.
Especially when in the notes a particular notation has been frequently used, students often
tend to assume that the notation is standard. For example, even though f is a commonly
used letter to denote a function, you would still need to announce that you are using it for
that purpose. For example, you could write “let f : R → R be a function...” to indicate
f is a function with domain and codomain both equal to R. Do not just use f without

9
prior introduction, hoping that your reader will somehow understand what you mean by
it.

3. Do not be afraid to use English words, but make sure that your statements are not am-
biguous. Strictly speaking, you should be able to express any rigorous mathematical
argument completely in the formal language of logic. If you look at mathematical text-
books or papers (as well as these notes), you will see, however, that this is usually not how
mathematical results and proofs are presented. This is because (and I hope this does not
shock you too much), mathematicians are humans. Even a simple proof can be very long
and very hard to read if it were written completely in formal language. We tend to find it
much easier to understand texts which are written in a natural language (such as English,
or whichever other natural language we can understand). In fact, most mathematicians
would have a hard time understanding a proof that was presented to them completely
in formal language. So use English! Words such as “thus”, “therefore”, “because”, “if”,
“then”, “and”, and “or” are really useful to indicate the structure of your argument, as
long as you are carefull to use them correctly. Words such as “and” and “or” and a con-
struction such as “if ... then ...” have a very specific meaning in a mathematical context
(see Section 1.2.1), so make sure you use them correctly! When you read and study the
proofs in these notes, you will hopefully get an idea how natural language can be employed
both to elucidate the structure of your argument and as a sort of shorthand to convey
ideas that would be very long and very hard to read were they written down in formal
logical notation. The downside of using natural language is that it is less precise than
formal language (which is why formal languages are used in logic) and so you need to
be extra careful that your statements are clear and not ambiguous. A necessary, but
not sufficient, condition for this to be the case is to use correct grammar! If a
sentence cannot be properly understood from a grammatical point of view, it is unlikely
that it will be able to convey mathematical meaning with the required level of detail and
precision. This is something that many students dislike, but it is true nonetheless: You
need to write correct English9 in order to write correct mathematics. Details
are very important in mathematics and your writing should reflect this.

1.2.4 Definitions, Lemmas, Theorems, Propositions, and Corollaries


Reading mathematics, you will frequently encounter definitions, lemmas, theorems. proposi-
tions, and corollaries. But what do these words mean and what are the differences between
some of these terms?

• Definition. The definition of a mathematical concept is the precise and complete descrip-
tion of the meaning of the concept. It is very important to realise that a mathematical
term or symbol means exactly what it says in its definition, no more and no less. This is
a difficult concept to grasp for many students, especially when the term which is being
defined also has a meaning in everyday English. It is crucial in that case to forget (as
it were) the everyday meaning of the word and only use the term (in a mathematical
context) in a way which is consistent with its definition. For example, in Definitions 8.1
and 9.1 in these notes, we define the concepts “open set” and “closed set”. Even though
the words “open” and “closed” are also very common in everyday speech, they have very
specific and precise meanings in a mathematical context (as given by the definitions) and
9
both in your current role as a student at an English university, as well as potentially later in life as a pro-
fessional mathematician, since English is the language in which mathematical ideas are currently communicated
and published in the international community

10
you should not let their everyday usage influence the way you use them in mathematics.
For example, in everyday English “open” is usually the opposite of “closed”. This is not
the case for their mathematical meanings. The meaning of a mathematical term or
concept is completely determined by its definition. You should not bring extra
meaning which isn’t present in the definition, nor forget about things that are part of its
definition.
There is an issue of language related to definitions which is worth pointing out. For
example, have a look at the first definition we will encounter in these notes, the definition
of a sequence of real numbers in Definition 3.1. It says “a sequence (xn ) in R is a
non-terminating ordered list of real numbers”. We could also have phrased this as “if
(xn ) is a non-terminating ordered list of real numbers, then (xn ) is a sequence in R”
or “(xn ) is a sequence in R if (xn ) is a non-terminating ordered list of real numbers”
or “(xn ) is a sequence in R if and only if (xn ) is a non-terminating ordered list of real
numbers”. We know from Section 1.2.1 that strictly speaking these sentences all mean
different things, after all the statements “P ⇒ Q”, “Q ⇒ P ”, and “P ⇔ Q” are all
substantially different. Definitions, however, are special beasts and they have a distinctly
separate logical standing. No matter which of the previous phrases is used, when it
happens in the context of a definition (and in no other case!) they all mean the same
thing; they are all understood to introduce mathematical meaning to the phrase “(xn ) is
a sequence in R”. Before the definition is given, this phrase has no meaning yet and thus
the setting in which we might say such things as “if P , then Q”, “if Q, then P ”, or “P iff
Q” is subtly, but importantly, different when dealing with a definition then when dealing
with any other mathematical context. Which formulation people use in definitions is a
matter of taste, but the meaning in the context of a definition is the same: the phrase
“(xn ) is a sequence in R” means “(xn ) is a non-terminating ordered list of real numbers”.
Nothing more and nothing less.
In the context of definitions it is also useful to talk about the difference between the
equality sign = and the symbol :=. The former indicates that two mathematical objects,
which have been defined before, are the same. The latter indicates that the symbol which
appears on the left hand side of := is defined to be equal to the object on the right hand
side (sometimes you might also see =: where the roles of the left and right hand side are
interchanged). So, for example, I can define x := 1 and y := 2, in which case x + 1 = y.
Not all authors use := for definitions. Some will just use = instead.

• Lemma and theorem. Lemmas and theorems have the same logical status. They
are both mathematical statements (or collections of mathematical statements) which are
proven. In a mathematical text a lemma or theorem is usually immediately followed by its
proof, although sometimes the proof precedes the result or the proof is deferred to a later
part of the paper or book. Where that is the case, it is usually because this improves the
clarity of the presentation. So, if there is no strict logical difference between a theorem
and a lemma, why do we have two different words for them? This is again for presentation
purposes. Theorems are usually the main results in piece of mathematical writing. For
example, they might be the important new pieces of mathematics that form the core of
a paper. Lemmas are usually smaller results which are needed in order to finally get to
the important theorem. This does most certainly not mean that you can be less precise
or rigorous when proving a lemma than when proving a theorem. Absolutely not. You
might wonder why you would not just put every result which is needed for the theorem
into the proof of the theorem, instead of into a lemma. There can be various reasons for
this. Sometimes a lemma gives a result that is interesting in its own right, not just in

11
service of a theorem. Or perhaps it is a result that is used multiple times in the proofs
of other results and putting it in a lemma makes it easier to refer to later. Or perhaps
the proof of the theorem is just so long and complicated, that it becomes much more
readable by stating some subresults as their own separate lemmas. Finally, it should be
mentioned here that sometimes the results from lemmas turn out to be really important
and interesting by themselves, yet the name “lemma” sticks, such as with the famous
“Zorn’s lemma” and “Urysohn’s lemma”.

• Proposition A proposition has exactly the same logical status as a lemma and a theorem.
It is also a proven mathematical statement (or collection of mathematical statements).
Some authors use “theorem” and “proposition” interchangeably (or just use one of the
two), others use “proposition” for a smaller, independent, result (which would make it,
in a sense similar to a lemma in the sense that it is not the major, main result in a text,
but different from a lemma in that it is not directly in the service of a theorem). In these
notes I have avoided the use of “propositions”.

• Corollary A corollary, finally, also has exactly the same logical status as a lemma, theo-
rem, and proposition: It is a proven mathematical statement (or collection of mathematical
statements). It is called a corollary to indicate that it is a result which follows immedi-
ately, or via a very short proof, from a theorem (or sometimes a lemma or proposition)
which was proven earlier. It is typically an interesting result which is worth mentioning,
but whose inclusion in the theorem from which it follows, would make the presentation a
bit more awkward; so it is given its own spotlight as a separate corollary. A good example
in these notes is given by Theorem 11.8 and Corollary 11.9. The theorem says that the
uniform limit of a sequence of continuous functions {fn } is continuous. The corollary
concludes from that, that in that case the order of the limits n → ∞ and x → a can be
interchanged without changing the value of limn→∞ limx→a fn (x), which is a very useful
and interesting result in itself. (Don’t worry if you do not yet understand the mathematics
here; this will be explained later in the notes. It is brought up here just as an example of
the role of a corollary.)

1.2.5 Proofs
Proofs are the lifeblood of mathematics and the bread and butter of mathematical practice. A
(correct and rigorous10 ) proof is what shows a mathematical statement to be correct. Proofs
are essential to mathematics and in this module you will be required to understand proofs and
produce them yourself. Moreover, you will need to development the mathematical maturity and
confidence to be able to judge if a proof (either your own or someone else’s) is correct and, if it
is not, to pinpoint where the error(s) is (are).
So what is a proof and how do you know if a proof is correct? A proof is a chain of logically
correct statements which starts from accepted starting points (which can be axioms or results
which have already been proven, or, in our case, the concepts and ideas which we expect to be
known at the start of the module — see Section 1.3). Not only coming up with a proof can
be difficult, but also checking if a proof you have deviced is indeed correct can be a non-trivial
task (and it is one that you should practise as well, for example by reading —and checking—
very carefully other people’s proofs, such as those in these notes). A frequently asked question
is how much detail a proof should contain. In principle a proof should be reducible to a long
chain of very elementary logical steps, but in practice you don’t write all of these down. If you
had to provide the full details every time you used “1 + 1 = 2” your proofs would be very long
10
of course

12
indeed. As we already discussed earlier in Section 1.2.3, the amount of details to give depends
in large part on the audience you are writing for. In the specific case of an exam for this module,
you are writing for someone whose task it is to find out how well you understand the material
of this module. So details that relate to material of this module, should be given, but details
that relate to earlier, more basic material can be assumed known. It is difficult to give hard
and fast rules for this, but hopefully you get a feeling for the right level of your proof writing,
by studying the proofs in these notes. They can serve as good examples, especially the proofs
later in the notes (early on in the notes I have often written out extra details in the proofs to
help familiarise you with certain concepts; later on in the notes the proofs often become more
concise).
Every year there are some students who seem to think that longer proofs are better proofs.
That is certainly not the case. There is nothing that can show your understanding of a proof as
effectively as a very short, concise presentation of the proof. Constructing a proof that contains
all the relevant details, but does not get bogged down in extraneous matters, requires clear
understanding of the material. Not seldom the best proofs are among the shortest. This of
course does not guarantee that a short proof is good; do not leave out required, crucial steps!
It is always a good idea, after you have found a proof, to see if you can simplify the argument
or the presentation.
In these notes we will assume that you are familiar with the common proof techniques which
you learned in G11ACF/MATH1005. I will not go into a lot of detail here, but as a reminder,
let me list a few. Look back at your notes from Year 1 if you need a refresher on these different
methods.
1. You can prove a statement of the form “P ⇒ Q” via a direct proof, i.e. by assuming P
and using logical deduction to arrive at Q.
2. You can prove a statement of the form “P ⇒ Q” also by proving its contrapositive,
“¬Q ⇒ ¬P ”, which is an equivalent statement (prove this, using the truth tables from
Section 1.2.1). Do this by assuming ¬Q and proving 6= P .
3. You can prove a statement of the form “P ⇒ Q” also via a proof by contradiction:
assume P ∧ (6= Q) and deduce a contradiction. If you did Exercise 1.2.1 in Section 1.2.1
you should be able to understand why this is a valid proof technique.
4. You can prove a statement of the form “P ⇔ Q” by proving both “P ⇒ Q” and “Q ⇒ P ”.
5. You can prove a statement of the form “∀n ∈ N P (n)” by mathematical induction, i.e.
by proving both “P (1)” and “∀k ∈ N [P (k) ⇒ P (k + 1)]”.
6. If S is a set, you can prove a statement of the form “∀x ∈ S P (x)” by proving “P (x)” for
a fixed, arbitary element x from S. This means that you can only use properties of x in
your proof that are shared among all elements of S. Proofs of this nature typically start
with the phrase “Let x ∈ S.”
7. If S is a set, you can prove a statement of the form “¬[∀x ∈ S P (x)]” by proving the
statement “∃x ∈ S (¬P (x))”, i.e. by providing an x ∈ S which serves as a counterexample
to “P (x)”. See also the end of Section 1.2.1. In practice a counterexample can be useful
to disprove a statement of the form “P ⇒ Q”; in that context a counterexample is an
example for which P is true, but Q is false (it satisfies all the assumptions, but not the
conclusion of the statement).
8. If S is a set, you can prove a statement of the form “¬[∃x ∈ S P (x)]” by proving the
statement “∀x ∈ S (¬P (x))”. See also the end of Section 1.2.1.

13
Which proof technique is necessary or useful in any given situation is up to you to find out. Again
the only piece of real advice that can be given here is: practise, practise, practise! Experience
will help you choose an appropriate approach.

1.2.6 Scrap paper


It might be strange to encounter a section about something seemingly as trivial as scrap paper,
after heavy topics like logic and proofs, but there is a very good reason to include this section
here. Up until now we have been occupied with the formal aspects of mathematics, with making
sure that once we have a good mathematical idea, we know how to write it down correctly and
how to check that it is indeed correct in the first place. But good mathematical ideas do not
appear magically out of nowhere fully formed in your (or mine, or anyone’s) mind. Coming up
with good ideas requires practice, trial and error (lots and lots of trial with plenty of error),
(self-)correction, questions for and conversations with others, and sometimes simply a stroke of
good luck. Often geometrical representations or visualisations of the problem, such as plots of
graphs, can also help to guide your thoughts. All the rules and writing tips which we discussed
in the sections above relate to the final, clean versions of your mathematical ideas, the versions
which you hand in to be marked (and the versions which professional mathematicians send off
to be peer reviewed and, hopefully, published). But those polished, final versions, are rarely
the first thing you put to paper. Like any creative endeavour, the final output is the result of
many attempts, some of which may fail completely, while others move you a little further in the
right direction. Just like your favourite band didn’t go from scratch to fully recorded album in
one try, you usually won’t be able to write down a fully complete, rigorous, well-written and
readable proof in one go. Scrap paper is your friend! Try things, talk to others, try and explain
your solution (or you difficulties!) to your fellow students. The final part of the process, that is
the turning of an idea into a mathematically rigorous proof, depends on the strict logic we have
discussed, but the part before that, the part when you generate the ideas in the first place, is
completely open. Whichever way you prefer to navigate the mathematical landscape is up to
you, as long as eventually the ideas you generate stand up to rigorous scrutiny and you write
them down properly in the ways discussed before for others to read and understand.
It is also good to keep in mind that the material in these notes and in any textbooks you
might read didn’t spring into being overnight. Mathematical analysis is a field in mathematics
with a long and interesting history. The results, proofs, ideas, and concepts you will learn
about in these notes are the product of many mathematicians over many centuries coming up
with new ideas, discussing them with others, and gradually fitting everything into a rigorous
framework. Moreover, the specific ways in which these results are presented in these notes have
come about through many years of my colleagues and myself teaching this material and trying
to find appropriate ways in which to present it. None of this is simple, or trivial, and there is
always more to say about any topic if you dig deeper, even more than is already presented in
these notes. So you shouldn’t be discouraged if it takes you some time to get to grips with this
material. Keep at it, keep trying, keep learning from mistakes, and then there is a really good
chance you will get there. You have the mathematicians of the ages helping you.

1.3 What do we use as our starting point?


Since our goal is to build up our theory of mathematical analysis rigorously from the ground
up, we have to decide what “the ground” is which we start from. As we discussed very briefly
in Section 1.2.2, we will not actually start all the way at set theory, but rather we assume some
higher level concepts which you are already familiar with from Year 1 (or possibly even from
before that). In this section we list these concepts. Of course (as has been made very clear

14
to you by now, hopefully) on top of familiarity with these concepts, we also expect you to be
comfortable with the relevant logical concepts and constructs that have been discussed so far.
It should also be mentioned that, since G11ACF/MATH1005 and G11CAL/MATH1006 are
necessary prerequisites for this module, we expect you to know the material from those modules.
Not all the material in those modules was rigorously justified in Year 1, however, and so we
cannot take those modules as starting points for mathematical analysis. We need to take a
few steps back and start building our rigorous theory from much more fundamental building
blocks. You might therefore see in these notes that sometimes we repeat proofs from Year 1, or
we prove results which you have seen in Year 1, but which were not proven rigorously at that
point. Of course you will also see plenty of new material that you have not yet seen at all!
These are the ingredients which we will consider our basic starting points.

1. The notions and notation from basic set theory, such as: “set” (and the various ways of
denoting and defining a set), “empty set” (∅), “element of’ (∈)’, “subset’ (⊂11 )’, “superset”
(⊃), “union’ (∪)’, “intersection” (∩), “complement” (both B \ A and Ac , where A and
B are sets; do you understand the difference between these two notations?), “Cartesian
product” (×), “A meets B” (i.e. A ∩ B 6= ∅), de Morgan’s laws (if A and B are subsets
of the same set X, then (A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c ).
Note that unions and intersections are not restricted to two (or even countably many)
sets. If X and I are sets, and, for all i ∈ I, Ai ⊂ X, then we can define the union
[
Ui := {x ∈ X : there is an i ∈ I such that x ∈ Ai }
i∈I

and the intersection \


Ai := {x ∈ X : for all i ∈ I, x ∈ Ai }.
i∈I

2. The notion of “cardinality of a set”, specifically the difference between a finite set, a
countably infinite set, and an uncountably infinite set.

3. The following specific, often used, sets and the standard algebraic operations (and corre-
sponding symbols: +, −, /, ·) and orderings (and corresponding symbols: ≤, <, ≥, >)
which can be defined on them: N, Z, Q, R. Also standard properties, such as “parity”
(“even” or “odd”) for elements in N and Z, “sign” (“positive”, “negative”) of all non-zero
numbers in N, Z, Q, R, and properties such as “non-negative” or “non-positive” for all
numbers in those sets.

4. The meaning of the equality sign =.


P Q
5. The meaning of the sum and product symbols and .

6. Notation for specific numbers, such as 2, π, and e in R.

7. Specific subsets of R, such as the “intervals”: ∅, (a, b), [a, b], (a, b], [a, b), (−∞, a), (−∞, a],
(a, ∞), [a, ∞) (where a, b ∈ R). The empty set ∅ and the interval [a, a] = {a} (which is
a “singleton”, i.e. a set containing exactly one element) are called “degenerate intervals”.
Any interval containing two or more elements is called “nondegenerate”. In all the intervals
11
Some authors prefer the notation A ⊆ B to denote that A is a subset of (and possible equal to) B and they
will use A ⊂ B to denote A being a strict (sometimes also called proper) subset of B (i.e. A ⊆ B and A 6= B).
I will not use that notation in these notes. I will use A ⊂ B to denote that A is a subset of (and possibly equal
to) B and, where it is needed, I use A ( B to denote A being a strict subset of B.

15
above, (where present) a, b, −∞, and ∞ in the above intervals are “endpoints” of the
interval. (Note that −∞ and ∞ are of course not ‘points’ or numbers in any usual sense
of the word.)

8. Terminology such as “bounded”, “bounded above”, and “bounded below” for subsets of
R.

9. The concepts of “maximum” (max), “supremum” (sup, least upper bound), “minimum”
(min), and “infimum” (inf; greatest lower bound) for subsets of R.

10. The density of Q in R: For all x, y ∈ R there exists a q ∈ Q such that q ∈ (x, y). The
density of Qc in R: For all x, y ∈ R there exists a p ∈ Qc such that p ∈ (x, y).

11. The so-called “completeness property” or “greatest lower bound property” (and the equiv-
alent “least upper bound property”) of nonempty subsets of R: every nonempty subset of
R which is bounded above (below) has a least upper bound (greatest lower bound).

12. The definition of a “function”, f : A → B (or f : A → B, x 7→ f (x)), where A and B are


sets. Also the definition of function related concepts such as “domain” (A, if f : A → B),
“codomain” (B, if f : A → B), “image of a subset of the domain” (f (C), if C is a subset of
the domain), “range” (f (A), if A is the domain), “pre-image of a subset of the codomain”
(f −1 (D), if D is a subset of the codomain), “injective” (or “one-to-one”), “surjective”
(or “onto”), “bijective”, “inverse function” (f −1 , if it exists; avoid confusion with the
pre-image!).

13. Operations that can be performed on functions f and g (and the conditions under which
they can be performed!), such as addition (f + g), subtraction (f − g), multiplication (f g),
division ( fg ) and composition (f ◦ g) of functions.12

14. Specific types of functions: constant functions, the identity function, polynomials, (in-
cluding multivariate polynomials), (multivariate) rational functions, square root function.

15. Roots of a polynomial (i.e. solutions x to the equation p(x) = 0, for a polynomial p) and,
for polynomials of degree 1 and 2, how to find them.

16. The factorial n! of a natural number n ∈ N. Also the binomial theorem, which says that,
for all x, y ∈ R and for all n ∈ N,
n  
n
X n n−k k
(x + y) = x y ,
k
k=0

n n!

where k = (n−k)!k! for 0 ≤ k ≤ n (k ∈ N ∪ {0}).

You might have noticed in point 14 above that we did not include trigonometric functions
(sin, cos and any functions derived from those) or the exponential function e· (or any functions
derived from it, such as its inverse, the natural logarithm log) in the list. This is because these
functions have not been rigorously defined in your Year 1 modules and we do not need them
to build our theory in this module (so we won’t spend the time to define them rigorously here
either). Thus, we will not use them as building blocks for our theory. You will notice, however,
12
You might want to also list here multiplication of a real-valued function f by a scalar c ∈ R (cf ) and addition
of a function and scalar (f + c), but those can be seen as special cases of multiplication and addition of functions,
respectively, by interpreting the scalar c as a constant function.

16
that sometimes we will use them in examples to illustrate the theory. This is because these
functions are often quite useful to in that context and your non-rigorous knowledge of them will
suffice to understand the examples. If this lack of rigour worries you, than (1) very good! you
start to think like a mathematician, and (2) for the sake of this module rest assured that these
functions can be rigorously defined, even if we do not do so in this module, and feel free to look
up their rigorous foundations yourself.

1.4 Rough lesson plan


The semester has 22 one-hour lecture slots. It is important to attend the lectures and preferably
come in prepared, having read part or all of the sections of the notes that relate to the lecture,
so that you get more out of the lecture and can ask specific questions. Attending the lectures,
however, is only the beginning, not the end of your involvement with this module.
There is only so much we can talk about in 22 hours. Details are very important in this
module, because details are what make or break a proof. We will not have time to go through
all the details in the lectures, so it is very important (and expected) that you will be reading
and studying these notes as well. Table 5 gives a rough schedule what I expect to be able to
do in the lectures. This is not set in stone. Some days we will be going a bit faster than the
schedule and others (more likely) a bit slower. This schedule is just meant to give a general
overview of what to expect.

Lectures Section(s) Topics Broader topic


Year 1 1 required background
background and introduction
modules
1–2 2 introduction to the module;
motivational examples (why do we
need rigour?)
2–3 3 sequences in R
sequences
3–5 4, 5 distance in Rd ; sequences in Rd
5–6 6 subsets of Rd and their boundaries
6–7 7 interior points of subsets of Rd
subsets of Rd
7–8 8 open subsets of Rd
8–9 9 closed subsets of Rd
9–12 10 continuous functions on subsets of
functions on subsets of Rd
Rd
12–13 11 convergence of sequences and series
of functions
14–15 12 functions on the real line
15–18 13 differentiability on the real line functions on subsets of the real line
18–20 14 the Riemann integral
21–22 – extra time if we have not finished buffer and revision
all material in 20 lectures; revision;
final chance for questions about
any part of the module!

Table 5: Tentative schedule

You will have noted that these notes also contain two appendices. Appendix A contains
mostly material that should already be familiar from your Year 1 modules, but which you
might not have seen presented in full detail and completely rigorous before. This appendix

17
aims to provide the same level of rigorous for that material as for the material in the main
text. It is not optional material, but rather it provides a new look at (what should be) familiar
material.
Appendix B gives optional material. This appendix contains proofs of results from the main
text. You should know the results, but will not be asked to reproduce those proofs. That being
said, with a few exceptions (such as Section B.2, which uses complex numbers which fall outside
the scope of this module), most of the material that is presented in Appendix B is made up of
elements that are all part of the main (non-optional) content of the module. It definitely will
not hurt to read, study, and try to understand the material in Appendix B. It could help you
to better understand the main material of the module.
Of course attending the lectures and reading the notes will not get you all the way to
understanding the material. The only way to truly learn mathematics is to do it! To repeat an
oft-quoted sentiment expressed by the mathematician George Pólya: “Mathematics, you see, is
not a spectator sport. To understand mathematics means to be able to do mathematics.” [1]
The main source of practice material for you will be the exercise sheet(s) that are part of this
module’s lecture material and which you can work on in the workshops and at home. You will
be able to hand in some of the exercises to get feedback. When you are working on the exercises,
do not turn to the solutions too quickly! A vitally important part of doing mathematics is the
struggle. As was mentioned before, it is absolutely normal to have to wrestle with this material
and to make plenty of mistakes as you go along. That is all an indispensable part of the learning
process. I would advise to try to solve your exercises for at least a week or two, before looking
at any solutions. And by “try”, I do not mean “look at it once and then wait two weeks”, but
actively think about it, discuss it with fellow students, ask the people involved with the module
questions about it.
It is also good to note that some of the questions on your exercise sheets (as well as the
solutions) are written by different people (and not all of them the author of these notes). So
you might run into presentation styles that you like better or worse than others and hopefully
you will find a style that you are comfortable with yourself, both to read, but also to write
in yourself. That it absolutely fine, as long as you have the required rigour and mathematical
correctness in your writing. In fact, I strongly encourage you to pick up textbooks from the
library as well, both to study the theory and to find more exercises to practice. There is a list
with suggested titles on Moodle, which has been compiled by various people who have taught
this module throughout the years, but most if not all books about mathematical analysis will
contain the material of this module. So take a visit to the library, look at the analysis books,
and pick one or two whose writing style you like. Seeing the same material from a different
point of view can be really helpful!
Let me also address the issue of examples. Examples are found all through these notes and
they are usually intended to illustrate some important point or show some application of the
general theory to a specific case. These notes are, however, first and foremost a place where
the theory is being explained. Sometimes students misinterpret this as “there are no (or not
enough) examples”. It is important to realise, however, that the main focus of this module is
on proofs. This module requires you to be able to understand and produce proofs yourself.
So every single proof in these notes can be seen as an example. When you have studied these
notes for a few week, you will (hopefully) come to realise that the structure of many of the
proofs in these notes is very similar. Of course not all proofs are exactly the same, but they
are often actually not as different as you might think at first. Scratch beyond the surface and
you will discover many commonalities. If you can come to realise that many of the proofs in
these notes share a common structure, you are on the right path. If you see each proof as its
own separate thing, these notes will appear overwhelming, but they do not need to be! Look

18
for the big picture in the details. These notes will try to guide you along the way to see this big
picture, but it does require effort on your part: For each new result and proof, try to fit it into
the overall structure that is being developed. Make sure you understand why it is relevant, how
it fits, and why the way the proof is built is a natural way of going about proving the result.
Finally, before we get to the main part of these notes, let me put your mind at ease if you
think these notes have too many pages. A large part is taken up by the introduction and the
appendices. The main part of these notes (Sections 2–14) only runs from page 20 to page 113.

19
2 What is mathematical analysis and why are we interested in
it?
Mathematical analysis (or “analysis” for short) studies mathematical concepts that have to do
with limits, especially in the context of functions (and in this module specifically, real-valued
functions). Such concepts include differentiation, integration, sequences, and series. This study
has a long history, but the foundations for its current modern form were laid in the the 17th
century with the development of analytic geometry and calculus. In the following centuries a
lot of work was done expanding the theory and introducing new concepts that nowadays we
take for granted (such as “function”). In the 19th century mathematicians started realising that
there was a need to be really rigorous when dealing with these limit concepts (and with notions
such as “function” and “set”). In this section we will have a look at some examples of what
goes wrong if we are not fully rigorous!
The examples in this section contain limits, sequences, series, and derivatives. If you think
that means I am not following the rules I laid out in Section 1, because I am using concepts I have
not rigorously defined yet, then you are right. I am presenting these examples, however, in order
to motivate the need for rigour. We have not yet started our systematic build-up of our theory
(that will start in Section 3. Since you should have at least a working (if perhaps not entirely
rigorous) knowledge of limits, derivatives, sequences, and series from G11ACF/MATH1005, you
should still be able to understand these examples and hopefully pick up on the need for rigour
which they are intended to illustrate.
Limits are at the heart of many ideas and concepts from mathematical analysis. Limits are a
way to rigorously deal with concepts such as “infinitesimally small quantities” (a quantitiy less
than every other positive quantity, yet still larger than zero) and “infinitely large quantities” (a
quantity larger than every other quantity), which were used in the early days of analysis and
calculus. They might, at first glance, have an intuitive appeal, but it should take too long to
realise that it is problematic to give precise definitions of such concepts13 and, as it turns out,
it is not necessary to work with these concepts. Everything we would want to do with them in
analysis, we can do with limits which can be rigorously defined as we will see in this module.
(In fact, you have seen the rigirous definition of limit already in G11ACF/MATH1005. Do you
remember it?) The best way to ensure that you make plenty of mistakes in analysis is to treat
∞ (or an infinitesimal small quantity) as numbers. They are not and every time we encounter
the symbol ∞ we need to carefully keep in mind how it is defined in the context in which we
are encountering it.
Now let us turn our attention to the examples. We will review some of the basic limits from
G11ACF/MATH1005, and in particular look at what happens when these are combined.
In G11ACF/MATH1005 you encountered various different types of limits, such as the limit
of a sequence of numbers and the limit of a function. An example of the former is
3n + 4 3 + 4/n 3+0 3
lim = lim = =
n→∞ 4n + 7 n→∞ 4 + 7/n 4+0 4
(where we used the algebra of limits from G11ACF/MATH1005). As an example of the latter,
can you compute
lim e−1/x ?
x→0+
We will have a very close look at the rigorous definitions of these two types of limits and the
validity of the algebra of limits in Sections 3 and 10 and in Appendix A, but for now, let us ask
13
In fact, there is a branch of mathematics called “nonstandard analysis” which (successfully) attempts exactly
that. This, however, falls far outside the scope of this module.

20
Figure 1: Plot of the function g100 from (1) (plotted with MAPLE)

the question “what happens if we mix these ideas?” Are


   
n n
lim lim x and lim lim x
n→+∞ x→1− x→1− n→+∞

the same? In other words, does it make a difference if we first compute the limit for x → 1− and
then the limit for n → ∞, or if we first compute the limit for n → ∞ and then for x → 1−? If
you do the computations correctly, you will find that in this case we do find two different values,
so it does make a difference! A new question we can now ask is “Does it make a difference in
every case, or does that depend on the functions (and limits) under consideration?” It turns
out the answer is closely related to the difference between pointwise and uniform convergence
of sequences of functions, which we will study closely in Section 11.
Another situation in which you have seen limits being used in G11ACF/MATH1005 is in
the definition of the derivative of a function. For example, if f : R → R, x 7→ x2 , then the
derivative of f at a ∈ R is

f (x) − f (a) x2 − a2
f 0 (a) = lim = lim = lim (x + a) = 2a.
x→a x−a x→a x − a x→a

What happens if we combine this concept with that of sequences of functions?


For x ∈ R and n ∈ N = {1, 2, . . .} consider the functions gn : R → R and g : R → R defined
by
nx
gn (x) := , g(x) := lim fn (x). (1)
1 + n2 x2 n→∞

Figure 1 shows one of the functions gn . Does gn0 (0) tend to g 0 (0) as n → ∞?
If you did the calculation above correctly, you will have found that the answer is “no”. In
Section 11 we will learn that not even uniform convergence of the sequence {gn } is enough
to garantuee that the derivative of the limit function is equal to the limit of the derivative
functions.
Let us have a look at another type of limit: a series of numbers. For example, consider the
series
∞   N  
X 1 1 X 1 1
T := − = lim − .
n n+1 N →∞ n n+1
n=1 n=1

21
Figure 2: Plot of the sum of the sine series S(x) up to the n = 20 term (plotted with MAPLE)

This series converges and we can exactly compute the value of this series.
 
1 1 1 1 1
T = lim 1 − + − + . . . + −
N →∞ 2 2 3 N N +1
 
1
= lim 1 − = 1.
N →∞ N +1

When dealing with series of functions we can get some very counterintuitive examples.
Suppose we take x ∈ (−π, π) and look at

(−1)n+1 sin(nx) (−1)N +1 sin(N x)
 
X sin(2x) sin(3x)
S(x) = = lim sin x − + − ... + .
n N →∞ 2 3 N
n=1

By Dirichlet’s test for the convergence of a series, we know this series converges. We present
the details of this argument in Appendix B.1. What the value is of the series S(x) for a given
x ∈ (−π, π) is not obvious at all. Figure 2 shows a partial sum (where instead of taking N → ∞,
we have set N = 20). In Section B.2 in the Appendix we present a proof which shows that, for
all x ∈ (−π, π), S(x) = x2 . This result and proof is completely optional in the context of the
current module and it uses techniques that go beyond the scope of this module.
This series is an example of a Fourier sine series. We will not study these in great de-
tail in this module; they feature in the module G12DEF/MATH2008 (Differential Equations
and Fourier Analysis) and are very important tools in applied mathematics (for example for
modelling waves or temperature distributions). What is important in this module is again
the question whether we can exchange certain limits without changing the corresponding limit
value. For example, are the following are the same?
∞ ∞ 
!
(−1)n+1 sin(nx) (−1)n+1 sin(nx)

???
X X
lim = lim .
x→π− n x→π− n
n=1 n=1

Using the fact mentioned above that S(x) = x2 when x ∈ (−π, π), we can compute that these
two values are in fact not the same. In Section 11 we will learn that in this example the key
fact is that the coefficients which multiply sin(nx) are cn = (−1)n+1 /n and

X 1 1
|cn | = 1 + + + ...
2 3
n=1

22
diverges (as you have seen in G11ACF/MATH1005).
In fact, as we will see in Section 11 when we get to the Weierstraß M -test, if (an ) is a real
sequence such that
X∞
|an |
n=1

converges, then we always have


∞ ∞
!
X X
lim an sin nx = an sin nπ = 0.
x→π−
n=1 n=1

Hopefully these examples illustrate the need for rigour and proof. They are not meant to
be an exhaustive list of what can go wrong when you are not precise and rigorous, but they do
give a taste of the problems you can invite when you handle limits too carelessly. Intuition (or
perhaps hope) might tell you that if you interchange the order of limits you should get the same
answer, butthe above examples make it clear that this is not always the case. Mathematical
analysis gives us tools to determine what does work and why.

23
3 Sequences in R
We will review briefly the idea of convergence, and consider the important topic of subsequences.

Definition 3.1. A sequence (xn ) in R is a non-terminating ordered list of real numbers

xp , xp+1 , xp+2 , . . .

Note: Usually we start labelling the elements in a sequence from p = 1, i.e.

x1 , x2 , x3 , . . .

but this is not necessarily the case. Unless we explicitly state otherwise, we will assume that
p = 1. Of course we can easily construct a sequence starting at n = p 6= 1 from a sequence
starting at n = 1, by relabelling (adding p − 1 to each label). We can write (xn )n∈N or (xn )∞
n=p
if we want to emphasize which values n takes, but the shorthand notation (xn ) is usually clear
enough if the context provides the necessary information about the values that n takes.

Example: An example of a sequence in R is the sequence (xn ), where, for all n ∈ N, xn := 21/n .
The first few numbers in the sequence are

2, 21/2 , 21/3 , 21/4 , . . .

Note: Be careful not to restrict your thinking about sequences to those sequences that have
a ‘nice’ expression defining them like the example above. Any non-terminating ordered list of
real numbers is a sequence; there is no requirement that we should be able to write a simple
expression for each element. In particular, if we are just given a few numbers in a sequence,
without any further information, there is no way for us to determine what the other numbers
are. In the example above, if I had only given you the numbers

2, 21/2 , 21/3 , 21/4 , . . .

without telling you that, for all n ∈ N, xn = 21/n , then you would have had no information
about what the fifth number should be. The numbers

2, 21/2 , 21/3 , 21/4 , π, −17.7, 2e , 9000000 . . .

form just as valid a start of a sequence as

2, 21/2 , 21/3 , 21/4 , 21/5 , 21/6 , 21/7 , 21/8 , . . .

do. Remember: do not impose extra constraints, properties, or expectations on a


mathematical concept that are not justified by the definition! So next time you see
one of those puzzles that asks you to give the next number in a sequence when only the first
few numbers are given, it would be mathematically correct (but perhaps not very much fun) to
say that any number is a valid choice.
Another way of thinking about sequences in R (an equivalent definition, if you wish) is as
real-valued functions with domain {p, p + 1, p + 2, . . .}. For every n ∈ {p, p + 1, p + 2, . . .} the
sequence assigns a value xn ∈ R.

24
In order to define what convergence means for sequences in R, we will need the modulus
or absolute value function on R:
(
√ x, if x ≥ 0,
|x| := x2 = (2)
−x, otherwise.

Note: A useful property of the absolute value function is that, if c ∈ R, then |cx| = |c||x|. In
particular, if we take c = −1, we see that | − x| = |x|, thus | · | is an even function.

Definition 3.2. If (xn ) is a sequence in R and a ∈ R, then (xn ) converges to a as


n → ∞ if, for all ε > 0, there exists an N ∈ N, such that for all n ≥ N , |xn − a| < ε.
If (xn ) converges to a ∈ R we denote this by xn → a as n → ∞ or limn→∞ xn = a.
If an a ∈ R exists such that the sequence (xn ) converges to a, then (xn ) is called a
convergent sequence in R. Otherwise, if no such a ∈ R exists, the sequence is called
divergent.

Note: Take a close look at the order of the quantifiers in the definition above. We first have
“for all ε > 0” and then “there exists an N ∈ N”. This means that N can depend on ε, so
sometimes we might write N (ε) instead of N to emphasise this dependence. Make sure you
understand the difference between “for all ε > 0 there exists an N ∈ N . . . ” and “there exists
an N ∈ N such that for all ε > 0 . . . ”. The order in which variables are quantified is crucial
and in many cases means the difference between a really strong statement, a weaker statement,
or even a trivially true, and therefore useless, statement. What would happen in Definition 3.2
if you were to change the order in which the variables N and ε are quantified?
The definition of convergence expresses the idea that the numbers xn in a convergent se-
quence approximate the limit value a arbitrarily well for sufficiently large values of n: if we are
given a permitted error (tolerance) ε > 0, then xn is within ε of a for all sufficiently large n.
The images in Figure 3 illustrate this concept.
Note: There is no requirement for a convergent sequence to approach the limit value from
one side. For example, the sequences (xn ) and (yn ) defined by, for all n ∈ N, xn := 1/n and
yn := (−1)n /n both converge to zero as n → ∞. The sequence (xn ) contains only positive
numbers, while the elements of (yn ) alternate between negative and positive values.

Example: Probably the simplest examples of convergent sequences are constant sequences.
Prove the following yourself: If c ∈ R and the sequence (xn ) in R is given by, for all n ∈ N ,
xn := c, then xn → c as n → ∞.
The following lemma shows that, if a sequence in R converges, it can only have one limit.

Lemma 3.3. Let (xn ) be a sequence in R and let x, y ∈ R. If (xn ) converges to x and to
y, then x = y.

Proof. Let ε > 0. By the definition of convergence, we know that there exist N1 , N2 ∈ N such
that, for all n ≥ N1 , |xn − x| < 2ε , and, for all n ≥ N2 , |xn − y| < 2ε . Define N := min(N1 , N2 )
and let n ≥ N . Then, by the triangle inequalitya for the absolute value function we find
ε ε
|x − y| = |x − xn + xn − y| ≤ |x − xn | + |xn − y| < + = ε.
2 2
Hence, for all ε > 0, |x − y| < ε. Thus |x − y| = 0 (prove this yourself! hint: proof by
contradiction) and thus x = y.

25
a+ε
a a
a-ε
a N0
n n
N0
(a) (b)

a N1
a+ε1 a+ε
a a
a-ε1 a-ε

n n
N0 N1
(c) (d)

Figure 3: An illustration of Definition 3.2 with (xn ) = (an ). (a) The start of a sequence (an )
in R with n along the x-axis and xn along the y-axis. The dots indicate the points an . The
sequence converges to a ∈ R, which is indicated by the horizontal line. The full sequence
contains infinitely many elements an , so it is not completely depicted here. (b) For a given
ε > 0 there exists an N0 ∈ N such that for all n ≥ N0 the points an fall within the tube of
width ε around a For every ε > 0 such an N0 exists. It can (and typically will) depend on ε.
So: (c) If ε1 > 0 is smaller than ε, there is an N1 ∈ N such that, for all n ≥ N1 , the points
an lie within the tube of width ε1 around a. In this case we had to choose N1 larger than N0 .
(d) For any given ε > 0 only finitely many elements an of the sequence lie outside the ε-tube
around a. This is the case because there exists an N ∈ N such that all points an with n ≥ N
lie inside the tube. Figures from [6] (by Ceranilo; distributed under the CC BY-SA 4.0 license)

26
a
The triangle inequality says that, for all x, y ∈ R, |x + y| ≤ |x| + |y|. One way to prove this is as a special
1-dimensional case of the generalised d-dimensional setting from Appendix B.6. A quick proof specific to this
1-dimensional situation goes as follows: Without loss of generality14 we can restrict ourselves to three cases:
(1) x ≥ 0 and y ≥ 0, (2) x < 0 and y < 0, (3) x ≥ 0 and y < 0. In case (1) |x + y| = x + y = |x| + |y|.
In case (2) |x + y| = −(x + y) = −x − y = |x| + |y|. In case (3) we have two subcases: (3a) |x| ≤ |y|
and (3b) |x| > |y|. In case (3a) we have |x + y| = x + y ≤ x + |y| = |x| + |y|. In case (3b) we have
|x + y| = −(x + y) = −x − y ≤ |x| − y = |x| + |y|.
a
The phrase “without loss of generality” is frequently used in mathematical proofs to indicate that the
assumption which is being made, while apparently being restrictive, does not actually reduce the generality of
the setting. In this particular case it appears we are neglecting the case in which x < 0 and y ≥ 0, but in fact,
in that we can redo the exact same proof as in case (3), just by interchanging x and y. Nothing else would
need changing. In order to save us from writing the (almost) same proof twice, we use the phrase “without
loss of generality” when we make the (seemingly, but not really) offending assumption. Be careful though!
This is not a magical phrase that you can use everywhere. Make sure that there is indeed no loss of generality,
before inserting this phrase in your proof!

Lemma 3.4. Let (xn ) and (ym ) be sequences in R and assume that there exist an N, M ∈ N
such that, for all l ∈ N ∪ {0}, xN +l = yM +l . Then (xn ) converges if and only if (ym )
converges. Moreover, if (xn ) and (yn ) both converge, they have the same limit.

Proof. This proof is left for you to attempt. It is a very nice basic exercise to become
familiar with the definition of convergence.
Note: The lemma above is very useful. It might look difficult, but all it says it that if two
sequences have the exact same elements from some point onwards, then they either both diverge
or they both converge to the same limit. In particular, it tells us that if we change (or delete,
or add) finitely many entries in a sequence, the convergence behaviour does not change!

Definition 3.5. If (xn ) is a sequence in R then (xn ) tends to +∞ if, for all M ∈ R there
exists an N ∈ N such that for all n ≥ N , xn > M .
If (xn ) tends to +∞, we denote this by xn → +∞ as n → ∞ or limn→∞ xn = +∞.
We say (xn ) tends to −∞ if, for all M ∈ R there exists an N ∈ N such that for all
n ≥ N , xn < M .
If (xn ) tends to −∞, we denote this by xn → −∞ as n → ∞ or limn→∞ xn = −∞.

Note: Instead of +∞ we often also just write ∞.

Example: If (xn ) and (yn ) are the sequences given by, for all n ∈ N, xn := 2n − n2 and
yn := ln n1 , then xn → +∞ as n → ∞ and yn → −∞ as n → ∞.
The definition of (xn ) tending to +∞ (or −∞) formalises the idea that the sequence grows
without bound (or decreases without bound) as n increases: for any real number M , xn will
be greater than M (or smaller than M ) for n large enough. Note that there is no requirement
for the sequence to grow (or decrease) monotonically (this is addressed below in Definition 3.6.
For example, consider the sequence which starts with

1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, . . . (3)

and continues in this fashion through all n-tuples 1, 2, . . . , n, for n ∈ N. This sequence does not
grow monotonically, but it does tend to +∞.

27
Note: If a sequence tends to +∞ or −∞ it does not converge. We can see this from the
definitions. A sequence which satisfies the conditions in Definition 3.5, does definitely not satisfy
the conditions for convergence in Definition 3.2. It is a good little exercise to prove this! The
notation and terminology can be a bit confusing, since we still say that (xn ) tends to +∞ and we
write limn→∞ xn = +∞, which suggests that the limit does exist. It is important, however, to
understand that “convergence” and “tending to +∞” express two very different ideas. “Tending
to +∞ (or −∞)” is a special case of the way in which a sequence in R can diverge. (There are
other ways in which sequences can diverge; for example, consider the sequence given by, for all
n ∈ N, xn := cos(n).)

Definition 3.6. Let (xn ) be a sequence in R and let N ∈ N.


We say (xn ) is non-decreasing for n ≥ N if, for all n ≥ N , xn+1 ≥ xn . We say (xn )
is strictly increasing for n ≥ N if, for all n ≥ N , xn+1 > xn .
We say (xn ) is non-increasing for n ≥ N if, for all n ≥ N , xn+1 ≤ xn . We say (xn )
is strictly decreasing for n ≥ N if, for all n ≥ N , xn+1 < xn .
If (xn ) is non-decreasing for n ≥ N or non-increasing for n ≥ N , we say the sequence
is monotone (or monotonic) for n ≥ N . If (xn ) is strictly increasing for n ≥ N or
strictly decreasing for n ≥ N , we say the sequence is strictly monotone (or strictly
monotonic) for n ≥ N .

Note: When a sequence xp , xp+1 , xp+2 , . . . is non-increasing/non-decreasing/monotone/strictly


decreasing/strictly increasing/strictly monotone for n ≥ p, we usually just say it is non-
increasing/non-decreasing/monotone/strictly decreasing/strictly increasing/strictly monotone
(dropping the “for n ≥ p” part).

Lemma 3.7. Let (xn ) be convergent sequence in R with limit x ∈ R. If (xn ) is non-
decreasing, then, for all n ∈ N, xn ≤ x. If, on the other hand, (xn ) is non-increasing, then,
for all n ∈ N , xn ≥ x.
Proof. We leave this proof as an exercise to practice your understand of convergence and
monotone sequences.
A key result from G11ACF/MATH1005 is the monotone sequence theorem.

Theorem 3.8 (Monotone sequence theorem). Suppose (xn ) is a sequence in R. If there


is an N ∈ N such that (xn ) is non-decreasing for n ≥ N , then (xn ) either tends to +∞
or converges. If there is an N ∈ N such that (xn ) is non-increasing for n ≥ N then (xn )
either tends to −∞ or converges.

Proof. The proof is optional for G12MAN/MATH2009, but it is useful to read it and try to
understand it. It is included in Section B.3 in the appendix. You can also look back at the
G11ACF/MATH1005 notes.
Note: In these proofs early in the notes, I will often write out quite a few of the details explicitly.
For example, in the proof of Theorem 3.8 I explicitly recalled the definition of “bounded above”,
to illustrate how I arrived at the definition of “not bounded above” by taking the negation. As
we get further into these notes, I will provide less and less of this explicit guidance and it will be
up to you to remember definitions and understand how to take negations and how to perform
other logical operations that lead from one statement to the next, without this being pointed
out explicitly.

28
Note: In proving Theorem 3.8 (see Section B.3) we have even shown something extra that
was not explicitly mentioned in the statement of the theorem: If a non-decreasing sequence
converges, then it converges to the supremum of the set A = {xn : x ≥ N } (and analogously, if
a non-increasing sequence converges, it converges to the infimum of A).

Lemma 3.9. Let (xn ) and (yn ) be sequences in R such that, for all n ∈ N, xn ≤ yn . Let
x, y ∈ R. If xn → x as n → ∞ and yn → y as n → ∞, then x ≤ y.

Proof. We prove this via a proof by contradiction. Assume x > y. Define ε := 14 (x − y),
then ε > 0. By definition of convergence (Definition 3.2) there exist N1 , N2 ∈ N such that,
for all n ≥ N1 , |xn − x| < ε and, for all n ≥ N2 , |yn − y| < ε. If we now let N := max(N1 , N2 ),
then, for all n ≥ N , |xn − x| < ε and |yn − y| < ε. Now fix an n ≥ N , then
1 3 3 1
yn < y + ε = x + y < x + y = x − ε < xn ,
4 4 4 4
where we used again that y < x. Now we deduced that yn < xn , which is a contradiction
with the assumption in the lemma. Hence we conclude that x ≤ y.
Note: Lemma 3.9 says that non-strict inequalities are preserved when taking limits, but care-
fully note that it makes no such claims about strict inequalities. In fact, those are not preserved,
as the following example shows. Consider the sequences (xn ) and (yn ) in R defined by, for all
n ∈ N, xn := 0 and yn := n1 . Then, for all n ∈ N, xn < yn , but limn→∞ xn = 0 = limn→∞ yn , so
the strict inequality is no longer true for the limit values. Of course it is true that limn→∞ xn ≤
limn→∞ yn , as Lemma 3.9 says should be the case.

Corollary 3.10. Let (xn ) be a convergent sequence in R with limit x ∈ R. If a ∈ R is such


that, for all n ∈ N, xn ≥ a, then x ≥ a. If b ∈ R is such that, for all n ∈ N, xn ≤ b, then
x ≤ b.
Proof. The proof of the second statement follows immediately from Lemma 3.9 by using the
constant sequence (yn ), with, for all n ∈ N, yn = b. The proof of the first statement now
follows by applying the second statement to the sequence (−xn ).

Corollary 3.11. Let (xn ) be a non-decreasing sequence in R and assume that there exists
an a ∈ R such that, for all n ∈ N, xn ≤ a. Then (xn ) converges to a limit x ∈ R with
x ≤ a.
If (xn ) is a non-increasing sequence in R and a ∈ R is such that, for all n ∈ N, a ≤ xn ,
then (xn ) converges to a limit x ∈ R with a ≤ x.

Proof. Assume that (xn ) is non-decreasing and that, for all n ∈ N, xn ≤ a. By the monotone
sequence theorem (Theorem 3.8) the sequence (xn ) either converges or tends to ∞. Assume
that it tends to ∞, then by definition, for all ε > 0 there exists an N ∈ N such that for
all n ≥ N we have |xn | > M . This is a contradiction as, for all n ∈ N, xn ≤ a. Hence (xn )
converges. By Corollary 3.10 we conclude that x ≤ a.
The proof for the case in which (xn ) is a non-decreasing sequence is left as an exercise. It is
very similar to the proof above.

Example: The sequence (xn ) given by xn = ni=1 21n is clearly non-decreasing, since, for all
P
1
n ∈ N, xn+1 − xn = 2n+1 ≥ 0. Does it converge or does it tend to ∞ or −∞? Prove your claim!

29
Corollary 3.12 (Sandwich theorem). Let (xn ), (yn ), and (zn ) be sequences in R. If, for
all n ∈ N, xn ≤ yn ≤ zn and if (xn ) and (zn ) both converge with the same limit value y ∈ R,
then (yn ) converges to y.

Proof. By applying Lemma 3.9 twice (once to (xn ) and (yn ) and once to (yn ) and (zn )) we
find that y ≤ limn→∞ yn ≤ y. Hence limn→∞ yn = y.
It is also useful to have a kind of one-sided sandwich theorem for divergent sequences which
tend to −∞ or +∞.

Lemma 3.13. Let (xn ) and (yn ) be sequences in R. Assume that, for all n ∈ N, xn ≤ yn .

1. If xn → +∞ as n → ∞, then yn → +∞ as n → ∞.

2. If yn → −∞ as n → ∞, then xn → −∞ as n → ∞.

Proof. To prove the frst result, assume xn → +∞ as n → ∞. Let M > 0. Then there exists
an N ∈ N such that, for all n ≥ N , yn > xn > M . Hence yn → +∞ as n → ∞.
To prove the second result, assume yn → −∞ as n → ∞. Let M > 0. Then there exists an
N ∈ N such that, for all n ≥ N , xn < yn < −M . Hence xn → −∞ as n → ∞.

Lemma 3.14. Let (xn ) and (yn ) be sequences in R, let x, y ∈ R, and assume that xn → x
as n → ∞ and yn → y as n → ∞.

1. If (zn ) is the sequence in R defined by, for all n ∈ N, zn := xn + yn , then zn → x + y


as n → ∞. (Sum rule)

2. If c ∈ R and (zn ) is the sequence in R defined by, for all n ∈ N, zn := cxn , then
zn → cx as n → ∞.

3. If (zn ) is the sequence in R defined by, for all n ∈ N, zn := xn yn , then zn → xy as


n → ∞. (Product rule)

4. Assume y 6= 0 and let (zn ) be a sequence in R which satisfies that, if yn 6= 0, then


zn = xynn . Then zn → xy as n → ∞. (Quotient rule)

Before we prove this lemma, let us have a closer look at the statement of the quotient rule.
It looks rather complicated, but all it says is that we assume that zn = xynn whenever this is
well-defined (i.e. whenever yn 6= 0). For those values of n for which yn = 0, we do not care how
zn is defined. In the proof below we will see that, since (yn ) converges to y 6= 0, yn = 0 can
be true for at most finitely many values of n and so the values zn corresponding to those n do
not play a role in the limit behaviour of the sequence (zn ). This is a particular application of
a general result: if a sequence (xn ) converges to a limit value x and (yn ) is a sequence which is
equal to (xn ) except for possibly at finitely many values of n, then (yn ) converges to the same
limit value x. (See if you can prove this, using the definition of convergence!) In fact, this is
not just true for converging sequences in R, but also for converging sequences in Rd , which we
will encounter in Definition 5.2.
The proofs (presented below) of the first two statements are fairly straightforward, once you
understand how to work with the definition of convergence. The proof of the product rule is a
bit more complicated and the proof of the quotient rule is the most involved, mainly because

30
we need to be careful that we are nowhere dividing by zero. It is very important that you
get comfortable thinking about and working with the definition of convergence. Not only is
it fundamental for all the other concepts we will encounter in this module, but the language
in which it is written and the type of logical argument you can build upon it will be found
throughout this module and you will be required to master those yourself. Understanding these
proofs and then attempting to reproduce them yourself (without looking at the notes) is good
practice.
Proof of Lemma 3.14. To prove the first statement, let ε > 0. Define η := 21 ε. By the
definition of convergence, we know that there exist N1 , N2 ∈ N, such that for all n ≥ N1 ,
|xn − x| < η and for all n ≥ N2 , |yn − y| < η. Define N := max(N1 , N2 ), then for all n ≥ N ,
|xn − x| < η = 21 ε and |yn − y| < η = 21 ε. Using the triangle inequality for the abolsute value
function we have that, for all n ≥ N ,
1 1
|zn − (x + y)| = |xn + yn − (x + y)| = |xn − x + yn − y| ≤ |xn − x| + |yn − y| < ε + ε = ε.
2 2
Hence (zn ) converges to x + y.
To prove the second statement, first note that if c = 0, then (zn ) is a constant sequence in
which every element is equal to zero, hence it converges to cx = 0. Now assume c 6= 0 and let
ε
ε > 0. Define η := |c| . Then, by convergence of (xn ) to x, we have that there exists an N ∈ N
such that, for all n ≥ N , |xn −x| < η and thus |cxn −cx| = |c(xn −x)| = |c||xn −x| < |c|η = ε.
Hence (zn ) converges to cx. pε
To prove the third statement, let ε > 0. Define η1 := 2 . Again, by the definition of
convergence, we know that there exist N1 , N2 ∈ N, such that for all n ≥ N1 , |xn − x| < η1
and for all n ≥ N2 , |yn − y| < η1 . Moreover, combining the first and second statement, we
know that if we define the sequence (wn ) in R by, for all n ∈ N, wn := −yxn + xyn , then
(wn ) converges to −yx + xy = 0. Thus, by definition of convergence, if η2 := 12 ε, there exists
an N3 ∈ N such that, for all n ≥ Np 3 , |wn − 0| = |wn | < p
η3 . Define N := max{N1 , N2 , N3 },
then for all n ≥ N , |xn − x| < η1 = 2 , |yn − y| < η1 = 2ε , and |wn | < η2 = 12 ε. Using the
ε

triangle inequality for the absolute value function we get, for all n ≥ N ,

|zn − xy| = |xn yn − xy| = |(xn − x)(yn − y) − xn y + xyn |


≤ |(xn − x)(yn − y)| + | − xyn + xyn | = |xn − x||yn − y| + |wn |
r r
ε ε 1
< + ε = ε.
2 2 2
Thus (zn ) converges to xy.
To prove the final statement, we are first going to prove the following claim: There exists an
N1 ∈ N such that, for all n ≥ N1 . yn 6= 0. (In particular this means there are at most a finite
number (namely N1 − 1) of values of n for which yn = 0. To prove that, let η1 := 12 |y|. since
y 6= 0 we have η1 > 0. By definition of convergence of (yn ) to y, we have that there exists an
N ∈ N1 such that, for all n ≥ N1 , |yn − y| < η1 = 12 |y|. Writing this inequality out without
the absolute value function we have yn ≥ 12 y if y > 0 and yn < 12 y if y < 0. In either case we
find that, for all n ≥ N1 , yn 6= 0.
As a consequence of the claim we have just proven, we know that (by the assumption in the
final statement of the lemma), for all n ≥ N1 , zn = xynn . Lemma 3.4 tells us that we can
disregard the first N1 − 1 entries of (zn ) without changing the convergence behaviour. Hence,
from now on, we can assume that, for all n ∈ N, zn = xynn .
Next we will prove the following claim: Let (wn ) be a sequence in R given by wn := y1n .
Then wn → y1 as n → ∞. If we can prove this claim, we are done, since we can then use the

31
product rule to prove that (zn ) = (xn wn ) converges to xy . By the same argument as above
we know that there exists an N1 ∈ N such that, for all n ≥ N1 , |yn − y| < 21 |y|. Hence, for all
n ≥ N1 , using the triangle inequality, |y| = |y − yn + yn | ≤ |y − yn | + |yn | < 12 |y| + |yn |. Thus
1 1 2 2
2 |y| < |yn | and |yn | < |y| . Now let ε > 0 and set η2 := 2y ε. By convergence of (yn ) to y we
know that there exists an N2 ∈ N such that, for all n ≥ N2 , we have |yn − y| < η2 . Defining
N := max(N1 , N2 ), we get, for all n ≥ N ,

1 1 1 y − yn 1 1 1 2
wn − = − = = |yn − y| < η2 = ε.
y yn y yn y |y| |yn | |y| |y|
1
Hence (wn ) converges to y and we have finished the proof.
Note: In the proof above, we have been quite explicit to elucidate the logical structure of the
proof. As these notes progress, we might choose to write things more efficiently. For example,
instead of explicitly introducing η = 12 ε (as we did in the proof of the first statement) and
clarifying that by the definition of convergence there is an N1 ∈ N such that, for all n ≥ N1 ,
|xn − x| < η = 12 ε, we might directly say that, by convergence, there exists an N1 ∈ N such
that, for all n ≥ N1 , |xn − x| < 21 ε. The more comfortable you get with reading (and creating)
such proofs, the more compact the writing can become (while always taking care, of course, not
to leave out essential information).

Note: In talking about limits, it is often useful to be able to talk about real numbers from
R and about the symbols (not numbers) −∞ and +∞ in one go. To this end, (re)familiarise
yourself with the extended real number line R̄ in Definition A.9.

Lemma 3.15. Let x, y ∈ R̄. Then all the results from Lemma 3.14 hold, as long as the
corresponding operation (x + y, cx, xy, or xy , respectively) is well-defined on R̄.

Proof. We will prove the sum rule and leave the other proofs as exercises to the reader.
Let x, y ∈ R̄. If x, y ∈ R, then we have already proven the result in Lemma 3.14. The cases
that are left to prove are the following:

• x ∈ R and y = −∞,

• x ∈ R and y = +∞,

• x = −∞ and y = −∞,

• x = +∞ and y = +∞.

(The cases where y ∈ R and x ∈ {−∞, +∞} are of course equivalent to the first two cases
above and we do not need to prove them seperately.) For any other combination of x, y ∈ R̄
the summation x + y is not well-defined in R̄ and so these cases are not included in the
statement of the lemma.
Let (xn ) and (yn ) be sequences in R and define, for all n ∈ N, zn := xn + yn .
First let x ∈ R and y = −∞. Let M > 0. Let ε > 0 and define K := max(M, M + x + ε) > 0.
Since (xn ) converges to x and (yn ) tends to −∞, there exist N1 , N2 ∈ N such that for all
n ≥ N1 , |xn − x| < ε and for all n ≥ N2 , yn < −K. Define N := max(N1 , N2 ) and let n ≥ N .
Then
zn = xn + yn < x + ε − K ≤ −M.
Hence zn → −∞ as n → ∞.

32
Next let x ∈ R and y = +∞. Let M > 0. Let ε > 0 and define K := max(M, M − x + ε) > 0.
Since (xn ) converges to x and (yn ) tends to +∞, there exist N1 , N2 ∈ N such that for all
n ≥ N1 , |xn − x| < ε and for all n ≥ N2 , yn > K. Define N := max(N1 , N2 ) and let n ≥ N .
Then
zn = xn + yn > x − ε + K ≥ M.
Hence zn → +∞ as n → ∞.
Now let x = −∞ and y = −∞. Let M > 0. Since (xn ) tends to −∞ and (yn ) tends to −∞,
there exist N1 , N2 ∈ N such that for all n ≥ N1 , xn < − 21 M and for all n ≥ N2 , yn < 12 M .
Define N := max(N1 , N2 ) and let n ≥ N . Then
1 1
zn = xn + yn > − M − M = M.
2 2
Hence zn → −∞ as n → ∞.
We leave the proof of the sum rule in the final case, where x = +∞ and y = +∞, as an
exercise to the reader, since it is very similar to the proof(s) given above.
Note: Perhaps you are used to seeing the sum, product, and quotient rules in other forms,
such as limn→∞ (xn + yn ) = limn→∞ xn + limn→∞ yn , limn→∞ xn yn = limn→∞ xn limn→∞ yn ,
and limn→∞ xynn = lim n→∞ xn
limn→∞ yn . It is dangerous to remember these rules in these forms, because
these statements are not true if the conditions as listed in Lemma 3.14 are not satisfied. For
example, if (xn ) is the sequence given by, for all n ∈ N, xn := sin(n) and (yn ) is the sequence
defined by, for all n ∈ N, yn := −xn . Then clearly limn→∞ (xn + yn ) = limn→∞ 0 = 0, but it
makes no sense to write limn→∞ xn or limn→∞ yn , since neither (xn ) nor (yn ) converges, nor do
they tend to −∞ or +∞.
Another examples, involving the product rule, is given by the sequences (wn ) and (zn ),
defined by, for all n ∈ N, wn := n and zn := n1 . In that case limn→∞ wn zn = limn→∞ 1 = 1,
but limn→∞ zn = 0 and (wn ) does not even converge. If you think you can sneakily avoid this
problem by pretending that ∞ · 0 = 1 (which of course, is not true!), what if instead of zn := n1 ,
we had defined zn := n2 ?

Definition 3.16. If (xn ) is a sequence in R (n ∈ {p, p+1, . . .}), then (yk ) is a subsequence
of (xn ) if, for all k ∈ N, there is an nk ∈ N, nk ≥ p such that yk = xnk and for all k ∈ N,
k ≤ nk .

From the definition above we see that a subsequence of the sequence (xn ) is itself a sequence
which is made up of infinitely many numbers xn while keeping the relative order of these
numbers the same as it was in the original sequence (xn ). Formally, this means that, if the
original sequence is xp , xp+1 , xp+2 , . . ., then we take integers

p ≤ n1 < n2 < n3 < . . .

and our subsequence is yk = xnk .


Note that a subsequence of (xn ) can, but does not need to, contain all the numbers from
(xn ). In particular, the sequence (xn ) is a subsequence of itself.
Example: If we start with the sequence (xn ) given by, for all n ∈ N, xn := 1/n then
1 1 1 1 1
, , , , ,...
2 3 5 7 11

33
can be the start of a subsequence, but
1 1 1 1
, , , ,...
2 6 3 8
cannot. (Why not?)

Theorem 3.17. If (xn ) is a sequence in R, it has a monotone subsequence.

Proof. Let us take any real sequence (xn ) (n ∈ {p, p + 1, . . .}). For each q ≥ p, we define
the set
Eq := {xn : n ≥ q} = {xq , xq+1 , . . .}.
So the set Eq contains all numbers that appear in the sequence for n ≥ q. We consider two
complementary cases and prove in each of these cases that (xn ) has a monotone subsequence.
Case I: suppose that every one of these sets Eq has a maximum element, i.e. there exists
r ≥ q such that, for all n ≥ q, xn ≤ xr . If p ≤ q < Q then EQ ⊆ Eq and so the maximum
element of EQ is not greater than the maximum element of Eq .
Let n1 ≥ p be such that xn1 is the maximum element of Ep . If xn2 is the maximum element
of the set E1+n1 , then n2 ≥ 1 + n1 > n1 and xn2 ≤ xn1 . If xn3 is the maximum element of
the set E1+n2 , then n3 ≥ 1 + n2 > n2 and xn3 ≤ xn2 . We can repeat this procedure and
inductively define, for k ∈ N, xnk+1 to be the maximum element of the set E1+nk . Then, for
all k ∈ N, nk+1 > nk and xnk+1 ≤ xnk . The former property shows that the sequence (xnk )
thus formed is a subsequence of (xn ), the latter shows that this subsequence is non-increasing
and thus monotone.
Case II: suppose that (at least) one of the sets Eq has no maximum element. Take this set
Eq , and let n1 = q. Since Eq has no maximum element, there is an element of Eq which is
strictly greater than xn1 . Let xn2 be such an element, then n2 > n1 . Because Eq has no
maximum element, there is an element of Eq which is strictly greater than all of the numbers

xn1 , x1+n1 , . . . , xn2 .

Let this element be xn3 , then n3 > n2 . We can now repeat this argument to inductively
define, for k ∈ N, xnk+1 to be an element of Eq which is strictly greater than all the numbers

xn1 , x1+n1 , . . . , xnk .

Then, for all k ∈ N, nk+1 > nk . This process gives us a strictly increasing, and thus monotone,
subsequence of (xn ).

Definition 3.18. If (xn ) is a sequence in R (n ∈ {p, p + 1, . . .}), we say (xn ) is bounded


if there exists an M ∈ R such that, for all n ≥ p,

|xn | < M.

Note: You might see definitions of boundedness of sequences in other notes or books that
have the condition “|xn | ≤ M ” instead of “|xn | < M ”. Do you see why this would lead to an
equivalent definition as the one in in Definition 3.18? Hint: The quantifier is important.
We can use Theorem 3.17 to derive another famous theorem, the Bolzano–Weierstraß theo-
rem, which tells us something very interesting about bounded sequences in R.

34
Theorem 3.19 (Bolzano–Weierstraß theorem). If (xn ) is a bounded sequence in R, then
it has a convergent subsequence.

Proof. By Theorem 3.17 we know that (xn ) (n ∈ {p, p+1, . . .}) has a monotone subsequence.
By the monotone sequence theorem, Theorem 3.8, we know that this monotone subsequence
either converges, or tends to +∞ or −∞. For a proof by contradiction, assume that (xn ) tends
to +∞ or −∞, then the sequence (|xn |) tends to +∞ (can you prove this?). By definition this
means that for all M ∈ R there exists an N (M ) ∈ N such that for all n ≥ N (M ), |xn | ≥ M .
Since the sequence (xn ) is bounded the sequence (|xn |) is also bounded (can you prove this?).
Hence, there exists a M̃ ∈ R such that for all n ≥ p, |xn | < M̃ . If n ≥ max{p, N (M̃ )} we
find that |xn | ≥ M̃ and |xn | < M̃ , which is a contradiction. Hence (xn ) does not tend to +∞
and (xn ) does not tend to −∞. We conclude that (xn ) converges.
Of course the Bolzano-Weierstrass theorem does not tell you how to find the convergent
subsequence, but it is still an extremely powerful result. Later in the module we will use it in
results about the maximum of a function or the intersection of sets.
Example: The sequence (yn ) defined by, for n ∈ N,
 
n 1
yn := (−1) 1 +
n

is an example of a sequence which does not converge. (Prove that statement! Hint: the
statement (yn ) converges means “there exists an y ∈ R such that for all ε > 0 there is an N ∈ N
such that for all n ≥ N , |yn − y| < ε”. In order to find out what it means that (yn ) does not
converge, we need to take the negation of that statement. What is the negation15 ?)
What happens if we take the subsequence corresponding to the odd values of n, i.e. (ynk )
where, for all k ∈ N, nk = 2k − 1? The first elements of that subsequence are
4 6
y1 , y3 , y5 , . . . or − 2, − , − , . . .
3 5
Does this subsequence converge? If so, what is its limit? (Prove your claims!)
Similarly we could look at the subsequence formed by taking the even n, i.e. (ynk ) where,
for all k ∈ N, nk = 2k. The first elements of this subsequence are
3 5 7
y2 , y4 , y6 , . . . or , , ,...
2 4 6
Does this subsequence converge? If so, what is its limit? (Prove your claims!)
If you answered the questions above correctly, you will have found that both subsequences
converge, but to different limit values. This is another way of proving that the original sequence
does not converge. Using the description you found above for what it means for a sequence not
to converge, can you prove that if a sequence has two subsequences which converge to different
limit values, then the original sequence does not converge?

Example: The sequence with elements sin n (n ∈ N) starts with the numbers (rounded to two
decimal places)
0.84, 0.91, 0.14, −0.75, −0.95, −0.28, 0.66, 0.99, . . .
15
Both the general skill of being able to quickly find the negation of precise mathematical statements as well
as this particular case (the definition of a divergent sequence) are very useful in mathematical practice and you
will need them often.

35
and looks quite random. However, for every n ∈ N, | sin n| ≤ 1 < 1.1, thus the sequence is
bounded and hence the Bolzano–Weierstraß theorem tells us it has a convergent subsequence!
Question: Which real numbers can be obtained as the limit of a subsequence of the sequence
(sin n)? For the answer have a look at the optional material in Appendix B.5.

36
4 Distance in Rd
In Definition 3.2 we saw that the quantity |xn − a| plays an important role when defining what
it means for a sequence of real numbers (xn ) to converge to a limit a ∈ R. This quantity is the
distance in R between xn and a. If we want to generalise the concept of converging sequences
to sequences in Rd (for any d ∈ N), then we need a concept of distance on Rd . Once we have
that, we will see that it is useful in other contexts, besides converging sequences, as well.
Before delving into the general case, let us first look back at distances in R and consider
an example in R2 . Remember from (2) the modulus or absolute value function on R. One way
to think of of this is as follows: |x| is the distance from 0 to x. The distance from x ∈ R to
y ∈ R equals the distance from x − y to 0 and is |x − y|. This generalises naturally to higher
dimensions.
Example: If we want to compute the distance between the points A and pB in the plane as de-
picted
√ in Figure 4, we use Pythagoras’ theorem to find that the distance is (2 − 7)2 + (8 − 5)2 =
34. If you go one to take G13MTS/MATH3003 in the third year, you will learn that the con-
cept of distance can be generalised and this distance we just computed (and which corresponds
to our local, flat space intuition) is just one of many possible distances. To distinguish it from
other choices, this distance “as the crow flies” is often called the Euclidean distance (named
after the ancient Greek mathematician geometer Euclid). We get another concept of distance
if we are only allowed to travel horizontally and vertically. This is sometimes called the taxicab

distance and, in this case, would give use a distance of |2 − 7| + |8 − 5| = 5 + 3 = 8 > 34.
Note that here the taxicab distance is greater than the Euclidean distance. We will see that in
general the taxicab distance cannot be smaller than the Euclidean distance.

B=(2,8) length 5

length 3

A=(7,5)

Figure 4: Straight line between two points

Definition 4.1. Let d ∈ N, then Rd is the set consisting of all d-tuples (x1 , . . . , xd ), where
for all j ∈ {1, . . . , d}, xj ∈ Rd . If x = (x1 , . . . , xd ) ∈ Rd , then we call the numbers xj the
coordinates of x. If x = (x1 , . . . , xn ), y = (y1 , . . . , yd ) ∈ Rd and c ∈ R, we define

x + y := (x1 + y1 , x2 + y2 , . . . , xd + yd ) and cx := (cx1 , cx2 , . . . , cxd ).

Note: When we write Rd without further specification of d, we will mean that d can be any
number in N. In particular, R is just a special case of Rd , with d = 1.
You will find a lot of different terminology and notation associated with Rd . Because Rd
together with the summation and scalar multiplication defined above is a vector space over
R16 , elements x ∈ Rd are often called vectors, especially when d > 117 . In some contexts it
16
Prove this! Look back at your G11LMA/MATH1007 notes, if you do not remember the definition of vector
space.
17
It is probably historically more accurate to say that vectors spaces are called vector spaces, because they are
generalisations of Rd whose elements are (or can be used to model) vectors.

37
is important to consider whether elements of Rd are row vectors or column vectors. For our
purposes this is irrelevant and, for notational simplicitly, we will stick with the d-tuple notation
from Definition 4.1 (which can be interpreted as a row vector, if you wish).
Sometimes you see d-tuples of real numbers referred to as finite sequences, because they
can be obtained by taking a sequence (of infinite length, in the sense of our Definition 3.1)
and cutting it of after the dth element. We will not use this terminology and whenever we use
the term “sequence”, we mean it in the sense of Definition 3.1. Be careful, however, when you
encounter a variable with a subscript: xn can denote the nth entry in a sequence of real numbers
or the nth coordinate of an element x in Rd . A well written mathematical text will always make
sure that it is clear what is meant.
Following the convention you are used to from G11LMA/MATH1007, in these notes we
will use the special boldface notation for vectors in Rd if d ≥ 2, such as x. Note however
that this is not a universal convention. Some texts use other special notational flourishes to
indicate a vector, such as x̄, x, or ~x. Other texts use no special notation at all and just write
x. Whatever convention you choose to follow, no choice absolves you from the responsibility
to clearly explain what x, or x, or x̄, or . . . means, before using it. Typically you can do this
by writing “let x ∈ Rd ” or something along those lines. Once this has been clearly introduced,
there is usually not much more information or clarity to be gained from writing x, x̄, etc., but
to be in line with your notation from G11LMA/MATH1007 we will use the boldface convention
here.
The analogue in Rd of the modulus is the norm (or length).

Definition 4.2. Let x ∈ Rd . Then the norm (or Euclidean norm) of x is


v
q u d
uX
2 2 2
kxk := x1 + x2 + . . . + xd = t x2j .
j=1

Note: The quantity kxk, for x ∈ Rd , can be interpreted at the distance from the point in Rd
with coordinates (x1 , . . . , xd ) to the origin 0.

Lemma 4.3. Let x ∈ Rd and k ∈ {1, . . . , d} then


d
X
0 ≤ |xk | ≤ kxk ≤ |xj |. (4)
j=1

Moreover, kxk = 0 if and only if x = 0. Furthermore, if c ∈ R, then kcxk = |c|kxk.

Proof. For all j ∈ {1, . . . , d}, we have 0 ≤ x2j = |xj |2 . Hence we deduce that

0 ≤ x2k ≤ x21 + x22 + . . . + x2d = |x1 |2 + |x2 |2 + . . . + |xd |2


≤ (|x1 | + |x2 | + . . . + |xd |)2 .

Taking square roots now gives the required inequalities in (4).


A direct computation shows that k0k = 0. On the other hand, if kxk = 0 then by (4) we
have |xk | = 0. Since k is an arbitrary element from {1, . . . , d}, we conclude that for all
j ∈ {1, . . . , d}, xj = 0 and thus x = 0.

38
The final statement in the lemma follows from a direct computation:
v v v
u d d u d
√ uX
u
uX u X
kcxk = t 2
(cxj ) = c
t 2 2
xj = c t2 x2j = |c|kxk.
j=1 j=1 j=1

We use the norm to define a distance on Rd .

Definition 4.4. Let x, y =∈ Rd . We define the distance (also called Euclidean dis-
tance) from x to y to be
v
u d
uX
dist(x, y) := kx − yk = t (xj − yj )2 .
j=1

Note: We saw before that kxk is the distance from x to the origin, hence the distance between
x and y is the same as the distance from x − y to the origin 0.

Lemma 4.5. Let x = (x1 , . . . , xd ), y = (y1 , . . . , yd ) ∈ Rd . The norm satisfies the following
properties.

1. kx + yk ≤ kxk + kyk (the triangle inequality);

2. kx − yk ≥ kxk − kyk (the reverse triangle inequality).

Proof. The triangle inequality in R2 or R3 this is easy to visualise by drawing a parallelo-


gram. Drawing a picture, however, is not a proof. A proof which uses the Cauchy–Schwarz
inequality from G11ACF/MATH1005 is given in Appendix B.6 (this proof is optional in the
context of this module).
The reverse triangle inequality now follows quickly by using the triangle inequality:

kxk = ky + (x − y)k ≤ kyk + kx − yk,

thus
kx − yk ≥ kxk − kyk. (5)
Similarly
kyk = kx + (y − x)k ≤ kxk + ky − xk = kxk + kx − yk,
where we not only used the triangle inequality, but also the result from Lemma 4.3 which
tells us that ky − xk = k(−1)(x − y)k = | − 1|kx − yk = kx − yk. Hence

kx − yk ≥ kyk − kxk. (6)

Combining (5) and (6) we get the reverse triangle inequality.


Note: The reverse triangle inequality is sometimes also called the second triangle inequality.
We can interpret it as saying that the distance from x to y is at least the distance from x to 0
minus the distance from y to 0. This is made precise in the following Corollary, together with
other useful properties of the distance function.

39
Corollary 4.6. Let x, y, z ∈ Rd . The distance satisfies the following properties.

1. dist(x, y) = 0 if and only if x = y;

2. dist(x, y) = dist(y, x) (symmetry);

3. dist(x, y) ≤ dist(x, z) + dist(z, y) (triangle inequality);

4. dist(x, y) ≥ |dist(0, x) − dist(0, y)| (reverse triangle inequality);

5. dist(x, y) ≤ dj=1 |xj − yj |


P
(the Euclidean distance is not greater than the taxicab
distance).

Proof. By Lemma 4.3 we know that kx−yk = 0 if and only if x−y = 0, hence dist(x, y) = 0
if and only if x = y. As we saw before in the proof of Lemma 4.5, kx − yk = ky − xk. The
symmetry property follows immediately. The triangle inequality for the distance follows
directly from the triangle inequality for norms (Lemma 4.5):

dist(x, y) = kx − yk = kx − z + z + yk ≤ kx − zk + kz − yk = dist(x, z) + dist(z, y).

The third property follows immediately from the reverse triangle inequality from Lemma 4.5,
since dist(0, x) = kxk. Finally, the last statement in the lemma follows by using x − y instead
of x in (4).
Now that we have a concept of distance in Rd ready for use, we start looking at sequences
in Rd and their convergence in the next section.

40
5 Sequences in Rd
This chapter will look at sequences (xn ) in which each xn is a point in Rd .
The following definition is a straightforward generalisation of the definition of a sequence in
R (and in fact it contains Definition 3.1 as a special case).

Definition 5.1. Let U ⊂ Rd . A sequence (xn ) in U is a non-terminating ordered list of


elements from U :
xp , xp+1 , xp+2 , . . .
where, for all n ∈ {p, p + 1, . . .}, xn ∈ Rd .

Note: This definition is a straightforward generalization of the definition of a sequence R (see


Definition 3.1). As with sequences in R, we often start counting at p = 1, but this is not
necessary. Unless we explicitly state otherwise, we will assume that p = 1. We can write
(xn )n∈N or (xn )∞ n=p if we want to emphasize which values n takes, but the shorthand notation
(xn ) is usually clear enough if the context provides the necessary information about the values
that n takes.
Be careful when encountering the subscript notation: xn can be the nth entry in a sequence
of elements from Rd , but xn can also be the nth coordinate of a given x ∈ Rd . It will always
be clear from the context which situation we are in. This is one of many reasons why it is
important to always clearly state what your notation means. Sometimes the same, or similar,
notation can be used for different concepts, even if it is standard notation.
If (xn ) is a sequence in Rd , we sometimes write an element xn from the sequence in terms
of its coordinates as
xn = (xn,1 , xn,2 , . . . , xn,d ).
For each j ∈ {1, . . . , d}, the j th coordinate sequence (xn,j ) then forms a real sequence (with n
the label along the sequence).
Example: An example of a sequence (xn ) in R2 is given by, for all n ∈ N,
 
ln n 1/n
xn := √ , n .
n

Be aware, just as we saw before when discussing sequences in R, that there is no requirement
whatsoever for us to be able to express the sequence in a nice, handy, formula. For example,
√ 1√
 
x1 := (1, 2, 3, 4), x2 := π, 2, −17 + 71, 0 ,
2
 
2
x3 := −1, −1, − , −1 , x4 := (42, 47, −13, 8023), ...
63

is a perfectly valid candidate for the start of a sequence in R4 . Without further information we
just do not know how it will continue.
We can now also ask the question what it means for a sequence in Rd to converge. For
example, if we consider the first example sequence above (the sequence in R2 ), what happens
as n → ∞?

41
Definition 5.2. If (xn ) is a sequence in Rd and a ∈ Rd , then (xn ) converges to a as
n → ∞ if limn→∞ kxn − ak = 0. If (xn ) converges to a ∈ Rd we denote this by xn → a
as n → ∞ or limn→∞ xn = a. If an a ∈ Rd exists such that the sequence (xn ) converges
to a, then (xn ) is called a convergent sequence. Otherwise, if no such a ∈ R exists, the
sequence is called divergent.

Note: The definition of convergence of sequences in Rd above is a good example of a general idea
in mathematics: new concepts are always defined in terms of concepts which we have already
defined before. This way we can always be certain that we can trace back each definition all
the way back to the underlying fundamentals. In this particular case, we see that we define
convergence of the sequence (xn ) to a in Rd in terms of convergence of the sequence (kxn − ak)
to 0 in R. And from Definition 3.2 we already know what convergence in R means! Let us
have a closer look if we can find out what this means in the ε and N language. We know that
limn→∞ kxn − ak = 0 means that for all ε > 0, there exists an N ∈ N, such that for all n ≥ N ,
kxn − ak − 0 < ε. Of course kxn − ak − 0 is just kxn − ak. This observation is important
enough to state in its own lemma.

Lemma 5.3. Let (xn ) be a sequence in Rd and let a ∈ Rd . Then (xn ) converges to a
as n → ∞ if and only if for all ε > 0 there exists an N ∈ N, such that for all n ≥ N ,
kxn − ak < ε.

Proof. As explained in the note before, this follows directly from writing out the definition
of convergence of (xn ) to a in Rd , i.e. limn→∞ kxn − ak = 0 (from Definition 5.2), in terms
of the ε-N definition of convergence in R (from Definition 3.2).
Note: In some other notes or books you might encounter definitions of convergence as in
Definition 3.2 or Lemma 5.3 that write “n > N ” instead of “n ≥ N ” or “kxn − ak ≤ ε” instead
of “kxn − ak < ε”. Do you understand why those definitions are equivalent to the ones we gave?
Why does it not matter in this case if we use strict or non-strict inequalities? Hint: Consider
the role of the quantifier “for all” in these definitions.
From Lemma 5.3 we see that the geometric interpretation of convergence in Rd is essentially
the same as the geometric interpretation of convergence in R. The latter is just a special case
of the former (because the norm kxn − ak for xn , a ∈ Rd is just a generalisation of the modulus
|xn − a| for xn , a ∈ R). If (xn ) converges to a in Rd that means that the distance between xn
and a (kxn − ak) can be made arbitarily small (namely less than ε, for any ε we care to choose)
for all n that are large enough (namely for all n ≥ N , where we get to choose N , based on
which ε we started with). This is exactly the same discussion we had in Section 3 in the note
following Definition 3.2.

Example: Just as for sequences in R, the convergence behaviour of constant sequences in Rd


is simple. Can you prove the following? If x ∈ Rd and (xn ) is a sequence in Rd given by, for all
n ∈ N, xn := x, then xn → x as n → ∞.
Many of the properties of sequences in R also hold true for sequences in Rd . Uniqueness of
the limit is such a property.

Lemma 5.4. Let (xn ) be a sequence in Rd and let x, y ∈ Rd . If (xn ) converges to x and

42
to y, then x = y.

Proof. The proof is the same, mutatis mutandisa , as the proof of uniqueness of limits for
sequences in R (Lemma 3.3).
a
“Mutatis mutandis” is an expression coming from Latin meaning “things being changed that have to be
changed”. This sounds like a cop-out. Of course we need to change the things that need to be changed.
That seems trivially true. What is it actually telling us? What is meant in mathematical practice when you
encounter this phrase, is that none of the main, essential points of the proof need to be changed. In this
particular case, the proofs in R and in Rd are not exactly the same, because the former deals with (xn ), x, y,
and the absolute value function | · | in R, while the latter deals with (xn ), x, y, and the norm k · k in Rd . But
no essential part of the mathematical reasoning needs to be changed: the proof in Rd works for exactly the
same mathematical reasons that the proof in R works. In cases like that, you might see the phrase “mutatis
mutandis” used, to capture that idea without claiming that the proofs are exactly, word for word, the same.

The very useful Lemma 3.4 generalises immediately from R to Rd .

Lemma 5.5. Let (xn ) and (ym ) be sequences in Rd and assume that there exist an N, M ∈
N such that, for all l ∈ N ∪ {0}, xN +l = yM +l . Then (xn ) converges if and only if (ym )
converges. Moreover, if (xn ) and (yn ) both converge, they have the same limit.

Proof. If you look back at the proof you constructed for Lemma 3.4 (which corresponds to
the case d = 1), you will hopefully find that the same proof, mutatis mutandis, works in this
case. Try it!
The following lemma tells us what convergence of a sequence (xn ) in Rd means for conver-
gence of the coordinate sequences.

Lemma 5.6. Let (xn ) be a sequence in Rd and let a ∈ Rd . For each n ∈ N we write the
coordinates of xn as (xn,1 , . . . , xn,d ) and similarly a = (a1 , . . . ad ). Then (xn ) converges to
a (in Rd ) if and only if for all k ∈ {1, . . . , d} the sequence of coordinates (xn,k ) converges
to ak (in R).

Proof. Let k ∈ {1, . . . , d}. By the inequality for norms in (4), we find that
d
X
0 ≤ |xn,k − ak | ≤ kxn − ak ≤ |xn,j − aj |. (7)
j=1

To prove the “only if” part of the statement, note that, if (xn ) converges to a in Rd , then
limn→∞ kxn − ak = 0. Thus, by the sandwich theorem and (7), limn→∞ |xn,k − ak | = 0.
Hence (xn,k ) converges to ak in R.
To prove the “if” part of the statement, assume that for all k ∈ {1, . . . , d} the sequence of
coordinates (xn,k ) converges to ak (in R). Then, by the sum rule for limits, we find that
limn→∞ dj=1 |xn,j − aj | = 0. Using (7) together with the sandwich theorem, we find that
P
limn→∞ kxn − ak = 0 and thus (xn ) converges to a.
Now that we have a concept of distance in Rd , it is not only the idea of convergence which
we can generalise from R to Rd . Looking back at Definition 3.18 we see it is easy to extend the
idea of a bounded sequence to Rd .

43
Definition 5.7. If (xn ) is a sequence in Rd (n ∈ {p, p + 1, . . .}), we say (xn ) is bounded
if there exists an M ∈ R such that, for all n ≥ p,

kxn k < M.

Note: We see that a sequence (xn ) in Rd is bounded if the distances from the elements xn to
the origin 0 are bounded.
Sometimes it is of essential importance if an inequality is strict (i.e. < or >) or non-strict
(i.e. ≤ or ≥). However, in the definition of boundedness in Definition 5.7 (and similarly in
Definition 3.18) it actually does not matter if we write “kxn k < M ” or “kxn k ≤ M ”. Do you
understand why? Can you prove that we end up with equivalent conditions in both cases? (See
also the note following Definition 3.18.)

Lemma 5.8. Let (xn ) be a sequence in Rd . Then (xn ) is bounded if and only if for all
k ∈ {1, . . . , d} the coordinate sequence (xn,k ) is bounded.

Proof. We use again the very useful inequalities in (4). To prove the “only if” statement,
assume that (xn ) is bounded. Then, per definition, there is an M ∈ R such that kxn k < M
and thus by (4), for all k ∈ {1, . . . , d}, |xn,k | < M . This means each of the coordinate
sequences is bounded.
To prove the “if” statement, assume that for all k ∈ {1, . . . , d} the coordinate sequence (xn,k )
is bounded. Then, for all k ∈ {1, . . . , d}, ∈ R (which can depend on k!) such
Pdthere is an Mk P
that |xn,k | < Mk . Now define M := k=1 Mk . Then dk=1 |xn,k | < M and thus, by (4),
kxn k < M . Hence (xn ) is bounded.

Lemma 5.9. Let (xn ) be a sequence in Rd . If (xn ) converges, then (xn ) is bounded.

Proof. We leave the proof as an exercise. It is a really good exercise to practice working
with the definitions of convergence and boundedness.
Note: The contrapositive of the statement in Lemma 5.9 is often useful in practice: if a
sequence in Rd is unbounded, then it diverges.
It is also important to note that the converse of the statement in Lemma 5.9 is not true.
For example, the sequence (xn ) in R given by, for all n ∈ N, xn = (−1)n , is bounded, but does
not converge. (Prove this!)
The definition of subsequence generalises straightforwardly from R (Definition 3.16) to Rd .

Definition 5.10. If (xn ) is a sequence in Rd (n ∈ {p, p + 1, . . .}), then (yk ) is a subse-


quence of (xn ) if, for all k ∈ N, there is an nk ∈ N, nk ≥ p such that yk = xnk and for
all k ∈ N, k ≤ nk .

Lemma 5.11. Let (xn ) be a sequence in Rd and let (xnk ) be a subsequence of (xn ).

• If (xn ) is a bounded sequence, then (xnk ) is a bounded sequence.

44
• If (xn ) is a convergent sequence with limit x ∈ Rd , then (xnk ) is a convergent sequence
with the same limit x.
Proof. This proof is a straightforward application of the definitions of subsequence, bounded,
and convergence, and is left as an exercise. Try to prove it yourself; it is a good little exercise
to test your basic proof construction skills!
Note: It is important to realize that the converse of each of the statements in Lemma 5.11
is not true. That is to say, there exists a sequence (many in fact) which has a bounded
subsequence, but is not bounded itself. There also exists a sequence (again, many) which has a
convergent subsequence, but does not converge itself. For example, the sequence in (3) provides
a counterexample to the converse of both statements. Do you see why? Can you come up with
more counterexamples yourself?

Lemma 5.12. Let (xn ) and (yn ) be sequences in Rd , let x, y ∈ Rd , and assume that xn → x
as n → ∞ and yn → y as n → ∞.

• If (zn ) is the sequence in Rd defined by, for all n ∈ N, zn := xn + yn , then zn → x + y


as n → ∞. (Sum rule)

• If c ∈ R and (zn ) is the sequence in Rd defined by, for all n ∈ N, zn := cxn , then
zn → cx as n → ∞.

Proof. These statements are proven in the same way (mutatis mutandis) as the correspond-
ing statements for sequences in R were in Lemma 3.14. I strongly encourage you to try
and proof these statements yourself, without looking back at the proof of Lemma 3.14. It is
a good exercise to see how much more comfortable you have gotten with the definition of
convergence since Section 3.

Lemma 5.13. Let (xn ) be a sequence in Rd and let x ∈ Rd . If xn → x as n → ∞, then


kxn k → kxk (in R) as n → ∞.

Proof. By the triangle inequality (Lemma 4.5), for all n ∈ N, kxn k = kxn − x + xk ≤
kxn − xk + kxk and kxk = kx − xn + xn k ≤ kxn − xk + kxn k. Combining these we get, for
all n ∈ N,
kxk − kxn − xk ≤ kxn k ≤ kxn − xk + kxk.
By definition of convergence (Definition 5.2) we have that kxn − xk → 0 as n → ∞. Then
using the first two results from Lemma 3.14 we find that

lim (kxk − kxn − xk) = lim (kxn − xk + kxk) = kxk,


n→∞ n→∞

and thus by the sandwich theorem (Corollary 3.12) we have kxn k → kxk as n → ∞.
Note: Later in Section 10 when we encounter the notation of continuous real-valued functions
on Rd , we will realize that Lemma 5.13 shows that the norm k · k is a continuous function on
Rd .
Lemma 5.8 allows us to extend the very important Bolzano–Weierstraß theorem, which we
encountered in Theorem 3.19 for sequences in R, to sequences in Rd . The generalisation of the
proof is not completely straightforward and requires a subtlety to ensure that we actually end
up with a subsequence.

45
Theorem 5.14 (Bolzano–Weierstraß theorem). If (xn ) is a bounded sequence in Rd , then
it has a convergent subsequence.

Lemma 5.8 tells us that if (xn ) is bounded, then so are all of its coordinate sequences. The
idea of the proof is to use the Bolzano–Weierstraß theorem in R (Theorem 3.19) on each of
these coordinate sequences seperately. The key subtlety here is to take care to do this in such
a way that we end up with one subsequence of (xn ) in the end, instead of with d unrelated
subsequences, one for each coordinate sequence.
Proof of Theorem 5.14. If d = 1, then of course this theorem is the same as Theorem 3.19
and we are done. So now we assume d ≥ 2.
Since the sequence (xn ) is bounded, by Lemma 5.8 we know that its first coordinate sequence
(xn,1 ) is bounded. Hence by Theorem 3.19 the sequence (xn,1 ) has a subsequence (xnk ,1 )
which converges. Now we go back to the original sequence (xn ) and take a subsequence
which corresponds exactly to those labels nk which we used for the subsequence (xnk ,1 ); so
we take the subsequence (xnk ) of the original sequence. By construction we know that the
first coordinate sequence of this subsequence converges. Because the original sequence (xn ) is
bounded, we also know that (xnk ) is bounded. Hence, by Lemma 5.8, we know that its second
coordinate sequence (xnk ,2 ) is bounded. Thus, using Theorem 3.19 again, we deduce that
there is a subsequence (xnkl ,2 ) of (xnk ,2 ) which converges. Therefore (xnkl ) is a subsequence
of (xn ) whose second coordinate sequence converges, and (very importantly!), because (xnkl )
is also a subsequence of (xnk ) and the first coordinate sequence of the latter converges, we
also know that the first coordinate sequence of (xnkl ) converges (Lemma 5.11). To recap
what we have found so far: (xnkl ) is a subsequence of the original sequence (xn ) with the
property that both its first and second coordinate sequences converge.
If d = 2, we are done. If d ≥ 3 we repeat the argument above another d − 2 timesa to find
the existence of a subsequence of (xn ) which converges.
a
Note that, since d is finite, we do not need to use a proof by induction here. We just need to repeat the
same argument a finite number of times.

Example: Let us apply the construction from the proof of the Bolzano–Weierstraß theorem
explicitly to an example sequence in R2 . Consider the sequence (xn ) defined by

xn := ((−1)n , cos(nπ/5)).

This is a bounded sequence (can you prove that?) and thus, by the Bolzano–Weierstraß theorem,
it should have a (at least one) convergent subsequence. Let us try to find one.
If we take the subsequence (yk ) = (xnk ) constructed by letting nk := 2k (so we are just
selecting the even values of n), it has elements

yk = x2k = (1, cos(2kπ/5)).

The first coordinate sequence of (yk ) is just the constant sequence with, for all k ∈ N, yk,1 = 1.
This definitely converges (to 1). Now we take a further subsequence (zl ) = (ykl ) = (xnkl ),
by setting kl := 5l (and thus nkl = n5l = 10l). Then (zl ) has elements Now if we take
n = 5, 10, 15, 20, . . . in yn we get

zl = y5l = x10l = (1, cos(2πl)) = (1, 1).

Since also the second coordinate sequence of (zl ) is constant, we have found a converging
subsequence.

46
Note that of course not all bounded sequences have a constant subsequence. All Bolzano–
Weierstraß guarantees us is that all bounded sequences have a convergent subsequence. In our
example we were just lucky that we could find a really clear example of a convergent subsequence:
not only did it converge, but it did so in possibly the simplest way possible, by being a constant
sequence.
The final lemma in this section is one which is often very useful in proofs

Lemma 5.15. Let (xn ) be a sequence in Rd and let x ∈ Rd . Then the following two
statements are equivalent.

1. The sequence (xn ) converges to x.

2. Every subsequence of (xn ) has a further subsequence which converges to x.

Proof. If (xn ) converges to x and (xnk ) is a subsequence of (xn ), then (xnk ) converges
to x and hence also every subsequence of (xnk ) converges to x. Thus statement 1 implies
statement 2.
Now assume that statement 2 is true. For a proof by contradiction assume that (xn ) does
not converge to x. Then there exists an ε > 0 such that, for all k ∈ N, there exists an nk ≥ k
such that kxnk − xk ≥ ε. In this way we have constructed a subsequence (xnk ) and so, by
assumption, there exists a further subsequence (xnkl ) of (xnk ) which converges to x. Hence,
there exists an N ∈ N such that, for all n ≥ N we have kxnkl − xk < ε. This contradicts the
earlier finding that, for all k ∈ N, kxnk − xk ≥ ε. Hence the sequence (xn ) does converge to
x.
Note: Carefully note that the lemma above says that, if every subsequence has a convergent
subsequence with limit x, then the sequence itself converges to x. It is not necessarily true that
if the sequence has a convergent subsequence with limit x, then the sequence converges to x.
The sequence (−1)n , for n ∈ N, which we have encountered before, is a good counterexample.
It has a convergent subsequence, for example the constant sequence in which every element is
equal to 1, but it does not converge itself.
It is also not true that if every subsequence has a convergent subsequence, then the sequence
itself converges. Note that we left out the condition that every subsequence has a convergent
subsequence with limit x. It is possible that every subsequence has a convergent subsequence,
but with different limits. Again the sequence (xn ) given by, for all n ∈ N, xn := (−1)n
provides an excellent example: it does not converge, yet every subsequence has a convergent
(sub)subsequence. If (xnk ) is a subequence of (xn ), then it contains infinitely many 1’s or18
infinitely many −1’s. If it has infinitely many 1’s, we can extract a further subsequence (xnkl )
with, for all l ∈ N, xnkl = 1. Hence this (sub)subsequence converges to 1. Similarly, if (xnk )
contains infinitely many −1’s, then it has a subsequence which converges to −1. So every
subsequence has a converging (sub)subsequence. However, it is not true that we always have
a subsubsequence which converges to 1. If (xnk ) contains only finitely many 10 s, then any
subsequence of (xnk ) converges to −1. Similarly, we also do not always have a subsubsequence
which converges to −1.
We have seen that questions about sequences in Rd can often be reduced to real sequences
by looking at the coordinates. This allowed us to generalise many of the concepts and results
18
This is the usual logical use of “or”. Note in particular that it is not an “exclusive or”, meaning that it
could be that the subsequence has infinitely many 1’s and infinitely many −1’s; for example, this is true for the
sequence (xn ) itself.

47
we saw in Section 3 for sequences in R to sequences in Rd . Note that we did not talk about
monotonicity in the context of sequences in Rd (d ≥ 2) though, because we typically do not
consider Rd as an ordered set (by which I mean here that we usually do not have a relation like
≤ which allows us to compare two vectors x, y ∈ Rd and say that, for example, x ≤ y. )19 .
In the next section we will see that sequences in Rd are extremely useful when we look at
properties of sets in Rd .

19
Although there are ways to equip Rd with an order, such as via a lexicographical order. We will not travel
further in that direction here.

48
6 Subsets of Rd and their boundaries
So far we have been concerned with sequences in Rd (including sequences in R). Eventually we
want to study functions whose domain is a subset of Rd . As we will see when we get there, when
we want to rigorously and precisely introduce concepts such as “derivative” for such functions,
we need to be careful and make sure that the domain of the function allows us to actually define
these concepts. For example, the notion of “derivative” has something to with the change in
the output of a function when you change the input variables a little bit. If we want to make
that rigorous, we need to make sure that there is enough room in the domain for the input
variables to change. If there is no such room, it would be hard to imagine (even in a vague, let
alone precise, sense) which property of the function we would want to capture in the concept
of “derivative”. At this point, this is all still quite vague, so before we can have a look at
functions, we first need to rigorously study some properties of subsets of Rd . That is the goal
of Sections 6–9.
Example: Let us first have a look at some examples of subsets of Rd , such as half-planes in
R2 ; for example

E := {(x1 , x2 ) ∈ R2 : x2 ≥ 0} or F := {(x1 , x2 ) ∈ R2 : x1 + x2 > 2}.

Sketch these sets to get a good idea what they look like. There are also multidimensional
‘intervals’, such as

G = {(x1 , x2 , x3 ) ∈ R3 : 1 < x1 < 2, 3 ≤ x2 < 7, −π ≤ x3 ≤ π} ⊆ R3 .

We put ‘intervals’ in quotation marks above, because these subsets of Rd (d ≥ 2) are not
actually intervals. In fact, intervals are a very important class of subsets of R which we will
encounter a lot. (Re)familiarise yourself with the definition of “interval’ (see Definition A.1 in
Appendix A.1).
Very important subsets of Rd are the open balls.

Definition 6.1. Let x ∈ Rd and r > 0, then the open ball in Rd with centre x and radius
r is
Br (x) := {y ∈ Rd : ky − xk < r}.
We will sometimes also use the notation B(x, r) instead.

Note: We see that the open ball with centre x ∈ Rd and radius r > 0 is the subset of Rd
consisting of all y ∈ Rd whose distance from x is less than r. Hence the name ball. The label
open will be explained later in Section 8.
Recall the definitions from set theory which are given in Definition A.2 in Appendix A.1.
Example: An example of a Cartesian product is

H := (0, 1) × [3, 4] = {(x1 , x2 ) ∈ R2 : 0 < x1 < 1, 3 ≤ x2 ≤ 4}.

What is the complement H c in R2 ?


A very important concept in mathematical analysis is the boundary of a set. This concept
will play a crucial role in the rest of this module.

49
Definition 6.2. Let E ⊂ Rd . The boundary of E (with respect to Rd ) is

∂E := {x ∈ Rd : there is a sequence (xn ) in E such that xn → x as n → ∞


and there is a sequence (yn ) in E c such that yn → x as n → ∞}.

Note: From the definition above we see that the boundary of E ⊂ Rd is the set of all x ∈ Rd
such that x is the limit of a sequence in E, and of a sequence in E c . Note that boundary points
of E do not need to be elements of E. They can be, but they can also be elements of E c . Or
perhaps some boundary points are in E and others in E c . The question whether or not a given
subset of Rd contains all its boundary points will play an important role later.
We will generally leave out the phrase “with respect to Rd ” when talking about the boundary
of a set E. This is not because it is not important —it is important! for example, the boundary
of a two-dimensional disc in R2 is quite different from its boundary in R3 ; do you see how
and why it differs?— but usually it will be clear which dimension we are working in from the
definition of the set E.
Another name for boundary is frontier, but boundary is much more common. Versions
of this module in earlier years used to use the term “frontier”, however, so you might still
encounter this if you look at old material for this module. In these notes we will stick with the
term “boundary”, as introduced in Definition 6.2.
Warning! In earlier sections we have already encountered the term “bounded” in the con-
text of sequences in R and in Rd . Later on (in Definition 9.4) we will also learn about the
concept of “bounded” for subsets of Rd . It is very important not to confuse “boundary”
with “bounded”. The definition of “bounded set” is given later, in Definition 9.4. Remember
back to what was said at the beginning of these notes: a mathematical term or concept means
exactly what the definition says it means, no more and no less. So even though the words
“boundary” and “bounded” are closely related in everyday English, we cannot and should
not assume that they are connected mathematically, unless we can actually prove precise math-
ematical statements that tell us that they are related. We will say more about this issue later
after we have introduced “bounded” in the context of subsets of Rd , but since this is an issue
with which many students struggle, this is a first warning to avoid potential confusion.

Example: Consider the open ball (disc) in R2 given by

E := B(0, 1) = {x ∈ R2 : kxk < 1} = {(x1 , x2 ) ∈ R2 : x21 + x22 < 1}.

Informally the boundary consists of all points which lie on the border between E and E c , so we
might suspect that the circle

C := {x ∈ R2 : kxk = 1} = {(x1 , x2 ) ∈ R2 : x21 + x22 = 1}

is the boundary of E. But of course our informal, intuitive ideas about what the boundary
might be, do not constitute a proof. We need to use the precise definition of boundary from
Defintion 6.2 to actually prove that this is the case.
Let us be very precise and explicit in all our steps. We want to prove that ∂E = C. We
(hopefully) remember that the way to prove equality of two sets is to prove two subset relations:

1. C ⊂ ∂E and

2. ∂E ⊂ C.

50
Do not forget that statement 2 also needs to be proven! This is forgotten very often on exams,
to the detriment of the proof and the exam mark.
Let us first prove statement 1. How do we prove a statement like that? Well, what is the
definition of ⊂? What does C ⊂ ∂E mean? It means that for all x ∈ C, we have x ∈ ∂E. How
do we prove such a statement, which asserts something about all elements in the set C? We
pick an arbitrary element from the set C and show that it has the required property. So, in this
case we start by picking an arbitrary20 x ∈ C and we want to prove that x ∈ E. How do we
show x ∈ E? We have to prove that there exists a sequence in E and a sequence in E c which
both converge to x. Here we go:
Proof of statement 1. Let x ∈ C. Define the sequences (xn ) and (yn ) in Rd by, for all
n ∈ N,  
1
xn = 1 − x and yn = x.
n
Since x ∈ C, we know (by definition of C) that kxk = 1. Hence, for all n ∈ N, kxn k =
1 − n1 kxk = 1 − n1 < 1 and kyn k = kxk = 1. Thus (xn ) is a sequence in E and (yn ) is a
sequence in E c = {x ∈ R2 : kxk ≥ 1}. Moreover, as n → ∞, we have xn → x and yn → xa .
Thus, there exists a sequence in E which converges to x and there exists a sequence in E c
which converges to x; hence x ∈ ∂E.
a
Unless you are being asked to explicitly prove this, it is fine to just write down the limits for simple standard
sequences like this. If you do not yet feel fully comfortable using the definition of convergence (Definition 5.2)
in proofs, however, then I do encourage you to actually write out the full proof of the statements “xn → x as
n → ∞” and “yn → x as n → ∞” for the sequences (xn ) and (yn ) which we are using in this proof.
Next we have to prove statement 2. Even though the mathematical content of the proof
will be quite different from that in the proof of statement 1, the structure is very similar. Once
you start recognising structural similarities between different proofs, it will be much easier to
understand proofs and produce your own. What do I mean when I say that this proof has a
similar structure as the one we just saw? Again we need to prove a subset relationship; in this
case we want to show that ∂E ⊂ C. What does that mean? It means that for all y ∈ ∂E 21 we
have y ∈ C. How do we prove that? We start with an arbitrary y ∈ ∂E and prove that this y
is an element of C. In particular, we have to show that kyk = 1, because that (per definition of
C) means that y ∈ C. Let us do it:
Proof of statement 2. Let y ∈ ∂E. Then (by definition of ∂E) there exists a sequence
(un ) in E and a sequence (vn ) in E c which both converge to y. Since (un ) is in E, we
know (by definition of E) that, for all n ∈ N, kun k < 1. Because (un ) converges to y, using
Lemma 5.13 and Corollary 3.10 we find
kyk = lim kun k ≤ 1.
n→∞

Moreover, since (vn ) is in E c , we have, for all n ∈ N, kvn k ≥ 1. By a similar argument as


above, we get
kyk = lim kvn k ≥ 1.
n→∞
20
What do we mean by “arbitrary” in this context? We just mean that we pick an x ∈ C and we can only use
those properties of x that are common to all elements in C. So we are not picking, say, (0, 1) specifically. Yes,
(0, 1) is an element of C, but if our proof relies on particular properties of the element (0, 1) that are not shared
by the other elements in C (such as, for example, the property that its first coordinate is zero), then all we will
have shown in the end is that (0, 1) is in ∂E, not that every element from C is in ∂E. We can only show that, if
we only use properties that all elements in C have. That is what we mean when we say that we pick an arbitary
element from C in a setting like this.
21
I am using y instead of x here just to set it apart from the x in the previous proof, but of course it does not
matter at all. Just make sure you do not use the same notation for different objects in the same proof.

51
Combining kyk ≤ 1 and kyk ≥ 1, we conclude that kyk = 1 and thus y ∈ C.

Example: Intuitively we might expect that

F := {x ∈ R2 : kxk ≤ 1} and G := {x ∈ R2 : kxk > 1}

have the same boundary as the set E from the previous example (namely the circle C). Can
you prove that this is indeed the case?
In the example above we saw that the boundary of F is the same as the boundary of G = F c .
This is actually generally true, as the following lemma shows.

Lemma 6.3. Let A ⊂ Rd . Then ∂A = ∂(Ac ).

Note that in the lemma, we used brackets to distinguish between the boundary of the comple-
ment, ∂(Ac ), and the complement of the boundary, (∂A)c . These two sets are never the same
—since ∂(Ac ) = ∂A 6= (∂A)c — so it is important to distinguish between them in our notation.
Proof of Lemma 6.3. The proof follows immediately from the definition of boundary: Let
x ∈ Rd . Then x ∈ ∂A if and only if there exists a sequence in A and a sequence in Ac which
both converge to x. That is the case if and only if x ∈ ∂(Ac ).

Example: The definition of boundary can give (perhaps) surprising results: what is the bound-
ary of Q ⊆ R?

52
7 Interior points of subsets of Rd
Having defined the boundary of a set, we now consider points which lie in a set but not on its
boundary. Remember that, for r > 0 and x ∈ Rd , B(x, r) denotes the open ball {y ∈ Rd :
ky − xk < r}.

Lemma 7.1. Let E ⊂ Rd and let x ∈ Rd . Then the following are equivalent:

1. For every real r > 0, B(x, r) ∩ E 6= ∅.

2. There exists a convergent sequence in E with limit x.

Proof. First we prove that 1 implies 2. Let n ∈ N and define rn := n1 . Then, by assumption,
there exists yn ∈ B(x, 1/n) ∩ E, i.e. there exists yn ∈ E with 0 ≤ kyn − xk < 1/n.
Taking the limit n → ∞ (and using the sandwich theorem from Corollary 3.12) we find
limn→∞ kyn − xk = 0, hence (yn ) is a sequence in E with limit x.
To prove that 2 implies 1, we use a proof by contradiction. Assume that (yn ) is a sequence
in E which converges to x and that there exists an r > 0 for which B(x, r) ∩ E = ∅. By
definition of convergence, there exists an N ∈ N such that, for all n ≥ N , kyn − xk < r.
That means that, for all sufficiently large n, yn ∈ B(x, r) and thus yn ∈ B(x, r) ∩ E (since
(yn ) is a sequence in E). This is a contradiction.

Corollary 7.2. Let E ⊂ Rd and let x ∈ Rd . Then the following are equivalent:

3. There exists a real r > 0, such that B(x, r) ∩ E = ∅.

4. If (xn ) is a sequence in E, it does not converge to x.

Proof. This follows directly from Lemma 7.1 by taking the contrapositive of each of the two
implication statements in that lemma, i.e. not-1 implies not-2, and not-2 implies not-1.
These results inspire the following definition.

Definition 7.3. Let A ⊂ Rd and x ∈ Rd . Then x is an interior point of A if there exists


a real r > 0 such that B(x, r) ⊂ A. We define the interior of the set A as

int A := {x ∈ Rd : x is an interior point of A}.

Note: Informally, this definition says that x and all its close neighbours lie in A.
If A ⊂ Rd and we take E := Ac in Corollary 7.2, then we get the following.

Corollary 7.4. Let A ⊂ Rd and x ∈ Rd . Then the following statements are equivalent.

1. x is an interior point of A.

2. There exists a real r > 0 such that B(x, r) ∩ Ac = ∅.

3. There exists no sequence in Ac with limit x.

53
Proof. Let x be an interior point of A, then there exists r > 0 such that B(x, r) ⊂ A, hence
B(x, r) ∩ Ac = ∅. On the other hand, if there exists a real r > 0 such that B(x, r) ∩ Ac = ∅,
then B(x, r) ⊂ A and thus x is an interior point of A. Hence the first two statements are
equivalent.
The equivalence between the second and third statement follows immediately from Corol-
lary 7.2 with E = Ac .

Lemma 7.5. Let A ⊂ Rd , then

A = (int A) ∪ (∂A ∩ A).

Moreover, int A and ∂A ∩ A are disjoint sets, i.e.

(int A) ∩ (∂A ∩ A) = ∅.

Proof. Let x ∈ int A, then there exists r > 0 such that B(x, r) ⊂ A. In particular x ∈ A,
hence int A ⊂ A and thus (int A) ∪ (∂A ∩ A) ⊂ A. To prove the subset relation in the other
direction, let x ∈ A. We have to prove that if x is not an interior point of A, then x ∈ ∂A.
So assume x is not an interior point of A, then by Corollary 7.4 there exists a sequence in Ac
with limit x. Moreover, the constant sequence (xn ) defined by, for all n ∈ N, xn := x, is a
sequence in A which converges to x. Hence x ∈ ∂A. We conclude that (int A) ∪ (∂A ∩ A) ⊂ A
and thus A = (int A) ∪ (∂A ∩ A).
The disjointness statement also follows from Corollary 7.4: If x ∈ int A, then there exists no
sequence in Ac converging to x, so x is not in ∂A. Hence (int A) ∩ (∂A ∩ A) = ∅.
Note: Lemma 7.5 says that every point in a subset A ⊂ Rd is either an interior point or a
boundary point, but is never both. Some people refer to ∂A ∩ A as the set of non-interior points
in A (sometimes written nint A), but we will not use this terminology in these notes.

Example: Note that it is possible for int A, ∂A ∩ A, both, or neither to be empty. For
example, if x ∈ Rd and A := {x}, then int A = ∅ and ∂A ∩ A = {x} = 6 ∅. If x ∈ Rd , r > 0, and
B := B(x, r), then int B = B 6= ∅ and ∂B ∩ B = ∅. If C := ∅, then int C = ∂C ∩ C = ∅. If
D := [0, 1) ⊂ R, then int D = (0, 1) 6= ∅ and ∂D ∩ D = {0} =6 ∅. Prove that the interiors and
boundaries of these sets A, B, C, and D are exactly as claimed.
This discussion leads to an important class of sets, called open sets, which we will introduce
in the next section.

54
8 Open subsets of Rd
We look at open subsets of Rd , which play a very important role in analysis.

Definition 8.1. Let A ⊂ Rd . Then A is called open if int A = A.

Note: Note that for every set A ⊂ Rd it is the case that int A ⊂ A, so if we are to prove that
A is an open set, all we need to prove is that A ⊂ int A. In other words, whenever we need to
prove that A is an open set, it suffices to prove that, if x ∈ A, then x ∈ int A.

Lemma 8.2. Let A ⊂ Rd . Then the following statements are equivalent.

1. A is open.

2. For all x ∈ A there exists an r > 0 such that B(x, r) ⊂ A.

3. ∂A ∩ A = ∅.

4. ∂A ⊂ Ac .

Proof. Since A is open if and only if for all x ∈ we have x ∈ int A, the equivalent between
statement 1 and statement 2 follows directly from the definition of int A.
By Lemma 7.5 we know that A is the disjoint union of int A and ∂A ∩ A. Hence A = int A
if and only if ∂A ∩ A = ∅. Thus statement 1 and statement 3 are equivalent.
Finally, the equivalence between statement 3 and statement 4 is a simple statement about
sets. By now such basic results about sets should be a standard part of your mathematical
toolbox, but for completeness we present a proof here. If ∂A ∩ A = ∅ and x ∈ ∂A, then x is
not an element of A (otherwise x ∈ ∂A ∩ A), hence x ∈ Ac . On the other hand, if ∂A ⊂ Ac
and x ∈ A, then x is not an element of ∂A (otherwise x ∈ Ac ) and thus ∂A ∩ A = ∅.
The name open will be justified to some extent when we meet closed sets. Here is a way that
you might form an intuitive picture. Imagine that you own a field, but none of its boundary
edge. Can you prevent your neighbour(s) from stepping on your property?
Example: The set
H := {(x1 , x2 ) ∈ R2 : x1 > 0}
(also called open half-plane is an open set, because for all x = (x1 , x2 ) ∈ H, we have
B(x, x1 ) ⊆ H. Alternatively we can note that ∂H = {(x1 , x2 ) ∈ R2 : u = 0} (prove that!), so
that ∂H ∩ H = ∅. Figure 5 shows H with the open ball B(x, u) drawn for x = (u, v) ∈ H.
In Definition 6.1 we defined what an open ball was. At the time we could understand the
“ball” part of the name, but did not yet understand what the “open” part meant. Now we are
in a position to show that open balls are indeed open, in the sense of Definition 8.1. Note that
this is not circular reasoning. Before we defined what it meant for a set to be open, the name
“open ball” was just a label. We could have just as well given the sets B(x, r) any other name.
But now that we have defined what we mean by “open”, we can go back and check that “open
ball” was actually a pretty good name for the sets B(x, r), because these sets which we have
been calling open balls are indeed open sets in the sense of Definition 8.1.

55
(0, v) (u, v)
u

Figure 5: H = {(x1 , x2 ) ∈ R2 : x1 > 0} is an open set.

r−s
y

Figure 6: The open ball B(x, r) is an open set. Here s := ky − xk.

Lemma 8.3. Let x ∈ Rd and r > 0. Then the set B(x, r) ⊂ Rd , as defined in Definition 6.1
is an open set.

The sketch in Figure 6 illustrates the proof of this lemma (but of course a sketch does not
constitute a proof!). Note that in the sketch we used s := ky − xk.
Proof of Lemma 8.3. For notational simplicity, define B := B(x, r). We have to prove
that int B = B. As noted before, per definition of int B (or by Lemma 7.5), we know that
int B ⊂ B, so we only need to prove that B ⊂ int B. Let y ∈ B, then, by definition of B, we
have that ky − xk < r. Define r0 := r − ky − xk, then r0 > 0. We will now prove the claim
that B(y, r0 ) ⊂ B. If that is true, we are done, because then we have found that for every
y ∈ B there exists an r0 > 0 such that B(y, r0 ) ⊂ B and thus, by Lemma 8.2, B is open.
To prove the claim, let z ∈ B(y, r0 ), then kz − yk < r0 and thus, using the triangle inequality,

kz − xk = kz − y + y − xk ≤ kz − yk + ky − xk < r0 + ky − xk = (r − ky − xk) + ky − xk = r.

Therefore, z ∈ B. This completes the proof.

Example: What is the boundary ∂B(x, r) of the open ball B(x, r), for given x ∈ Rd and r > 0?
Try to answer (and then prove your answer!) this first without help. After you have struggled
for a while, if you have difficulty, look back at the open ball example in Section 6.

56
Example: Which of the following sets are open subsets of the given space?

(a, b] ⊆ R, Q ⊆ R, {x = (x1 , x2 ) ∈ R2 : kxk ≤ 1 and x1 > 0} ⊂ R2 .

Prove your claims! If you think that a particular set is not open, how would you prove that?
Hint: What is the negation of statement 2 in Lemma 8.2?
Now that we have defined the property of “open”, which subsets of Rd can have, it is
interesting (and important) to consider how this property behaves with respect to basic set
operations such as taking unions and intersections. If we take the union of open sets, do we
end up with an open set? If we take the intersection of open sets, do we get an open set. The
answers to these questions, as we will see, depends crucially on how many open sets are involved
in the union or intersection.

Theorem 8.4. Let T be a set and let {Ut }St∈T be a collection of sets such that, for all t ∈ T ,
Ut is an open subset of Rd . Then the set t∈T Ut is an open subset of Rd .

Before we prove this theorem, let us recall some of the set terminology and notation that we
used here. The set T is not specified; it can be any set. All we use the elements of T for is
to label the sets Ut . What is very important here, is that we do not make any assumptions
whatsover about the cardinality (the size, if you will) of T . It could contain finitely many
elements22 , countably infinitely many elements, or even uncountably infinitely many. We make
no assumption on that. If T contains finitely many S elements, let us say T = {1, 2, . . . , n} for
some n ∈ N , then it is easy to interpret whatS t∈T Ut means: it is just the union of finitely
many sets Ut : U1 ∪ U2 ∪ . . . ∪ Un . So if x ∈ t∈T Ut , then x belongs to at least one of the
(finitely many) set Ut (with t ∈ T ). But what if T is countably or uncountably infinite? Also
in that case we can define the union in exactly the same way, except that we cannot label the
sets with a finite set of numbers anymore (and in the uncountably infinite case we cannot even
label them with only natural numbers anymore). But also in this case the definition is
[
Ut := {x ∈ Rd : there is a t ∈ T such that x ∈ Ut }.
t∈T
S
In words, x is in t∈T Ut if x is in at least one of the Ut .
S
Proof of Lemma 8.4. For notational simplicity, define W := t∈T Ut . Let x ∈ W , then
there exists a t ∈ T such that x ∈ Ut . Since Ut is open, there exists an r > 0 such that
B(x, r) ⊂ Ut ⊂ W . Hence x ∈ int W and thus W is open.
Note: In the proof of Lemma 8.4 we did not use the boundary. The boundary of a union may
be tricky to determine.
Now let us look at intersections.

Theorem 8.5. Let T be a set with finitely many elements and let {Ut }T t∈T be a collection
d
of sets such that, for all t ∈ T , Ut is an open subset of R . Then the set t∈T Ut is an open
subset of Rd .
22
which includes the possibility that it contains no elements, although in that case the statement of the theorem
is rather uninteresting

57
Before we go to the proof of this theorem, pay special attention to the fact that we now require
the labelling set to be finite23 . After the proof we will see an example of an intersection of
infinitely many open sets which is not open. So we definitely cannot leave out this condition on
T ! As a reminder, the intersection of sets is of course defined to be
[
Ut := {x ∈ Rd : for all t ∈ T , x ∈ Ut }.
t∈T

In this particular case, where T is finite, say T = {1, . . . , n}24 , we of course get

[ n
[
Ut = Ut = U1 ∩ . . . ∩ Un = {x ∈ Rd : for all t ∈ {1, . . . , n}, x ∈ Ut }.
t∈T t=1
S
In words, if x ∈ t∈T Ut , then x is an element of all the set Ut .
S
Proof of Theorem 8.5. For notational simplicity, define W := t∈T Ut . Let x ∈ W , then
for all t ∈ T , W ∈ Ut . Since each of the sets Ut is open, we have for all t ∈ T , there exists
an rt > 0 such that B(x, rt ) ⊂ Ut . Define r := min{rt ∈ R : t ∈ T }. This minimum exists
because T is a finite set and thus {rt ∈ R : t ∈ T } is a finite set. Moreover, r > 0a . If we
have two radii s1 > 0 and s2 > 0 and if s1 ≤ s2 , then B(x, s1 ) ⊂ B(x, s2 ). Thus, since, for
all t ∈ T , r ≤ rt , we have, for all t ∈ T , B(x, r) ⊂ B(x, rt ) ⊂ Ut . Hence B(x, r) ⊂ W and
thus x ∈ int W . We conclude that W is open.
a
Note that if T had been an infinite set, we would have had no guarantee that the minimum of {rt ∈ R :
t ∈ T } existed. You might think that we could have used the infimum of {rt ∈ R : t ∈ T } instead (which does
exist, since {rt ∈ R : t ∈ T } is a nonempty subset of R —assuming that T 6= ∅— and is bounded below), but
there would be no guarantee that the infimum would be strictly positive, which is a necessary property when
we want to use it as the radius of an open ball.

Example: Note very carefully the point in the proof above where we used the fact that T is
finite. Let us have a look at an example that shows that the conclusion of Theorem 8.5 is not
necessarily true if T is not finite. If we define Un (n ∈ N) to be

Un := B(0, 1/n) = {x ∈ Rd : kxk < 1/n},


T
then each Un is an open ball and so open. But the intersection n∈N Un of all the sets Un is
just {0}. This is not an open set. (Can you prove that?) So the intersection of infinitely many
open sets is not necessarily open.

23
which again includes the possibility of it being empty, but then, as before, the statement of the theorem is
not very interesting
24
The use of an ellipsis (“. . .”) in mathematics is sometimes considered bad form. An ellipsis indicates an
omission. Strictly speaking the notation {1, . . . , n} (for example) does not tell us which elements of the set we
are omitting from our notation. The ellipsis could stand in for anything. In these notes, however, we will use the
convention that {1, . . . , n} := {k ∈ N : k ≤ n}.

58
9 Closed subsets of Rd
In the previous section we have learned what open sets are. In this section we introduce
closed subsets of Rd , which are also very important from the point of view of analysis. The
first big warning: “closed” does not mean “not open”! As we have seen many, many
times now, we need to stick strictly to the mathematical definitions of terms. Even though
in everyday speech you might consider “open” and “closed” to be opposites, this is not true
for the mathematical terms “open” and “closed”. This is a perennial source of confusion for
students, so right from the start this point should be stressed.
Now that we know (and hopefully do not forget) what “closed” does not mean, let us have
a look at what it does mean.

Definition 9.1. Let A ⊂ Rd . Then A is called closed if Ac is open.

Remember that, per definition Ac = Rd \ A.

Lemma 9.2. Let A ⊂ Rd . Then the following statements are equivalent.

1. A is closed.

2. ∂A ∩ Ac = ∅.

3. ∂A ⊂ A.

Also the following statements are equivalent.

4. A is not closed.

5. Ac is not open.

6. There exists a convergent sequence in A whose limit is in Ac .

Proof. To prove the equivalence of the first three statements, remember from Lemma 6.3
that ∂A = ∂(Ac ). Per definition A is closed if and only if Ac is open, which, by Lemma 8.2 is
equivalent to ∂A ∩ Ac = ∂(Ac ) ∩ Ac = ∅. This in turn, by a simple set theoretical argument
(as in the proof of Lemma 8.2) is equivalent to ∂A ⊂ A.
The equivalence between statements 4 and 5 follows immediately by negating the condition
in the definition of closed. To prove equivalence with the final statement, not that Ac is not
open if and only if there exists an x ∈ Ac which is not an interior point of Ac . This in turn,
by Corollary 7.4, is equivalent to the statement that there exists an x ∈ Ac such that there
is a sequence in (Ac )c = A which converges to x. This is equivalent to statement 6.
From Lemma 9.2 we can extract a very important result, which we will give its own theorem.

Theorem 9.3. Let A ⊂ Rd . Then A is closed if and only if every convergent sequence in
A has its limit in A.
Proof. This follows by taking the contrapositives of the statements “4 implies 6” and “6
implies 4” in Lemma 9.2. (Write out the details yourself for practice with negating logical
statements!)
The theorem above gives an intuitive reason for the name “closed set”: a convergent sequence
in a closed set A cannot ‘escape’ to a limit outside A.

59
A very common technique in practice to prove that a subset A ⊂ Rd is closed, is to assume
that (xn ) ⊂ A, x ∈ Rd , and xn → x as n → ∞, and then to prove that x ∈ A.
To prove that a subset A ⊂ Rd is not closed, we can try to find an x ∈ Ac (which we suspect
is in ∂A) and then construct a sequence (xn ) ⊂ A such that xn → x as n → ∞.
Example: At the start of this section, we warned against confusing “closed” with “not open”.
Hopefully it has become clearer now that those two things are not the same. In fact, it is
possible for a set to be “open and not closed”, “not open and closed”, “open and closed”, or
“not open and not closed”. Each of those four combinations is possible. Here are some examples
of each.
If x ∈ Rd and r > 0, then the open ball B(x, r) is open and not closed. In Lemma 8.3
we saw it is open. To prove it is not closed, consider the sequence (xn ) in Rd defined by, for
all n ∈ N, xn = (r − nr , 0, . . . , 0) (where there are d −p1 zeros following the first coordinate,
unless d = 1, in which case xn = r − nr ). Then kxn k = (r − nr )2 + 02 + . . . + 02 = r − nr < r,
hence (xn ) is√a sequence in B(x, r). Moreover, xn → x := (r, 0, . . . , 0) as n → ∞ (prove this!).
Since kxk = r2 + 02 + . . . + 02 = r, we have x ∈ Ac and thus (xn ) is a convergent sequence in
B(x, r) with limit in Ac . Hence, by Lemma 9.2, B(x, r) is not closed.
If x ∈ Rd , then the singleton set {x} is not open and closed. In the final example of Section 7
we saw that the interior of {x} is the empty set, thus {x} = 6 int {x} and hence {x} is not open.
To prove it is closed, let (xn ) be a sequence in {x}. Since {x} contains only one element, (xn )
has to be the constant sequence with, for all n ∈ N, xn = x. In an example in Section 5 we saw
that such a constant sequence converges to x. Hence every sequence in {x} converges to a limit
in {x} and thus {x} is closed by Theorem 9.3
Examples of subsets of Rd which are both open and closed are ∅ and Rd . Can you prove
that they are indeed open and closed? If you are up for a bit of a challenge, also try to prove
that these are in fact the only subsets of Rd which are both open and closed25 .
An example of a set which is not closed and not open, is the interval [0, 1) ⊂ R. In the last
example of Section 7 we saw that int [0, 1) = (0, 1). Hence int [0, 1) 6= [0, 1) and thus [0, 1) is
not open. Morover, the sequence (xn ) defined by, for all n ∈ N, xn = 1 − f rac1n, is a sequence
in [0, 1) with limit equal to 1, which is not in [0, 1). Hence [0, 1) is not closed. Can you think of
an example of a subset of R2 (or of Rd ) which is not open and not closed? Hint: Try to think
of a set which contains part, but not all, of its boundary. Prove your claims.
We have already seen the concept of boundedness for sequences in Rd . It is also a very useful
concept for subsets of Rd .

Definition 9.4. Let A ⊂ Rd . We say that A is bounded if there exists an M ∈ R such


that, for all x ∈ A, kxk < M .

Note: From the definition above we see that A ⊂ Rd is bounded if and only if there is an
M > 0 such that A ⊆ B(0, M ).
Similar to what was noted after Definition 5.7, it actually does not matter if we use a strict
or non-strict inequality in the condition in Definition 9.4. Do you see why both choices are
equivalent in this definition? (Careful! In other situations it often does matter if an inequality
is strict or not.)
Another warning! The words “bounded” and “boundary” are closely related in English
and based on their meanings in everyday speech, you might surmise that “being bounded” has
25
The fact that ∅ and Rd are the only subsets of Rd which are both open and closed is in fact a very important
property of the set Rd . It is called connectedness and if you want to learn more about it, you can take the
module G13MTS/MATH3003 next year.

60
something to do with “containing all of its boundary”. In our mathematical setting, however,
this is not true. Whether or not a set contains all of its boundary is completely independent
from the question whether or not it is bounded. For example, we have seen before that the
boundary of the open ball B(0, 1) ⊂ R2 is the circle {y ∈ R2 : kyk = 1} (Section 6), or
more generally (as you will know if you have done the example in Section 8), the boundary
of B(x, r) ⊂ Rd (for given x ∈ Rd and r > 0) is the hypersphere {y ∈ Rd : ky − xk = r}.
Thus the open ball B(x, r) contains no single element of its boundary (in fact, that is why it is
open, as we know from Lemma 8.2); it is, however, bounded, since we have, for all y ∈ B(x, r),
kyk < r.
Moreover, we can also find examples of sets which do contain their complete boundary, and
yet are not bounded. An example of those is the half-plane {(x1 , x2 ) ∈ R2 : x2 ≥ 0} which we
encountered in Section 6, or more generally a half-space of the form {x ∈ Rd : xd ≥ 0}. The
boundary of such as set is {x ∈ Rd : xd = 0} (prove this!), which is completely contained in the
half-space (or half-plane) itself. These sets, however, are not bounded. It is worthwhile looking
in detail at how we prove that, because it will give us more practice in negating statements to
prove that a property (in this case boundedness) does not hold.
Negating the defining statement in Definition 9.4, we see that a set A ⊂ Rd is unbounded if
and only if for all M > 0 there is an x ∈ A such that kxk ≥ M . In our example, for each M > 0
we can take xM := (M, 0, . . . , 0). Then xM is an element of the half-space {x ∈ Rd : xd ≥ 0}
and kxM k = M ≥ M . So the half-space is unbounded.
The following theorem describes a very important property of subsets of Rd which are both
closed and bounded.

Theorem 9.5 (Heine–Borel theorem). Let A ⊂ Rd . Then the following statements are
equivalent.

1. The set A is closed and bounded.

2. Every sequence in A has a convergent subsequence whose limit is an element of A.

Proof. We start by proving that statement 1 implies statement 2. Let A ⊂ Rd be closed


and bounded and let (xn ) be a sequence in A. Because A is bounded the sequence (xn )
is bounded (prove this!) and thus, by the Bolzano–Weierstraß theorem, the sequence has a
convergent subsequence. By Theorem 9.3 we then conclude that the limit of this subsequence
must be in A.
We prove that statement 2 implies statement 1 via a proof by contradiction. Assume that
every sequence in A has a convergent subsequence with limit in A and assume that A is not
closed or A is not bounded. If A is not closed, then by Theorem 9.3 there exists a sequence
(yn ) in A which converges to a limit y ∈ Ac . Then, by Lemma 5.11, every subsequence of
(yn ) also has limit y ∈ Ac . This contradicts the assumption. On the other hand, if A is not
bounded, then for every M > 0 there exists an x ∈ A such that kxk ≥ M . In particular,
for every n ∈ N, there exists an xn ∈ A such that kxn k ≥ n. We have found a sequence
(xn ) in Rd . If (xnk ) is a subsequence of (xn ), then, for all k ∈ N, kxnk k ≥ nk . Hence (xnk )
is unbounded and thus, by Lemma 5.9, it does not converge. Thus all subsequences of (xn )
diverge, which again contradicts the starting assumption (statement 2). We conclude that A
is closed and bounded.
Property 2 in Theorem 9.5 is so important that it has its own name.

61
Definition 9.6. Let A ⊂ Rd . The set A is called sequentially compact if every sequence
in A has a convergent subsequence whose limit is an element of A.

Note: By the Heine–Borel theorem we see that subsets of Rd are sequentially compact if
and only if they are both closed and bounded. The concept of sequential compactness can
be generalised to more abstract settings (in many of which closed and bounded is no longer
enough to guarantee sequential compactness). If you want to learn more about this, you can
take the module G13MTS/MATH3003 in the third year. Sequentially compact subsets of Rd
are sometimes also just called compact. In G13MTS/MATH3003 you will learn that in a more
general setting the notions “sequentially compact” and “compact” mean different things, but
that in the specific case of subsets of Rd they are equivalent.

Theorem 9.7 (Nested closed and bounded sets theorem). For all n ∈ N, let En ⊂ Rd be
non-empty, closed, and bounded, and assume that the sets are nested in the following way

E1 ⊃ E2 ⊃ E3 ⊃ . . .

Then \
En 6= ∅.
n∈N

Proof. Let (xn ) be a sequence in Rd with the property that, for all n ∈ N, xn ∈ En . Since,
for all n ∈ N, En ⊂ E1 , we have that (xn ) is a sequence in E1 . Since E1 is closed and bounded,
the Heine–Borel theorem tells us that (xn ) has a convergent subsequence (xnk ) with limit in
E1 . Let us call this limit a ∈ E1 . Let m ∈ N. By the properties of a subsequence, we have
for all k ≥ m, that nk ≥ m and hence, for all k ≥ m, xnk ∈ Enk ⊂ Em . By Lemma 5.5 we
know that the subsequence (xnk )∞ k=m of (xnk ) (i.e. the subsequence we get by deleting the
first m − 1 elements from (xnk )) also converges to a. Since this subsequence is in the closed
set Em , we have, by Theorem 9.3, that T a ∈ Em . Since m was arbitrary, we conclude that,
for all m ∈ N, a ∈ Em , and thus a ∈ n∈N En .
Note: The conclusion of Theorem 9.7 is not true if we drop either the closedness assumption
or the boundedness assumption. For example, if we define, for all En := (0, n1 ) ⊂ R, then the
sets En are non-emptyT and bounded (prove!) and E1 ⊃ E2 ⊃ E3 ⊃ . . .. However, they T are not
closed (prove!) and n∈N En = ∅. We can prove this by contradiction. Assume x ∈ n∈N En ,
then, for all n ∈ N, x ∈ (0, n1 ). If n ∈ N is such that n > x1 , however, then x > n1 , which is a
contradiction.
We can also find an example that shows that the boundedness assumption is necessary.
Define, for all n ∈ N, En := [n, ∞) ⊂ R. Then the sets En are non-empty, closed T (prove!), and
E1 ⊃ E2 ⊃ E3 ⊃ . . .. However, they areT not bounded (prove!) and again n∈N En = ∅. To
prove this by contradiction, assume x ∈ n∈N En , then, for all n ∈ N, x ∈ [n, ∞). If, however,
n ∈ N is such that n > x, then x is not in [n, ∞), which is a contradiction.
The version of Theorem 9.7 in R (with an extra uniqueness statement) is interesting to
mention separately.

Theorem 9.8 (Nested interval theorem). Let (an ) ⊂ R and (bn ) ⊂ R be sequences such

62
that, for all n ∈ N, an ≤ bn , and, for all n ∈ N, [an+1 , bn+1 ] ⊂ [an , bn ]. Then
\
[an , bn ] 6= ∅.
n∈N

Moreover, if c ∈ R is such that limn→∞ an = c = limn→∞ bn , then


\
[an , bn ] = {c}.
n∈N

Proof. Because, for all n ∈ N, the interval [an , bn ] is non-empty and closed, and moreover by
assumption the intervals [an , bn ] are nested in the sense of Theorem 9.7, the first statement
in this corollary follows immediately by applying that theorem.
Let k ∈ N. By induction we can prove that, for all n ∈ N with n ≥ k, [an , bn ] ⊂ [ak , bk ] (the
details are left to the reader as an exercise). In particular, we see that, for all n ∈ N, an ≤ bk
and bn ≥ ak . Hence, by Corollary 3.10 (and Lemma 3.4), we have c ≤ bTk and c ≥ ak . Since
k was arbitrary, we deduceT that, for all k ∈ N, c ∈ [ak , bk ], hence {c} ⊂ n∈N [an , bn ].
It remains to prove that n∈N [an , bn ] ⊂ {c}. Let d ∈ R be such that, for all n ∈ N, d ∈ [an , bn ].
Then, for all n ∈ N,
0 ≤ d − an ≤ bn − an .
Using the sum rule for limits, we have limn→∞ (bn −an ) = c−c = 0. By the sandwich theorem,
limn→∞ d − an = 0. Using the sum rule again, we find limn→∞ an = d. By uniqueness of the
limit (Lemma 3.3) we have d = c. This completes the proof.

63
10 Continuous functions on subsets of Rd
In this Section we will consider continuous functions defined on (subsets of) Rd and their
properties. This is a good time to carefully remember what the definition is of a function with
domain A and codomain B, written f : A → B. Informally, a function assigns (or perhaps we
should say it actually is the assignment) of an element from the codomain to each element of
the domain. In particular, we note that each element in the domain (no elements are left out)
is assigned one element in the codomain (no more and no fewer than one). Definition A.3 in
Appendix A provides the rigorous definition. Also have a look at the definitions of the restriction
of a function (Definition A.4) and of a real-valued function (Definition A.5).
Even if we limit our attention to real-valued functions whose domain and codomain are
subsets of Rd (for possibly different values of d for the domain and the codomain), as we will do
in this module, that still leaves us with a bewilderingly large set of possible functions. There are
some interesting (and provably true) things to say about this large set of all possible functions,
but if we want to think about operations such as differentiation and integration of functions,
we will need to restrict our attention even further to functions which are, in some sense, well-
behaved enough to allow such complex operations. The well-behavedness of a function is often
called its regularity. The first regularity property we will discuss is continuity. In Section 13
we will encounter differentiability (for functions on the real line), which is another regularity
property (or, in fact, which gives us a whole family of regularity properties, since we can consider
higher order differentiability as well).
Let us first recall from G11ACF/MATH1005 the definition of a limit for function values of
functions on the real line.

Definition 10.1. If f : R → R, a ∈ R, and L ∈ R, we write limx→a f (x) = L if, for every


ε > 0 there exists a δ > 0 such that, for all x ∈ R,

0 < |x − a| < δ ⇒ |f (x) − L| < ε. (8)

Have a look at Definition A.10 as well to remind yourself of the definitions of one-sided limits
and limits involving −∞ or +∞. Also remember the definition of the extended real number
line R̄ (Definition A.9) which we mentioned earlier in these notes already.
Next we address the relationship between convergent sequences and limits for function val-
ues.

Lemma 10.2. Let f : R → R, a ∈ R, and L ∈ R. Then the following statements are


equivalent.

1. limx→a f (x) = L.

2. For al sequences (xn ) in R \ {a} which converge to a, the sequence (f (xn )) converges
to L.

Proof. First we prove that statement 1 implies statement 2. Assume that limx→a f (x) =
L and let (xn ) be a sequence in R converging to a. Let ε > 0. Then, by definition of
limx→a f (x) = L, there exists a δ > 0 such that, for all x ∈ R, if 0 < |x − a| < δ, then
|f (x) − L| < ε. Take this δ. Then by definition of convergence of (xn ) to a we know there
exists an N ∈ N such that, for all n ≥ N , |xn − a| < δ. Moreover, since (xn ) is a sequence
in R \ {a}, we have for all n ∈ N that xn 6= a, thus |xn − a| > 0. Therefore, for all n ≥ N ,

64
|f (xn ) − L| < ε. Hence the sequence (f (xn )) converges to L.
We prove that statement 2 implies statement 1 via a proof by contradiction. Assume that,
for all sequences (xn ) in R \ {a} which converge to a, the sequence (f (xn )) converges to L and
that limx→a f (x) = L does not hold. Negating the defining statement for limx→a f (x) = L in
Definition 10.1, we get that there exists an ε > 0 such that for all δ > 0 there exists x ∈ R
with 0 < |x−a| < δ and |f (x)−L| ≥ ε. In particular, if we define, for all n ∈ N, δn := n1 , then
there exist, for all n ∈ N, xn ∈ R with 0 < |xn − a| < δn = n1 and |f (xn ) − L| ≥ ε. Thus, the
sequence (xn ) which we constructed in this way is in R \ {a} and converges to a. (Prove this!)
Therefore, by assumption, the sequence (f (xn )) converges to L. This, however, contradicts
the statement that, for all n ∈ N, |f (xn ) − L| ≥ ε. (Prove this!) Hence limx→a f (x) = L.
Note: Can you formulate and prove similar equivalences between limits of function values and
limits of sequences for the other types of function value limits from Definition A.10 and the
note following it?
Before considering continuity of functions defined on subsets of Rd , let us have a quick look
at the special case of continuity for functions on the real line. Hopefully we do also remember
the definition from G11ACF/MATH1005.

Definition 10.3. Let a ∈ R. A function f : R → R is continuous at a, if for every ε > 0


there exists a δ > 0 such that, for all x ∈ R,

|x − a| < δ ⇒ |f (x) − f (a)| < ε. (9)

If f is not continuous at a, we say f is discontinuous at a.

This says that f (x) is as close as we need to f (a), for all x which are close enough to a. Here ε
can be thought of as εrror, or tolεrance, and δ as δisplacement, or δistance.

Lemma 10.4. Let f : R → R be a function and a ∈ R. Then f is continuous at a if and


only if limx→a f (x) = f (a).

Proof. The “if” statement follows because (9) in Definition 10.3 implies (8) in Definition 10.1
with L = f (a). To prove the “only if” statement, we note that (8) implies that (9) holds for
all x ∈ R \ {a}. If x = a, however, then trivially |f (x) − f (a)| = |f (a) − f (a)| = 0 < ε and
so (9) holds for all x ∈ R.
We want to extend the definition of continuity in Definition 10.3 to functions defined on Rd .
Of course not all functions that take input from Rd have all of Rd as domain, so we will be
considering functions which have a subset of Rd as domain. We will also allow for more general
codomains. So far we have considered real-valued functions, now we will consider functions
whose codomain is Rq , for some q ∈ N.
Note: In the following we will use both Rd and Rq . Unless it is explicitly stated otherwise, we
assume that d, q ∈ N. We make no further assumptions on d and q; in particular, they could be
the same or different.

Definition 10.5. Let U ⊂ Rd , f : U → Rq , and assume a ∈ U . The function f is


continuous at a if, for every ε > 0 there eixsts a δ > 0 such that, for all x ∈ U ,

kx − ak < δ ⇒ kf (x) − f (a)k < ε. (10)

65
δ
a U

f(a)

Figure 7: A function f : U → Rq (with U an open subset of Rd ; in this drawing d = q = 2)


which is continuous at a ∈ U : for each ε > 0, no matter how small, there is a δ > 0 with
f (B(a, δ)) ⊂ B(f (a), ε).

If f is not continuous at a, we say f is discontinuous at a.


If U 0 ⊂ U , the function f is continuous on U 0 if, for all a ∈ U 0 , f is continuous at a.
The function f is continuous if it is continuous on its domain U . If f is not continuous,
we say f is discontinuous.

Note: We see that Definition 10.5 is the straightforward generalisation of Definition 10.3 to
functions whose domain and codomain can be higher dimensional. Instead of the absolute value
function, we need to use the norm, but the structure of the definition is the same. Again, this
just says that the distance from f (x) to f (a) is as small as we like, provided the distance from
x to a is small enough. Note that the norm used in kx − ak is the norm in Rd , while the norm
in kf (x) − f (a)k is the norm in Rq .
Also note that the implication in (10) is only required to be true for x in the domain U . Of
course if x is not in the domain of f , then the expression f (x) (and by extension the statement
in (10)) makes no sense!
If we rewrite the condition in Definition 10.5 in terms of open balls, we find that f is
continuous at a if and only if, for every ε > 0 there exists a δ > 0 such that f (B(a, δ) ∩ U ) ⊂
B(f (a), ε).

Definition 10.6. Let U ⊂ Rd , f : U → Rq , a ∈ U ∪ ∂U , and L ∈ Rq . Then we write


limx→a f (x) = L if, for all ε > 0 there exists δ > 0 such that, for all x ∈ U ,

0 < kx − ak < δ ⇒ kf (x) − Lk < ε.

Note: The definition above is a generalisation of both the two-sided limit limx→a f (x) = L
as well as the one-sided limits limx→a− f (x) = L and limx→a+ f (x) = L from Definitions 10.1
and A.10. Do you understand why? Hint: Consider the quantification “for all x ∈ U ” in
Definition 10.6.
Do you see why we cannot generalise the definitions from Definition A.10 that involve ±∞
either on the side of the domain (if d ≥ 2) or the codomain (if q ≥ 2)?
Also note that we only require a ∈ U ∪ ∂U ; we do not need a to be in the domain U . We
could even have relaxed our requirement and written a ∈ Rd . In that case everything would
still be meaningfully defined, but it would have had the rather unfortunate consequence that,

66
for all a ∈ Rd \ {U ∪ ∂U }, limx→a f (x) = L would be true for every L ∈ Rq . Do you see why
that would have been the case?

Lemma 10.7. Let U ⊂ Rd , f : U → Rq , a ∈ U ∪ ∂U and L ∈ Rq . Then the following


statements are equivalent.

1. limx→a f (x) = L.

2. For every sequence (xn ) in U \ {a} which converges to a, the sequence (f (xn )) con-
verges to L in Rq .

Proof. The proof is the same, mutatis mutandis, as the proof of Lemma 10.2. We only need
to generalise all the concepts from that proof which are specific to R to the general setting
in which the domain is a subset of Rd and the codomain is Rq .
Note: Because limx→a f (x) = L is equivalent to a statement about converging sequences, many
of the results we proved before about converge of sequences still hold for convergence of function
values. For example, when q = 1 the preservation of inequalities in the limit (as in Lemma 3.9
and Corollary 3.10) and the sandwich theorem(s) (as in Corollary 3.12 or Lemma 3.13) have
equivalent formulations for limits of the form limx→a f (x), as well as for the other limits defined
in Definition A.10, wherever these results can be meaningfully defined. I strongly encourage
that you try to prove this yourself.
Now we are in a position to give some different formulations of continuity.

Lemma 10.8. Let U ⊂ Rd , f : U → Rq , and assume a ∈ U . Then the following statements


are equivalent.

1. The function f is continuous at a.

2. limx→a f (x) = f (a).

3. For every sequence (xn ) in U \{a} which converges to a, the sequence (f (xn )) converges
to f (a) in Rq .

4. For every sequence (xn ) in U which converges to a, the sequence (f (xn )) converges to
f (a) in Rq .

Proof. The equivalence of the first two statements follows directly from Definitions 10.5
and 10.6. We deduce equivalence of the second and third statements by using Lemma 10.2
with L = f (a). It is immediately clear that the fourth statement implies the third. Conversely,
assume the third statement is true and let (xn ) be a sequence in U which converges to a.
Let ε > 0 and define S := {n ∈ N : xn 6= a}. If S is finite we can remove all the elements
f (xn ) corresponding to n ∈ S from the sequence (f (xn )) without changing its convergence
behaviour. If S is infinite, then the sequence (xn )n∈S , i.e. the subsequence of (xn ) which
contains only those elements xn which are not equal to a, is a sequence in U \ {a} and thus,
by assumption, (f (xn ))n∈S converges to f (a). Hence there exists an N ∈ N such that, for
all n ∈ S, with n ≥ N , we have kf (xn ) − f (a)k < ε. If n ∈ N \ S, then xn = a and thus
kf (xn ) − f (a)k = kf (a) − f (a)k = 0 < ε. Hence, for all n ≥ N , kf (xn ) − f (a)k < ε. Therefore
(f (xn )) converges to f (a) and thus the fourth statement implies the third.
Note: Contrary to our requirements in Definition 10.6, in Definition 10.8 we do require a ∈ U ,
because we need f (a) to be defined.

67
Corollary 10.9. Let U ⊂ Rd , f : U → Rq , and assume a ∈ U . Write f in terms of its
coordinates as
f = (f1 , . . . , fq ).
Then f is continuous at a if and only if, for all j ∈ {1, . . . , q}, fj is continuous at a.

Proof. By Lemma 10.8 we know that f is continuous if and only if, for every sequence (xn )
in U which converges to a, we have that (f (xn )) converges to f (a) in Rq . This is equivalent,
by Lemma 5.6, to, for all j ∈ {1, . . . , q}, (fj (xn )) converges to fj (a). That in turn, again
Lemma 10.8, is equivalent to, for all j ∈ {1, . . . , q}, fj is continuous at a.
It is useful to know how continuity properties of functions change (or not) if the domain or
codomain changes.

Lemma 10.10. Let U ⊂ Rd and assume a ∈ U . Let f : Rd → Rq be a function and define


f̃ : U → Rq to be the restriction of f to U , given by, for all x ∈ U , f̃ (x) := f (x).

1. If f is continuous at a, then f̃ is continuous at a.

2. If a ∈ int U and f̃ is continuous at a, then f is continuous at a.

Let V ⊂ Rq and let g : U → Rq be a function such that g(U ) ⊂ V . Define g̃ : U → V be,


for all x ∈ U , g̃(x) := g(x).

3. The function g is continuous at a ∈ U if and only if the function g̃ is continuous at


a ∈ U.

Proof. To prove statement 1, assume f is continuous at a. Let ε > 0, then there exists δ > 0
such that, for all x ∈ Rd with kx − ak < δ we have kf (x) − f (a)k < ε. Hence, if x ∈ U ⊂ Rd
and kx − ak < δ, then
kf̃ (x) − f̃ (a)k = kf (x) − f (a)k < ε.
Thus f̃ is continuous at a.
To prove statement 2, assume a ∈ int U and f̃ is continuous at a. Let ε > 0. Then there
exists δ̃ > 0 such that, for all x ∈ U with kx − ak < δ̃ we have kf̃ (x) − f̃ (a)k < ε. Since
a ∈ int U , there exists an r > 0 such that B(a, r) ⊂ U . Now define δ := min(δ̃, r). Let x ∈ Rd
be such that kx − ak < δ, then x ∈ B(a, r) ⊂ U and kx − ak < δ̃, hence

kf (x) − f (a)k = kf̃ (x) − f̃ (a)k < ε.

Hence f is continuous at a.
To prove statement 3, we note that g is continuous at a ∈ U if and only if, for all ε > 0 there
exists a δ > 0 such that for all x ∈ U with kx − ak < δ we have kg(x) − g(a)k < ε. Since
g(a) = g̃(a) and, for all x ∈ U , g(x) = g̃(x), we have that this condition is equivalent to, for
all ε > 0 there exists δ > 0 such that for all x ∈ U with kx−ak < δ we have kg̃(x)−g̃(a)k < ε.
This, per definition, is equivalent to g̃ being continuous at a.
Note: Lemma 10.10 tells us that if we restrict the domain (statement 1) or the codomain26
(statement 3) of a continuous function, the resulting function is still continuous. That is why
26
as long as we are careful to do this in a consistent way, meaning that the new codomain V still needs to be
large enough to contain the function’s range g(U )

68
we often say things like “polynomials are continuous”, or “sin and cos are continuous” (see also
Lemma A.13), without specifying their domain or codomain.
Note that the lemma very specifically requires a to be in the interior of U in statement 2.
This statement is in fact not true in general if a ∈ ∂U ∩ U . In the first example we consider
below, if we restrict the domain of the function f in (11) to the set U := {x ∈ R2 : x21 + x22 ≥ 1},
then the restriction f˜ is constant (and thus continuous, as we will see in Lemma A.13) on
its domain U . So we can immediately conclude that the original function f is continuous on
int U = {x ∈ R2 : x21 + x22 > 1}. But we cannot conclude from this that f is continuous on
∂U = {x ∈ R2 : x21 + x22 = 1}. We need to do extra work to prove that (which we will do later in
the example) and in general it need not be true. For example, consider the function g : R2 → R
given by (
0, if 0 ≤ x21 + x22 < 1,
g(x1 , x2 ) :=
1, if x21 + x22 ≥ 1.
By the same argument as above, g is continous on int U , but is it continuous at any element a
in ∂U ?
Before we look at this example and others, it is useful to establish some general rules which
allow us to quickly identify, for certain functions, if they are continuous or not. First we
need to have a look at Definition A.11 and Lemma A.12 to remember how we can combine
functions into new functions (for example, through addition, multiplication, and division) and
to recall the corresponding limit rules (which you have probably seen for functions whose domain
and codomain are subsets of R; the results for functions with higher dimensional domains or
codomains are straightforward generalisations).
Lemma A.12 immediately gives us the tools to prove various statements about the continuity
of functions which are combinations of other continuous functions.

Lemma 10.11. Let U ⊂ Rd , f : U → Rq , g : U → Rq , c ∈ R, and assume a ∈ U . Assume


f and g are continuous at a. Then

1. f + g is continuous at a,

2. cf is continuous at a.

Now let f : U → Rq and g : U → R. Assume f and g are continuous at a. Then

3. f g = gf is continuous at a,

4. if, for all x ∈ U , g(x) 6= 0, then f /g is continuous at a.

Finally, if V ⊂ Rd , W ⊂ Rp , f : V → Rp , g : W → Rq , f (V ) ⊂ W , f is continuous at
a ∈ U , and g is continuous at f (a) ∈ W . Then

5. g ◦ f is continuous at a.

Proof. These statements can be proved by applying the results from Lemma A.12. We leave
the details as an exercise to the reader.
Note: In both point 5 of Lemma A.12 and point 5 of Lemma 10.11, the requirement that, for
all x ∈ U , g(x) 6= 0, is necessary if we want to have f /g be well-defined on all of U . For the
sake of continuity at a or the existence of the limit value at a, however, it is enough for g to be
nonzero on an open set which contains a.

69
Now we know some ways in which we can combine continuous functions to form new contin-
uous functions, so it is good to establish (or remember) the continuity of some basic functions.
Since these are examples of continuous functions you are probably already familiar with, we
have included Lemma A.13 in Appendix A.
Note: When we say that a function is continuous, without specifying at which point in the
domain, or on which subset of the domain, it is implicitly understood that we mean “continuous
on the whole domain of the function”. In Lemma A.13 we have explicitly written this out, but
we will often not do so. Note carefully that (multivariate) rational functions are not defined, let
alone continuous, at elements where their denominator is zero.
Now let us have a look at some examples.
Example: Consider the function f : R2 → R defined by

0,   if (x1 , x2 ) = (0, 0),


1
f (x1 , x2 ) := exp 1 − x2 +x2 , if 0 < x21 + x22 < 1, (11)
 1 2
if x2 + x2 ≥ 1.

1,
1 2

Is this function continuous? We need to decide, for every a ∈ R2 , if the function f is continuous
at a. First of all, we already saw above in the note following Lemma 10.10 that f is continuous
on the open set {x ∈ R2 : x21 +x22 > 1} because it is constant there. On the open set {x ∈ R2 : 0 <
1
x21 +x22 < 1} the function x 7→ x2 +x 2 is a well-defined rational function and thus, by Lemma A.13,
1 2
it is continuous on this set. By Lemma 10.11 (using statements 2, 1, and 5, respectively),  we
1 1 1
then deduce that the functions x 7→ − x2 +x2 , x 7→ 1 − x2 +x2 , and exp 1 − x2 +x2 are all
1 2 1 2 1 2
continuous on {x ∈ R2 : 0 < x21 + x22 < 1}. Since this set is open, statement 2 in Lemma 10.10
tells us that f is continuous on {x ∈ R2 : 0 < x21 + x22 < 1}.
We spent quite a bit of attention here on the continuity of the function on the sets {x ∈ R2 :
0 < x21 + x22 > 1} and {x ∈ R2 : 0 < x21 + x22 < 1} to point out exactly what our line of reasoning
is. In particular, we see that it is very important to be able to quickly determine if a set is open
or not and to identify if a function is made up (through the operations which were explored in
Lemma 10.11) of continuous building blocks or not. In future examples we will spend less time
on such observations, but it is important that you realise what underlies them!
Now we still need to determine if f is continuous at a if a = (0, 0) or a ∈ {x ∈ R2 : 0 <
x21 + x22 = 1}. We cannot do this by using the strategy we used above on other parts of the
domain, because int {(0, 0)} = int {x ∈ R2 : 0 < x21 + x22 = 1} = ∅, hence we need to go
back to the definition of continuity (either the original ε–δ definition or the characterisation
in terms of sequences which we proved to be equivalent (Lemma 10.8). To give an example
of both formulations in action, we will prove continuity at (0, 0) using the ε–δ definition of
continuity and we will prove continuity at a ∈ {x ∈ R2 : 0 < x21 + x22 = 1} using the sequence
characterisation.
Let ε ∈ (0, e). Note that if we can prove that (10) holds q (with a = (0, 0)) for ε ∈ (0, e),
1
then it is of course also satisfied for any ε ≥ e. Define δ̃ := 1−log ε . This is well-defined, since
 
log ε < 1. Moreover, we observe that exp 1 − δ̃12 = ε. Define δ := min(δ̃, 1). Now let x ∈ R2
p
be such that kx − (0, 0)k = kxk = x21 + x22 < δ, then 0 ≤ x21 + x22 < δ 2 ≤ 1. If x = (0, 0), then
|f (x) − f ((0, 0))| = 0. Otherwise, we have x21 + x22 < δ̃ 2 and thus
   
1 1
|f (x) − f ((0, 0))| = f (x) = exp 1 − 2 < exp 1 − = ε.
x1 + x22 δ̃ 2

70
Hence f is continuous at (0, 0).
Finally, let a ∈ {x ∈ R2 : 0 < x21 + x22 = 1}. We will prove continuity at a using the sequence
characterisation of continuity. Let (xn ) be a sequence in R2 which converges to a. Because
x 7→ x21 + x22 is a multivariate polynomial, we know by Lemma A.13 that it is continuous.
Hence, limn→∞ (x2n,1 + x2n,2 ) = a21 + a22 = 1. Then there exists an N1 ∈ N such that, for all
n ≥ N1 , |x2n,1 + x2n,2 − 1| < 1. Hence, for all n ≥ N1 we know that xn 6= (0, 0). Now define
S := {n ∈ N : n ≥ N1 and 0 < x2n,1 + x2n,2 < 1}. Then, if n ≥ N1 , but n is not in S, we have
x2n,1 + x2n,2 ≥ 1 and thus f (x) = 1 and |f (x) − f (a)| = |1 − 1| = 0. Hence, if S is finite, we can
remove the (finitely many) elements f (xn ) corresponding to all n ∈ S and the(finitely many)
elements f (xn ) corresponding to all n < N1 from the sequence (f (xn )) without changing its
convergence behaviour and we are left with a constant sequence whose elements (f (xn )) are all
equal to 1. Hence this sequence converges to 1 = f (a). If S is infinite, we note that, if n ∈ S,
 
1 1
then f (xn ) = exp 1 − x2 +x 2 and we know by our proof above that x →
7 exp 1 − x2 +x2
n,1 n,2 1 2

is continuous. Since (xn )n∈S is a subequence of (xn ) it converges to a and thus (f (xn ))n∈S
converges to f (a). Let ε > 0, then there exists N2 ∈ N such that for all n ∈ S with n ≥ N2
we have |f (x) − f (a)| < ε. Since we already know that, for all n ≥ N1 which are not in S, we
have |f (x) − f (a)| = 0 < ε, if we now define N := min(N1 , N2 ), then we conclude that, for all
n ≥ N we have |f (x) − f (a)| < ε. Hence also in this case (f (xn )) converges to f (a) = 1. Thus
f is continuous at a.
Note: In the example above we give a very detailed description of the reasoning we used to
prove that f is continuous. In principle, when asked to prove continuity of a function you will
need to give these (or equivalent) arguments and especially when you are still getting familiar
with these concepts and with working with these kind of ε–δ and sequence arguments, it is
very helpful to write out all details for yourself. In the examples we will see below, however,
we will formulate these arguments more compactly. The details are still ther underlying the
arguments, but we might not write them out completely because we expect you to be able to
understand (and reproduce!) the details without these being spelled out explicitly. Writing is
always done for a particular audience and the level of the audience will determine to a large
degree which details are emphasised and which are glossed over quickly (but not ignored or
forgotten!). Just like I might write 8 + 5 = 13 without feeling the need to go into detail how
I arrived at that conclusion, the examples below might leave out some of the details that were
explicitly written out above (such as the manipulations that can be done by removing finitely
many elements from a sequence). You are strongly encouraged to be honest with yourself and
spend time trying to reproduce the necessary details yourself. Do not gloss over them, because
if you do not understand them in these examples, you will also not know when to apply them
yourself. Exams are very special beasts. Not only are you writing for a rather strange audience
(your goal on an exam is to explain a piece of mathematics to someone —the marker— who
already understands it), but you are also doing so under a time constraint. Therefore you will
have to use similar kinds of ‘compression techniques’ in your writing as I mentioned above. It
is difficult to give hard and fast rules that tell you exactly which details deserve close attention
and which ones you can combine in one or two higher level sentences. All I can say about that is
that, the better you understand the material, the better you will be able to judge which details
can be left out. That is not an easy skill to master, but it is an important one. And as with so
many others, it is one that can only be gained by practice.

71
Example: Consider the function f : R2 → R given by
( x4 x2
8
1 2
4, if (x1 , x2 ) 6= (0, 0),
f (x1 , x2 ) := x1 +x2
0, if (x1 , x2 ) = (0, 0).

If (xn ) is a sequence of elements (xn,1 , xn,2 ) in R2 such that, for all n ∈ N, xn,2 = 0, and if the
first coordinate sequence (xn,1 ) converges to 0, then we get
0
f (xn,1 , 0) = =0→0 as n → ∞.
x8n,1

If instead, for all n ∈ N , xn,1 = 0 and (xn,2 ) is a sequence which converges to 0, we get
0
f (0, xn,2 ) = =0→0 as n → ∞.
x4n,2

So if we fix one variable (x1 or x2 ) and regard f as a function of the other variable then f is
continuous. But is f really continuous as a function of both variables?
If (xn,2 ) is a sequence which converges to 0 and, for all n ∈ N, xn,1 = mxn,2 for some
m ∈ R \ {0}, then the sequence (xn ) converges to (0, 0) and

m4 x6n,2 m4 x2n,2
f (xn,1 , xn,2 ) = = →0 as n → ∞.
m8 x8n,2 + x4n,2 m8 x4n,2 + 1

So as (xn,1 , xn,2 ) → (0, 0) along any straight line we have f (xn,1 , xn,2 ) → 0. If we let (xn,1 , xn,2 ) →
(0, 0) along the curve xn,2 = x2n,1 , however, we get

x4n,1 x4n,1 1 1
f (xn,1 , xn,2 ) = f (xn,1 , x2n,2 ) = = → 6= f (0, 0).
x8n,1 + x8n,1 2 2

This means that the sequence (f (xn,1 , xn,2 )) does not have the same limit for every sequence
(xn ) which converges to (0, 0). Hence f is not continuous at (0, 0).
This example teaches us that in deciding whether a function f = (f1 , . . . , fq ) is continuous
it is sufficient to look at each coordinate fj of the image separately, but it is not sufficient to
consider f with respect to each coordinate variable xk separately. Nor is it enough to look at
f as x approaches a point along straight lines. In general it can be quite tricky to determine
whether a function of several variables is continuous, but the next example illustrates how this
can be done in some cases.

Example: Consider the function g : R2 → R given by


( 4 3
u v
8 4, if (u, v) 6= (0, 0),
g(u, v) := u +v
0, if (u, v) = (0, 0).

(Note that the numerator is different than the one of f from the previous example.)
Note first that x81 + x42 ≥ x81 and x81 + x42 ≥ x42 . If (x1 , x2 ) is such that |x2 | ≤ |x1 |2 , then

|x41 x32 | ≤ |x1 |4 |x1 |6 = |x1 |10 ,

and thus
|x1 |10
|g(x1 , x2 )| ≤ = |x1 |2 .
|x1 |8

72
If (x1 , x2 ) is such that |x2 | > |x1 |2 , then |x1 | < |x2 |1/2 , hence

|x41 x32 | < |x2 |2 |x2 |3 = |x2 |5 ,

and therefore
|x2 |5
|g(x1 , x2 )| ≤ = |x2 |.
|x2 |4
Hence, for any (x1 , x2 ) ∈ R2 ,

|g(x1 , x2 )| ≤ max(|x1 |2 , |x2 |). (12)

If (xn ) = ((xn,1 , xn,2 ) is a sequence in R2 which converges to (0, 0) (in any fashion), then both
coordinate sequences (xn,1 ) and (xn,2 ) converge to 0 and thus g(un , vn ) → 0 = g(0, 0) and so,
by (12), g is continuous at (0, 0).
If you try this method on the previous example, you will find it is inconclusive.
We can also prove that g is continuous at (0, 0) using the ε–δ definition. p Let ε > 0 and

define δ := min( ε, ε). Let x = (x1 , x2 ) ∈ R2 be such that kx − (0, 0)k = x21 + x22 < δ. If
|x2 | ≤ |x1 |2 , then, by the same logic as before

|g(x1 , x2 )| ≤ |x1 |2 ≤ x21 + x22 < δ 2 ≤ ε.

If, on the other hand, |x2 | > |x1 |2 , then, by the same logic as before
q
|g(x1 , x2 )| ≤ |x2 | ≤ x21 + x22 < δ ≤ ε

Thus, for every ε > 0, there exists a δ > 0, such that for all x ∈ R2 with kx − (0, 0)k < δ we
have kx − g(0, 0)k < ε. Thus g is continuous at (0, 0).
If you understand both methods above well, you will have noticed that the real mathematical
core of the argument —the part which is not just copy-pasting the standard structure of a proof
like this— consists offinding the right two (complementary) cases to consider and deriving an
estimate on |g(x1 , x2 )| in either case. This is, if you will, the new bit, the crucial bit of insight
you need for this particular example. You see it is present in both versions of the proof. Whether
we dress up the rest of the proof in sequence language (as in the first version of the proof) or
ε–δ language (the second version) does not change the fact that we used these crucial estimates
to complete the proof. Knowing how to structure your proof is very important: it will make
your proof readable for others and will help you decide which parts require some mathematical
work to fill out — which parts require a mathematical argument to make the logic work? These
‘holes’ that need filling out are often filled by estimates (inequalities27 ) of the kind we saw
above. Inequalities are the life blood of mathematical analysis! They are at the heart of many
a proof.
p
Example: Let E := {x ∈ Rd : kxk ≤ 1}. The function h : E → R, given by h(x) := 1 − kxk
is continuous on its domain. We can very quickly see this with the results we have in our toolbox
now. First of all, looking back at Lemma 5.13 we realise now that what we showed in that lemma
is that the norm function x → kxk is continuous on all of Rd and so (by Lemma 10.10) also on
E. We know that the constant function x 7→ 1 is continuous on E and thus, by Lemma 10.11,
so is x 7→ 1 − kxk. From Lemma A.13 we know that the square root function is continuous on
[0, ∞). Since 1 − kxk ≥ 0 on E, a final application of Lemma 10.11 lets us conclude that h is
continuous on E.

27
Inequalities in this context are often called “estimates” in the mathematical analysis world.

73
Theorem 10.12. Let E ⊂ Rd be closed and bounded, and let f : E → Rq be a continuous
function. Then the image f (E) is a closed and bounded subset of Rq .

Proof. Let (yn ) be a sequence in f (E), then (by definition of the image f (E)), for each
n ∈ N there exists an xn ∈ E such that yn = f (xn ). Since E is a closed and bounded subset
of Rd , by the Heine–Borel theorem (Theorem 9.5) the sequence (xn ) in E has a convergent
subsequence (xnk ) whose limit, say a, is in E. Because f is continuous, we have that (f (xnk ))
converges to f (a). Because a ∈ E, we have f (a) ∈ f (E). Setting, for all k ∈ N, ynk := f (xnk ))
we have that (ynk ) is a convergent subsequence of (yn ) with limit f (a) ∈ f (E). Thus, using
the Heine–Borel theorem again, we conclude that f (E) is closed and bounded.
Remember from G11ACF/MATH1005 (see also Definition A.7) what a (global) maximum
and (global) minimum of a function are. Choosing q = 1 in Theorem 10.12, will alllow us to
prove a very important theorem for real-valued functions.

Theorem 10.13 (Maximum and minimum theorem for continuous real-valued functions).
Let E ⊂ Rd be non-empty, closed, and bounded. Let f : E → R be a continuous function.
Then f has a maximum and a minimum on E.

Proof. For notational simplicitly, define A := f (E). Applying Theorem 10.12 in the case
q = 1, we find that A is closed and bounded. Moreover, since E 6= ∅, we have A 6= ∅. The
least upper bound property of R (see Section 1.3) tells us that every non-empty subset of R
which is bounded above has a least upper bound, hence s := sup A exists. We want to prove
that there exists an x1 ∈ E such that f (x1 ) = s. For a proof by contradiction, assume that
for all x ∈ E, f (x) < s. We define the function g : E → R, by
1
g(x) := .
s − f (x)

Note that by our assumption, the denominator is nowhere zero on E and thus g is well-
defined. Moreover, because f is continuous on E, so is g (using the usual rules for combining
continuous functions to form new continuous functions, which should be very familiar to you
by now) and thus, by Theorem 10.12, g(E) is bounded. By definition of the least upper
bound, for every ε > 0 there exists an x ∈ E such that f (x) ≥ M − ε. In particular, if we let,
for all n ∈ N, εn := n1 , then, for all n ∈ N there exists xn ∈ E such that f (xn ) ≥ M − n1 and
thus g(xn ) ≥ n. This contradicts the boundedness of g(E) and thus there exists an x1 ∈ E
such that f (x1 ) = s. Hence, for all x ∈ E, f (x) ≤ f (x1 ).
Finally, define f˜ := −f . Then, f˜ is a continous function with non-empty, closed, and bounded
domain E. By what we have just proven, there exists an x0 ∈ E such that, for all x ∈ E,
f˜(x) ≤ f˜(x0 ). Hence, for all x ∈ E, f (x0 ) ≤ f (x).
Note: The maximum value f (x0 ) ∈ R in the theorem above is of course unique (there cannot be
two different values which are both the maximum), but the element x0 ∈ E at which this value
is achieved does not need to be unique. An example which shows this very clearly is a constant
function defined on a non-empty, closed, and bounded domain. Such a function achieves its
maximum at every element in its domain. Similary comments apply to the mininum as well.
The particular case where E = [a, b] is a nondegenerate, closed, and bounded interval in R
is one of the most important theorems in calculus.
The property of a function to have bounded range is important enough to give it its own
name.

74
Definition 10.14. Let E ⊂ Rd and let f : E → Rq . The function f is bounded if its
range f (E) is a bounded subset of Rq .

So far we have seen characterisations of continuity in ε–δ language and in sequence lan-
guage. We end this section with a third, equivalent, formulation in terms of pre-images. From
Definition A.3 we remember that, if V ⊂ Rq and we consider a function f : Rd → Rq , then the
pre-image of V under f is
f −1 (V ) = {x ∈ Rd : f (x) ∈ V }.
So the pre-image of V under f is the set of all elements in the domain which get mapped into
V by f . Note that the pre-image is a subset of the domain.28

Theorem 10.15. Consider a function f : Rd → Rq . Then the following statements are


equivalent.

1. The function f is continuous.

2. For every open subset V of Rq , the pre-image f −1 (V ) is an open subset of Rd .

Proof. Assume f is continuous on Rd . Let V ⊂ Rq be an open subset and let x0 ∈ f −1 (V ).


Then y0 := f (x0 ) is an element in V . Since V is open, there exists an r > 0 such that
B(y0 , r) ⊂ V . Moreover, since f is continuous on Rd and thus in particular at x0 , there exists
a δ > 0 such that, for all x ∈ Rd with kx − x0 k < δ, we have

kf (x) − y0 k = kf (x) − f (x0 )k < r.

Let x ∈ B(x0 , δ). Then kx − x0 k < δ and thus f (x) ∈ B(y0 , r) ⊂ V . Hence x ∈ f −1 (V )
and we find that B(x0 , δ) ⊂ f −1 (V ). Therefore f −1 (V ) is open and we have proven that
statement 1 implies statement 2.
To prove that statement 2 implies statement 1, assume that for every open subset V of Rq ,
the pre-image f −1 (V ) is an open subset of Rd . Let x1 ∈ Rd and let ε > 0. Define y1 := f (x1 )
and V := B(y1 , ε). We know that open balls are open, so V ⊂ Rq is open and thus, by
assumption, so is f −1 (V ). Because x1 is an element of the open set f −1 (V ), there exists
an s > 0 such that B(x1 , s) ⊂ f −1 (V ). This means that, for all x ∈ B(x1 , s), we have
x ∈ f −1 (V ) and thus f (x) ∈ V = B(f (x1 ), ε). Therefore, for all x ∈ Rd with kx − x1 k < s,
we have kf (x) − f (x1 )k < ε. Hence f is continuous.
Note: In Theorem 10.15 we only considered functions from Rd to Rq . There is a version of
the theorem for functions f : U → W , where U ⊂ Rd and W ⊂ Rq , but it is not quite so easy
to state. You can learn more about that, if you take G13MTS/MATH3003 next year. If you
do, then you will also see that the characterisation of continuity in terms of pre-images is very
useful indeed when considering continuous functions in the abstract setting of topological spaces.

28
Warning! As noted before, the notation can be a bit confusing. You might be used to the notation f −1
being used for the inverse of the function f . In the definition of pre-image, however, we are not assuming that f
has an inverse function. The set f −1 (V ) is perfectly well-defined even if f does not have an inverse function. The
reason why people use the same notation for both concepts, is because if the function f does have an inverse
function, say g, then the image of V under g is equal to the pre-image of V under f : g(V ) = f −1 (V ).

75
11 Convergence of sequences and series of functions
We have seen various types of limits so far. We have talked about the limit of sequences of
vectors in Rd (or numbers in R) —for example n1 → 0 as n → ∞— and we have talked about the
limit of function values —such as limx→3 x2 = 9. In this section we will consider convergence
of sequences of functions. The elements in our sequence (and the potential limit) are no longer
numbers or vectors in Rd , but functions.
We saw in some of our motivating examples in Section 2 that taking sequences of functions
may lead to unexpected consequences. This chapter will introduce a strong form of convergence
called uniform convergence which in certain contexts helps avoid such unexpected results.

Definition 11.1. Let U ⊂ Rd . A sequence of functions (from U to Rq ) (fn ) is a


non-terminating ordered list of functions fn : U → Rq :

fp , fp+1 , fp+2 , . . .

Note: As we are used to by now, we usually start labelling the elements in a sequence from
p = 1, but this is not necessarily the case. Unless we explicitly state otherwise, we will assume
that p = 1.
Note that can easily adapt the definition above if we want to specify a codomain for the
functions fn which is a subset of Rq . This will not be relevant for our purposes in these notes.

Definition 11.2 (Pointwise convergence of functions). Let U ⊂ Rd , for all n ∈ N let


fn : U → Rq , and let f : U → Rq . We say that the sequence of functions (fn ) converges
pointwise to f (on U ), if, for all x ∈ U ,

f (x) = lim fn (x).


n→∞

Note: If we leave out the specification “on U ”, it is implied (unless stated otherwise) that the
pointwise convergence holds at each element of the domain.
We can understand the name “pointwise convergence”: at each element or point x in the
domain U we consider the sequence of function values (fn (x)). Pointwise convergence of (fn )
to f means that at each such point x the sequence (fn (x)) converges (in Rq ) to a limit value
y ∈ Rq , where y is equal to the value of f at x, i.e. y = f (x).
Example: Consider the sequence of functions (fn ) with, for all n ∈ N, fn : [0, 1] → Rq defined
by fn (x) := xn . If x = 1, then, for all n ∈ N, fn (x) = fn (1) = 1n = 1. Hence in this case, the
sequence (fn (x)) in R is a constant sequence in which each element is 1. Thus this sequences
converges to 1. If x ∈ [0, 1), then xn → 0 as n → ∞ (prove this!) and thus the sequence (fn (x))
converges to 0. This means that the function f : [0, 1] → R defined by
(
0, if 0 ≤ x < 1,
f (x) := (13)
1, if x = 1,

is the pointwise limit of the sequence (fn ). In other words, (fn ) converges pointwise to f .
We observe an interesting phenomenon here. Even though, for all n ∈ N, fn is a continuous
function, the pointwise limit f is not continuous on all of [0, 1] (it is discontinuous at 0). (We saw
something similar in Section 2.) Pointwise convergence is thus not a strong enough requirement

76
to guarantee preservation of continuity in the limit29 . In order to preserve such ‘nice’ properties
in the limit, we need a stronger form of convergence, called uniform convergence, which we will
introduce below.
Before we do that, however, let us have another look at the example above. We defined f
by (13) and proved that it is the pointwise limit of (fn ). Alternatively, we could have defined
f as the pointwise limit of (fn ),
f (x) := lim fn (x), (14)
n→∞
and subsequently have computed f to take the values given in (13). Warning! This second
way of defining f comes with the possibility of error! The definition of f in (14) only makes
sense if the pointwise limit exists! In this example we know it exists, because we have already
done the required computations above, but if you do not know this yet, be careful. If you define
functions in the way of (14), then that is only meaningful if you can prove that the pointwise
limit actually exists. In fact, there is a second subtlely here that we need to deal with. Even if
a pointwise limit of (fn ) exists, the expression in (14) only defines a unique f if the pointwise
limit is unique. The following lemma shows this is indeed the case.

Lemma 11.3. Let U ⊂ Rd , for all n ∈ N let fn : U → Rq , and let f : U → Rq and


g : U → Rq . If (fn ) converges pointwise to f and to g, then f = g.

Proof. Let x ∈ U , Then, by definition of pointwise convergence, the sequence (fn (x)) con-
verges to f (x) and to g(x) in Rq . By Lemma 5.4 we have that f (x) = g(x). Since x was
chosen arbitrarily from the domain U , f = g.

Definition 11.4. Let U ⊂ Rd , for all n ∈ N let fn : U → Rq , and let f : U → Rq . Then


we say that (fn ) converges uniformly to f (on U ), if the sequence (sn ) converges to zero
in R, where, for all n ∈ N

sn := sup{kfn (x) − f (x)k : x ∈ U }.

Note: If we leave out the specification “on U ”, it is implied (unless stated otherwise) that the
uniform convergence holds on the whole domain.
The following lemma helps to shed some light on the differences between pointwise conver-
gence and uniform convergence.

Lemma 11.5. Let U ⊂ Rd , for all n ∈ N let fn : U → Rq , and let f : U → Rq . The


following statements are equivalent.

1. (fn ) converges pointwise to f .

2. For all ε > 0 and for all x ∈ U there exists an N ∈ N such that, for all n ≥ N ,
kfn (x) − f (x)k < ε.

The following statements are also equivalent.

29
That does not mean that a pointwise limit function cannot be continuous, of course. A simple example is
provided by a constant sequence (fn ) in which each funtion fn : R → R is the zero function, i.e. for all n ∈ N
and for all x ∈ R, fn (x) = 0. Then each function fn is continuous and the limit function f : R → R is also the
zero function (prove this!) and thus also continuous.

77
3. (fn ) converges uniformly to f .

4. For all ε > 0 there exists an N ∈ N such that, for all n ≥ N and for all x ∈ U ,
kfn (x) − f (x)k < ε.

Proof. By definition statement 1 is equivalent to, for all x ∈ U , (fn (x)) converges to f (x).
Writing out the definition of convergence in R we see that this is equivalent to statement 2.
To prove that statement 3 implies statement 4, assume that the sequence (sn ) converges to
zero, where, for all n ∈ N, sn := sup{kfn (x) − f (x)k : x ∈ U }. Then for all ε > 0 there exists
N ∈ N such that, for all n ≥ N , |sn | = sn < ε. By definition of sn this means that, for all
ε > 0 there exists N ∈ N such that, for all n ≥ N and for all x ∈ U , kfn (x) − f (x)k < ε.
To prove that statement 4 implies statement 3, assume that, for all ε > 0 there exists an
N ∈ N such that, for all n ≥ N and for all x ∈ U , kfn (x) − f (x)k < ε. Let ε > 0
and let N correspond to ε as above. We claim that, for all n ≥ N , sn ≤ ε. We prove
this claim by contradiction. Assume that there exists an n ≥ N such that sn > ε. Define
η := sn − ε > 0 Then, by definition of the least upper bound, there exists x ∈ U such that
kfn (x) − f (x)k > sn − η = ε. This is a contradiction, hence the claim is proven and thus, for
all ε > 0 there exists N ∈ N such that, for all n ≥ N , |sn | = sn ≤ ε. We conclude that (sn )
converges to zero and thus (fn ) converges uniformly to f a .
a
If you are confused about the use of ≤ in |sn | ≤ ε instead of <, remember the note following Lemma 5.3.

Corollary 11.6. Let U ⊂ Rd , for all n ∈ N let fn : U → Rq , and let f : U → Rq . If the


sequence of functions (fn ) converges uniformly to f , then it converges pointwise to f .

Proof. This follows immediately from Lemma 11.5 as statement 4 in that lemma implies
statement 2 from the same lemma.

Lemma 11.7. Let U ⊂ Rd , for all n ∈ N let fn : U → Rq , and let f : U → Rq and


g : U → Rq . If (fn ) converges uniformly to f and to g, then f = g.

Proof. Since (fn ) converges uniformly to f and to g, by Corollary 11.6 (fn ) also converges
pointwise to f and to g. By Lemma 11.3 we conclude that f = g.
Note: The converse of the statement in Corollary 11.6 is not true. For example, the sequence
(fn ) of functions fn (x) := xn from our example above, did converge pointwise (to the function f
we computed), but it does not converge uniformly. We can see this as follows. If (fn ) converges
uniformly, then, by Lemma 11.6 the uniform limit has to be f . However,

sn := sup{|fn (x) − f (x)| : x ∈ [0, 1]} = sup{xn : x ∈ [0, 1]} = 1.

So the sequence (sn ) is a constant sequence with each element equal to 1 and thus it does
not converge to 0. Hence (fn ) does not converge uniformly to f and so it does not converge
uniformly at all.
The next theorem shows that the uniform limit of continuous functions is continuous.

Theorem 11.8. Let U ⊂ Rd , for all n ∈ N , let fn : U → Rq be continuous, and let


f : U → Rq . If the sequence of functions (fn ) converges uniformly to f , then f is continuous.

78
Proof. Let a ∈ U and let (xm ) be a sequence in U which converges to a. We want to
prove that (f (xm )) converges to f (a). Let ε > 0. Then, since (fn ) converges uniformly to f ,
statement 4 in Lemma 11.5 tells us that there exists an N ∈ N such that, for all n ≥ N and
for all x ∈ U , kfn (x) − f (x)k < 3ε . In particular, for all x ∈ U ,
ε
kfN (x) − f (x)k < .
3
Since, by assumption, fN is continuous at a, there exists an M ∈ N such that, for all m ≥ M ,
ε
kfN (xm ) − fN (a)k < .
3
Now let m ≥ M , then by the triangle inequality we have

kf (xm ) − f (a)k = kf (xm ) − fN (xm ) + fN (xm ) − fN (a) + fN (a) − f (a)k


≤ kf (xm ) − fN (xm )k + kfN (xm ) − fN (a)k + kfN (a) − f (a)k
< ε/3 + ε/3 + ε/3 = ε.

Hence (f (xm )) converges to f (a).

Corollary 11.9. Let U ⊂ Rd , for all n ∈ N , let fn : U → Rq be continuous, and let


f : U → Rq . Assume that the sequence of functions (fn ) converges uniformly to f . Then,
for all a ∈ U ,

lim lim fn (x) = lim f (x) = f (a) = lim fn (a) = lim lim fn (x).
x→a n→∞ x→a n→∞ n→∞ x→a

Proof. Because (fn ) converges uniformly to f , it also converges pointwise to f (Corollary 11.6)
and thus, for all x ∈ U , limn→∞ fn (x) = f (x). By Theorem 11.8 we know that f is continuous,
thus limx→a f (x) = f (a). Using pointwise convergence again, we get f (a) = limn→∞ fn (a).
Finally, since, for all n ∈ N, fn is continuous, we have fn (a) = limx→a fn (x).
Note: Corollary 11.9 tells us that, if a sequence of functions converges uniformly, we can
interchange the limiting operations limn→∞ and limx→a . In Section 2 we saw examples where
we could not interchange these operations without changing the outcome. Go back and check
that in those examples the function sequences do not converge uniformly!
Next we want to consider series of functions. In order to do so, we first need to remind
ourselves of the definition and the properties of series of real numbers which we have seen in
G11ACF/MATH1005. The definition is given in Definition A.15 and some important properties
are proven in the rest of Appendix A.4. After we have refamiliarised ourselves with series of
real numbers, we can turn our attention to series of functions.

Definition 11.10. Let U ⊂ Rd , for all k ∈ N, let fk : U → Rq , and let f : U → Rq . For all
n ∈ N, define the partial sum function Fn : U → Rq , by
n
X
Fn := fk .
k=1
P∞
We say that the series (or function series) k=1 fk converges pointwise to f (on U ),

79
if, for all x ∈ U , the sequence
P (Sn (x)) converges to f (x).
We say that the series ∞ k=1 fk converges uniformly to f (on U ), if the sequence (Sn )
converges uniformly to f (on U ).
Note: If we leave out the specification “on U ”, it is implied (unless stated otherwise) that the
pointwise or uniform convergence holds at each element of the domain.

Theorem 11.11 (The Weierstraß M -test). Let U ⊂ Rd , for all k ∈ N, let fk : U → Rq ,


and let f : U → Rq . Assume that, for all k ∈ N, there exists an Mk > 0 such that, for all
x ∈ U,
kfk (x)k ≤ Mk .
P∞
Furthermore, assume that the series k=1 MK converges. Then the series

X
F := fk
k=1

converges uniformly on U .
If, additionally, for all k ∈ N, fk is continuous on U , then F is continuous on U .
Proof. We know that if the series F converges uniformly, then its uniform limit must be
equal to its pointwise limit. We will first show that the pointwise limit exists and then that
the series F indeed converges uniformly to this pointwise limit.
Define M := ∞
P
k=1 M k . We know this series converges. Also define the partial sums, for all
n ∈ N,
Xn
Fn := fk .
k=1

Let x ∈ U and consider a subsequence (Fnm (x)) of (Fn (x)). Then by the triangle inequality,
for all m ∈ N,
Xnm nm
X nm
X
kFnm (x)k = fk ≤ kfk k ≤ Mk ≤ M,
k=1 k=1 k=1

where for the last inequality we used Lemma 3.7. Hence the subsequence (Fnm (x)) is bounded
and thus, by the Bolzano–Weierstraß theorem, it has a convergent subsequence (Fnml (x)).
Since the subsequence (Fnm (x)) was chosen arbitrarily, Lemma 5.15 now tells us that (Fn (x))
converges. Hence the series F converges pointwise. Then, using the continuity of the norm
k · k, the sum rule for limits, and the triangle inequality, we find, for all n ∈ N,

kF(x) − Fn (x)k = k lim Fm (x) − Fn (x)k = k lim (Fm (x) − Fn (x))k


m→∞ m→∞
m
X m
X
= lim kFm (x) − Fn (x)k = lim fk (x) ≤ lim kfk (x)k
m→∞ m→∞ m→∞
k=n+1 k=n+1
m
X ∞
X
≤ lim Mk = Mk .
m→∞
k=n+1 k=n+1

Taking the supremum over all x ∈ U and applying Lemmas 3.9 and A.17, we have
∞ ∞ n
!
X X X
lim sup{kF(x) − Fn (x)k : x ∈ U } ≤ lim Mk = lim Mk − Mk = 0,
n→∞ n→∞ n→∞
k=n+1 k=1 k=1

80
where the last equality follows the convergence of ∞
P
k=1 Mk by the sum rule of limits. Hence
F converges uniformly.
Finally, if for all k ∈ N, fk is continuous, then we have, by Lemma 10.11, that, for all n ∈ N ,
Fn is continuous (prove this by induction!). Because (Fn ) converges uniformly to F, we
conclude by Theorem 11.8 that F is continuous.

Example: Let us have another look at the Fourier sine seriesP example we brought up back in
Section 2. For all k ∈ N, let ak ∈ R be such that the series ∞P∞|ak | converges (if all terms in
k=1
a series are nonnegative, as they are here, we sometimes write k=1 |ak | < ∞ we when want to
say that the series converges). Consider the series

X
U (x) := ak sin(kx),
k=1

for x ∈ R. Since
|ak sin(kx)| = |ak || sin(kx)| ≤ |ak |,
we have

X ∞
X
|ak sin(kx)| = |ak |.
k=1 k=1

Corollary A.20 now tells us that the series ∞


P
k=1 |ak sin(kx)| converges. Hence, by the Weierstraß
M -test (Theorem 11.11) we know that the series U converges uniformly.
Moreover, since, for all k ∈ N, x 7→ ak sin(kx) is a continuous function, we know that the
limit function U is continuous. So, in particular, if we want to compute, say, limx→π U (x), we
know it is equal to U (π) (by continuity). What is the value of U (π)? If we define the partial
sums, for all n ∈ N,
Xn
Un (x) := ak sin(kx),
k=1

for x ∈ R, then we know that U is the uniform limit of the sequence (Un ). Hence U is
also
Pn the pointwise limit of (Un ). This means that U (π) = limn→∞ Un (π). Because Un (π) =
k=1 ak sin(kπ) = 0, we have limx→π U (x) = 0. We see that in this case, where we have a
uniformly converging series in which each term is continuous, we can compute the value at π
(or at any other number in the domain) by computing the value of each term at π and taking
the limit. If we do not have a uniformly converging series in which each term is continuous,
we are in general not allowed to do this kind of ‘term by term calculation’. We have seen an
example where this goes wrong back in Section 2. It is useful to note the general result in a
corollary.

Corollary 11.12. Let U ⊂ Rd ,Pa ∈ U , and let (fk ) be a sequence of continuous functions
fk : U → Rq . If the series F := ∞
k=1 fk converges uniformly, then

n
X
lim F(x) = F(a) = lim fk (a).
x→a n→∞
k=1
Pn
Proof. For all n ∈ N, define the partial sums Fn := k=1 fk . Then (Fn ) converges uniformly
to F, hence by Corollary 11.9 we have

lim F(x) = F(a) = lim lim Fn (x).


x→a n→∞ x→a

81
Example: We have seen that uniform convergence preserves continuity in the limit. Unfortu-
nately, even uniform convergence has its limitations. It is in general not true that the derivative
of the unform limit of a sequence of functions equals the limit of the derivatives of the functions,
as the following counterexample shows30 . For all n ∈ N, define the functions fn : R → R by

sin(n2 x)
fn (x) := .
n
The sequence (fn ) converges uniformly on R to the constant function f : R → R, defined by
f (x) := 0. Prove this yourself! (Hint: note that, for all n ∈ N and for all x ∈ R, |fn (x)−0| ≤ n1 ).
The function f is differentiable on R with derivative given by, for all x ∈ R, f 0 (x) = 0. On
the other hand, however, the derivatives of the functions fn are given by, for all n ∈ N and
2 2 x)
for all x ∈ R, fn0 (x) = n cos(n
n = n cos(n2 x). It is not true that the sequence (fn0 ) converges
0
uniformly to f . (Prove this!)

Example: An interesting and parhaps counterintuitive application of the Weierstraß M -test


is Schoenberg’s construction of a space-filling curve (from [2]). This is a curve which goes
through every point in the unit square [0, 1]2 . The construction requires us to represent real
numbers from [0, 1] in binary, which is an optional topic for this module which is discussed
in Appendix B.7. The actual construction of the space-filling curve is given in Section B.8.
The specifics of the construction are optional for this module, but what is not optional is the
realisation that limits (such as in the definition of a (uniformly) convergent series of functions)
can give rise to counterintuitive results. We would never have been able to answer the question
whether it is possible to ‘fill a square with a curve’ if we had not proceeded very carefully and
rigorously in building up all our theoretical ingredients. Within the context of our theoretical
framework as we have constructed it thus far it turns out that it is indeed possible to “fill a
square with a curve”. This is also a warning that your intuition might not always be trustworthy,
especially when dealing with such concepts as ‘infinity’.

30
Strictly speaking, we have not yet rigorously introduced derivates in these notes. We will do so later in
Section 13. For the sake of this example, your knowledge from A Levels and Year 1 should suffice to follow along.

82
12 Functions on the real line
From Definition A.3 we recall the definition of a function f with set A as domain and set B as
codomain. Also have a look at Definition A.6 to remember what it means for a function to be
surjective, injective, or bijective.
In Sections 12, 13, and 14 we (will) study specifically real-valued functions whose domain
is a subset of R. In this section we derive the intermediate value theorem, which you might
remember from G11ACF/MATH1005 and we discuss properties of (strictly) increasing and
(strictly) decreasing functions. In Section 13 we will turn our eye to differentiation, before
concluding (the main part of) these notes with a study of (Riemann) integration in Section 14.
As we discussed near the beginning of Section 10 the concept “function” is very general and if
we want to have any hope of developing some general theory for differentiation and integration,
we need to restrict our attention to functions which have certain regularity properties, such as
continuity (which we discussed in Section 10) or differentiability (which we will see in Section 13
and which you also might remember from G11ACF/MATH1005 and G11CAL/MATH1006).
First we will have a look at the kind of wild behaviour a function can have if we do not impose
additional regularity conditions.
Example: Consider the function f : R → R, defined by
(
1, if x ∈ Q,
f (x) :=
−1, if x ∈ R \ Q.
This is a perfectly good function, but it is nowhere continuous. One way to think about this
is that you cannot draw the graph of this function on any nonempty interval (a, b). We know,
however, that we need to be careful with such informal considerations. Can we prove that, for
all x ∈ R, f is not continuous at x? We provide a proof here showing that f is not continuous
at 0. After that, have a try yourself at proving that in fact f is nowhere continuous (meaning
that, for all x ∈ R, f is not continuous at x).
To prove that √
f is not continuous at 0, let (xn ) be a sequence in R defined by, for all
2
n ∈ N, xn := n , Then xn → 0 as n → ∞, but, for all n ∈ N, f (xn ) = −1 and thus
limn→∞ f (xn ) = −1 6= 1 = f (0). By Lemma 10.8 we then conclude that f is not continuous at
0.
From Appendix A.3 and Section 10 we remember the definitions of one-side limits of a
function f : R → R in their ε-δ formulations, but also the equivalent formulations in terms of
sequences. For example, we say limx→a− f (x) = L if, for every sequence (xn ) which converges
to a with, for all n, xn < a (i.e., xn converges to a from the left), we have f (xn ) → f (a) as
n → ∞. Similarly limx→a+ f (x) = L if, for every sequence (xn ) which converges to a with, for
all n, xn > a (i.e., xn converges to a from the right), we have f (xn ) → f (a) as n → ∞. As long
as we are careful that the domain is such that the point a can be approached from the left or
right, we can give similar definitions for functions whose domain is not R, but a (strict) subset
of R.
Now we look at a class of functions for which the one-sided limits at all points in the domain
exist.

Definition 12.1. Let f be a real-valued function whose domain contains the interval I ⊂ R.
We say that (on I) the function f is:

• strictly increasing if, for all x, y ∈ I with x < y it holds that f (x) < f (y);

83
• non-decreasing if, for all x, y ∈ I with x < y it holds that f (x) ≤ f (y);

• non-increasing if, for all x, y ∈ I with x < y it holds that f (x) ≥ f (y);

• strictly decreasing if, for all x, y ∈ I with x < y it holds that f (x) > f (y).

If at least one of the above holds for f , we say that f is monotone (or monotonic) on
I. If f is strictly increasing on I or strictly decreasing on I, we say that f is strictly
monotone (or strictly monotonic) on I.

Note: Some authors use the terminology “increasing” and “decreasing” instead of “non-
decreasing” and “non-increasing”, respectively. To avoid confusion, I have chosen to stick with
the terminology from Definition 12.1 above.
Now we take a look at one-sided limits for monotone functions. Remember the definition of
the extended real number line R̄ from Definition A.9.

Theorem 12.2. Let f be a real-valued function. Let a ∈ R or a = −∞. Let b ∈ R or


b = +∞. Assume a < b and assume the domain of f contains the non-empty interval
(a, b) ⊂ R. If f is monotone on (a, b), then the following both hold:

• There exists an La ∈ R̄ such that limx→a+ f (x) = La . If f is non-decreasing on (a, b),


then La 6= +∞. If f is non-increasing on (a, b), then La = 6 −∞.

• There exists an Lb ∈ R̄ such that limx→b− f (x) = Lb . If f is non-decreasing on (a, b),


then Lb 6= −∞. If f is non-increasing on (a, b), then Lb 6= +∞.

Let c ∈ (a, b). If f is non-decreasing on (a, b), then

lim f (x) ≤ f (c) ≤ lim f (x). (15)


x→c− x→c+

If f is non-increasing on (a, b), then

lim f (x) ≤ f (c) ≤ lim f (x). (16)


x→c+ x→c−

Proof. The difficult parts of the proofs of these various statements, is to find a candidate
for what the limit is. Once we know that, it is fairly straightforward to actually prove that
this is indeed the limit.
First we consider the limit limx→b− f (x) in the case where f is non-decreasing on (a, b) with
a, b ∈ R. Define
C := {f (x) ∈ R : x ∈ (a, b)}
and (
sup C, if C is bounded above,
Lb :=
+∞, otherwise.
Let (xn ) be a sequence in (a, b) such that xn → b as n → ∞. First assume that C is
not bounded above and let M > 0. Since C is not bounded above, there is a t ∈ (a, b)
such that f (t) > M . Since (xn ) converges to b, there exists an N ∈ N such that, for all
n ≥ N , xn ∈ (t, b). Because f is non-decreasing on (a, b), this means that, for all n ∈ N,
f (xn ) ≥ f (t) > M . Hence f (xn ) → +∞ = Lb as n → ∞.
Now assume that C is bounded above instead and let ε > 0. Since Lb = sup C, there exists
a t ∈ (a, b) such that f (t) > Lb − ε. As before, since (xn ) converges to b, there exists an

84
N ∈ N such that, for all n ≥ N , xn ∈ (t, b). Hence, since f is non-decreasing, for all n ≥ N ,
f (xn ) ≥ f (t) > Lb − ε. Moreover, since for all n ∈ N, f (xn ) ∈ C, we also have, for all n ∈ N,
f (xn ) ≤ Lb . Thus, for all n ∈ N, f (xn ) ∈ (Lb − ε, Lb ) and hence |f (xn ) − Lb | < ε. This proves
that f (xn ) → Lb as n → ∞. Thus, we have proven that in both cases (C bounded above and
C not bounded above) f (xn ) → Lb as n → ∞.
In a similar way as above, it can be proven that, if (xn ) is a sequence in (a, b) such that
xn → a as n → ∞ and if
(
inf C, if C is bounded below,
La :=
+∞, otherwise,

then f (xn ) → La as n → ∞. If f is non-increasing instead of non-decreasing on (a, b) the


roles of La and Lb are interchanged. We will leave the details of these proofs as exercises to
the reader.
Next we consider the case where b = +∞. We leave the definitions of C and Lb unchanged
from above and again first consider the case where f is non-decreasing. Let (xn ) be a
sequence in (a, +∞) such that xn → +∞ as n → ∞. First consider the case when C is
not bounded above. Let M > 0. Since C is not bounded above, there is a t ∈ (a, +∞)
such that f (t) > M . Because xn → +∞ as n → ∞, there exists an N ∈ N such that, for
all n ≥ N , xn > t. Because f is non-decreasing, this means that for all n ≥ N , we have
f (xn ) ≥ f (t) > M . Hence f (xn ) → +∞ = Lb as n → ∞. Now assume C is bounded above,
hence Lb = sup C. Let ε > 0, then there exists t ∈ (a, +∞) such that f (t) > Lb − ε. Since
xn → +∞ as n → ∞, there exists an N ∈ N such that, for all n ≥ N , xn > t. Thus, because
f is non-decreasing, for all n ≥ N we have f (xn ) ≥ f (t) > Lb − ε. Since we also have, for all
n ∈ N , f (xn ) ∈ C and thus f (xn ) ≤ Lb , we deduce that, for all n ∈ N, f (xn ) ∈ (Lb − ε, ε)
and hence |f (xn ) − Lb | < e. Thus f (xn ) → Lb as n → ∞.
We have seen above that the case when b = +∞ is very similar to the case when b ∈ R,
with minor changes. The remaining cases (non-increasing f and/or a = −∞) similarly follow
along the lines of the previous proofs with minor alterations required. The details are again
left to the reader as exercise.
It still remains to prove the inequalities in (15) and (16). We will prove the former now and
leave the proof of the latter as an exercise. For the proof of (15), assume f is non-decreasing
and let c ∈ (a, b). Then f (c) ∈ C. If C is bounded below, then f (c) ≥ inf C = La . If C is
not bounded below then f (c) > −∞ = La . If C is bounded above, then f (c) ≤ sup C = Lb .
If C is not bounded above, then f (c) < +∞ = Lb .
Note: In the proof above we actually proved a bit more than the statement of the theorem
required: We gave explicit characterisations of La and Lb .

Example: Consider the function g : R → R, defined by


(
x, if x < 0,
g(x) :=
1, if x ≥ 0.

This function is non-decreasing on R, so in particular on (−∞, 0). By Theorem 12.2 we thus


know that the one-sided limits limx→0− f (x) and limx→0+ f (x) either exist or are ±∞. Since f
is a bounded function (remember Definition 10.14) the cases ±∞ cannot occur, hence the one-
sided limits do actually exist. Can you compute them? Does the two-sided limit limx→0 f (x)
exist?
The next theorem you have seen already in G11ACF/MATH1005, but we revisit it here and
provide a different proof for it.

85
Theorem 12.3 (Intermediate value theorem). Let a, b ∈ R with a < b. Let f : [a, b] → R
be continuous. Let T ∈ R. If f (a) < T < f (b) or f (b) < T < f (a), then there exists a
c ∈ (a, b) such that f (c) = T .

Proof. First consider the case where f (a) < T < f (b).
We recursively define two sequences, (xn ) and (yn ), in R. Let x0 := a and y0 := b. Given xn
and yn , define tn := xn +y
2
n
(thus tn is the midpoint between xn and yn ) and
(
xn , if f (tn ) ≥ T,
xn+1 :=
tn , if f (tn ) < T,
(
tn , if f (tn ) ≥ T,
yn+1 :=
yn , if f (tn ) < T.

We claim that, for all n ∈ N ∪ {0}, the following hold:

b−a
a ≤ xn ≤ yn ≤ b, yn − xn = , f (xn ) ≤ T, and f (yn ) ≥ T. (17)
2n
We prove this by induction. All these claims are easily checked for n = 0. Now let k ∈ N∪{0}
and assume the claims hold for n = k, then we need to prove they hold for n = k + 1. Since
xk ≤ yk , we have xk ≤ tk ≤ yk . Per definition of xk+1 we have xk+1 ≥ min(xk , tk ) =
xk ≥ a. Moreover, xk+1 ≤ max(xk , tk ) = tk . Similarly, yk+1 ≥ min(yk , tk ) = tk and
yk+1 ≤ max(yk , tk ) = yk ≤ b. Combining these, we find

a ≤ xk+1 ≤ tk ≤ yk+1 ≤ b.
yk −xk
Next we note that, if f (tk ) ≥ T , then yk+1 − xk+1 = tk − xk = 2 . Similarly, if f (tk ) < T ,
then yk+1 − xk−1 = yk − tk = yk −x k
2 . Hence, in either case

yk − xk 1b−a b−a
yk+1 − xk+1 = = k
= k+1 .
2 2 2 2
If f (tk ) ≥ T , we have f (xk+1 ) = f (xk ) ≤ T and f (yk+1 ) = f (tk ) ≥ T . If, on the other hand,
f (tk ) < T , then f (xk+1 ) = f (tk ) < T (and thus f (xk+1 ) ≤ T ) and f (yk+1 ) = f (yk ) ≥ T .
This concludes the proof of the claims in (17).
For every n ∈ N, define the interval In := [xn , yn ] ⊂ R. The properties in (17) guarantee
that, for all n ∈ N, In is non-empty and In+1 ⊂ In . Thus

[a, b] = I0 ⊃ I1 ⊃ I2 ⊃ . . . .

T for all n ∈ N, In is also closed and bounded. Hence, by Theorem 9.7, there exists a
Clearly,
c ∈ n∈N∪{0} In . Thus, for all n ∈ N ∪ {0} we have c ∈ (xn , yn ) and thus 0 ≤ c − xn ≤ yn − xn .
By the second property in (17) we have yn − xn → 0 as n → ∞ and thus by the sandwich
theorem c − xn → 0 as n → ∞, hence xn → c as n → ∞. By a similar argument, since
xn − yn ≤ c − yn ≤ 0, we also have that yn → c as n → ∞. By continuity of f we then
have that f (xn ) → f (c) as n → ∞ and f (yn ) → f (c) as n → ∞. By the third and fourth
properties in (17) and the preservation of inequalities in the limit from Corollary 3.10, we
deduce that f (c) ≤ T and f (c) ≥ T , hence f (c) = T .
To prove the result in the case where f (b) < T < f (a), we apply the first case which we just
proved to −f and −T .

86
Note: The intermediate value theorem is a very powerful theorems in calculus and analysis.
For example, it allows us to conclude that any continuous function f : [a, b] → [a, b] must have
a fixpoint (i.e. a solution of f (x) = x) in [a, b]. Do you see how we can prove that? (Hint:
Consider g(x) = f (x) − x.)
The intermediate value theorem also allows us to determine what kind of function can be
continuous and injective on an interval.

Theorem 12.4. Let I ⊂ R be a non-empty interval and let f : I → R be a continuous and


injective function. Then f is strictly increasing on I or f is strictly decreasing on I.

Proof. First we prove the result when I = [a, b], with a, b ∈ R, a < b.
Because f is injective, we know f (a) 6= f (b). Suppose first that f (a) < f (b). Assume, for a
proof by contradiction, that f is not strictly increasing. Then there exist x, y ∈ [a, b] such
that x < y and f (x) ≥ f (y). By injectivity we know that f (x) 6= f (y) and thus f (x) > f (y).
We now consider two cases. If f (y) < f (a) then f (y) < f (a) < f (b). By the intermediate
value theorem there exists a c ∈ (y, b) such that f (c) = f (a), which contradicts the fact that
f is injective. On the other hand, if f (y) ≥ f (a) then f (y) > f (a) (by injectivity), and so
f (x) > f (y) > f (a). Then the intermediate value theorem gives the existence of a d ∈ (a, x)
such that f (d) = f (y). This again contradicts the fact that f is injective. Hence f is strictly
increasing. This proves the result when f (a) < f (b).
To prove the result when f (a) > f (b), we can replace f by −f and apply the result we
just proved to deduce that −f is strictly increasing and thus f is strictly decreasing. That
completes the proof for I = [a, b].
Now let I be any non-empty interval in R. For a proof by contradiction assume that f is not
strictly increasing and f is not strictly decreasing. Then there exist t, u, v, w, ∈ I such that
t < u, v < w, f (t) < f (u), and f (v) > f (w). Define m := min(t, v), m0 := max(u, w), and
J := [m, m0 ]. Then t, u, v, w ∈ J and f is neither strictly increasing, nor strictly decreasing
on J. This contradicts the result we just proved. Therefore f is strictly increasing on I or f
is strictly increasing on I.
Note: The converse of Theorem 12.4 is not true, as a strictly increasing function need not be
continuous. For example, the function f : R → R defined by
(
x, if x < 0,
f (x) :=
x + 1, if x ≥ 0,

is strictly increasing, but it is not continuous at x = 0.


For functions which are surjective, we have the following result.

Theorem 12.5. Let I and J be intervals in R. Let f : I → J be a monotone surjective


function. Then f is continuous on I.

Proof. Here we prove the result in the case where J is an open interval. The result in the
general case is proven in Appendix B.9. The proof for the general case is optional in the
context of this module.
First we assume that f is non-decreasing.
Assume J is an open interval. Let x ∈ I and let (xn ) be a sequence in I which converges to
x. Let ε > 0. Since f (x) ∈ J and J is open, there exist A, B ∈ J such that
f (x) − ε < A < f (x) < B < f (x) + ε.

87
By surjectivity of f , there are a, b ∈ I such that f (a) = A and f (b) = B, hence

f (x) − ε < f (a) < f (x) < f (b) < f (x) + ε.

Since f is non-decreasing we have x ∈ (a, b). Since (xn ) converges to x, there exists N ∈ N
such that, for all n ≥ N , xn ∈ (a, b). Using again that f is non-decreasing, we find, for all
n ≥ N,
f (x) − ε < f (a) < f (xn ) < f (b) < f (x) + ε
and thus |f (xn ) − f (x)| < ε. Therefore f (xn ) → f (x) as n → ∞ and thus f is continuous at
x. We conclude that f is continuous on I.
The case where f is non-increasing is proven in a similar way. The details are left as an
exercise to the reader.

Example: The conclusion of Theorem 12.5 does no longer hold in general, if we leave out either
the assumption of monotonicity or the assumption of surjectivity. For example, the function
f : R → R, defined by (
x, if x ≤ 0,
f (x) :=
1 + x, if x > 0,
is monotone, but not surjective and not continuous. The function g : R → R, defined by
(
x, if x ≤ 0,
g(x) :=
−1 + x, if x > 0,

is surjective, but not monotone and not continuous. For a (perhaps) more interesting example,
see the optional example in Appendix B.10: a non-decreasing function which is discontinuous
at every rational number in R.

88
13 Differentiability on the real line
In this section we will have new look at the concept of differentiability, which you have seen
before in G11ACF/MATH1005 and G11CAL/MATH1006 and look at some important conse-
quences of its definition.

Definition 13.1. Let I ⊂ R be an open interval, a ∈ I, and f : I → R. We say that f is


differentiable at a if there exists an La ∈ R such that

f (x) − f (a)
lim = La . (18)
x→a x−a
In that case we write f 0 (a) := La .
If J ⊂ I and, for all a ∈ J, f is differentiable at a, then f is differentiable on J.
If f is differentiable on its domain I, then f is differentiable. If f is differentiable, the
function f 0 : I → R, x 7→ f 0 (x) is called the derivative (or first derivative) of f .

Note: In the definition above we need I, the domain of f , to be open to make sense of the
two-sided limit x → a at each a ∈ I.

Lemma 13.2. Let I ⊂ R be an open interval, a ∈ I, and f : I → R. Then f is differentiable


at a with f 0 (a) = La ∈ R if and only if

f (a + h) − f (a)
lim = La .
h→0 h
Proof. We leave this as an exercise to the readers. It is a simple consequence of Defini-
tion 13.1.

Lemma 13.3. Let I ⊂ R be an open interval, a ∈ I, and f : I → R. Then f is differentiable


at a with f 0 (a) = La ∈ R if and only if there exists a function εa : I → R such that, for all
x∈I
f (x) = f (a) + La (x − a) + εa (x)(x − a) (19)
and limx→a εa (x) = 0.

Proof. First we prove the “if” statement. Assume that there exists a function εa : I → R
such that, for all x ∈ I, (19) holds and limx→a εa (x) = 0. Then
f (x) − f (a)
= La + εa (x).
x−a
Taking the limit x → a on both sides of the equality gives the condition in (18).
To prove the “only if” statement, assume that f is differentiable at a, with f 0 (a) = La . Then
(18) holds. Let η > 0. Then there exists δ > 0 such that, for all x ∈ I,
f (x) − f (a)
0 < |x − a| < δ ⇒ − La < η.
x−a
Define the function εa : I → R by εa (x) := f (x)−f
x−a
(a)
− La , then clearly for all x ∈ I (19)
follows. Moreover, by the inequality above we have, for all x ∈ I,
0 < |x − a| < δ ⇒ |ε(x)| < η.

89
Hence limx→a εa (x) = 0.
Note: We can interpret the expression in (19) as follows. To approximate f (x) for x near a,
we can use the linear function g(x) = f (a) + f 0 (a)(x − a), and this approximation will be good
if x is close enough to a. Thus differentiability is really about whether you can approximate
f (x) by a linear function. This idea also has the advantage that you can generalise it to higher
dimensions.

Theorem 13.4. Let I ⊂ R be an open interval, a ∈ I, andf : I → R. If f is differentiable


at a, then f is continuous at a.

Proof. Since f is differentiable at a, we know that (19) holds and thus f (x) → f (a) as
x → a.
Note: The converse of Theorem 13.4 is false. For example, the function f : R → R, x 7→ |x|
is continous at 0, but it is not differentiable at 0. Can you prove this?
It is very useful to know how to compute the derivative of sums, products, quotients, and
compositions of functions, given that the appropriate derivatives of the individual functions
exist. Hopefully you remember all those rules from Year 1. Their details and proofs are given
in Appendix A.5. In that section you will also find the derivatives of some commonly used
functions.
Once we know how to take the derivative of a function once, we can repeat this process to
obtain higher order derivatives, if they exist.

Definition 13.5. Let I ⊂ R be an open interval, a ∈ I, and let f : I → R be a differentiable


function. We say f has a derivative of order 2 at a if f (1) := f 0 is differentiable at a.
In that case
f (1) (x) − f (1) (a)
f (2) (a) := lim .
x→a x−a
If, for all x ∈ I, f (1) is differentiable at x, we say f is 2 times differentiable (or
twice differentiable). In that case the function f (2) : I → R, x 7→ f (2) (x) is called the
derivative of order 2 (or the second derivative).
Let n ∈ N{1, 2}, then we recursively define f to have a derivative of order n at a if
f (n−1) is differentiable at a. In that case

f (n−1) (x) − f (n−1) (a)


f (n) (a) := lim .
x→a x−a

If, for all x ∈ I, f (n−1) is differentiable at x, we say f is n times differentiable. In that


case the function f (n) : I → R, x 7→ f (n) (x) is called the derivative of order n (or the
nth derivative).

Note: Extrapolating the terminology of the preceding definitions, we can call the derivative
the derivative of order 1.
There are many different notations for derivatives. For example, f 0 , f 00 , and f 000 are quite
common for the first, second, and third derivatives. Some authors will continue in a similar
vein using Roman type numerals to indicate higher derivatives: f (iv) , f (v) , etc. Also the Leibniz
df n
notation for derivatives, dx = f 0 (x), ddxnf = f (n) (x) is commonly encountered.
We have set up enough theory now to have a look at some examples.

90
Example: Define f : R → R by
(
x2 sin(1/x2 ), if x 6= 0,
f (x) :=
0, if f (0) = 0.

If x 6= 0, we can use the product rule and chain rule on the open intervals (−∞, 0) and (0, ∞)
to find
f 0 (x) = 2x sin(1/x2 ) − 2x−1 cos(1/x2 ).
f (x)−f (0)
Does f 0 (0) exist? For x 6= 0 we have x−0 = x sin(1/x2 ). Since

−x ≤ x sin(1/x2 ) ≤ x,

the sandwich theorem tells us that

f 0 (0) = lim x sin(1/x2 ) = 0.


x→0

Note that f 0 (x) is not bounded as x → 0 and so not continuous at 0, so f 00 (0) cannot exist
(because if f 0 were differentiable at 0, it should be continuous at 0, by Lemma 13.4.

Example: Define f : R → R by
(
x3 , if x ≤ 0,
f (x) =
x2 , if x > 0.

On the open interval (0, ∞) Lemma A.25 gives f 0 (x) = 2x and f 00 (x) = 2. On the open interval
(−∞, 0), by the same lemma, we find f 0 (x) = 3x2 and f 00 (x) = 6x.
What happens at 0? We need to consider the one-sided limits from the left and right:

f (x) − f (0) x3
lim = lim = lim x2 = 0
x→0− x−0 x→0− x x→0−
f (x) − f (0) x2
lim = lim = lim x = 0.
x→0+ x−0 x→0+ x x→0+

Thus
f (x) − f (0)
f 0 (0) = lim = 0.
x→0 x−0
But f 00 (0) does not exist, as

f 0 (x) − f 0 (0) 2x
lim = lim = 2,
x→0+ x−0 x→0+ x
f 0 (x) − f 0 (0) 3x2
lim = lim = 0 6= 2.
x→0− x−0 x→0− x

Example: As we have seen before, the function x 7→ |x| is continuous on R but not differentiable
at 0. It turns out that there are functions which are continuous on R but not differentiable
anywhere. Weierstraß discovered a whole class of these, including the function W : R → R
defined by
X∞
W (x) := 2−n cos ((21)n πx) .
n=0

91
P20 −n cos((21)n πx)
Figure 8: Plot of the function x 7→ n=0 2 (MAPLE)

Figure 9: The functions f0 , f1 for 0 ≤ x ≤ 1

Since the geometric series ∞ −n converges (see (32)) and, for all x ∈ R, | cos((21)n πx)| ≤
P
n=0 2
1, the series W (x) converges uniformly on R and is continuous, by the Weierstrass M -test
(Theorem 11.11).
The effect of the powers (21)n is to make the graph of cos((21)n πx) so steep that the graph
of W turns out to have no tangent. (This is not trivial to prove.)
Figure 8 shows a partial sum of the Weierstraß function W .

Example: Here we give a slightly easier example of a continuous nowhere differentiable func-
tion, due to Van der Waerden in 1930 [4].
For all n ∈ N ∪ {0}, define fn : R → [0, ∞), x 7→ |x − pn |, where pn is the nearest rational
m
number to x of the form , with m ∈ Z. So fn (x) is the distance from x to pn . Figure 9
10n
shows f0 and f1 on (0, 1). Now define the function f : R → [0, ∞) by

X
f (x) := fn (x). (20)
n=0

Note
P∞ that, for all x ∈ R and for all n ∈ N ∪ {0}, |fn (x)| < 10−n . Since the geometric series
−n
k=0 10 converges (see (32)), we know by the Weierstraß M -test that f converges uniformly
on R and thus is continuous.
For all x ∈ R, f is not differentiable at x. We will prove this now for the case where x is
r
of the form s , with r ∈ Z and s ∈ N ∪ {0}. An (optional) proof for the general case can be
10
found in Appendix B.12.
r
Let r ∈ Z, s ∈ N ∪ {0} and x := . Let q ∈ N be such that q > s, and define yq :=
10s
1 1
x + q+1 . Let n ∈ N ∪ {0}. If n > q both x and yq are integer multiples of , so we
10 10n

92
1
have fn (x) = fn (yq ) = 0. If s ≤ n ≤ q, x is an integer multiple of , but yq is not, so
10n
1 1
fn (yq ) − fn (x) = q+1 = yq − x. For the final case, assume 0 ≤ n < s. Since |x − yq | = q+1 ,
10 10
1
the difference between fn (x) and fn (yq ) cannot be more than q+1 :
10
1
fn (yq ) − fn (x) ≥ − = −(yq − x).
10q+1
Since yq → x as q → ∞, we find that
q
f (yq ) − f (x) X fn (yq ) − fn (x)
= ≥ (q + 1 − s) − s → ∞ as q → ∞.
yq − x yq − x
n=0

f (x0 )−f (x)


Hence the limit limx0 →x x0 −x does not exist and thus f is not differentiable at x.

Example: Define h : R → R by
(
x, if x < 0,
h(x) :=
sin x, if x ≥ 0.

A hypothetical student writes: “For x < 0 we have h0 (x) = 1 and for x > 0 we have
h0 (x)
= cos x. Since
lim 1 = lim cos x = 1
x→0− x→0+

we have h0 (0) = 1.” Is this student correct?


In order to answer that, we need to consider

h(x) − h(0) x−0


lim = lim = lim 1 = 1
x→0− x−0 x→0− x − 0 x→0−

and
h(x) − h(0) sin x − 0
lim = lim = 1.
x→0+ x−0 x→0− x − 0

The final equality above follows, because we know that sin is differentiable at 0 with derivative
value cos(0) and thus

sin x − 0 sin x − 0
lim = lim = cos(0) = 1.
x→0− x−0 x→0 x − 0

Since the left and right hand limits are equal, we can conclude that

h(x) − h(0)
h0 (0) = lim =1
x→0 x−0
and thus the numerical answer which the student gave is correct. Theorem 13.12 shows that
this is no coincidence. The method which the student used here can indeed be used.
Before we state and prove Theorem 13.12 we will first prove Rolle’s theorem, the mean value
theorem, Cauchy’s mean value theorem, and L’Hôpital’s rule, which are very interesting and
useful in their own rights. Moreover, L’Hôpital’s rule is also used in the proof of Theorem 13.12
and Rolle’s theorem and Cauchy’s mean value theorem are required before we prove L’Hôpital’s
rule. To prove Rolle’s theorem, it is useful to be able to talk about local maxima and local

93
minima. Remember their definition from G11CAL/MATH1006 (“Calculus”). It is also given
in Definition A.8.
Before we state and prove Theorem 13.12 we will first prove Rolle’s theorem, the mean value
theorem and Cauchy’s mean value theorem, which are very interesting and useful in their own
rights. Moreover, in the proof of Theorem 13.12 L’Hôpital’s rule is used. You have seen this rule
in G11ACF/MATH1005, but for completeness we have included it and its proof in Theorem B.6
and Corollary B.7. We require Rolle’s theorem and Cauchy’s mean value theorem also to prove
L’Hôpital’s rule. To prove Rolle’s theorem, it is useful to be able to talk about local maxima
and local minima. Remember their definition from G11CAL/MATH1006 (“Calculus”). It is
also given in Definition A.8.

Lemma 13.6. Let U ⊂ R, f : U → R, and a ∈ U . Assume f is differentiable at a. If f


has a local maximum at a or if f has a local minimum at a, then f 0 (a) = 0.

Proof. First assume that f has a local maximum at a. If x ∈ U and x > a, then (f (x) −
f (a))/(x − a) ≤ 0. Since non-strict inequalities are preserved in the limit, we have f 0 (a) ≤ 0.
Similarly, if x is in U and x < a, then (f (x) − f (a))/(x − a) ≥ 0, so f 0 (a) ≥ 0. Thus f 0 (a) = 0.
If f has a local minimum at a, then −f has a local maximum at a and thus, by the result we
have just proven, −f 0 (a) = 0. Hence, by point 2 in Theorem A.22 we have f 0 (a) = −0 = 0.
Note: In the lemma above, we have to assume that f is differentiable at a. It is possible for
a function to have a local maximum (or minimum) at a point, without being differentiable at
that point. For example, we have seen before that x 7→ |x| is not differentiable at 0, but it does
have a local minimum at 0. Another example would be the function g : R → R defined by
(
0, if x 6= 0,
g(x) :=
1, if x = 0.

This function is not continuous at 0, thus also not differentiable at 0, but it does have a local
maximum at 0.

Lemma 13.7. Let U ⊂ R, f : U → R, and a ∈ int U . If f has a global maximum at a,


then f has a local maximum at a. If f has a global minimum at a, then f has a global
minimum at a.
Proof. This follows almost directly from the relevant definitions in Definition A.7 and Defi-
nition A.8. The details are left as an exercise to the reader.

Theorem 13.8 (Rolle’s theorem). Let a, b ∈ R with a < b and let f : [a, b] → R be a
continuous function which is differentiable on (a, b). Assume f (a) = f (b). Then there
exists a c ∈ (a, b) such that f 0 (c) = 0.

Proof. If f is a constant function (i.e. for all x ∈ [a, b], f (x) = f (a)), then by Lemma A.25,
for all c ∈ (a, b), f 0 (c) = 0.
Now assume there exists an x ∈ [a, b] such that f (x) > f (a). Since [a, b] is non-empty, closed,
and bounded, by Theorem 10.13 there exists a c ∈ [a, b] such that f has a global maximum
at c. By our assumption c 6= a and c 6= b, hence c ∈ (a, b). Thus, by Lemma 13.7 the global
maximum at c is also a local maximum. Since f is differentiable on (a, b) it is differentiable
at c, and thus, by Lemma 13.6, f 0 (c) = 0.

94
Finally, if there exists an x ∈ [a, b] such that f (x) < f (a), a similar argument to the one
above shows that f has a local minimum at a c ∈ (a, b) and thus f 0 (c) = 0.

Theorem 13.9 (Mean value theorem). Let a, b ∈ R with a < b and let f : [a, b] → R be a
continuous function which is differentiable on (a, b). There there exists c ∈ (a, b) such that

f (b) − f (a)
f 0 (c) = .
b−a

Proof. Define the function g : [a, b] → R by


 
f (b) − f (a)
g(x) := f (x) − (x − a) .
b−a

Then g(a) = f (a) = g(b) and thus, by Rolle’s theorem (Theorem 13.8) there exists a c ∈ (a, b)
such that g 0 (c) = 0. Since f is differentiable on (a, b) we compute

f (b) − f (a)
0 = g 0 (c) = f 0 (c) − ,
b−a
from which the result follows.
There is an interesting generalisation of the mean value theorem which is not only very
interesting in its own right, but which will also be useful when we prove L’Hôpital’s theorem
(Theorem B.6 and Corollary B.7). It is called Cauchy’s mean value theorem or the extended
mean value theorem.

Theorem 13.10 (Cauchy’s mean value theorem). Let a, b ∈ R with a < b and let f :
[a, b] → R and g : [a, b] → R be continuous functions which are both differentiable on (a, b).
There there exists c ∈ (a, b) such that

f (b)) − f (a) g 0 (c) = g(b) − g(a) f 0 (c).


 

Moreover, if g(a) 6= g(b) and g 0 (c) 6= 0, then

f (b) − f (a) f 0 (c)


= 0 .
g(b) − g(a) g (c)

Proof. Define the function h : [a, b] → R by


   
h(x) := f (b) − f (a) g(x) − g(a) − f (x) − f (a) g(b) − g(a) .

Then h(a) = h(b) = 0, h is continuous, and h is differentiable on (a, b). Thus by Rolle’s
theorem (Theorem 13.8) there exists a c ∈ (a, b) such that h0 (c) = 0. Since

h0 (c) = g 0 (c) f (b) − f (a) − f 0 (c) g(b) − g(a) ,


 

this proves the first result. The second result follows immediately by dividing by g(b) −
g(a) g 0 (c).
Note: The assumption g 0 (c) 6= 0 in the second part of Theorem 13.10 is a very natural one,
since we want to divide by g 0 (c). In practice, however, we usually do not know what c exactly
is. We only know it lies in the interval (a, b). Therefore, in practice one often requires that

95
it is known that g 0 = 0 on all of (a, b) in order to use the statement in the second part of
Theorem 13.10.

Theorem 13.11. Let I be a nondegenerate interval in R and let f be a real-valued function


whose domain contains I. Assume f is continuous on I and differentiable on int I. Then
the following hold.

1. If, for all x ∈ int I, f 0 (x) > 0, then f is strictly increasing on I.

2. f is non-decreasing on I if and only if for all x ∈ int I, f 0 (x) ≥ 0.

3. f is constant on I if and only if for all x ∈ int I, f 0 (x) = 0.

4. If, for all x ∈ int I, f 0 (x) < 0, then f is strictly decreasing on I.

5. f is non-increasing on I if and only if for all x ∈ int I, f 0 (x) ≤ 0.

6. If, for all x ∈ int I, f 0 (x) 6= 0, then f is injective on I.

Proof. Proof of result 1. Assume that, for all x ∈ I, f 0 (x) > 0. Let x1 , x2 ∈ I with x2 > x1 ,
then by the mean value theorem (Theorem 13.9) there exists a c ∈ (x1 , x2 ) such that

f (x2 ) − f (x1 )
0 < f 0 (c) = ,
x2 − x1
thus f (x2 ) − f (x1 ) > 0. Hence f is strictly increasing.
The proof of the “if” statement in result 2 follows in a very similar way as above (try it!). To
prove the “only if” statement, assume that f is non-decreasing on I and let x ∈ int I. Then,
for all y ∈ int I, we have
f (y) − f (x)
≥ 0.
y−x
Since f is differentiable on int I, we have, by the definition of f 0 and by the preservation of
inequalities in the limit, that

f (y) − f (x)
f 0 (x) = lim ≥ 0.
y→x y−x
The proofs of results 3, 4, and 5 follow in a similar way and we leave the details as exercises
to the reader.
Finally, to prove result 6, assume that, for all x ∈ int I, f 0 (x) 6= 0 and let x1 , x2 ∈ I with
x1 < x2 . For a proof by contradiction, assume that f (x1 ) = f (x2 ). Since f is continuous
on [x1 , x2 ] and differentiable on (x1 , x2 ), the mean value theorem tells us that there exists a
c ∈ (x1 , x2 ) ⊂ int I such that

f (x2 ) − f (x1 )
f 0 (c) = = 0,
x2 − x1

which contradicts the assumption that f 0 6= 0 on int I.


Note: In results 1 and 2 in Theorem 13.11 we cannot replace the “if” by an “if and only if”.
For example, the function f : R → R, x 7→ x3 is strictly increasing, but f 0 (0) = 0.
x
Example: Define the function g : [0, 1] → R, x 7→ 1+x2
and prove it is strictly increasing on its
domain.

96
It is not immediately obvious from a first glance that this function is strictly increasing,
because it is the quotient of two strictly increasing functions. If the numerator had been
non-increasing instead (with the denominator still strictly increasing), then we could have im-
mediately concluded that g is strictly increasing, but that does not work here. We can, however,
use Theorem 13.11, since g is continuous on [0, 1] and differentiable on (0, 1). We compute, for
x ∈ (0, 1),
1 + x2 − 2x2 1 − x2
g 0 (x) = = .
(1 + x2 )2 (1 + x2 )2
Both numerator and denominator are strictly positive on (0, 1), hence g is strictly increasing on
[0, 1].
1
Example: Prove that, for all x > 0, (1 + x)− 2 > 1 − x2 .
1
Define the function f : [0, ∞) → R, x 7→ (1 + x)− 2 − 1 + x2 . Note that f (0) = 1 − 1 = 0.
Hence, if we can prove that f is strictly increasing on its domain, then we have proven what we
wanted to show. We compute, for x ∈ (0, ∞),
1 3 1 1 3

f 0 (x) = − (1 + x)− 2 + = 1 − (1 + x)− 2 .
2 2 2
3 3
Since x > 0, we have (1 + x) 2 > 1 and thus (1 + x)− 2 < 1. Hence, for all x ∈ (0, ∞), f 0 (x) > 0.
By Theorem 13.11 we conclude that f is strictly increasing on [0, ∞) and we are done.
It can be instructive to prove this in an alternative way, by directly applying the mean value
1
theorem (Theorem 13.9) to the function g : [0, ∞) → R, x 7→ (1 + x)− 2 . Compared to the first
method above, it is more difficult to see in advance if this method will be useful, but it is always
good to have more techniques to choose from when having to prove something.
Note that g(0) = 1. We compute, for x ∈ (0, ∞),
1 3
g 0 (x) = − (1 + x)− 2 .
2
Now let x ∈ (0, ∞) and apply the mean value theorem on the interval [0, x]. Then we find that
there exists a c ∈ (0, x) such that
g(x) − 1 g(x) − g(0) 1 3 1
= = g 0 (c) = − (1 + c)− 2 > − . (21)
x x−0 2 2
3
The last inequality above follows since (1 + c)− 2 < 1 and thus g 0 (c) > − 12 . Since x > 0 we can
multiply (21) by x to find that, for all x > 0, g(x) − 1 > − x2 . Hence, for all x > 0, g(x) > 1 − x2 .
From G11ACF/MATH1005 you remember L’Hôpital’s rule. In Appendix B.4 we state
this result and prove it (Theorem). In that section we also state and prove a useful lemma
(Lemma B.5) which we will not only use for the proof of L’Hôpital’s rule, but also again when
we prove the very important first fundamental theorem of calculus (Theorem 14.17).
We are now able to state and prove the following very useful theorem which we mentioned
earlier.

Theorem 13.12. Let a, b, c ∈ R with a < c < b. Let f be a real-valued function whose
domain contains the interval (a, b). Assume that f is continuous on (a, b) and that f is
differentiable on (a, c) and on (c, b). Also assume that there are L, M ∈ R such that

lim f 0 (x) = L, and lim f 0 (x) = M.


x→c− x→c+

97
1. If L = M , then f is differentiable at c and f 0 (c) = M .

2. If f is differentiable at c, then f 0 (c) = L = M .

Proof. To prove the result in 1, let L = M . Then limx→c− f 0 (x) = limx→c+ f 0 (x) and thus

lim f 0 (x) = L = M.
x→c

Because f is continuous at c we have limx→c (f (x) − f (c)) = 0. Of course we also have


limx→c x = c and thus we can apply L’Hôpital’s rule (Corollary B.7) and the definition of
f 0 (c) to find
f (x) − f (c) f 0 (x)
f 0 (c) = lim = lim = L = M.
x→c x−c x→c 1
To prove the result in 2, assume f is differentiable at c with f 0 (c) = L = M . By the same
arguments as above, we can again use L’Hôpital’s rule together with the definition of f 0 (c),
to get
f (x) − f (c) f 0 (x)
f 0 (c) = lim = lim .
x→c x−c x→c 1
Since limx→c f 0 (x) exists, it is equal to the left and right hand side limits:

L = lim f 0 (x) = lim f 0 (x) = lim f 0 (x) = M.


x→c− x→c x→c+

Thus L = f 0 (c) = M .
We end this section of the notes with an intermediate value theorem for derivative functions.
We have seen in the intermediate value theorem (Theorem 12.3) that a continuous function
f : [a, b] → R has the intermediate value property: If f (a) < T < f (b) or f (a) > T > f (b) then
there exists a c ∈ (a, b) such that f (c) = T . A non-continuous function may fail to have this
property. For example, the function g : R → R defined by
(
−1, if x < 0,
g(x) :=
1, if x ≥ 0,

never takes the value 0. As the first example on page 91 showed, it is possible for a function
to be differentiable everywhere on its domain, but to have discontinuous derivative. But can a
derivative function fail to have the intermediate value property? The following theorem shows
the answer is “no”.

Theorem 13.13 (Darboux’s theorem). Let a, b ∈ R with a < b. Let f be a real-valued


function whose domain includes [a, b]. Assume f is differentiable on [a, b]a . If T ∈ R is
such that f 0 (a) < T < f 0 (b) or f 0 (a) > T > f 0 (b), then there exists a c ∈ (a, b) such that
f 0 (c) = T .
a
Note that, for our definition of differentiability to be applicable for f at the points a and b, the domain
of f needs to contain an open interval which itself in turn contains the interval [a, b].

Proof. First we will prove the result in the case where f 0 (a) < T < f 0 (b).
We prove two cases separately: T = 0 and T 6= 0. First assume T = 0. For a proof by
contradiction, assume that, for all x ∈ (a, b), f 0 (x) 6= 0. Because f is differentiable on [a, b],
f is continuous on [a, b]. By Theorem 13.11 we have that f is injective on [a, b]. Hence by

98
Theorem 12.4 the function f is strictly monotone on [a, b]. We claim that

f 0 (a) ≥ 0. (22)

To prove this, we can use a very similar proof as that of result 2 in Theorem 13.11. Note
that we cannot apply this result directly, because it would only give us that, for all x ∈ (a, b),
f 0 (x) ≥ 0 and we need the result for x = a. Since f is strictly inreasing on [a, b] we have
that, for all x ∈ [a, b],
f (x) − f (a)
> 0.
x−a
Since f 0 (a) exists, we have

f (x) − f (a) f (x) − f (a)


f 0 (a) = lim = lim ≥ 0.
x→a x−a x→a+ x−a
which proves the claim in (22). This contradicts the assumption that f 0 (a) < 0. A simi-
lar argument shows that, if f is strictly decreasing, then f 0 (b) ≤ 0, which contradicts the
assumption that f 0 (b) > 0. Hence, we conclude that there exists a c ∈ (a, b) such that
f 0 (x) = 0.
To prove the result in the case where T 6= 0, define the function g : [a, b] → R, x 7→ f (x) − T x.
Then g is differentiable on (a, b) with g 0 (x) = f 0 (x) − T . By assumption g 0 (a) = f 0 (a) − T <
0 < f 0 (b) − T = g 0 (b). Hence, we can apply the result we just proved to g to deduce that
there exists a c ∈ (a, b) such that g 0 (c) = f 0 (c) − T = 0. Hence f 0 (c) = T .
In the case where f 0 (b) < T < f 0 (a) apply the result which we just proved to the function −f :
We have −f 0 (a) < −T < −f 0 (b), hence there exists a c ∈ (a, b) such that −0 f (c) = −T .

99
14 The Riemann integral
In G11CAL/MATH1006 you have made your acquaintance with Riemann integration. Now we
are ready to revisit this topic with our newly acquired highly detailed and rigorous glasses on.
Throughout this section we will consider real-valued bounded functions with the closed
interval [a, b] ⊂ R as domain, for some given a, b ∈ R with a < b.
Rb
We need to define what is meant by the integral a f (x) dx, and to determine for which f
it exists. It may be tempting to define the integral as the “area under the graph of f ”, but
it is not obvious that the area exists. The function f may give a very messy graph, such as
the continuous, nowhere differentiable functions we discussed starting on page 91. Moreover, it
is not obvious what to do if f changes sign infinitely often, as does, for example, the function
x 7→ x sin(1/x) on any domain that includes an interval of the form (−a, 0) or (a, 0) (with
a > 0).
The idea of Riemann integration is to construct a sequence of approximations to the area
under the graph from above and and a sequence which approximation the area from below. If
we can bring those approximations to within arbitrarily small distance of each other, we have a
way of defining the Riemann integral. In order to make this precise, we need to introduce some
machinery first.

Definition 14.1. Let a, b ∈ R with a < b. Let n ∈ N. Then the finite set P := {xi }ni=1 ⊂ R
is a partition of the interval [a, b] if x0 = a, xn = b, and, for all i ∈ {1, . . . , n}, xi > xi−1 .
The elements xi are the vertices of the partition P .

Definition 14.2. Let a, b ∈ R with a < b. Let f be a bounded real-valued function with
domain [a, b]. Let n ∈ N and let P := {xi }ni=1 be a partition of [a, b]. Then the Riemann
upper sum (or upper sum) corresponding to P and f is
n
X
U (P, f ) := Mk (P, f )(xk − xk−1 )
k=1

and the Riemann lower sum (or lower sum) corresponding to P and f is
n
X
L(P, f ) := mk (P, f )(xk − xk−1 ),
k=1

where, for all k ∈ {1, . . . , n},

Mk (P, f ) := sup {f (x) : x ∈ (xk−1 , xk )} ,


mk (f P, f ) := inf {f (x) : x ∈ (xk−1 , xk )} . (23)

Figures 10a and 10b show examples of a Riemann upper sum and a Riemann lower sum,
respectively.

Lemma 14.3. Let a, b ∈ R with a < b. Let f be a real-valued function with domain [a, b]
and assume there exists an M > 0 such that, for all x ∈ [a, b], |f (x)| ≤ M . Let n ∈ N and
let P := {xi }ni=1 be a partition of [a, b]. Let U (P, f ) and L(P, f ) be the upper and lower
sums corresponding to P and f , respectively, with Mk (P, f ) and mk (P, f ) as in (23). Then,

100
y=f(x)
y=f(x)

(a) A Riemann upper sum (b) A Riemann lower sum

Figure 10

for all k ∈ {1, . . . , n},


−M ≤ mk (P, f ) ≤ Mk (P, f ) ≤ M.
Moreover,
−M (b − a) ≤ L(P, f ) ≤ U (P, f ) ≤ M (b − a).

Proof. Assume, for a proof by contradiction, that there exists a k ∗ ∈ {1, . . . , n} such that
mk (P, f ) < −M . Define ε := 12 (−M − mk (P, f )) > 0. Then, by definition of the infimum,
there exists x ∈ (xk∗ −1 , xk∗ ) such that f (x) < mk (P, f ) + ε = 21 (mk (P, f ) − M ) < −M , which
contradicts the assumption that, for all x ∈ [a, b], |f (x)| < M . Hence, for all k ∈ {1, . . . , n},
−M ≤ mk (P, f ). The analogous result for Mk (P, f ) is proven in a similar way.
Because P is a partition of [a, b[ we have, for all k ∈ {1, . . . , n}, 0 ≤ xk − xk−1 ≤ b − a. Thus,
by the definition of the lower sum it follows that
n
X n
X n
X
L(P, f ) = mk (P, f )(xk − xk − 1) ≥ −M (xk − xk−1 ) = M (xk−1 − xk ) ≥ −(b − a)M.
k=1 k=1 k=1

The result for the upper sum U (P, f ) follows in a similar way.
In Figures 10a and 10b we see that the area of the rectangles of the upper and lower sums
approximate the area under the graph. In order to improve this approximation, it is useful to
be able to talk about refinements of a partition.

Definition 14.4. Let a, b ∈ R with a < b. Let f be a real-valued, bounded function with
domain [a, b]. Let P and Q be partitions of [a, b]. Then Q is a refinement of P if P ⊂ Q.

Note: The definition above says that the partition Q is a refinement of the partition P if Q
contains at least all the vertices of P . The idea of a refinement is thus that extra vertices get
added to a partition, without taking out any already existing vertices. Note that according to
our definition P is a refinement of itself; it is not necessary for a refinement to have strictly
more vertices than the original partition. As long as none of the existing vertices are taken out
of P , the resulting new partition is a refinement of P .
Now that we have defined the concept of refinement, we need to figure out what refining a
partition does to its corresponding upper and lower sums.

101
y=f(x)

Figure 11: A Riemann lower sum and the effect of refinement

Lemma 14.5. Let a, b ∈ R with a < b. Let f be a real-valued bounded function with domain
[a, b]. Using the notation as in Definition 14.2

1. Let P and Q be partitions of [a, b]. If Q is a refinement of P , then

L(P, f ) ≤ L(Q, f ), and U (P, f ) ≥ U (Q, f ).

2. Let P1 and P2 be partitions of [a, b]. Then

L(P1 , f ) ≤ U (P2 , f ).

Proof. The proof of result 1 is optional in the context of this module. It can be found in
Appendix B.13. An indication of why this statement might be true can be gleaned from
Figure 11: An extra vertex has been introduced, and the lower sum has increased. Of course
a sketch is not a proof!
To prove statement 2, let P1 and P2 be partitions of [a, b]. Define P := P1 ∪ P2 , then P
is a refinement of P1 and a refinement of P2 . We can now use the result from statement 1
together with Lemma 14.3 to conclude

L(P1 , f ) ≤ L(P, f ) ≤ U (P, f ) ≤ U (P2 , f ).

Note: Statement 1 from Lemma 14.5 tells us that, if we refine a partition, the value of the
Riemann lower sum cannot go down and the value of the Riemann upper sum cannot go up.
Statement 2 has nothing to do with refinements, but tells us that any lower sum is always less
than or equal to any upper sum (for the same function f ), regardless of the partitions used in
the lower and upper sum. Those two results together tell us something very interesting. If we
keep refining our partitions, the values of our lower sums will keep increasing (or at the very
least they will not decrease), but they can never become larger than any of the upper sums,
which themselves are decreasing (or at least not increasing) with each refinement. This conjures
up the image that the lower and upper sum approximations of the area are approaching each
other when we refine the partitions. But will they ever get close enough to each other to draw
the conclusion that we have approximated the area with sufficient accuracy? Let us have a
rigorous look to see what that means.

102
Lemma 14.6. Let a, b ∈ R with a < b. Let f be a real-valued bounded function with
domain [a, b]. Let U (P, f ) and L(P, f ) denote the upper and lower sum of f corresponding
to a partition P , respectively. Then

inf{U (P, f ) : P is a partition of [a, b]} ∈ R

and
sup{L(P, f ) : P is a partition of [a, b]} ∈ R.

Proof. Since f is bounded, there exists an M > 0 such that, for all x ∈ [a, b], |f (x)| < M .
By Lemma 14.3, for all partitions P of [a, b], we have U (P, f ) ≥ −M (b − a). Since there
exists a partition of [a, b], the set {U (P, f ) : P is a partition of [a, b]} is a nonempty subset
of R. Hence, by the greatest lower bound property of R, we know the infimum of this
set exists in R. Similarly, we can use the least upper bound property of R to show that
sup{L(P, f ) : P is a partition of [a, b]} ∈ R.
Note: Now that Lemma 14.6 has shown us that the particular infimum and supremum from
that lemma exist as real numbers, it turns out to be useful to give these quantities their own
notation and name, because they play a very important role in Riemann integration. In a
sense, inf{U (P, f ) : P is a partition of [a, b]} tells us what is the best approximation of the area
under the graph of f we can achieve by approximation ‘from above’ by upper sums. Similarly,
sup{L(P, f ) : P is a partition of [a, b]} tells us what is the best we can do approximating ‘from
below’ with lower sums. When thinking about these quantities in this way we are implicitly
assuming that our approximations via upper sums will never go below the ‘true value’ of the
area under the graph of f and that neither will our approximations with lower sums go above
this ‘true value’. This is not something we have shown to be the case. In fact, we have not
discussed the ‘true value’ of the area at all. There is a very good reason for this. As alluded
to before, we have not defined what “area” actually means. Even though we have been talking
about lower and upper sums as ‘approximating the area under the graph or curve’, what we are
actually doing here is using the lower and upper sums to define what the area under the graph
is. Let us continue with this programme, because we are not quite there yet.

Definition 14.7. Let a, b ∈ R with a < b. Let f be a real-valued bounded function with
domain [a, b]. The Riemann upper integral (or upper integral) of f from a to b is
Z b
f (x) dx := inf{U (P, f ) : P is a partition of [a, b]}.
a

The Riemann lower integral (or lower integral) of f from a to b is


Z b
f (x) dx := sup{L(P, f ) : P is a partition of [a, b]}.
a

Lemma 14.8. Let a, b ∈ R with a < b. Let f be a real-valued function with domain [a, b].

103
Assume that there exists an M > 0 such that, for all x ∈ [a, b], |f (x)| ≤ M . Then
Z b Z b
−M (b − a) ≤ f (x) dx ≤ f (x) dx ≤ M (b − a).
a a

Proof. By Lemma 14.3 we know that, for all partitions of [a, b], L(P, f ) ≥ −M (b − a).
Therefore
Z b
f (x) dx = sup{L(P, f ) : P is a partition of [a, b]} ≥ −M (b − a).
a

The details of the proof of the previous statement are left to the reader as a good exercise to
Rb
practise with proofs containing a supremum. The proof that a f (x) dx ≤ M (b − a) follows
along similar lines.
Rb Rb
Finally we need to prove that a f (x) dx ≤ a f (x) dx. For a proof by contradiction, assume
Rb Rb R 
b Rb
that a f (x) dx > a f (x) dx. Define ε := 21 a f (x) dx − a f (x) dx > 0. By the definitions
of supremum and infimum, there exist partitions P1 and P2 of [a, b] such that
Z b Z b Z b ! Z
b
1
L(P1 , f ) ≥ f (x) dx − ε = f (x) dx + f (x) dx = f (x) dx + ε ≥ U (P2 , f ).
a 2 a a a

This, however, contradicts result 2 in Lemma 14.5.


We are now finally ready to define the Riemann integral!

Definition 14.9. Let a, b ∈ R with a < b. Let f be a real-valued bounded function with
domain [a, b]. The function f is Riemann integrable on [a, b] if
Z b Z b
f (x) dx = f (x) dx.
a a

If f is Riemann integrable, we define


Z b Z b Z b
f (x) dx := f (x) dx = f (x) dx.
a a a

and call this value the Riemann integral of f .

Note: Even though in Definition


Rb 14.9 weRchose to callR the integration variable x, it does
b b
not matter whether you write a f (x) dx, or a f (t) dt, or a f (,) d,, etc. All these notations
mean exactly the same thing. We call x, t, ,, etc. in these situations a dummy variable. It is
important to use a symbol for the dummy variable that is not yet in use for something else31
Rb Rb Rb Rb
The same goes for the upper and lower integrals a f (x) dx, a f (t) dt, a f (x) dx, a f (t) dt, etc.

Note: Remember that our aim was to define a quantity that can reasonably be considered to
be the area under the graph of f . The Riemann integral is what we have come up with. Note
31 Pn
Compare this, for example, to the dummy index k in a summation of the P form k=1 ak , where, for all
n
k ∈ {1, . . . , n}, ak ∈ R. You can replace the k by any other unused symbol, e.g. i=1 ai , without changing the
meaning of the summation.

104
that if f changes sign, then what we are really defining is a signed area, in the sense that the
area under the x-axis contributes negatively to the value of the Riemann integral.
Rb
Note: You might wonder why we call a f (x) dx the “Riemann integral” and not just the
“integral”. We could do that and for the purposes of this module the Riemann integral is
the only kind of integral we will encounter, so it will not cause confusion to just call it the
“integral”. In the wider world of mathematics, however, there are other kinds of integrals and
in certain situations it is good to distinguish which kind of integral is meant. We will give a
little more information about this in the note on page 112.

Note: In all of this section we have been dealing with bounded functions. We needed the
boundedness of the function to assure that the quantities mk (P, f ) and Mk (P, f ) from Defini-
tion 14.2 are finite and hence the Riemann upper and lower sums and upper and lower integrals
are well-defined. In this module we will not deal with “improper (Riemann) integrals”, which
are a way to define (in certain cases) Riemann integrals of functions which are unbounded near
the endpoints of the (half-)open interval on which they are defined.

Corollary 14.10. Let a, b ∈ R with a < b. Let f be a real-valued function with domain
[a, b]. Assume that there exists an M > 0 such that, for all x ∈ [a, b], |f (x)| ≤ M . Moreover,
assume f is Riemann integrable on [a, b]. Then
Z b
f (x) dx ≤ M (b − a).
a

Proof. By Lemma 14.8 it follows immediately that


Z b
−M (b − a) ≤ f (x) dx ≤ M (b − a).
a

Example: Define the function f : [0, 1] → R by


(
p
1, if there exist p, q ∈ N ∪ {0} such that x = 10q ,
f (x) :=
0, otherwise.

The function f is bounded, but is it Riemann integrable?


Let P := {x0 , ...., xn } be a partition of [a, b]. Then 0 = x0 < x1 < . . . < xn = 1. Let
k ∈ {1, . . . , k}. By the density of Q in R, there exists an x ∈ [xk−1 , xk ] of the form x = 10pq ,
for some p, q ∈ N ∪ {0}. For completeness, we prove this claim in the optional material in
Appendix B.14. Hence, for this x, f (x) = 1. Similarly, by the density of Qc , we find that there
exists an x ∈ Qc ∩ [xk−1 , xk ] and thus, for this x, f (x) = 0. Let mk (P, f ) and Mk (P, f ) be as in
Definition 14.2, then these considerations show that Mk (P, f ) = 1 and mk (P, f ) = 0.
Thus, for the upper sum we compute
n
X
U (P, f ) = 1(xk − xk−1 ) = xn − x0 = 1 − 0 = 1.
k=1

For the lower sum we find


n
X
L(P, f ) = 0(xk − xk−1 ) = 0.
k=1

105
Since these computations were for an arbitrary partition P , we deduce that
Z 1 Z 1
f (x) dx = 0 6= 1 = f (x) dx.
0 0

Thus f is not Riemann integrable.


This example shows something very important: Not every function is Riemann inte-
grable, not even when we restrict ourselves to bounded functions which are defined on nonempty
closed intervals.

Example: Let M > 0 and assume the function g : [0, 1] → [0, M ] satisfies that, for all
x ∈ [0, 1], g(x) ≥ 0. Note that, by the definition of the function’s codomain, the function is
bounded. Moreover, assume there exists an m ∈ N and a finite set Y := {y1 , . . . , ym } ⊂ [0, 1]
such that, for all x ∈ [0, 1] \ Y , g(x) = 0. So g is a nonnegative function which is zero except
possibly at finitely many points. Is g Riemann integrable?
Let l ∈ N and let Q := {x0 , . . . , xl } be a partition of [0, 1]. Since Y is finite and any interval
[xk−1 , xk ] contains infinitely many points, each such interval contains an x such R 1 that g(x) = 0.
Hence, for all k ∈ {1, . . . , n}, mk (Q, g) = 0 and thus L(Q, g) = 0. Therefore 0 g(x) dx = 0.
Now let n ∈ N and let P := {0, n1 , . . . , n−1 n , 1} be a partition of [0, 1] with vertices, for all
k ∈ {1, . . . , n}, xk := nk . Let yınY . If there exists a k ∈ {1, . . . , n − 1} such that y = xk , then y
belongs to both the intervals [xk−1 , xk ] and [xk , xk+1 ]. Otherwise y belongs to only one interval
of the form [xk , xk+1 ] with k ∈ {1, . . . , n}. If, for a given k ∈ {1, . . . , n}, the interval [xk−1 , xk ]
contains a y from Y , then 0 < Mk (P, g) ≤ M . Otherwise Mk (P, g) = 0. Since each of these
intervals has length n1 , we find that (using the notation from Definition 14.2)
n
X 1
0 ≤ U (P, g) = Mk (P, g)(xk − xk−1 ) ≤ 2mM .
n
k=1

Let ε > 0. Since m and M are fixed constants, there exists an n ∈ N and a corresponding
R1
partition P = {0, n1 , ...., n−1
n , 1} such that 0 ≤ U (P, g) < ε. Thus 0 g(x) dx = 0.
What we have shown in this example is a very useful fact: For any bounded real-valued
function on a nonempty closed domain which takes positive values at at most finitely many
points and which takes the value zero everywhere else, the Riemann integral is zero. In this
sense the Riemann integral ‘does not see’ what happens at single points.
Another example of an explicit computation of a Riemann integral is provided near the end
of this section, on page 111.
We will have a look now at some more general classess of functions which are Riemann
integrable.

Theorem 14.11. Let a, b ∈ R with a < b and let f : [a, b] → R be a monotone function.
Then f is Riemann integrable on [a, b].

Proof. We prove the result in the case where f is non-decreasing. If f is non-increasing, the
proof follows in a similar way. We leave the details of that case as an exercise for the reader.
Let f be non-decreasing. We first note that then, for all x ∈ [a, b], f (a) ≤ f (x) ≤ f (b),
hence f is bounded and we can apply the apparatus we have developed in this section so far.
Let ε > 0 and let n ∈ N and let P := {x0 , . . . , xn } be a partition of [a, b] such that, for all
k ∈ {1, . . . , n}, we have xk −xk−1 ≤ ε. Since f is non-decreasing, we have for all k ∈ {1, . . . , n}

106
and for all x ∈ [xk−1 , xk ], that f (xk−1 ) ≤ f (x) ≤ f (xk ) and thus Mk (P, f ) = f (xk ) and
mk (P, f ) = f (xk−1 ). Hence, we compute
n
X n
X
U (P, f ) − L(P, f ) = (Mk (P, f ) − mk (P, f ))(xk − xk−1 ) = (f (xk ) − f (xk−1 ))(xk − xk−1 )
k=1 k=1
Xn
≤ (f (xk ) − f (xk−1 )) ε = (f (xn ) − f (x0 ))ε = (f (b) − f (a))ε.
k=1

Using Lemma 14.8 together with the definitions of the upper and lower integral, we find
Z b Z b Z b
f (x) dx ≤ f (x) dx ≤ U (P, f ) ≤ L(P, f ) + (f (b) − f (a)) ε ≤ f (x) dx + (f (b) − f (a)) ε.
a a a

Rb Rb
Taking the limit ε → 0+ and using the sandwich theorem, we find that a f (x) dx = a f (x) dx
and thus f is Riemann integrable.

Theorem 14.12. Let a, b ∈ R with a < b. Let f : [a, b] → R be a continuous function.


Then f is Riemann integrable on [a, b].

Proof. First we remember, from Theorem 10.12, that f is a bounded function, because it
is a continuous function on a closed and bounded domain in R. Hence, we can apply the
machinery developed in this section.
Define
Z b Z b
C := f (x) dx − f (x) dx ≥ 0.
a a

We will prove the following claim: For every m ∈ N there exist sm , tm ∈ [a, b] with

1 C
|sm − tm | < and |f (sm ) − f (tm )| ≥ D := . (24)
m 2(b − a)

If we prove this claim, then by the Bolzano–Weierstraß theorem (Theorem 3.19) there exists
a convergent subsequence (smk ) of the sequence (sm ) and a u ∈ [a, b] such that (smk ) → u,
as k → ∞. Hence tmk → u as k → ∞ and

C
D := ≤ |f (smk ) − f (tmk )|.
2(b − a)

By continuity of f and continuity of the absolute value function (i.e. the Euclidean norm in
R; see Lemma A.13), we have |f (smk ) − f (tmk )| → |f (u) − f (u)| = 0 as k → ∞. Thus, by
the sandwich theorem we find that D = 0 and thus C = 0, which proves that f is Riemann
integrable.
To prove the claim from (24), let m, n ∈ N and let P := {x0 , . . . , xn } be a partition of [a, b]
1
with the property that, for every k ∈ {1, . . . , n}, xk − xk−1 < m . Then, in the notation of

107
Definition 14.2,
Z b Z b
0≤C= f (x) dx − f (x) dx ≤ U (P, f ) − L(P, f )
a a
n n
!
X X
= (Mk (P, f ) − mk (P, f ))(xk − xk−1 ) ≤ (xk − xk−1 ) max{Mk (P, f ) − mk (P, f )}
k
k=1 k=1
= (b − a) max{Mk (P, f ) − mk (P, f )}.
k

Hence, there exists a k ∈ {1, . . . , n} with Mk (P, f ) − mk (P, f ) ≥ C/(b − a) = 2D. By


the definitions of supremum and infimum, there exist sm , tm ∈ [xk−1 , xk ] such that f (sm ) <
mk (P, f )+ 21 D and f (tm ) > Mk (P, f )− 12 D. Hence |f (sm )−f (tm )| ≥ D. Moreover, |sm −tm | ≤
1
xk − xk−1 < m . This proves (24).
The following two theorems (and corollary) show that we can ‘glue together’ or ‘cut apart’
the interval(s) on which f is Riemann integrable and still be left with a Riemann integrable
function on the new interval(s).

Theorem 14.13. Let a, b ∈ R with a < b. Let f : [a, b] → R be a bounded function.


Let c ∈ (a, b) and assume that f is Riemann integrable on [a, c] and on [c, b]. Then f is
Riemann integrable on [a, b] and
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx. (25)
a a c

Proof. The proof is not difficult, but it is optional for this module. We present it in Ap-
pendix B.15.

Corollary 14.14. Let a, b ∈ R with a < b. Let f : [a, b] → R be a bounded function which
is Riemann integrable on [a, b]. Let x, y ∈ [a, b] with x < y. Then
Z y Z y Z x
f (t) dt = f (t) dt − f (t) dt.
x a a

Proof. This follows from Theorem 14.13. Specifically, if we apply (25) on the domain [a, y]
(instead of [a, b]) with c = x, then the result follows.

Theorem 14.15. Let a, b ∈ R with a < b. Let f : [a, b] → R be a bounded function. Let
c ∈ (a, b) and assume that f is Riemann integrable on [a, b] Then f is Riemann integrable
on [a, c] and [c, b].

Proof. Also this proof is not difficult, but optional for this module. It is presented in
Appendix B.16.

Theorem 14.16. Let a, b ∈ R with a < b. Let f : [a, b] → R and g : [a, b] → R both
be bounded functions which are Riemann integrable on [a, b]. Then the function f + g is

108
Riemann integrable on [a, b] and
Z b Z b Z b
(f + g)(x) dx = f (x) dx + g(x) dx.
a a a

Proof. The proof is again not very difficult. It is optional for this module and we present it
in Appendix B.17.
Instead of treating the upper limit of the domain of integration b as a fixed number, we can
consider the function we get by varying this upper limit of the integral.

Theorem 14.17 (First fundamental theorem of (the) calculus). Let a, b ∈ R with a < b.
Let f : [a, b] → R be a bounded R xfunction which is Riemann integrable on [a, b]. Define the
function F : [a, b] → R, x 7→ a f (t) dt. Let c ∈ (a, b). If f is continuous at c, then F is
differentiable at c and F 0 (c) = f (c).

Proof. Because f is bounded, there exists an M > 0 such that, for all t ∈ [a, b], |f (t)| < M .
Let x, y ∈ [a, b] with x < y. Then, by Corollary 14.14,
Z y
|F (y) − F (x)| = f (t) dt ≤ M (y − x).
x

To obtain the inequality we used Corollary 14.10. Thus F is continuous. (Can you fill in
the details yourself that show that F is indeed continuous? Why does that follow from the
inequality above?)
Let h > 0 such that c + h ≤ b. Such an h exists since c is an element of the open interval
(a, b). Now define

s(h) := sup{f (t) : c ≤ t ≤ c + h} and i(h) := inf{f (t) : c ≤ t ≤ c + h}.

Then, by a proof similar to that of Lemma 14.8, we have


Z c+h Z c+h Z c+h Z c+h
f (t) dt = f (t) dt ≤ s(h) h and f (t) dt = f (t) dt ≥ i(h) h.
c c c c

Dividing by h > 0 gives


R c+h
c f (t) dt F (c + h) − F (c)
i(h) ≤ = ≤ s(h),
2h h
where we used Corollary 14.14 again. Because f is continuous, we have limt→c+ f (t) = f (c)
and thus, by Lemma B.5, limh→0+ i(h) = limh→0+ s(h) = f (c). Thus, by the sandwich
theorem,
F (c + h) − F (c)
lim = f (c).
h→0+ h
A similar argument as above applied on the interval [c − h, c] (details left to the reader) shows
that also limh→0− F (c+h)−F
h
(c)
= f (c). Thus

F (c + h) − F (c)
F 0 (c) = lim = f (c).
h→0 h

109
Note: In the theorem above it is not true that F is necessarily differentiable at c if we do not
make extra assumptions, such as the assumption that f is continuous at c, which Rwe make in
x
the theorem. Can you come up with an example of a function f for which F (x) = a f (t) dt is
well-defined, but such that F is not differentiable everywhere?

Definition 14.18. Let f and F be real-valued functions whose domains both contain an
open interval I ⊂ R. We say that F is an antiderivative of f on I, if, for all x ∈ I,
F 0 (x) = f (x).

Note: Carefully note that in definition 14.18 we speak of an antiderivative, not the antideriva-
tive. This is because, if the function f has one antiderivative, it has infinitely many: if F is an
antiderivative of f on I, then, for all c ∈ R, F + c is an antiderivative of f on I. Can you prove
this?

Note: The implications of the first fundamental theorem of calculus (Theorem 14.17) are very
interesting. (There is a reason why it is called “fundamental”.) It shows that we can obtain
antiderivatives of continuous functions, by Riemann integration!

Theorem 14.19 (Second fundamental theorem of (the) calculus). Let a, b ∈ R with a < b.
Let f : [a, b] → R be Riemann integrable on [a, b] and let F : [a, b] → R be continuous on
[a, b]. Moreover, assume that, for all x ∈ (a, b), F 0 (x) = f (x). Then
Z b
f (x) dx = F (b) − F (a).
a

Proof. Let n ∈ N and let P := {x0 , . . . , xn } be a partition of [a, b]. By the mean value
theorem, for every k ∈ {1, . . . , n}, there exists a ck ∈ (xk−1 , xk ) such that F (xk ) − F (xk−1 ) =
f (ck )(xk − xk−1 ) and thus
n
X n
X
F (b) − F (a) = (F (xk ) − F (xk−1 )) = f (ck )(xk − xk−1 ).
k=1 k=1

With the notation as in Definition 14.2, we have, for all k ∈ {1, . . . , n}, mk (P, f ) ≤ f (ck ) ≤
Mk (P, f ). Hence
n
X
F (b) − F (a) ≤ Mk (P, f )(xk − xk−1 ) = U (P, f ),
k=1
Xn
F (b) − F (a) ≥ mk (P, f )(xk − xk−1 ) = L(P, f ). (26)
k=1

Taking the infimum over all partitions P of [a, b] in the first inequality above, gives
Z b Z b
F (b) − F (a) ≤ f (t) dt = f (t) dt,
a a

where the equality follows since f is assumed to be Riemann integrable on [a, b]. If we take
the supremum over all partitions P of [a, b] in the inequality on the second line of (26), then
we get
Z b Z b
F (b) − F (a) ≥ f (t) dt = f (t) dt.
a a

110
Combining these final two inequalities, we conclude that
Z b
F (b) − F (a) = f (t) dt.
a

Note: The second fundamental theorem of calculus (Theorem 14.19) shows a remarkable con-
nection between two concepts which, a priori, do not appear to be connected: area under a
graph and the antiderivative of a function. The theorem says that we can find the R barea under
the graph of f between x = a and x = b (assuming that we are happy to consider a f (x) dx to
be this (signed) area) by computing the antiderivative of f at x = a and x = b. This is a very
deep connection, whose implications are hard to overstate. It draws direct connections between
geometry and calculus. A fundamental theorem indeed!

Example: Now we provide another explicit calculation of a Riemann integral without using
antiderivatives.
Consider the curve y = x2 for x ∈ [0, 1] and the area A bounded by this curve, the positive
x-axis, and the line from (1, 0) to (1, 1) (the “area under the curve”). In other words, we are
looking for the area under the graph of the function f : [0, 1] → R, x 7→ x2 . Using the second
fundamental theorem of calculus (Theorem 14.19) together with the fact that, if F : [0, 1] →
R, x 7→ 13 x3 , then, for all x ∈ (0, 1), F 0 (x) = f (x), we find that
Z 1
1
A= f (x) dx = F (1) − F (0) = .
0 3
To illustrate how Riemann integration works with another example, however, we will now also
compute the Riemann integral (the “area under the curve”) directly, without using the second
fundamental theorem of calculus.
Let n ∈ N and consider the partition P := {x0 , . . . , xn } of [0, 1] with, for all k ∈ {0, 1, . . . , n},
xk := nk . Then the part of the required area between x = k−1 k
n and x = n can be enclosed in a
2 2
rectangle of base n1 and height nk . Or, in the notation of Definition 14.2, Mk (P, f ) = nk .


Thus
n n
Z 1 !
X (k/n)2 X n(n + 1)(2n + 1)
f (x) dx ≤ U (P, f ) = = k 2 n−3 = .
0 n 6n3
k=1 k=1
The final equality above is easily proved by induction on n. Have a try yourself to test your
proficiency with simple proofs by induction. For completeness, we have also included the result
as Lemma A.14 in Appendix A. We have

1 1 + n1 2 + n1
 
n(n + 1)(2n + 1) 1
3
= → , as n → ∞.
6n 6 3
Hence Z 1
1
f (x) dx ≤ .
0 3
We can also see that the part of the required area between x = k−1
n and x = k/n encloses a
k−1 2 2
rectangle of base n and height n . In the notation of Definition 14.2, mk (P, f ) = k−1
1

n .
Thus
n n−1 n−1
Z 1 ! ! !
X
2 −3
X
2 −3
X
2 (n − 1)n(2n − 1)
f (x) dx ≥ L(P, f ) = (k − 1) n = k n = k = ,
0 6n3
k=1 k=0 k=1

111
by Lemma A.14. We have
1 1
 
(n − 1)n(2n − 1) 1+ n 1 2− n 1
3
= → , as n → ∞.
6n 6 3
Thus we have Z 1 Z 1
1 1
≤ f (x) dx ≤ f (x) dx ≤ .
3 0 0 3
R1 R1
Hence 0 f (x) dx = 0 f (x) dx and we conclude that f is Riemann integrable on [0, 1] and that
Z 1
1
A= f (x) dx = .
0 3

Note: In this section we have introduced Riemann integration as a way to make sense of the
notion of “(signed) area under the graph of a function”. The fundamental theorems of calculus
showed us that this notion of area is very closely related to the concept of antiderivative. Not
everything, however, works out well for Riemann integration. For example, on page 105 we
saw an example of a bounded function which is not Riemann integrable. To address issues
such as this and make integration more widely applicable, people have come up with different
kinds of integration. The most widely used form of integration for functions whose domains and
codomains are subsets of some Euclidean space Rd in modern mathematical practice is Lebesgue
integration. The definition of a Lebesgue integral proceeds via a very different route than our
definition of the Riemann integral in these notes. The good news, however, is that the Lebesgue
integral gives exactly the same values as the Riemann integral for those functions which are
Riemann integrable. But the Lebesgue integral can also deal with many functions which are
not Riemann integrable, such as our example from page 105 (for this example, it turns out, the
value of the Lebesgue integral is zero). If you are interested in learning more about Lebesgue
integration, then G13LNA/MATH3020 is a good module to choose in your third year.

Example: We end the main part of these notes with another example of a ‘shortcoming’ of
Riemann integration which is (at least partially) addressed by the theory of Lebesgue integra-
tion.
For all n ∈ N, define the functions fn : [0, 1] → R by
(
1, if there exists a p ∈ N ∪ {0} such that x = 10pn ,
fn (x) :=
0, otherwise.

Then each function fn is zero everywhere on its domain except at finitely many points. Thus
each fn satisfies
R1 the assumptions for the function g from an example on page 106. Hence, for
all n ∈ N, 0 fn (x) dx = 0.
Now consider the pointwise limit of the sequence (fn ), i.e. the function f : [0, 1] → R which
is defined by
f (x) := lim fn (x).
n→∞

Let x ∈ [0, 1]. If, for all n ∈ N, fn (x) = 0, then f (x) = 0. If, on the other hand, there exists a
q ∈ N such that fq (x) = 1, then that means that there exists p ∈ N ∪ {0} such that x = 10pq .

112
n−q
Then, for all n ∈ N which satisfy n ≥ q, we have x = 1010n p . Since 10n−q ∈ N ∪ {0}, this means
that, for all n ≥ q, fn (x) = 1 and thus f (x) = 1. Hence
(
1, if there exist p ∈ N ∪ {0} and there exists q ∈ N such that x = 10pq ,
f (x) :=
0, otherwise,
(
1, if there exist p, q ∈ N ∪ {0} such that x = 10pq ,
=
0, otherwise.

Hence this function f is exactly the same function as the f from an example on page 105.
In that example we proved that f is not Riemann integrable on [0, 1]. We thus see that the
pointwise limit of Riemann integrable functions is not necessarily Riemann integrable!
To end on a high note, we give a result that shows that Riemann integrability behaves a lot
better under uniform convergence.

Theorem 14.20. Let a, b ∈ R with a < b. For all n ∈ N, let fn : [a, b] → R be a bounded
function which is Riemann integrable on [a, b]. Assume that the sequence (fn ) converges
uniformly to the function f : [a, b] → R. Then f is bounded and Riemann integrable on
[a, b]. Moreover,
Z b Z b
lim fn (x) dx = f (x) dx.
n→∞ a a

Proof. The proof is a nice combination of techniques we have encountered when we discussed
uniform convergence and techniques we have seen many times now in this section. It is
included in Appendix B.18.

113
A Background material
This section contains background material which we did not include in the main text, because it
would have interrupted the flow of the notes. However, this material is not optional. Rather,
most of this section can be seen as revision of G11ACF/MATH1005 and other Year 1 Core
modules, sometimes perhaps presented in a different way than in those Year 1 modules.
We have included it here as a reminder of some Year 1 material, but also to make these notes
as self-contained as possible and to provide the required rigour in the proofs of these results from
Year 1. For some of these concepts and results you might have already seen rigorous definitions
and proofs in Year 1, for others you probably have not. Because we do not want to interrupt
the flow of the main text too much, we include them here.

A.1 Subsets of Rd

Definition A.1. If I ⊂ R, then we say I is an interval if, for all x, y ∈ I, z ∈ [x, y]


implies that z ∈ I.

Example: The definition of interval above is consistent with all the usual examples of intervals
you are familiar with32 : (a, b), [a, b], (a, b], [a, b), (−∞, a), (−∞, a], (a, ∞), [a, ∞). Note that
R itself is also an interval. Those are not all the possible forms intervals can take, though.
According to Definition A.1 also the empty set ∅ and the singleton {a} are intervals. To
distinguish these from the intervals listed before (which all contain at least two elements), ∅ and
{a} are called degenerate intervals. All the other intervals are nondegenerate intervals.
It is useful to be able to talk about the endpoints of an interval, consisting of the left-hand
endpoint of an interval and the right-hand endpoint of an interval. For (a, b) and
[a, b],(a, b], and [a, b), a is the left-hand endpoint and b the right-hand endpoint of the interval.
For (−∞, a) and (−∞, a], −∞ is the left-hand endpoint and a is the right-hand endpoint of the
interval. For (a, ∞) and [a, ∞), a is the left-hand endpoint and ∞ is the right-hand endpoint
of the interval. For {a} = [a, a], a is both the left-hand endpoint and right-hand endpoint of
the interval. Finally, ∅ and R have no endpoints. Note that −∞ and +∞ are not ‘points’
or numbers in any usual sense of the word, but it is useful to include them under the name
“endpoints of the interval”.
The intervals ∅, R, (a, b), (−∞, a), and (a, ∞) are called open intervals. In Section 8
we will learn why. The intervals ∅, R, [a, b], (−∞, a], [a, ∞) are called closed intervals. In
Section 9 we will learn why. Note that ∅ and R are both open and closed. Sometimes the
intervals [a, b) and (a, b] are called half-open intervals or half-closed intervals.
Note that some authors use ]a, b[ to denote the open interval {x ∈ R : a < x < b}. I will use
the more standard notation (a, b).

0
Definition A.2. If A, B ⊂ Rd and C ⊂ Rd (where d and d0 can be different), then

• the complement (or absolute complement) of A in Rd is

Ac := {x ∈ Rd : x is not an element of A};

32
Here I assume a, b ∈ R with a < b.

114
• the relative complement of B with respect to A (also called the set difference) is

A \ B := {x ∈ A : x ∈ B c };

• the Cartesian product of A and C is


0
A × C := {(x, y) ∈ Rd+d : x ∈ A and y ∈ C}.

Note: In the definition above, note that Ac = Rd \ A.

A.2 Functions

Definition A.3. A function f is a triplet (A, B, C), where A and B are sets, C ⊂ A × B
and for every x ∈ A there is a unique y ∈ B such that (x, y) ∈ C. The set A is the domain
of the function, the set B is the codomain of the function. We denote this by f : A → B.
If (x, y) ∈ C, we write f (x) = y or x 7→ y.
If A0 ⊂ A, then the image of A0 under f is

f (A0 ) := {y ∈ B : there exists an x ∈ A0 such that f (x) = y}.

The range of f is f (A), the image of the domain A under f .


If B 0 ⊂ B, then the pre-image of B 0 under f is

f −1 (B 0 ) := {x ∈ A : f (x) ∈ B 0 }

Note: In the definition above, note carefully that in order to define the pre-image f −1 (B 0 ) we
do not need to assume that f has an inverse function. Despite the fact that the inverse function,
if it exists, is also denoted by f −1 , it is important to understand the difference between the
pre-image and the inverse function.

Definition A.4. If A and B are sets, A0 ⊂ A, and f : A → B, then the restriction of f


to A0 , is the function f |A0 : A0 → B, x 7→ f (x).

Note: It is often useful to consider restrictions of functions to a subset of their original domain.
For example, if we have a function f : R → R and we want to apply a result which is stated
for a function with domain [a, b] (where a, b ∈ R with a < b), then we can restrict f to [a, b],
apply the result to get the desired property for f |[a,b] and then (depending on exactly what the
property is — this needs to be carefully considered) we can often infer the same property for f
on the subset [a, b] ⊂ R.

Definition A.5. A function is real-valued if its codomain is a subset of R.

Note: When we say that a function is real-valued, this says nothing about its domain. It
just means that its codomain is a subset of R (possibly all of R). The choice of comain is
very important when we are trying to determine if a function is surjective or bijective (see
Definition A.6), but in many other situations it is not very important whether the codomain is
R or one of its strict subsets (as long as the codomain contains the range of the function). Hence
it is often useful to just say that a function is real-valued instead of specifying its exact codomain.

115
In fact, all of the results in these notes which are formulated for functions with codomain R,
except for the ones pertaining to surjectivity and bijectivity, hold true for functions with a strict
subset of R as codomain (as long as it is large enough to contain the function’s range).

Definition A.6. Let A and B be sets and f : A → B. The function f is surjective (or
onto) if f (A) = B, i.e. if the range of f is equal to the codomain.
The function f is injective (or one-to-one or one-one) if, for all x1 , x2 ∈ A, f (x1 ) =
f (x2 ) ⇒ x1 = x2 .
The function f is called bijective if it is both surjective and injective.

Note: Remembering the definition of range of a function, we see from the definition above that
a function f : A → B is surjective if and only if for every y ∈ B there is an x ∈ A such that
f (x) = y.
A function is injective if different elements from the domain are mapped to different elements
of the codomain. This can be seen by considering the contrapositive of “[f (x1 ) = f (x2 )] ⇒
[x1 = x2 ]” in Definition A.6.

Definition A.7. Let E ⊂ Rd , f : E → R, and a ∈ E. The function f has a global


maximum (or maximum) at a if, for all x ∈ E, f (x) ≤ f (a). The function f has a
global minimum (or minimum) at a if, for all x ∈ E, f (x) ≥ f (a).

Definition A.8. Let U ⊂ R, f : U → R, and a ∈ U . Then f has a local maximum at a


if there exists an open interval I ⊂ U such that a ∈ I and, for all x ∈ I, f (x) ≤ f (a). The
function f has a local minimum at a if there exists an open interval J ⊂ U such that
a ∈ J and, for all x ∈ J, f (x) ≥ f (a).

A.3 Limits and continuity


When we want to combine various limits it is often very useful to be able to talk about ‘adding’
or ‘multiplying, even when the symbols −∞ or ∞ are involved. To this end it is useful to think
about the extended real number line. It is very important to remember that −∞ and +∞ are
not real numbers and the operations we define below on the extended real number line are only
useful in very particular situations (where they will be clearly indicated).

Definition A.9. The extended real number line R̄ is the union of the set R together
with two extra elements, denoted by −∞ and +∞: R := R ∪ {−∞, +∞}. We define the
following operations involving −∞ and +∞. Let x ∈ R̄.

• If x 6= −∞, then x + (+∞) := +∞ =: +∞ + x.

• If x 6= +∞, then x + (−∞) := −∞ =: −∞ + x.

• If x > 0 or x = +∞, then x(+∞) := +∞ =: +∞x.

• If x < 0 or x = −∞, then x(−∞) := −∞ =: −∞x

• If x 6= −∞ and x 6= +∞, then x


+∞ := 0 =: x
−∞ .

116
• If x > 0a , then +∞
x := +∞ and −∞
x := −∞.

• If x < 0b , then +∞
x := −∞ and +∞
x := +∞.
a
Note that this does not include the case x = +∞!
b
Note that this does not include the case x = −∞!

Note: In Definition A.9 not all possible combinations of additions, multiplications, or divisions
of elements from R̄ are defined. For example, (+∞) + (−∞) is not defined.

Definition A.10. If f : R → R, a ∈ R, and L ∈ R, we write limx→a− f (x) = L if, for


every ε > 0 there exists δ > 0 such that, for all x < a,

0 < |x − a| < δ ⇒ |f (x) − L| < ε.

We write limx→a+ f (x) = L if, for every ε > 0 there exists δ > 0 such that, for all x > a,

0 < |x − a| < δ ⇒ |f (x) − L| < ε.

We write limx→a f (x) = ∞, if, for every M ∈ R there exists a δ > 0 such that, for all
x ∈ R,
0 < |x − a| < δ ⇒ f (x) > M.
We write limx→a f (x) = −∞, if, for every M ∈ R there exists a δ > 0 such that, for all
x ∈ R,
0 < |x − a| < δ ⇒ f (x) < M.
We write limx→∞ f (x) = L, if, for every ε > 0 there exists a M ∈ R such that, for all
x ∈ R,
x > M ⇒ |f (x) − L| < ε.
We write limx→−∞ f (x) = L, if, for every ε > 0 there exists a M ∈ R such that, for all
x ∈ R,
x < M ⇒ |f (x) − L| < ε.

Note: Can you give the definitions of limx→∞ f (x) = ∞, limx→∞ f (x) = −∞, limx→−∞ f (x) =
∞, and limx→−∞ f (x) = −∞ yourself? What about the definitions of the one-sided limits
limx→a− f (x) = ∞, limx→a− f (x) = −∞, limx→a+ f (x) = ∞, and limx→a− f (x) = −∞?
Note that, in the definitions above which relate to a (two-sided or one-sided) limit involving
x tending to a, that we only have a condition on x ∈ R for which |x − a| > 0. That is to say,
we do not have any requirements at x = a.
It follows immediately from the definitions (both in the case L ∈ R and in the cases where
L = −∞ or L = +∞) that limx→a f (x) = L if and only if limx→a− f (x) = L and limx→a+ f (x) =
L.
Figure 12 shows an illustration of the condition in (8).
We define the usual operations on functions. When we write Rp , we assume p ∈ N.

Definition A.11. Assume U ⊂ Rd , f : U → Rq , g : U → Rq , and c ∈ R. Then the


function f + g : U → Rq is defined by, for all x ∈ U ,

(f + g)(x) := f (x) + g(x).

117
y

c x

Figure 12: An illustration of (8) with a = c. Figure from [5] (by HiTe; distributed under the
GNU Free Documentation License)

The function cf : U → Rq is defined by, for all x ∈ U ,

(cf )(x) := cf (x).

Now assume instead that U ⊂ Rd , f : U → Rq , and g : U → R, then the function


f g : U → Rq is defined by, for all x ∈ U ,

(f g)(x) := f (x)g(x).

We define gf := f g.
f
Moreover, if, for all x ∈ U , g(x) 6= 0, then we define f /g : U → Rq (or g : U → Rq ) by,
for all x ∈ U ,
(f /g)(x) := f (x)/g(x).
Finally, if V ⊂ Rd , W ⊂ Rp , f : V → Rp , g : W → Rq , and f (V ) ⊂ W , then the
function f ◦ g : V → Rq is defined by, for all x ∈ V ,

(g ◦ f )(x) := g(f (x)).

For the definition of g ◦ f above, we remember from Definition A.3 that the definition of the
image of the set V ⊂ Rd under the function f : V → Rp is defined as
f (V ) := {y ∈ Rp : there exists an x ∈ V such that y = f (x)}.
So the image of V under f consists of all elements y in the codomain of f for which there is a
corresponding element in V which gets mapped to y by f .
The following lemma contains useful results that tell us how we can combine limits.

Lemma A.12. Let U ⊂ Rd , f : U → Rq , g : U → Rq , c ∈ R, and assume a ∈ U . Assume


limx→a f (x) = Lf ∈ Rq and limx→a g(x) = Lg ∈ Rq . Then

1. limx→a (f + g)(x) = Lf + Lg ,

2. limx→a (cf )(x) = cLf .

118
Now let f : U → Rq and g : U → R. Assume limx→a f (x) = Lf ∈ Rq and limx→a g(x) =
Lg ∈ R. Then

3. limx→a (f g)(x) = Lf Lg ,
Lf
4. if, for all x ∈ U , g(x) 6= 0, and if Lg 6= 0, then limx→a gf (x) = Lg .

If V ⊂ Rd , W ⊂ Rp , f : V → Rp , g : W → Rq , f (V ) ⊂ W , limx→a f (x) = Lf ∈ W , and


limy→Lf g(y) = Lg ∈ Rp . Moreover, assume that at least one of the following assumptions
holds.

(i) The function g is continuous at Lf , or

(ii) there exists an open subset V 0 ⊂ V such that, for all x ∈ V 0 \ {a}, f (x) 6= Lf .

Then

5. limx→a (g ◦ f )(x) = Lg .

If d = 1 the results above also hold if a = −∞ or a = +∞. If q = 1 and if we use


the addition, multiplication, and division rules from the extended real number line R̄ (Def-
inition A.9), then the results in 1, 2, 3, and 4 also hold for Lf , Lg ∈ R̄ as long as the
L
corresponding operation (Lf + Lg , cLf , Lf Lg , or Lfg , respectively) is well-defined on R̄.
If p = 1 and Lf ∈ R̄ or if q = 1 and Lg ∈ R̄, then the result in 5 also holds.
Proof. We will prove statements 1 and 5 and leave the proofs of the other statements as an
exercise to the reader.
To prove statement 1, let ε > 0, then there exist δ1 > 0 and δ2 > 0 such that, for all x ∈ U
with kx − ak < δ1 we have kf (x) − Lf k < 2ε , and, for all x ∈ U with kx − ak < δ2 we have
kg(x) − Lg k < 2ε . Define δ := min(δ1 , δ2 ), then, for all x ∈ U with kx − ak < δ we have
ε ε
k(f + g)(x) − (Lf + Lg )k = kf (x) + g(x) − Lf − Lg k ≤ kf (x) − Lf k + kg(x) − Lg k < + = ε.
2 2
Hence limx→a (f + g)(x) = Lf + Lg .
Now we prove result 5. Let ε > 0 Because limy→Lf g(y) = Lg there exists η > 0 such that
for all y ∈ W ,
0 < ky − Lf k < η ⇒ kg(y) − Lg k < ε. (27)
Since limx→a f (x) = Lf there exists δ > 0 such that for all x ∈ V ,

0 < kx − ak < δ ⇒ kf (x) − Lf k < η. (28)

If assumption (i) holds, then g is continuous at Lf and thus, since limy→Lf g(y) = Lg , we
have g(y) = Lg , which means that kg(Lg )−Lg k = 0 < ε. Hence we can extend the statement
in (27) to
ky − Lf k < η ⇒ kg(y) − Lg k < ε.
Combining this with (28) shows that, for all x ∈ V ,

0 < kx − ak < δ ⇒ kg(f (x)) − Lg k < ε. (29)

If assumption (ii) holds, then there exists a δ 0 ∈ (0, δ] such that, for all x ∈ V ,

0 < kx − ak < δ 0 ⇒ kf (x) − Lf k > 0.

119
Combining this with (28), we find that, for all x ∈ V ,

0 < kx − ak < δ 0 ⇒ 0 < kf (x) − Lf k < η.

Combining this with (27) we again find (29).


Note: The results in Lemma A.12 also hold if d = 1 and we replace the limits limx→a with
one-sided limits limx→a− or limx→a+ . Prove this yourself.

Lemma A.13. Let U ⊂ Rd .

1. Assume c ∈ Rq . If f : U → Rq is the constant function defined by, for all x ∈ U ,


f (x) := c, then f is continuous (on its domain U ).

2. If g : U → Rd is the identity function, defined by, for all x ∈ U , g(x) := x, then g is


continuous (on its domain U ).

3. The norm function k · k : Rd → R is continuous on Rd .

4. If p : Rd → R is a (multivariate) polynomial, then p is continuous (on Rd ).

5. If q : U → R is a (multivariate) rational function, then q is continuous (on its domain


U ).

6. The square root function · : [0, ∞) → R is continuous (on [0, ∞)).

7. The trigonometric functions sin : R → R and cos : R → R are continuous (on R).

8. The exponential function exp : R → R is continuous (on R); the natural logarithm
log : (0, ∞) → R is continuous (on (0, ∞)).
Proof. Let a ∈ U . To prove continuity at a of the constant function given by, for all x ∈ U ,
f (x) := c, let (xn ) be a sequence in U which converges to a. Then, for all n ∈ N, f (xn ) = c
and thus (f (xn )) converges to c. Hence, by Lemma 10.8, f is continuous at a. Since a is an
arbitrary element of the domain U , f is continuous.
Let a ∈ U . To prove continuity at a of the identity function g, let (xn ) be a sequence in U
which converges to a. Then, for all n ∈ N , g(xn ) = xn and thus the sequence (g(xn )) is
equal to the sequence (xn ) in Rd and hence converges to a = g(a). Thus, by Lemma 10.8, g
is continuous at a and therefore g is continuous.
The continuity of the norm follows from Lemma 5.13 combined with Lemma 10.8.
For multivariate polynomials note that each multivariate polynomial is a sum of terms, each
of which is a product of single-variable polynomials. So, by Lemma 10.11 it suffices to prove
that single-variable polynomials are continuous. We will use mathematical induction to prove
that, for all n ∈ N, all polynomials pn : R → R of degree n are continuous. First let n = 0.
Any polynomial p0 of degree 0 is a constant function and thus continuous by our proof above.
Now assume that, for some k ∈ N, all polynomialsPof degree k are continuous. Let pk+1
k+1 i
be a polynomial of degree k + 1, then pk+1 (x) = i=0 ci x for some coefficients ci ∈ R
(i ∈ {0, . . . , k + 1}). Thus
k+1
X
pk+1 (x) = x ci xi−1 + c0 .
i=1
Pk+1 i−1 is a polynomial of degree k, it is continuous by our induction step
Because i=1 ci x
assumption. The identity function x 7→ x is continuous by our proof above. Moreover, the

120
constant function x 7→ c0 is continuous by our proof above as well. Hence, by Lemma 10.11
pk+1 is continous. We conclude, by mathematical induction, that, for all n ∈ N, polynomials
of degree n are continuous.
Per definition, (multivariate) rational functions are of the form q = p1 /p2 , where p1 and
p2 are (multivariate) polynomials. We have just proved that (multivariate) polynomials are
continuous on their domain, thus by statement 4 in Lemma 10.11 we get that q is continuous
at every element in its domain which is not a root of the polynomial p2 . However, if x ∈ Rd
is such that p2 (x) = 0, then x cannot be in the domain of q, since p1 /p2 is not defined at x.
Hence q is continuous on its domain.
For the square root function, we consider two separate cases. First let a > 0. We will prove
√ √ √ √
continuity at a. If x > 0, then x − a = ( x − a) ( x + a), thus

√ √ |x − a| |x − a|
x− a = √ √ < √ ,
| x + a| a
√ √ √ √
where the inequality follows from the fact that x > 0 and a > 0 and thus | x + a| =
√ √ √
x + a > a. Let (xn ) be a sequence in [0, ∞) converging to a. Hence, for n large enougha

we have that |xn − a| < aε. In particular, since a > 0, we can assume that for n large
enough we have xn > 0. Hence, by the inequality above, for n large enough we have

√ √ |xn − a| aε
xn − a < √ < √ = ε.
a a

Thus the square root function is continuous at a > 0. Now assume that a = 0 instead. Let
ε > 0 and assume (xn ) is a sequence in [0, ∞) which converges to a. Now we have that,
√ √ √
for all n ∈ N, xn − a = xn . Since (xn ) converges to 0, for n large enough we have
xn = |xn − 0| < ε2 . Hence, for n large enough
√ √ √ √
xn − a = xn < ε2 = ε.

Hence, the square root function is also continuous at a = 0.


We will not include a proof of the continuity of sin, cos, exp, and log hereb . We will not
use these functions in the building of our theory, but sometimes we will encounter them in
examples and it will be useful to be able to use the fact that they are continuous. For a proof
of their continuity, see rigorous analysis or calculus textbooks.
a
“For n large enough” is shorthand language that we often use instead of writing “there exists an N ∈ N
such that for all n ≥ N ”. This is especially useful if we do not need the specific value of N later in the proof.
b
In fact, we have not even properly defined these functions! However, we assume you have a ‘working
knowledge’ of them from your A Levels and Year 1 modules.

A.4 Series
The following result for the finite sum is often useful.

Lemma A.14. Let n ∈ N, then


n
X n(n + 1)(2n + 1)
k2 = . (30)
6
k=1

121
Proof. We prove this by induction. Let n = 1, then
n
X 1(2)(3)
k 2 = 12 = 1 = .
6
k=1

Now let m ∈ N and assume that (30) holds for n = m. Then


m+1 m
X
2 2
X m(m + 1)(2m + 1) 6(m + 1)2 + m(m + 1)(2m + 1)
k = (m + 1) + k 2 = (m + 1)2 + =
6 6
k=1 k=1
(m + 1)(6(m + 1) + m(2m + 1) (m + 1)(m + 2)(2(m + 1) + 1)
= = .
6 6
The required result now follows by induction.
Next we remember the definition of a series. A series is often interpreted as an ‘infinite
sum’, but be warned! This does not mean anything more or less then what Definition A.15
says it means. Do not project your own expectations of what an ‘infinite sum’ should be onto
this definition. When working with series, you can only use the properties that follow from the
definition, not any unproven properties you would like to hold. In fact, we will see that many of
the properties you might wish (and intuitively expect) to hold, do not always hold for series.

Definition A.15. Let (xk ) be a sequence in R. The partial sum of the first n ∈ N terms
is
Xn
Sn := xk .
k=1

If the sequence (Sn ) converges in R to a limit S ∈ R, then we define



X
xk := S.
k=1
P∞
We say that the series k=1 kn converges to S. If, for all S ∈ R, the series does not
converge to S, we say the series diverges.

Note: We have only given the definition here for a series of real numbers, but note that this
definition can be easily extended to a series of vectors in Rd . We will not need those, however,
in this module.
P∞Note that there is a big difference between the sequence P∞ (xk ) convergingPand n
the series
k=1 x k converging. The value of the series is given by k=1 xk = lim n→∞ k=1 k , if that
x
limit value exists, of course. The limit value of the sequence (xk ) is P quite different. In fact,
hopefully you remember from G11ACF/MATH1005 that, if the series ∞ k=1 xk converges, then
the sequence (xk ) converges to 0.
Example: From G11ACF/MATH1005 youP remember the geometricPseries.
Let r ∈ R and a ∈ R, and define Sn := nk=0 ark . Then rSn = nk=0 ark+1 = n+1 k
P
k=1 ar .
Hence rSn − Sn = arn+1 − a and thus, if r 6= 1,

rn+1 − 1
Sn = a . (31)
r−1
Pn
(If r = 1, then of course Sn = k=0 a = (n + 1)a.) From this we see that, if |r| < 1, the value

122
of the geometric series is

X 1 − rn+1 a
ark = lim Sn = lim a = . (32)
n→∞ n→∞ 1−r 1−r
k=0

P∞
Lemma A.16. Let (xk ) be a sequence in R and assume the series k=1 xk converges.
Then the sequence (xk ) converges to 0.

Proof. For all n ∈PN, define the partial sum Sn := N


P
k=1 xk . Then, by definition of conver-
gence of the series ∞ k=1 x k , the sequence (Sn ) converges, say to S ∈ R. For all m ∈ N, define
nm := m +
Pm+1 1. Then (S nm ) is a subsequence of (Sn ) and thus also converges to S. Note that
Snm = k=1 xk and thus, for all m ∈ N, xm = Snm − Sm . Thus, by the sum rule for limits,
(xm ) converges to S − S = 0 and thus so does (xn ) (since the sequence (xm ) is equal to the
sequence (xn ) with the first element removed).
Remember that there is no requirement to start labelling sequences with n = 1. This will
be useful in the following lemma.

P∞
Lemma A.17. Let (xkP ) be a sequence in R and assume the series k=1 xk converges. Let
m ∈ N, then the series ∞ k=m+1 xk converges and


X ∞
X m
X
xk = xk − xk .
k=m+1 k=1 k=1
Pn
Proof. For all n ∈ N, define
P∞Sn := k=1 xk . Then, by assumption (Sn ) converges, say to
S ∈ R. So, per definition, k=1 xk = S. Also define, for all n ≥ m + 1, S̃n := nk=m+1 xk .
P
Then
m
X
S̃n = Sn − xk .
k=1
Pm
Since (Sn ) converges and − k=1 xk does not depend on n, we find by the sum rule of limits
that (S̃n ) converges and, moreover, that
m
X
lim S̃n = lim Sn − xk ,
n→∞ n→∞
k=1

which concludes the proof.

a sequence in R and assume the series ∞


P
Lemma A.18. Let (xk ) beP Px∞
k=1 k converges. If,
for all k ∈ N, xk ≥ 0, then ∞ x
k=1 k ≥ 0. If, for all k ∈ N, x k ≤ 0, then k=1 xk ≤ 0.

Proof.
Pn Assume that, for all k ∈ N, xk ≥ 0. For all n ∈ N , define the partial sum Sn :=
k=1 k . Then, for all n ∈ N, Sn ≥ 0 (prove this by induction!).
x P∞ By definition, the sequence
(Sn ) converges and the value of the converging series k=1 xk is equal to the limit value of
(Sn ). By Lemma 3.9 this limit value is nonnegative.
If, for all k ∈ N, xk ≤ 0, the proof follows in the same way, mutatis mutandis.
Lemma A.19 and Corollary A.20 will look familiar if you remember the comparison test for
series from G11ACF/MATH1005.

123
Lemma A.19. Let (xk ), (yk ), and (zk ) be sequences in R. Assume that, for all k ∈ N,
x k ≤ yk ≤ zk .

1. If, for all k ∈ N, yk ≥ 0 and the series ∞


P P∞
k=1 zk converges, then the series k=1 yk
converges and

X ∞
X
yk ≤ zk .
k=1 k=1

2. If, for all k ∈ N, xk ≥ 0 and the series ∞


P P∞
k=1 xk diverges, then the series k=1 yk
diverges.

3. If, for all k ∈ N, yk ≤ 0 and the series ∞


P P∞
k=1 xk converges, then the series k=1 yk
converges and
X∞ ∞
X
xk ≤ yk .
k=1 k=1
P∞ P∞
4. If, for all k ∈ N, zk ≤ 0 and the series k=1 zk diverges, then the series k=1 yk
diverges.

Proof. To prove statement 1, assume that, for all k ∈ N, yk ≥ 0. Then by assumption,


for all k ∈ N , zk ≥ 0, and thus (prove this by induction!), for all n ∈ N, Zn ≤ Zn+1 . By
assumption (Zn ) converges, say to Z ∈ R, and thus, by Lemma 3.7, for all n ∈ N, Zn ≤ Z.
A similar proof by induction (try it!) shows that, for all n ∈ N,

Yn ≤ Yn+1 ≤ Zn+1 ≤ Z.

Hence the sequence (Yn ) is a non-decreasing sequence which is bounded above and thus, by
Corollary 3.11 it converges to a limit Y ∈ R and Y ≤ Z.
Statement 3 follows in a very similar fashion and its proof is left as an exercise.
P∞
To prove statement 2, assume that, for all k ∈ N, xP k ≥ 0 and the series k=1 xk diverges.
For a proof by contradiction, assume that the series ∞ k=1 yk converges. Since
Pwe have that,
series ∞ ∞
P
for allPk ∈ N, yk ≥ xk ≥ 0, statement 1, applied to the P x
k=1 k and k=1 yk , shows
∞ ∞
that k=1 xk converges. This is a contradiction, hence k=1 yk diverges.
Statement 4 again follows in a similar way and we leave its proof as an exercise for you.

CorollaryPA.20. Let (xk ) and (yk ) be sequences P∞in R. If, for all k ∈ N, |xk | ≤ |yk | and if
the series ∞
k=1 |y k | converges, then the series k=1 |xk | converges and


X ∞
X
0≤ |xk | ≤ |yk |.
k=1 k=1

Proof. This follows immediately from statement 1 in Lemma A.19 and from Lemma A.18.

Lemma
P∞ A.21 (Absolute convergence test). Let (ak ) be a sequence in Ra . If the series
P∞
k=1 |ak | converges, then the series k=1 converges.
a
If you are familiar with sequences of complex numbers, you can check that the result of this lemma

124
generalises to that context, by considering the real and imaginary parts of the sequence separately. This
falls outside the scope of this module.

Proof. First we note that, for all k ∈ N,

0 ≤ ak + |ak | ≤ 2|ak |.

In particular, if we define, for all N ∈ N, the partial sums


N
X
SN := (ak + |ak |)
k=1

then (SN ) is a non-decreasing sequence. Moreover, by a similar argument as in the proof of


Lemma A.19 (details left as an exercise to the reader) we find that
N
X ∞
X
0 ≤ SN ≤ 2 |ak | ≤ 2 |ak |.
k=1 k=1

Because the series on the right hand side converges, we see that the sequence (SN ) is bounded.
By the monotoneP convergerce theorem (Theorem 3.8) we conclude that (SN ) converges and
thus the series ∞ k=1 (ak + |ak |) converges. Now, for all N ∈ N we have

N
X N
X
ak = SN − |ak |.
k=1 k=1

Since both terms on the right are partial sums of convergent series, the right hand side
converges as N → ∞ and thus so does the left hand side. (The details of this argument are
left as an exercise to the reader.)

A.5 Derivatives

Theorem A.22. Let I ⊂ R be an open interval, a ∈ I, and let f : I → R and g : I → R be


functions which are differentiable at a. Assume λ ∈ R. Then the functions f + g, λf , f g
are differentiable at a, and, if g(a) 6= 0, so is the function g1 . Moreover,

1. (f + g)0 (a) = f 0 (a) + g 0 (a) (sum rule),

2. (λf )0 (a) = λf 0 (a),

3. (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a) (product rule),


0
g (a)
4. if g(a) 6= 0, then ( g1 )0 (a) = − (g(a))2.

Proof. The proofs are optional in the context of this module and are included in Ap-
pendix B.11.

Corollary A.23 (Quotient rule). Let I ⊂ R be an open interval, a ∈ I, and let f :


I → R and g : I → R be functions which are differentiable at a. Then the function fg is

125
differentiable at a and  0
f f 0 (a)g(a) − f (a)g 0 (a)
(a) = .
g (g(a))2

Proof. This follows from Theorem A.22 by combining the product rule in point 3 with the
derivative of the reciprocal in point 4. We leave the details as an exercise for the reader.

Theorem A.24 (Chain rule). Let I ⊂ R and J ⊂ R be open intervals and let f : I → R
and g : J → R. Assume that f (I) ⊂ J, let a ∈ I, and define b := f (a) ∈ J. If f is
differentiable at a and g is differentiable at b, then the function h := g ◦ f : I → R is
differentiable at a and
h0 (a) = f 0 (a)g 0 (b).

Proof. Since f is differentiable at a and g is differentiable at b, by Lemma 13.3 we have that


there exist functions εa : I → R and ρb : J → R such that, for all x ∈ I and for all y ∈ J,

f (x) = f (a) + (x − a) f 0 (a) + εa (x) ,




g(y) = g(b) + (y − b) g 0 (b) + ρb (y) ,




and moreover limx→a εa (x) = 0 and limy→b ρb (y) = 0. If we combine these expressions for
y := f (x), we find

h(x) − h(a) = g(f (x)) − g(b) = f (x) − b g 0 (b) + ρb (f (x))


 

= (x − a) f 0 (a) + εa (x) g 0 (b) + ρb (f (x))


 

= (x − a)f 0 (a)g 0 (b) + (x − a)δa,b (x), (33)

where we defined the function δa,b : I → R, x 7→ εa (x)g 0 (b) + εa (x)ρb (f (x)) + f 0 (a)ρb (f (x)).
Because f is differentiable at a, by Theorem 13.4 f is also continuous at a and thus
limx→a f (x) = f (a) = b. Hence, by Lemma A.12, limx→a ρb (f (x)) = limy→b ρb (y) = 0. Thus
limx→a δa,b (x) = 0. From (33) and Lemma 13.3 we then conclude that h is differentiable at
a with h0 (a) = f 0 (a)g 0 (b) .

Note: Carefully note that in the chain rule above, g 0 is evaluated at b = f (a) ∈ I, not at a; in
fact, a might not even be an element of J, the domain of g.

Lemma A.25. Let I ⊂ R be an open interval.

1. Let c ∈ R and let f : I → R, x 7→ c be a constant function. Then, for all x ∈ I,


f 0 (x) = 0.

2. Let f : I → R, x 7→ x. Then, for all x ∈ I, f 0 (x) = 1.

3. Let n ∈ R \ {0, 1} and let f : I → R, x 7→ xn . Then, for all x ∈ I for which xn−1 is
well-defined, f 0 (x) = nxn−1 .

4. For all x ∈ R, (sin)0 (x) = cos(x) and (cos)0 (x) = − sin(x).

5. For all x ∈ R, (exp)0 (x) = exp(x).

126
6. For all x ∈ (0, ∞), (log)0 (x) = x1 .

Proof. To prove 1, let x ∈ I. Then

f (x) − f (x0 ) c−c


lim 0
= lim = lim 0 = 0.
0
x →x x−x x →x x − x0
0 x0 →x

For statement 2, let x ∈ I. Then

f (x) − f (x0 ) x − x0
lim = lim = lim 1 = 1.
0
x →x x − x0 x0 →x x − x0 x0 →x

We will not prove statement 3 in full generality here. We will provide the prove for n ∈ N\{1}.
Note that in this case for all x ∈ R xn−1 is well-defined. Let x ∈ I. Using the formulation
of derivative from Lemma 13.2 together with the binomial theorem, we find

f (x + h) − f (x) (x + h)n − xn
f 0 (x) = lim = lim
h→0 h h→0 h
xn + nxn−1 h + nk=2 nk xn−k hk − xn
P 
= lim
h→0 h
n   !
X n
n−1 n−k k−1
= lim nx + x h = nxn−1 .
h→0 k
k=2

As we have discussed before, for example in the proof of Lemma A.13, we have not rigorously
defined the functions cos, sin, exp, and log in this module, but we will on some occasions
encounter them in examples (but they will never appear as necessary parts of the rigorous
build-up of the theory in this module). Therefore it is useful to know their derivatives, but
we will omit the proofs of 4, 5, and 6 here.

127
B Optional material
The extra material included here is optional, and some of it is quite difficult.
Many of the proofs in this appendix were written by Prof. J. K. Langley.

B.1 Dirichlet’s test and convergence of the series in (2)


In this section we present Dirichlet’s test for the convergence of a series and then apply it to
prove that the series in (2) converges. In this section we will make use of complex numbers
and in the proof of Lemma B.2 we assume you are familiar with trigonometric functions, their
relationship with complex numbers (in particular Euler’s formula), and trigonometric identities.
Strictly speaking these all fall outside the scope of this particular module, but you will have
seen many if not all of these in A Levels or Year 1.

Lemma B.1 (Dirichlet’s test for the convergence of a series). Let (an ) ⊂ R and (bn ) ⊂ R
be sequencesa . Assume that (an ) is non-increasing and converges to 0. Also assume there
exists M ≥ 0 such that, for all N ∈ N,
N
X
bn ≤ M.
n=1
P∞
Then the series n=1 an bn converges.
a
In fact, the result in this lemma also holds if (bn ) is a sequence of complex numbers. The proof is a
straightforward generalisation of the proof we give here. Since we have not studied such sequences in this
module, however, we will restrict ourselves to the case in which (bn ) is a sequence of real numbers.

Proof. For all N ∈ N, define the partial sums


N
X N
X
SN := an bn and BN := bn .
n=1 n=1
Note that, for all n ∈ N with n ≥ 2, we have bn = Bn − Bn−1 and thus, for all N ∈ N,
N
X N
X N
X N
X −1
an bn = an (Bn − Bn−1 ) + a1 b1 = an Bn − an+1 Tn + a1 b1
n=1 n=2 n=2 n=1
N
X −1
= Bn (an − an+1 ) + aN BN − a2 B1 + a1 b1
n=2
XN
= Bn (an − an+1 ) − a1 B1 + a2 B1 − aN TN + aN +1 TN + aN TN − a2 B1 + a1 b1
n=1
XN
= Bn (an − an+1 ) + aN +1 BN ,
n=1
where we used that B1 = b1 in the final equality.
By assumptions the sequence (BN ) is bounded and the sequence (an ) converges to 0, therefore
the sequence (aN +1 BN ) converges to zero (details left as an exercise to the reader). It remains
to show that the sequence (TN ) of partial sums
N
X
TN := Bn (an − an+1 ) (N ∈ N)
n=1

128
converges. Since (an ) is non-increasing, we have that, for all n ∈ N,

|Bn (an − an+1 )| ≤ |Bn ||an − an+1 | = |Bn |(an − an+1 ) ≤ M (an − an+1 )

and thus
N
X N
X
|TN | ≤ |Bn (an − an+1 )| ≤ M (an − an+1 ) = M (a1 − aN +1 ).
n=1 n=1

Hence the sequence (|TN |) by the comparison test in statement 2 of Lemma A.19. Finally,
by the absolute convergence test in Lemma A.21, since (|TN |) converges, so does (TN ).

Lemma B.2. Let x ∈ R be such that cos x 6= 1 and let N ∈ N. Then


N
1 sin x − 2 sin 21 x cos (N + 12 )x
 
X
sin(nx) = .
2 1 − cos x
n=1

Proof. Note that we have to exclude the x-values for which the denominator 1 − cos x is
zeroa .
If z ∈ C, we use Im z to denote the imaginary part of the complex number z. By Euler’s
formulab we know that sin(nx) = Im einx . Hence we compute
N N N
!
X X X e i(N +1)x − 1
sin(nx) = sin(nx) = Im einx = Im
eix − 1
n=1 n=0 n=0
!
1 ei(N +1)x − 1 e−i(N +1)x − 1
= −
2i eix − 1 e−ix − 1
1 1 
iN x i(N +1)x −ix −iN x −i(N +1)x ix

= e − e + 1 − e − e + e − 1 + e
2i 2 − eix − e−ix
1 1  
= 2i sin(N x) − 2i sin (N + 1)x + 2i sin x
2i 2 − 2 cos x 
1 sin x + sin(N x) − sin (N + 1)x
=
2 1 − cos x
1 sin x − sin (N + 12 )x + 12 x − sin (N + 12 )x − 21 x
 
=
2 1 − cos x
1
+ 21 )x
 
1 sin x − 2 sin 2 x cos (N
= .
2 1 − cos x
For the third equality we used the formula for the partial sum of a geometric series (see (31))
and for the final equality we used the trigonemetric identity that says that, for all α, β ∈ R,
sin(α + β) − sin(α − β) = 2 cos(α) sin(β).
a PN
For those values of x we of course have that, for all n ∈ N, sin(nx) = 0 and thus n=1 sin(nx) = 0.
b ix
e = sin x + i cos x

The following corollary shows that the series in (2) converges for all x ∈ R.

129
Corollary B.3. Let x ∈ R. Then the series

X (−1)n+1 sin(nx)
n
n=1

converges.

Proof. If x is an odd integer multiple of π, then, for all n ∈ N, sin(nx) = 0, and thus the
series converges (and has value 0). (The details are left as an exercise to the reader.)
If x is not an odd integer multiple of π, then cos(x + π) 6= 1 and thus, by Lemma B.2, for all
N ∈ N,.
N
 1 sin(x + π) − 2 sin 21 (x + π) cos (N + 12 )(x + π)
 
X
sin n(x + π) = .
2 1 − cos(x + π)
n=1

Since, for all N ∈ N, | cos (N + 12 )(x + π) | ≤ 1, we have, for all N ∈ N,




N
1 sin(x + π) − 2 sin 21 (x + π) cos (N + 12 )(x + π)
 
X 
sin n(x + π) =
2 1 − cos(x + π)
n=1

sin 12 (x + π)

1 sin(x + π)
≤ + .
2 1 − cos(x + π) 1 − cos(x + π)

Note that the right hand side is independent of N . Since we are working pointwise, at a
fixed x, the right hand side is a nonnegative constant. Call it M
 . Hence, if we define, for all
n ∈ N, an := n1 and bn := (−1)n+1 sin(nx) = − sin n(x + π) , then we find that (an ) is a
non-increasing sequence which converges to 0 and (bn ) satisfies

X N
X 
bn ≤ sin n(x + π) ≤ M.
n=1 n=1

Hence, the conditions of Dirichlet’s test (Lemma B.1) are satisfied and thus
∞ ∞
X X (−1)n+1 sin(nx)
an bn =
n
n=1 n=1

converges.
In Appendix B.2 we compute the value of the series from (2) for all x ∈ (−π, π).

B.2 A Fourier series formula

Lemma B.4. For all x ∈ (−π, π) define



X (−1)n+1 sin nx sin 2x sin 3x
S(x) := = sin x − + − ....
n 2 3
n=1

Then, for all x ∈ (−π, π), S(x) = x/2.

130
To prove this, we will use ideas from complex numbers, inverse functions, and (of course)
trigonometric functions and so this falls completely outside the scope of the current module.
This result, however, is not fundamental to the theory of mathematical analysis which we are
building in this module. As explained we use S simply as an illustrative example on page 22 to
show what kind of things can go wrong if we are not rigorous in our dealings with (in this case)
‘infinite sums’ (i.e. a series) of functions.
Proof of Lemma B.4. Let N ∈ N and, for all x ∈ (−π, π) set
N +1
X (−1)n+1 sin nx sin 2x (−1)N +2 sin(N + 1)x
YN (x) := = sin x − + ... + .
n 2 N +1
n=1

By Euler’s formula from G11LMA/MATH1007 (“Linear Mathematics”)

eit = cos t + i sin t,

we have YN (x) = Im ZN (x), where, for all x ∈ (−π, π),


N +1
X (−1)n+1 einx ei2x (−1)N +2 ei(N +1)x
ZN (x) := = eix − + ... +
n 2 N +1
n=1
!
ix N eiN x i(x+π) iN (x+π)
 
e (−1) e e
= eix 1 − + ... + = eix 1 + + ... +
2 N +1 2 N +1
Z 1
= eix 1 + sei(x+π) + . . . + sN eiN (x+π) ds.
0

Here we have used de Moivre’s formula (eit )n = eint (from G11LMA/MATH1007) for n ∈ Z
and the fact that eiπ = −1. Now 0 < x+π < 2π, and so for 0 ≤ s ≤ 1 we have u = sei(x+π) 6= 1
and hence

u(1 + u + . . . + uN ) = u + . . . + uN +1 = (1 + u + . . . + uN ) + uN +1 − 1,

which gives
1 − uN +1
1 + u + . . . + uN = .
1−u
This means that, for all x ∈ (−π, π),
1 1
1 − sN +1 ei(N +1)(x+π) eix + sN +1 ei(N +2)(x+π)
Z Z
ix
ZN (x) = e ds = ds,
0 1 − sei(x+π) 0 1 + seix

and YN (x) is the imaginary part of this integral.


The complex number 1 + seix has modulus

|1 + seix | ≥ Re (1 + seix ) = 1 + s cos x.

Let s ∈ [0, 1] and x ∈ (−π, π). If cos x ≥ 0, then 1 + s cos x ≥ 1. If cos x < 0 then
1 + s cos x ≥ 1 + cos x ≥ 0. If we define, for all x ∈ (−π, π),
(
1 if cos x ≥ 0,
c(x) :=
1 + cos x, if cos x < 0,

131
p
then, for all x ∈ (−π, π), 1 + s cos x ≥ c(x). Since, for all t ∈ R, |eit | = cos2 t + sin2 t = 1,
this means that
ei(N +2)(x+π) 1
ix

1 + se c(x)
and thus
ei(N +2)(x+π) 1
Im ix
≤ .
1 + se c(x)
Therefore we get
! !
1 1
sN +1 ei(N +2)(x+π) sN +1 ei(N +2)(x+π)
Z Z
Im ds = Im ds
0 1 + seix 0 1 + seix
1
sN +1
Z
1
≤ ds = →0
0 c(x) (N + 2)c(x)

as N → ∞. This means that for x ∈ (−π, π) and as N → ∞ we have


Z 1  Z 1 Z 1
eix eix
   
1
YN (x) → Im ds = Im ds = Im ds
0 1 + se
ix
0 1 + seix 0 s + e−ix

and so Z 1
sin x
YN (x) → I(x) := ds.
0 1 + s2 + 2s cos x
So we just need to show that I(x) = x/2.
To compute I(x) we can note that the substitution s = 1/t gives
Z ∞ Z ∞
sin x 1 sin x
I(x) = −2 −1 2
dt = 2
dt
1 1 + t + 2t cos x t 1 1 + t + 2t cos x

and so Z ∞ Z ∞
sin x sin x
2I(x) = ds = ds.
0
2
1 + s + 2s cos x 0 (s + cos x)2 + sin2 x
For x ∈ (0, π) we now substitute s + cos x = u sin x to get
Z ∞
du
2I(x) = 2
= lim tan−1 (u) − tan−1 (cot x)
cot x u + 1 u→∞
 π  π  π 
= lim tan−1 (u) − tan−1 tan −x = − − x = x.
u→∞ 2 2 2
We see that I(0) = 0, while if x ∈ (−π, 0) we find I(x) = −I(−x) = −(−x/2) = x/2.
We conclude this section by remarking that, although this proof is evidently hard, Fourier
series are extremely powerful and versatile objects and the module G12DEF/MATH2008 (“Dif-
ferential Equations and Fourier Analysis”) will focus not on proof but rather on how to compute
and use them.

B.3 Proof of the monotone sequence theorem


Remember the statement of the monotone sequence theorem from Theorem 3.8. Now we will
prove it.

132
Proof of Theorem 3.8. First assume that (xn ) is non-decreasing for n ≥ N . Let A be the
set consisting of all the numbers xn in the sequence for n ≥ N : A := {xn : n ≥ N }. We
consider two complementary cases: (i) A is not bounded above and (ii) A is bounded above.
In case (i), first we remember what it means for A to be bounded above. Per definition this
means that there exists an M ∈ R such that for all a ∈ A, a ≤ M . In case (i), however, we
have that A is not bounded above, so we need the negation of that statement: for all M ∈ R
there exists an a ∈ A such that a > M . Since the elements a in A are also numbers in our
sequence, we get that for all M ∈ R there exists an N ∈ N such that xN > M . Because the
sequence (xn ) is non-decreasing we know that for all n ≥ N , xN ≤ xn . Hence, for all M ∈ R
there exists an N ∈ N such that for all n ≥ N , xn > M . By Definition 3.5 this means that
(xn ) tends to +∞.
In case (ii) we first note that the set A is not empty. We also remember from G11ACF that
the least upper bound property for subsets of R says that any nonempty subset of R which
is bounded above has a supremum. Thus s := sup A exists. Suppose ε > 0, then s − ε is
not an upper bound for A and thus there exists an a ∈ A such that a > s − ε. Because the
elements in A are numbers from the sequence (xn ), this means that there exists an N ∈ N
such that xN > s − ε. Because the sequence is non-decreasing this means that for all n ≥ N ,
xn > s − ε. Since s is an upper bound for A, we also have that, for all n ≥ N , xn ≤ s. Hence,
for all n ≥ N , |xn − s| = s − xn < ε. By Definition 3.2 this means that xn → s as n → ∞.
This completes the proof for the statement about non-decreasing sequences. If (xn ) is a
non-increasing sequence, then (−xn ) is a non-decreasing sequence and thus, by the result we
have just proven, (−xn ) either tends to +∞ or converges. This means that (xn ) either tends
to −∞ or converges.

B.4 L’Hôpital’s rule


From G11ACF/MATH1005 you remember L’Hôpital’s rule. Before we go on to prove it, we
first state and prove a very useful lemma.

Lemma B.5. Let f be a real-valued function whose domain contains an open interval
I ⊂ R. If I is bounded below, let c := inf I, otherwise let c := −∞. For all y ∈ I, define
the set S(y) := {f (x) : x ∈ (c, y)} and define the functions m : I → R̄ and M : I → R̄, by
(
inf S(y), if S(y) is bounded below,
m(y) :=
−∞, otherwise,
(
sup S(y), if S(y) is bounded above,
M (x) :=
+∞, otherwise.

Let L ∈ R̄. Then limx→c+ f (x) = L if and only if limy→c+ m(y) = limy→c+ M (y) = L.

If I is bounded above, let d := sup I, otherwise let d := +∞. For all y ∈ I, (re)define
the sets S(y) := {f (x) : x ∈ (y, d)} and define m and M as above. Let L ∈ R̄. Then
limx→d− f (x) = L if and only if limy→d− m(y) = limy→d− M (y) = L.

Proof. We will only prove the case where I is bounded below. The case where I is bounded
above can be proven in a similar way and we leave the details to the reader.
First we prove the “only if” statement. Let (yn ) be a sequence in I such that yn → c as
n → ∞. We consider two cases, based on whether L is finite or not. First consider the case
where L ∈ R. Let ε > 0. Then, by the assumption that limx→c+ f (x) = L, there exists a

133
δ > 0 such that, if x ∈ (c, c + δ) ∩ I, then f (x) ∈ (L − ε, L + ε). Since (yn ) is a sequence in I
which converges to c, there exists an N ∈ N such that for all n ≥ N , yn ∈ (c, c+δ)∩I. Hence,
for all n ≥ N , S(yn ) ⊂ (L − ε, L + ε). In particular, for all n ≥ N , m(yn ) ∈ [L − ε, L + ε] and
M (yn ) ∈ [L − ε, L + ε]. Hence both (m(yn )) and (M (yn )) converge to L.
Next consider the case where L = +∞. Let K > 0. Then there exists a δ > 0 such that,
if x ∈ (c, c + δ) ∩ I, then f (x) > K. Again, as before, there exists N ∈ N such that, for all
n ≥ N , yn ∈ (c, c+δ)∩I. Hence, for all n ≥ N , S(yn ) ⊂ (K, ∞) and in particular m(yn ) ≥ K
and M (yn ) ≥ K. Hence m(yn ) → +∞ as n → ∞ and M (yn ) → +∞ as n → ∞. The final
case, where L = −∞, is proven in a similar way.
Now we prove the “if” statement. Again we consider the cases where L ∈ R and where
L ∈ {−∞, +∞} separately. First assume L ∈ R. Let ε > 0. Since limy→c+ m(y) =
limy→c+ M (y) = L, there exists δm > 0 and δM > 0 such that, for all y ∈ (c, c + δm ) ∩ I,
m(y) ∈ (L − ε, L + ε) and, for all y ∈ (c, c + δM ) ∩ I, M (y) ∈ (L − ε, L + ε). Define
δ := min(δm , δM ), then, for all y ∈ (c, c + δ) ∩ I, S(y) ⊂ (L − ε, L + ε). This means that, if
x ∈ (c, y) ⊂ (c, c + δ) ∩ I, then f (x) ∈ (L − ε, L + ε). Thus limx→c+ f (x) = L.
Finally, consider the case where L = +∞ (as before, the case where L = −∞ follows in
a similar way and the details are left to the reader). Let K > 0. Then there exists a
δ > 0 such that, for all y ∈ (c, c + δ) ∩ I, m(y) > K. Hence, for all y ∈ (c, c + δ) ∩ I,
S(y) ∈ (K, ∞), and thus, for all x ∈ (c, y) ⊂ (c, c + δ) ∩ I, we have f (x) > K. We conclude
that limx→c+ f (x) = +∞.
Note: If L = +∞ in Lemma B.5, it is unneccessary for the “if” statement to include the
assumption that limy→c+ M (y) = L (or limy→d− M (y) = L) as that is a consequence from the
assumption limy→c+ m(y) = +∞ (or limy→d− m(y)). Similarly, if L = −∞, the assumption
limy→c+ m(y) = L (or limy→d− m(y) = L) can be left out of the “if” statement.

Theorem B.6 (L’Hôpital’s rule). Let f and g be real-valued functions both of whose do-
mains contain an open interval I ⊂ R. Let f and g be differentiable on I. Assume that, for
all x ∈ I, g(x) 6= 0, such that fg is well-defined on I. Moreover, assume that, for all x ∈ I,
g 0 (x) 6= 0.
If I is bounded below, let c := inf I, otherwise let c := −∞. Moreover, assume one of
the following two holds:

1. limx→c+ f (x) = 0 and limx→c+ g(x) = 0, or

2. limx→c+ |g(x)| = ∞.
f 0 (x) f (x)
If there exists an L ∈ R̄ such that limx→c+ g 0 (x) = L, then limx→c+ g(x) = L.

If I is bounded above, let d := sup I, otherwise let d := +∞. Assume that assumption 1
or 2 holds, with every instance of“ x → c+” replaced by “x → d−”. If there exists an L ∈ R̄
0 (x)
such that limx→d− fg0 (x) = L, then limx→d− fg(x)
(x)
= L.

Proof taken from [3]. We prove the results for the case x → c+. The results for x → d−
follow in a similar way and we leave those
n 0 details as anoexercise for the reader.
For all y ∈ I, define the sets S1 (y) := fg0 (w)
(w)
: w ∈ (c, y) , and define the functions m : I → R̄

134
and M : I → R̄, by
(
inf S1 (y), if S1 (y) is bounded below,
m(y) :=
−∞, otherwise,
(
sup S1 (y), if S1 (y) is bounded above,
M (x) :=
+∞, otherwise.

By Lemma B.5 we know that

lim m(y) = lim M (y) = L. (34)


y→c+ y→c+
n o
f (w)
For use later in the proof, we also define, for all y ∈ I, the sets S2 (y) := g(w) : w ∈ (c, y) ,
and define the functions µ1 : I → R̄ and µ2 : I → R̄, by
(
inf S2 (y), if S2 (y) is bounded below,
µ1 (y) :=
−∞, otherwise,
(
sup S2 (y), if S2 (y) is bounded above,
µ2 (x) :=
+∞, otherwise.

Again using Lemma B.5 we find that,


f (x)
lim µ1 (y) = lim µ2 (y) = L ⇒ lim = L. (35)
y→c+ y→c+ x→c+ g(x)
Now let y ∈ I and x ∈ (c, y). Then f and g are continuous on [x, y] and differentiable on
(x, y). By assumption, for all w ∈ (x, y), g 0 (w) 6= 0, which also allows us to deduce, by
Theorem 13.11, that g is injective on [x, y]. Hence g(x) 6= g(y). Therefore, by Cauchy’s mean
0 (z)
value theorem (Theorem 13.10, there exists a z ∈ (x, y) such that fg(y)−g(x)
(y)−f (x))
= fg0 (z) . Using
the definitions of m and M above, we find, for all y ∈ I and for all x ∈ (c, y)
f (y) − f (x))
m(y) ≤ ≤ M (y). (36)
g(y) − g(x)
Dividing numerator and denominator on the left by g(y) 6= 0, we find
f (y) f (x)
g(y) − g(y)
m(y) ≤ g(x)
≤ M (y).
1− g(y)

If assumption (i) holds, take the limit x → c+ (while keeping y fixed) to find
f (y)
m(y) ≤ ≤ M (y).
g(y)
The result then follows from taking y → c+ and applying (34) together with the sandwich
theorem (Corollary 3.12 or Lemma 3.13).
It remains to prove the result when assumption (ii) holds. Divide numerator and denominator
on the left hand side of (36) by g(x) 6= 0 to get, for all y ∈ I and all x ∈ (c, y),
f (y) f (x)
g(x) − g(x)
m(y) ≤ g(y)
≤ M (y). (37)
g(x) − 1

135
For all x ∈ (c, y), µ1 (y) ≤ fg(x)
(x)
≤ µ2 (y). Let ε > 0 Since limx→c+ fg(x)
(y) g(y)
= limx→c+ g(x) =
0 (note that we are keeping y fixed), there exists δ1 > 0 and δ2 > 0 such that, for all
x ∈ (c, c + δ1 ) ∩ I, (f (y) g(y)
g(x) ∈ (−ε, ε), and, for all x ∈ (c, c + δ2 ) ∩ I, g(x) ∈ (−ε, ε). Set
δ := min{δ1 , δ2 , y − c}, then, for all x ∈ (c, c + δ) ∩ I ⊂ (c, y)
f (y) f (x)
g(x) − g(x) ε − µ1 (y) µ1 (y) − ε
g(y)
< = < µ1 (y) − ε < µ1 (y)
−ε − 1 1+ε
g(x) − 1

and
f (y) f (x)
g(x) − g(x) −ε − µ2 (y) µ2 (y) + ε
g(y)
> = > µ2 (y) + ε > µ2 (y).
ε−1 1+ε
g(x) − 1

Combining these with (37), we deduce that, for all y ∈ I

m(y) < µ1 (y) ≤ µ2 (y) < M (y).

By (34) together with the sandwich theorem (Corollary 3.12 or Lemma 3.13) we find that
limy→c+ µ1 (y) = limy→c+ µ2 (y) = L and thus, by (35) we conclude that limx→c+ fg(x)
(x)
= L.

Note: Assumption 2 in Theorem B.6 is often stated as “limx→c+ |g(x)| = ∞ and limx→c+ |f (x)| =
∞”. Note that we did not use the assumption on f in our proof, so this can be left out, as we
did in the statement of the theorem.

Corollary B.7 (L’Hôpital’s rule). Let f and g be real-valued functions both of whose
domains contain an open interval I ⊂ R. Let c ∈ I and let f and g be differentiable on
I \ {c}. Assume that, for all x ∈ I \ {c}, g(x) 6= 0, such that fg is well-defined on I \ {c}.
Moreover, assume that, for all x ∈ I \ {c}, g 0 (x) 6= 0. Assume one of the following two
holds:

(i) limx→c f (x) = 0 and limx→c g(x) = 0, or

(ii) limx→c |g(x)| = ∞.


f 0 (x) f (x)
If there exists an L ∈ R̄ such that limx→c g 0 (x) = L, then limx→c g(x) = L.

Proof. Since c ∈ I the intervals I1 := (−∞, c) ∩ I and I2 := (c, ∞) ∩ I are both non-empty.
I1 is bounded above with sup I1 = c and I2 is bounded below with inf I2 = c. Furthermore,
f and g are differentiable on I1 and I2 , and g 0 6= 0 on I1 and on I2 . Moreover, since
0 (x) 0 (x) 0 (x)
limx→c fg0 (x) = L, we also have limx→c− fg0 (x) = L and limx→c+ fg0 (x) = L. Thus we can apply
f (x)
the corresponding results from Theorem B.6 on I1 and I2 to deduce that limx→c− g(x) =L
and limx→c+ fg(x)
(x)
= L. Hence limx→c fg(x)
(x)
= L.

B.5 Limits of subsequences


In the example on page 35 we considered the sequence (sin n) (n = 1, 2, . . .)33 . It is clearly
bounded, and so it has a convergent subsequence by the Bolzano–Weierstraß theorem, but it is
33
As we have noted before, we have not rigorously defined the trigonometric functions in this module, but we
expect the reader to be familiar enough with their standard properties to be able to use them for example that
are not necessary to build our theory, but that are enlightening as examples nonetheless.

136
not obvious what possible limit values we can get from convergent subsequences. The following
lemma gives the answer.

Lemma B.8. Let α ∈ [−1, 1]. Then there exists a subsequence (xnk ) (k ∈ N) of the
sequence (sin n) (n ∈ N) such that sin nk → α as k → ∞.

Proof. Let α ∈ [−1, 1], let ε > 0, and let N ≥ N. We claim that there exists an n∗ (α, ε, N ) ∈
N such that n∗ (α, ε, N ) ≥ N and | sin(n∗ (α, ε, N )) − α| < ε.
If this claim is true, then we define the sequence (εk ) by, for all k ∈ N, εk := k1 . We also
define the sequence (Nk ) by N1 := 1 and, for all k ∈ N, Nk+1 = 1 + n∗ (α, εk , Nk ). Then we
get that, for all k ∈ N, nk+1 ≥ 1 + nk . Since n1 ≥ 1 we can prove by induction (details left to
the reader) that, for all k ∈ N, nk ≥ k. Hence (sin nk ) is a subsequence of (sin n). Moreover,
by construction, for all k ∈ N, | sin nk − α| < k1 , and thus (sin nk ) converges to α.
We will now prove the claim. Let α ∈ [−1, 1], let ε > 0, and let N ≥ N. Given
an M ∈ N, divide the interval [−2π, 2π] into 2M subintervals, which we denote by
Il := −2π + (l − 1) 2π 2π
M , −2π + l M (l ∈ {1, . . . , 2M }). Since sin is a continuous function ,
a

there exists an M ∈ N large enough such that, for all l ∈ {1, . . . , 2M } and for all x, y ∈ Il we
have | sin x − sin y| < ε.
Now let p, q ∈ {1, . . . , 2M } be such that Ip ⊆ [−2π, 0], Iq ⊆ [0, 2π] and there exists xp ∈ Ip
and xq ∈ Iq such that sin xp = sin xq = α. We will now show that there exists an integer
n∗ ≥ N and an integer m(n∗ ) such that n∗ + m(n∗ )2π belongs to either Ip or Iq . Once
we have proven that, the claim (and thus also the result in the lemma) follows, because
sin(n∗ + m(n∗ )2π) = sin n∗ .
To do this consider all real numbers of the form n + m2π with m and n integers. If m, n, p, q
are all integers and n + m2π = q + p2π then n = q and m = p, because π is irrational. So for
each positive integer n we choose an integer q(n) such that n + q(n)2π ∈ [0, 2π). There are
infinitely many numbers of this form n + q(n). By the reasoning before, if n 6= n0 for positive
integers n and n0 , we have n + q(n) 6= n0 + q(n0 ). Let η > 0, then there exist two numbers
of the form n + q(n) which are less then distance η apartb . By subtracting these two we can
find a positive integer r and an integer s(r) such that |r + s(r)2π| < η.
Consider a sequence (ηj ) in (0, ∞) which converges to 0 as j → ∞. Then there is a sequence
of positive integers rj and integers s(rj ) such that 0 6= xj := rj + s(rj )2π → 0 as j → ∞.
By possibly passing to a subsequence (which for notational simplicity we label again by j)
can assume all these xj are different (since none of them are 0 but they tend to 0) and that
the integers rj are different (because for a given rj ∈ N there is only one s(rj ) ∈ Z such that
rj + s(rj )2π ∈ [−1/2, 1/2]). So this gives us an integer r ≥ N and an integer s(r) such that
r + s(r)2π ∈ [−2π/M, 0) ∪ (0, 2π/M ] (because there are only finitely many different positive
integers less than N and the sequence (xj ) converges to 0). Now we can find a positive integer
t such that tr + ts(r)2π belongs to Ip or Iq , and our required n∗ is n∗ = tr.
a
As can be seen, for example, by noting that
Z x
| sin x − sin y| = cos t dt ≤ |x − y|,
y

if we are willing to use the (unproven in this module) fact that sin is the antiderivative of cos.
b
This we can prove by contradiction. Let δ > 0 and assume that for all n, n0 ∈ N with n 6= n0 we
have |n + q(n) − n0 − q(n0 )| > δ. Let K ∈ N with K > 2π δ
and consider K different positive integers ni
(i ∈ {1, . . . , K}. Then the K different numbers ni + q(ni ) in [0, 2π) have pairwise distances of at least δ.
Hence there are i, j ∈ {1, . . . , K} such that |ni + q(ni ) − nj − q(nj )| > 2π, which is a contradiction.
Note: Not much in this proof depends on the function in question being the sine specifically;
something similar can be done for the cosine, and indeed the argument can be adapted for any

137
continuous function h : R → R which has an irrational period, in which case we need to take
α ∈ h(R).

B.6 The Cauchy–Schwarz inequality and the triangle inequality


In this section we prove the triangle inequality (statement 1 in Lemma 4.5). To do that, we
first will prove the famous Cauchy–Schwarz inequality (see also G11ACF/MATH1005), which
is very useful in its own right.

Lemma B.9 (Cauchy–Schwarz inequality). Let x, y ∈ Rd . Then


 2   
d
X d
X d
X
 x j yj  ≤  x2j   yj2  .
j=1 j=1 j=1

Proof. Let λ ∈ R, then x + λy ∈ Rd and 0 ≤ kx + λyk2 = dj=1 (xj + λyj )2 . Expanding the
P
brackets gives us
Xd Xd d
X
2 2
0≤ xj + 2λ xj yj + λ yj2 .
j=1 j=1 j=1
Pd 2
Pd Pd 2
If we write A := j=1 yj , B := 2 j=1 xj yj and C := j=1 xj , we get the quadratic (in λ)
inequality
Aλ2 + Bλ + C ≥ 0
which holds for all λ ∈ R. This means that the discriminant B 2 4AC of the quadratic
polynomial λ 7→ Aλ2 +Bλ+C cannot be postive, for if it were, the equation Aλ2 +Bλ+C = 0
would have two different real solutions for λ. Thus
 2   
X d Xd Xd
B2 = 4  xj yj  ≤ 4AC = 4  x2j   yj2 
j=1 j=1 j=1

which gives us the desired result.


Proof of the triangle inequality (statement 1 in Lemma 4.5). Let x, y ∈ Rd . Tak-
ing square roots on both sides of the Cauchy–Schwarz inequality from Lemma B.9, we get
v  
u
Xd u X d d
X
xj yj ≤ t x2j   yj2 .
u
j=1 j=1 j=1

By a direct computation we have


d
X d
X d
X d
X
kx + yk2 = (xj + yj )2 = x2j + 2 x j yj + yj2
j=1 j=1 j=1 j=1

138
and thus
v  
u
d
X d
X d
X d
X u Xd d
X d
X
kx + yk2 ≤ x2j + 2 xj yj + yj2 ≤ x2j + 2t x2   y2 + yj2
u
j j
j=1 j=1 j=1 j=1 j=1 j=1 j=1
v v 2
u d u d
uX uX
= t x2j + t yj2  = (kxk + kyk)2 .
j=1 j=1

B.7 Expressing real numbers in binary


In this section we investigate how we can write a number x ∈ [0, 1] in binary (base 2) form. This
will be very useful for the construction of Schoenberg’s space-filling curve in Appendix B.8.

Lemma B.10. Let x ∈ [0, 1]. There exists a sequence (aj ) such that, for all j ∈ N,
aj ∈ {0, 1} and

X a1 a2 a3
x= 2−j aj = + 2 + 3 + .... (38)
2 2 2
j=1

Proof. If x = 0, then we can P take, for all j ∈ N, aj := 0. If x = 1, then we use the geometric
series from (32) to write 1 = ∞ −j
j=1 2 . Hence, we can take, for all j ∈ N, aj := 1.
Now let x ∈ (0, 1). We will first prove the following claim: There exists a sequence (aj ) such
that, for all j ∈ N, aj ∈ {0, 1} and such that, for all n ∈ N,
n
X n
X
−j −n
2 aj ≤ x < 2 + 2−j aj . (39)
j=1 j=1

To prove the claim by induction, first a1 be the largest element of {0, 1} such that 2−1 a1 ≤ x.
Assume, for a proof by contradiction, that 2−1 a1 + 2−1 ≤ x. If a1 = 0, then 2−1 ≤ x, which
contradicts that a1 = 0 is the largest element from {0, 1} such that 2−1 a1 ≤ x. If a1 = 1,
then 1 ≤ x, which is a contradiction with x ∈ (0, 1). Hence 2−1 a1 + 2−1 ≤ x and thus (39)
holds for n = 1.
Now let k ∈ N and assume that (39) holds for n = k. Let ak+1 be the largest integer such
that
Xk+1
2−j aj ≤ x. (40)
j=1

Then, since (39) holds for n = k, we get


k+1
X k
X
2−j aj ≤ x < 2−k + 2−j aj
j=1 j=1

and thus 2−k−1 ak+1 < 2−k . Hence ak+1 ∈ {0, 1}. Since ak+1 is the largest integer with the
property in (40), we have that
k+1
X k+1
X
−j
2 aj ≤ x < 2−j aj + 2−k−1 ,
j=1 j=1

139
0 1/3 2/3 4/3 5/3 2 7/3

Figure 13: Part of the graph of a periodic function φ which satisfies the properties described at
the start of Appendix B.8

which completes the proof by induction of the claim.


Now we use (39) to prove that (38) holds. Because, for all j ∈ N, 2−j aj ≥ 0, the sequence
(Sn ) formed by the partial sums
Xn
Sn := 2−j aj
j=1

is non-decreasing. Moreover, it is bounded above by x according to P (39).−j Hence by the


monotone sequence theorem the sequence (Sn ), and thus the series ∞ j=1 2 aj , converges.
Because 2 −n → 0 as n → ∞ we can now take the limit n → ∞ in (39) (where for the
right
P∞ −j hand side we use the sum rule for limits) and use the sandwich theorem to prove that
j=1 2 aj = x.
Note: Instead of (38), sometimes the notation
x = 0 · a1 a2 a3 . . .
is used. So, for example 0 = 0 · 0 and 0 · 1̇, where by the dot we indicate a repeating decimal,
i.e. 0 · 1̇ means that, for all j ∈ N, aj = 1.

B.8 A space-filling curve


In this section we will construct the famous Schoenberg curve, which is a so-called space-filling
curve. At the end of this section we will understand why it is called that. To construct it we
will need the Weierstraß M -test.
Let φ : R → [0, 1] be a continuout function with the following properties:
1. For all t ∈ R, φ(t + 2) = φ (i.e. φ has period 2.
2. For all t ∈ 0, 31 , φ(t) = 0.
 

3. For all t ∈ 32 , 1 , φ(t) = 1.


 

It is not difficult to prove that such a function exists. We leave it to the reader to provide
the details of that proof. Figure 13 shows part of the graph of the standard choice for the
Schoenberg function φ, but we really only need the properties above, so we could choose a
different φ which satisfies these properties if we wish.
Now we define the functions f1 : R → R and f2 : R → R by

X φ(32n−2 t)
f1 (t) := ,
2n
n=1

X φ(32n−1 t)
f2 (t) := .
2n
n=1

We need to make sure these functions are well-defined.

140
Lemma B.11. The series f1 and f2 defined above this lemma converge uniformly on R
and are continuous functions on R.
Proof. We use the Weierstraß M -test (Theorem 11.11). Because the codomain of φ is [0, 1],
we have, for all n ∈ N and for all t ∈ R,

φ(32n−2 t) 1 φ(32n−1 t) 1
n
≤ n
and n
≤ n. (41)
2 2 2 2

By (32) we know the geometric series ∞ 1


P
n=1 2n converges and thus the Weierstraß M -test
tells us that f1 and f2 converge uniformly on R.
Because the function φ is continuous, each term in f1 is continuous and so is each term in
f2 . Thus, since both f1 and f2 converge uniformly on R, both are continuous on R.
Now we can define the Schoenberg curve. It is the function f : [0, 1] → R2 , t 7→
(f1 (t), f2 (t)). Because f1 and f2 are both continuous on [0, 1], we know by Corollary 10.9
that f is continuous on [0, 1]. A path in R2 is a continuous function with a nonempty closed
interval as domain and a subset of R2 as codomain. Thus f is a path. The image of a path,
thus f ([0, 1]) in this case, is often called a curve. In this sense f indeed defines a curve.
What is so special about this curve? The next theorem gets to the punchline.

Theorem B.12. If f is the Schoenberg curve as defined above, then f ([0, 1]) = [0, 1]2 .

Proof. We need to prove f ([0, 1]) ⊂ [0, 1]2 and [0, 1]2 ⊂ f ([0, 1]). The former follows quickly,
since (41) and (32) tell us that, for all j ∈ {1, 2},

X 1
0 ≤ fj (t) ≤ = 1.
2n
n=1

To prove that [0, 1]2 ⊂ f ([0, 1]), let (a, b) ∈ [0, 1]2 . We use Lemma B.10 to write a and b in
binary as
∞ ∞
X aj a1 a2 a3 X bj b1 b2 b3
a= = + + + ..., b= = + + + ...,
2j 2 4 8 2j 2 4 8
j=1 j=1

where, for all j ∈ N, aj , bj ∈ {0, 1}. Now we define


∞  
X cn a1 b1 a2 b2
c := 2 =2 + + + + ... ,
3n 3 9 27 81
n=1

where we defined, for all n ∈ N, c2n := bn and c2n−1 := an . In particular, for all n ∈ N,
cn ≤ 1. Thus, by (32), we have
∞ ∞  
X 1 2X 1 2 1
0≤c≤2 = = = 1.
3n 3 3n 3 1 − 1/3
n=1 n=0

This not only shows that the series c converges (see Lemma A.19) and thus c ∈ R, but it
shows that c is in [0, 1], the domain of f , specifically. We now want to calculate f (c). To do
this let k ∈ N. Then
∞ k ∞
k
X cn X cn X cn
3 c=2 n−k
=2 n−k
+2 n−k
.
3 3 3
n=1 n=1 n=k+1

141
The first term
k k
X cn X
2 =2 cn 3k−n
3n−k
n=1 n=1

is a positive even integer, because, for all n ∈ {1, . . . , k}, 3k−n is a positive integer. So since
φ has period 2 we get that, for all k ∈ N, φ(3k c) = φ(dk ), where
∞ ∞
X cn X ck+m
dk := 2 =2 .
3n−k 3m
n=k+1 m=1

Clearly φ(3k c) = φ(dk ) also holds for k = 0, as in that case 3k c = dk .


Let k ∈ N ∪ {0}, then ck+1 is either 0 or 1. If ck+1 is 0 then, by (32),
∞ ∞ ∞ ∞
X ck+m X ck+m X 1 2X 1 1
0 ≤ dk = 2 m
=2 m
≤2 m
= m
= ,
3 3 3 9 3 3
m=1 m=2 m=2 m=0

and so φ(dk ) = 0. If ck+1 = 1 then, by (32),


∞ ∞ ∞
2 2 X ck+m X ck+m X 1
≤ +2 = dk = 2 ≤ 2 = 1,
3 3 3m 3m 3m
m=2 m=1 m=1

and thus φ(dk ) = 1. Hence, in either case we have φ(3k c) = φ(dk ) = ck+1 . Therefore, for all
for n ∈ N,
φ(32n−2 c) = c2n−1 = an and φ(32n−1 c) = c2n = bn .
This gives

X φ(32n−2 c) a1 a2 a3
f1 (c) = = + + + . . . = a,
2n 2 4 8
n=1

X φ(32n−1 c) b1 b2 b3
f2 (c) = = + + + . . . = b.
2n 2 4 8
n=1

We conclude that f (c) = (a, b) and thus [0, 1]2 ⊂ f ([0, 1]).
Note: We have shown that the range of the Schoenberg curve is the whole square [0, 1]2 .
Thus a square can be ‘filled up’ using a curve! This shows that some results can be highly
counterintuitive.

B.9 Monotone surjective functions and continuity


In this section we will prove Theorem 12.5. In the main text, we already proved it in the special
case where J is an open interval. Now we prove the general case where J can be any interval.
Proof of Theorem 12.5. We will first prove the result for the case where f is non-
decreasing. If J = ∅, then I = ∅. This follows from the fact that f is a function (see
Definition A.3). In that case f is continuous, as follows directly from the definition of conti-
nuity (Definition 10.3 or 10.5). So, now we assume J is not empty.
Let β ∈ I and let (xn ) be a sequence in I which converges to β. Let ε > 0. First we assume
that f (β) ∈ J is not an endpoint of J. Then there exist A, B ∈ J such that

f (β) − ε < A < f (β) < B < f (β) + ε.

142
Because f is surjective and thus J = f (I), there exists s, t ∈ I such that

f (β) − ε < f (s) = A < f (β) < B = f (t) < f (β) + ε.

Since f is non-decreasing on I we have s < β < t. Because xn → β as n → ∞, there


exists N ∈ N such that, for all n ≥ N , xn ∈ (s, t). Using again the assumption that f is
non-decreasing, we have, for all n ≥ N ,

f (β) − ε < f (s) ≤ f (xn ) ≤ f (t) < f (β) + ε,

and hence |f (xn ) − f (β)| < ε. Thus limn→∞ f (xn ) = f (β).


Now we consider the case where f (β) is the right-hand endpoint of J. If J consists of only
one element, then f is a constant function on I and so f is continuous. Assume now that J
has at least two elements. Then there exists A ∈ J such that

f (β) − ε < A < f (β).

Because J = f (I), there exists s ∈ I such that

f (β) − ε < f (s) = A < f (β),

and thus s < β, since f is non-decreasing. Because xn → β as n → ∞, there exists N ∈ N


such that, for all n ≥ N , xn > s. Again using the monotonicity (non-decreasing) of f ,
combined with the assumption that f (β) is the right-hand endpoint of J, we deduce that,
for all n ≥ N ,
f (β) − ε < f (s) ≤ f (xn ) ≤ f (β),
and hence |f (xn ) − f (β)| < ε. Thus limn→∞ f (xn ) = f (β).
The proof for the case where f (β) is the left-hand endpoint of J is similar and is left as an
exercise for the reader.
Finally, if f is non-increasing, then the result follows by applying the result proven above to
−f .

B.10 A non-decreasing function on R which is discontinuous at every rational


number
Theorem 12.5 tells us that any monotone surjective function whose domain and codomain are
intervals in R is continuous. In one of the example on page 88 we saw that we cannot leave
out the surjectivity assumption and still expect the statement to be true. In this section we
give another such example, which has the interesting property that it is discontinuous at every
rational number.
First we prove some results that are useful for us here, but that are also interesting in their
own right.

Lemma B.13. Let I ⊂ R be an interval and let (fn ) be a sequence of functions fn : I → R.


Let f : I → R and assume that (fn ) converges to f pointwise. If, for all n ∈ N, fn is
non-decreasing on I, then f is non-decreasing on I. If, for all n ∈ N, fn is non-increasing
on I, then f is non-increasing on I.

Proof. First assume that, for all n ∈ N, fn is non-decreasing on I. For a proof by contra-
diction, assume that f is strictly decreasing. Then there exist x, y ∈ I such that x < y and
f (y) < f (x). Define ε := f (x) − f (y) > 0. Because (fn ) converges to f pointwise, there

143
exists an N1 ∈ N such that, for all n ≥ N1 , |fn (x) − f (x)| < 14 ε. Moreover, there exists an
N2 ∈ N, such that, for all n ≥ N2 , |fn (y) − f (y)| < 41 ε. Define N := max(N1 , N2 ) and let
n ≥ N . Since

0 ≤ fn (x) − fn (y) = fn (x) − f (x) − fn (y) + f (y) + f (x) − f (y),

we have, using the triangle inequality, that

ε = f (y) − f (x) ≤ fn (x) − f (x) − fn (y) + f (y) ≤ |fn (x) − f (x) − fn (y) + f (y)|
1
≤ |fn (x) − f (x)| + |fn (y) − f (y)| < ε.
2
This is a contradiction and thus f is non-decreasing.
In the case where, for all n ∈ N, fn is non-increasing, the result follows by applying the result
proven above to the sequence (−fn ) and the function −f . Details are left as an exercise to
the reader.

Corollary B.14. Let I ⊂ R be an interval and let (fnP ) be a sequence of functions fn :


I → R. Assume that, for all x ∈ I, the series f (x) := ∞ n=1 fn (x) converges. If, for all
n ∈ N, fn is non-decreasing on I, then f is non-decreasing on I. If, for all n ∈ N, fn is
non-increasing on I, then f is non-increasing on I.

Proof. For all N ∈ N and for all x ∈ I, define the partial sums SN (x) := N
P
n=1 fn (x). Then,
per definition, for all x ∈ I, f (x) = limN →∞ SN . If, for all n ∈ N, fn is non-decreasing,
then, for all N ∈ N, the function SN is non-decreasing (as it is a finite sum of non-decreasing
functions; prove this yourself). Hence, by Lemma B.13, f is non-decreasing. If, for all n ∈ N,
fn is non-increasing, then, for all N ∈ N, the function SN is non-increasing and hence, again
by Lemma B.13, f is non-increasing.
Now we are ready to provide a function which is monotone, yet not continuous. Because
the set Q is countable, there exists a sequence (pn ) in Q such that, for all q ∈ Q, there exists
an n ∈ N such that pn = q. Given such a sequence (pn ), for all n ∈ N, define the functions
fn : R → [0.1] by (
0, if x < pn ,
fn (x) := −n
3 , if x ≥ pn .
Now define f : R → R by

X
f (x) := fn (x). (42)
n=1

This series converges pointwise by the comparison test (Lemma A.19 or Corollary
P∞ −n A.20) since,
−n
for all n ∈ N and for all x ∈ R, 0 ≤ fn (x) ≤ 3 , and the geometric series n=1 3 converges
by (32).

Lemma B.15. The function f defined in (42) is non-decreasing. Moreover, for all q ∈ Q,
f is not continuous at q.

Proof. We use the notation introduced above the lemma. By Corollary B.14 f is non-
ecreasing, because, for all n ∈ N, fn is non-decreasing.

144
Let q ∈ Q and let k ∈ N be such that pk = q. Define, for all x ∈ R,
X
gk (x) := fn (x).
n∈N,n6=k

Since, for all x ∈ R, 0 ≤ gk (x) ≤ f (x), the series gk converges pointwise by the comparision
test. Again using Corollary B.14, we find thatgn is non-decreasing. By Theorem 12.2 we then
deduce that there exists an L ∈ − ∞, gk (pk ) , such that limx→pk − gk (x) = L. Note that, for
all x ∈ (−∞, pk ), we have that f (x) = gk (x) + fk (x) = gk (x). Since limx→pk − fk (x) = 0, we
have
lim f (x) = lim gk (x) ≤ gk (pk ) < gk (pk ) + fk (pk ) = f (pk ).
x→pk − x→pk −

Thus f is discontinuous at pk .

B.11 Algebra of derivatives


In this section we prove Theorem A.22 which provides some rules for the algebra of derivatives
(how do derivatives combine when we have sums and products of functions?).
Proof of Theorem A.22. We leave the proofs of statements 1 and 2 as exercises to the
reader, as they are straightforward applications of the definitions together with limit rule 1
and 2 from Lemma A.12, respectively.
To prove the product rule in statement 3, let x ∈ I. We have

f (x)g(x) − f (a)g(a) f (x)g(x) − f (a)g(x) f (a)g(x) − f (a)g(a)


= +
x−a  x−a   x−a 
f (x) − f (a) g(x) − g(a)
= g(x) + f (a) .
x−a x−a

Because f and g are differentiable at a —and thus in particular g is continuous at a and


therefore limx→a g(x) = g(a)— we can take the limit x → a in the expression above and use
the sum and product rules for limits (statements 1 and 3 in Lemma A.12) to find
    
f (x)g(x) − f (a)g(a) f (x) − f (a) g(x) − g(a)
lim = lim g(x) + f (a)
x→a x−a x→a x−a x−a
0 0
= g(a)f (a) + f (a)g (a).

To prove statement 4, we write

(1/g(x)) − (1/g(a)) g(a) − g(x)


= .
x−a (x − a)g(x)g(a)

Using again the fact that g is both continuous and differentiable at a, by taking the limit
x → a and applying the relevant limit rules from Lemma A.12, we find

(1/g(x)) − (1/g(a)) g(a) − g(x) 1


lim = lim = g 0 (a) .
x→a x−a x→a (x − a)g(x)g(a) (g(a))2

B.12 A continuous nowhere differentiable function


In the example on page 92 we constructed the Van der Waerden function f in (20).

145
Lemma B.16. Let f : R → [0, ∞) be the function as defined in (20). Then f is continuous
and, for all x ∈ R, f is not differentiable at x.

Proof. In the example on page 92 it is already proven that f is continuous. Now let x ∈ R.
We will prove f is not differentiable at x. We use the notation as introduced in the example
on page 92 where f was introduced.
Let q ∈ N. There exists an k ∈ Z such that x ∈ [k10−q , (k + 1)10−q ). (We leave the details
of the proof of that statement to the reader; for ideas you might want to have a look at the
example on page 105.) Divide this interval into two ‘halves’:

I1 (q, k) := [k10−q , (k + 1/2)10−q ) and I2 (q, k) := [(k + 1/2)10−q , (k + 1)10−q ).

Now define
yq := x ± 10−q−1
where we choose − or + in such a way x, yq ∈ I1 (q, m) or x, yq ∈ I2 (q, m). In other words,
we choose − or + such that x and yq are both contained in the same ‘half’.
We now want to find an expression for

f (yq ) − f (x) X fn (yq ) − fn (x)
= . (43)
yq − x yq − x
n=1

Note that |yq − x| = 10−q−1 . Let n ∈ N. If n > q, then 10−q−1 is an integer mulitple of 10−n .
That means that the distance of x to the nearest number of the form m10−n with m ∈ Z is
the same as the distance of yq to the nearest number of the form m10−n with m ∈ Z (work
out the details of the proof of that statement for yourself). Hence fn (qn ) = fn (x) and thus
only terms with n ≤ q contribute to the sum in (43).
Since x and yq are either both contained in I1 (q, k) or both contained in I2 (q, k), if n ≤ q,
then there exists an r ∈ Z such that x, yq ∈ I1 (n, r) or x, yq ∈ I2 (n, r). Hence, z is the nearest
point to x of the form m10−n with m ∈ Z if and only if z is the nearest point to yp of the form
m10−n with m ∈ Z. (We leave the details of the proofs of these statements as an exercise to
the reader.) This means that fn (yq ) − fn (x) = x − yq or fn (yq ) − fn (x) = yq − x. Hence
q
f (yq ) − f (x) X
= an =: Aq ,
yq − x
n=0

where, for all n ∈ N, an ∈ {−1, 1}. We do not know the value of Aq , but we know that it is
an integer and that A1 is equal to 0, 2 or −2. Moreover, each time we increase q by 1, we
add −1 or +1 to Aq and so the paritya of Aq changes. Thus, Aq is even if q is odd and Aq is
odd if q is even.
By the definition of yq , we have that yq → x as q → ∞, but the sequence (Aq ) diverges,
because its elements are alternately even and odd. Hence f is not differentiable at x.
a
The parity of an integer refers to its property of “being even” or “being odd”.

B.13 The effect of refinement on Riemann sums


In this section we prove statement 1 from Lemma 14.5. It shows the effect that refinement of a
partition has on the upper and lower Riemann sums.

146
Proof of statement 1 from Lemma 14.5. Because a partition is a finite set, it suffices
to prove the statement for the case where Q = P ∪ {c}, where c ∈ I \ P (thus Q is P plus
one extra point which was not yet in P ). The general case then follows by adding points one
at a time.
Let n ∈ N and let P := {x0 , . . . , xn } be a partition of I. Let c ∈ I \P and define Q := P ∪{c}.
Let k ∈ {1, . . . , n} be such that c ∈ (xk−1 , xk ). Then, using the notation as in Definition 14.2,
 
k−1
X
U (Q, f ) − U (P, f ) =  Mj (Q, f )(xj − xj−1 ) + sup{f (x) : xk−1 ≤ x ≤ c}(c − xk−1 )
j=1
 
n
X
+ sup{f (x) : c ≤ x ≤ xk }(xk − c) +  Mj (Q, f )(xj − xj−1 )
j=k+1
n
X
− Mj (P, f )(xj − xj−1 )
j=1

= (c − xk−1 ) · sup{f (x) : xk−1 ≤ x ≤ c} + (xk − c) · sup{f (x) : c ≤ x ≤ xk }


− (xk − xk−1 ) · sup{f (x) : xk−1 ≤ x ≤ xk },

where we used that, if j 6= k, then Mj (P, f ) = Mj (Q, f ). Because sup{f (x) : xk−1 ≤ x ≤
c} ≤ sup{f (x) : xk−1 ≤ x ≤ xk } and sup{f (x) : c ≤ x ≤ xk } ≤ sup{f (x) : xk−1 ≤ x ≤ xk },
we conclude

U (Q, f ) − U (P, f ) ≤ [(c − xk−1 ) + (xk − c) − (xk − xk−1 )] sup{f (x) : xk−1 ≤ x ≤ xk } = 0.

The proof for inequality of the lower sums uses the same idea. Its details are left to the
reader.
p
B.14 The existence of a rational of the form 10q
in an interval
In the example on page 105 we claimed that, by the density of Q in R, there exists an x ∈
[xk−1 , xk ] ⊂ [0, 1] of the form x = 10pq , for some p, q ∈ N ∪ {0}. Here we provide the proof of
that claim (we will actually prove a slightly more general claim).

Lemma B.17. Let x, y ∈ R with x < y. Then there exist p, q ∈ N ∪ {0} such that
p
10q ∈ [x, y].

Proof. We prove this claim for the case where 0 ≤ x < y. The proofs in the other cases
(x < y ≤ 0 and x < 0 < y) are similar and are left to the reader.
If x or y are of the required form, we are done. If not, then applying the density property of
Q in R twice tells us that there exist k, l, r, s ∈ N such that
k 1 r
x< < (y − x) < < y.
l 2 s
Assume kl and rs are reduced fractions (i.e. for each of them the denominator and numerator
have no common divisors besides 1 and −1). If kl or rs is of the form 10pq , the claim is proven.
Note that ls > 0. Since ks lr
ls < ls , we have ks < lr. qDefine N := lr = −ks. Let m ∈ N ∪ {0}
be large enough such that 10 > ls. Then, since 10lsN > N , there exists an p ∈ N ∪ {0} such
q

that ks10q < pls < lr10q and thus


k ks10q pls p lr10q r
x< = q
< q
= q
< q
= < y.
l ls10 ls10 10 ls10 s

147
B.15 Riemann integrability: from [a, c] and [c, b] to [a, b]
In this section we prove Theorem 14.13
Proof of Theorem 14.13. We use the notation from Definition 14.2. Rc Rc
Let ε > 0. Since f is Riemann integrable on [a, c] and on [c, b], we have a f (x) dx = a f (x) dx
Rb Rb
and c f (x) dx = c f (x) dx. Thus there exists a partition P of [a, c] and a partition Q of
[c, b] such that
Z c Z b
U (P, f ) < f (x) dx + ε and U (Q, f ) < f (x) dx + ε.
a c

Now let S := P ∪ Q. Then S is a partition of [a, b] such that

U (S, f ) = U (P, f ) + U (Q, f ).


Rb
Note that we do not yet know if a f (x) dx exists, but we do know that
Z b Z c Z b
f (x) dx ≤ U (S, f ) ≤ f (x) dx + f (x) dx + 2ε. (44)
a a c

Similarly, there exists a partition P 0 of [a, c] and a partition Q0 of [c, b] such that
Z c Z b
0 0
L(P , f ) > f (x) dx − ε and L(Q , f ) > f (x) dx − ε.
a c

If we define S 0 := P ∪ Q, then S 0 is a partition of [a, b] with

L(S 0 , f ) = L(P 0 , f ) + L(Q0 , f )

and thus Z b Z c Z b
f (x) dx ≥ L(S 0 , f ) ≥ f (x) dx + f (x) dx − 2ε.
a a c

Combining this with (44), we deduce that


Z c Z b Z b Z b Z c Z b
f (x) dx + f (x) dx − 2ε ≤ f (x) dx ≤ f (x) dx ≤ f (x) dx + f (x) dx + 2ε.
a c a a a c

Taking the limit ε → 0 and using the sandwich theorem gives us


Z b Z b Z c Z b
f (x) dx = f (x) dx = f (x) dx + f (x) dx.
a a a c

B.16 Riemann integrability: from [a, b] to [a, c] and [c, b]


In this section we prove Theorem 14.15.
Proof of Theorem 14.15. We use the notation from Definition 14.2.

148
Let ε > 0 and let R and S be partitions of [a, b] such that
Z b Z b
U (R, f ) < f (x) dx + ε and L(S, f ) > f (x) dx − ε.
a a

Such partitions exist because the upper integral is the infimum of the upper sums over
all partitions of [a, b] and the lower integral is the supremum of the lower sums over all
partitions of [a, b]. Now let T := R ∪ S ∪ {c}, then by Lemma 14.5, U (T, f ) ≤ U (R, f ) and
L(T, f ) ≥ L(S, f ). Hence
Z b Z b
U (T, f ) < f (x) dx + ε and L(T, f ) > f (x) dx − ε.
a a

Because f is Riemann integrable on [a, b] the upper and lower integrals are the same and thus

U (T, f ) − L(T, f ) < 2ε.

Now define P := {x ∈ T : x ≤ c} and Q := {x ∈ T : x ≥ c}. Then P is a parition of [a, c]


and Q is a partition of [c, b]. Moreover,

U (T, f ) = U (P, f ) + U (Q, f ) and L(T, f ) = L(P, f ) + L(Q, f ).

Hence
U (P, f ) − L(P, f ) + U (Q, f ) − L(Q, f ) = U (T, f ) − L(T, f ) < 2ε.
In particular, since U (Q, f ) − L(Q, f ) ≥ 0, we deduce

0 ≤ U (P, f ) − L(P, f ) < 2ε.


Rc Rc
Because U (P, f ) ≥ a f (x) dx and L(P, f ) ≤ a f (x) dx, we get
Z c Z c
0≤ f (x) dx − f (x) dx < 2ε.
a a

Taking the limit ε → 0 and using the sandwich theorem gives us


Z c Z c
f (x) dx = f (x) dx.
a a

Thus f is Riemann integrable on [a, c]. A similar argument using the partition Q of [c, b]
proofs that f is Riemann integrable on [c, b]. We leave the details of that proof as an exercise
to the reader.

B.17 The Riemann integral of a sum of two functions


In this section we prove Theorem 14.16.
Proof of Theorem 14.16. We use the notation from Definition 14.2.
If P := {x0 , . . . , xn } is a partition of [a, b], then, for all k ∈ {1, . . . , n},
sup{f (x) + g(x) : xk−1 ≤ x ≤ xk } ≤ sup{f (x) : xk−1 ≤ x ≤ xk } + sup{g(x) : xk−1 ≤ x ≤ xk }.
Therefore, for all k ∈ {1, . . . , n},
Mk (P, f + g) ≤ Mk (P, f ) + Mk (P, g).

149
A similar argument involving the infima (the details of which are left to the reader), shows
that, for all k ∈ {1, . . . , n},

mk (P, f + g) ≥ mk (P, f ) + mk (P, g).

So we have

U (P, f + g) ≤ U (P, f ) + U (P, g) and L(P, f + g) ≥ L(P, f ) + L(P, g).

Let ε > 0 and let R and S be partitions of [a, b] such that


Z b Z b
U (R, f ) < f (x) dx + ε, and U (S, g) < g(x) dx + ε.
a a

Such partitions exist by the properties of the infimum in the upper integral. Now define
P := R ∪ S. Since refinement cannot increase upper sums (Lemma 14.5) this yields
Z b
(f + g)(x) dx ≤ U (P, f + g) ≤ U (P, f ) + U (P, g) ≤ U (R, f ) + U (S, f )
a
Z b Z b
< f (x) dx + g(x) dx + 2ε.
a a

In a similar way (details of which are left to the reader) we find that
Z b Z b Z b
(f + g)(x) dx ≥ L(P, f + g) ≥ L(P, f ) + L(P, g) > f (x) dx + g(x) dx − 2ε.
a a a

Because f and g are both Riemann integrable on [a, b], we have


Z b Z b Z b Z b
f (x) dx = f (x) dx and g(x) dx = g(x) dx.
a a a a

Hence we have
Z b Z b Z b Z b Z b Z b
f (x) dx+ g(x) dx−2ε < (f +g)(x) dx ≤ (f +g)(x) dx < f (x) dx+ g(x) dx+2ε.
a a a a a a

Taking the limit ε → 0, the sandwich theorem gives us.


Z b Z b Z b Z b
(f + g)(x) dx = (f + g)(x) dx = f (x) dx + g(x) dx.
a a a a

B.18 Uniform convergence and the Riemann integral


In this section we prove Theorem 14.20.
Proof of Theorem 14.20. We use the notation from Definition 14.2.
Because (fn ) converges uniformly to f , there exists an N ∈ N such that, for all n ≥ N ,
sup{|fn (x) − f (x)| : x ∈ [a, b]} ≤ 1.
Since fN is a bounded function, there exists a K > 0 such that, for all x ∈ [a, b], |fN (k)| < K.

150
Thus, for all x ∈ [a, b], we have, by the triangle inequality,

|f (x)| ≤ |fN (x)| + |f (x) − fN (x)| ≤ K + 1.


Rb
This proves that f is bounded. Hence the upper and lower Riemann integrals a f (x) dx and
Rb
a f (x) dx exist.
Let ε > 0. Again by the uniform convergence of (fn ) to f there exists L(ε) ∈ N such that,
for all n ≥ L(ε),
sup{|fn (x) − f (x)| : x ∈ [a, b]} ≤ ε.
Let n ≥ L(ε). Let l ∈ N and let P := {x0 , . . . , xl } be a partition of [a, b]. Then we have, for
all k ∈ {1, . . . , l}, Mk (P, f ) ≤ Mk (P, fn ) + ε, and thus
l
X
U (P, f ) ≤ U (P, fn ) + ε(xk − xk−1 ) = U (P, fn ) + ε(b − a). (45)
k=1

Because fn is Riemann integrable, there exists a partition Qn of [a, b] such that U (Qn , fn ) <
Rb
a fn (x) dx + ε. Using P = Qn in (45) gives

Z b Z b
f (x) dx ≤ U (Qn , f ) < fn (x) dx + ε(1 + b − a).
a a

Similarly to above, we have, for all paritions P := {x0 , . . . , xl } (with l ∈ N) of [a, b] and for
all k ∈ {1, . . . , l}, that mk (P, f ) ≥ mk (fn ) − ε and thus

L(P, f ) ≥ L(P, fn ) − ε(b − a). (46)

Because fn is Riemann integrable, there exists a partition Rn of [a, b] such that L(Rn , fn ) >
Rb
a f (x) dx. Using P = Rn in (46) gives
Z b Z b
f (x) dx ≥ L(Rn , f ) > fn (x) dx − ε(1 + b − a).
a a

Hence we have found that, for all n ≥ L(ε),


Z b Z b Z b Z b
fn (x) dx − ε(1 + b − a) < f (x) dx ≤ f (x) dx < fn (x) dx + ε(1 + b − a). (47)
a a a a

Note that we cannot simply take the limit εto0 while keeping n fixed, because n ≥ L(ε) and
Rb Rb
L(ε) can depend on ε. We claim, however, that a f (x) dx = a f (x) dx. Assume for a proof
R 
b Rb
by contradiction that C := 21 a f (x) dx − a f (x) dx > 0. Now let η > 0 be such that
η(1 + b − a) = C, let L(η) ∈ N correspond to the choice ε = η, and let n ≥ L(η). From (47),
we then find
Z b Z b Z b Z b Z b !
1
f (x) dx < fn (x) dx + C = fn (x) dx + f (x) dx − f (x) dx
a a a 2 a a
Z b Z b Z b Z b Z b !
1
f (x) dx > fn (x) dx − C = fn (x) dx − f (x) dx − f (x) dx .
a a a 2 a a

151
Hence
Z b Z b Z b Z b Z b
f (x) dx + f (x) dx < fn (x) dx < f (x) dx + f (x) dx.
a a a a a

Rb
This is a contradiction, hence C = 0 and f is Riemann integrable with a f (x) dx =
Rb Rb
a f (x) dx = a f (x) dx. Going back to (47) this then tells us that, for all ε > 0 there
exists an L(ε) ∈ N such that, for all n ≥ L(ε),
Z b Z b
f (x) dx − fn (x) dx ≤ ε(1 + b − a).
a a

We conclude that Z b Z b
lim fn (x) dx = f (x) dx.
n→∞ a a

References
[1] Pólya, George, The goals of Mathematical Education, part one, transcription of a lecture
from the late 1960s, in ATM Mathematics Teaching 181, 1 December 2002

[2] Schoenberg, I. J. On the Peano curve of Lebesgue, Bull. Amer. Math. Soc. 44 (1938) 519.
doi:10.1090/S0002-9904-1938-06792-4.

[3] Taylor, A. E., L’Hospital’s Rule, The American Mathematical Monthly, Vol. 59, No. 1, (Jan.
1952), pp. 20–24.

[4] van der Waerden, B. L., Ein einfaches Beispiel einer nichtdifferenzierbaren stetigen Funk-
tion, Math. Z. 32 (1930), 474475 (German).

[5] Wikipedia: (ε, δ)-definition of limit, last accessed 16 August 2018; https://
en.wikipedia.org/wiki/(%CE%B5,_%CE%B4)-definition_of_limit (illustration derived
from Lı́mite 01.svg).

[6] Wikipedia: Limit of a sequence, last accessed 16 August 2018; https://en.wikipedia.


org/wiki/Limit_of_a_sequence (illustrations derived from: Folgenglieder im KOSY.svg,
Epsilonschlauch.svg, Epsilonschlauch klein.svg, Epsilonschlauch2.svg).

152
Index
2 times differentiable function, 90 derivative of order 2, 90
Rd , 37 derivative of order 2 at a point for functions on
limx→... f (x) = . . . in R, 64, 117 R, 90
n times differentiable function, 90 derivative of order n, 90
nth derivative, 90 derivative of order n at a point for functions
on R, 90
absolute complement, 114 differentiable at a point for functions on R, 89
absolute convergence test, 124 differentiable function, 89
and, 6 differentiable function on a subset, 89
antiderivative, 110 direct proof, 13
axioms, 8 Dirichlet’s test, 22, 128
bijective, 116 discontinuous at a point for functions on R, 65
binary, 139 discontinuous at a point for functions on Rd ,
Bolzano–Weierstraß in R, 35 66
Bolzano–Weierstraß in Rd , 46 discontinuous function, 66
bound variable, 8 distance, 39
boundary, 50 divergent sequence in R, 25
bounded function, 75 divergent sequence in Rd , 42
bounded sequence in R, 34 divergent series, 122
bounded sequence in Rd , 44 domain, 115
bounded set, 60 endpoints of an interval, 114
Cartesian product, 115 equivalent, 6
Cauchy’s mean value theorem, 95 Euclidean distance, 39
chain rule for derivatives, 126 Euclidean norm, 38
closed interval, 114 everywhere continuous, nowhere differentiable
closed set, 59 function — Van der Waerden, 92
codomain, 115 everywhere continuous, nowhere differentiable
complement, 114 function — Weierstraß, 92
continuous at a point for functions on R, 65 extended real number line, 116
continuous at a point for functions on Rd , 65
first derivative, 89
continuous function, 66
first fundamental theorem of calculus, 109
continuous function on a subset, 66
fixpoint, 87
contrapositive, 13
frontier, 50
convergent sequence, 25
function, 115
convergent sequence in Rd , 42
function series, 79
convergent series, 122
coordinates, 37 geometric series, 123
corollary, 12 global maximum, 116
counterexample, 13 global minimum, 116
curve, 141
half-closed interval, 114
Darboux’s theorem, 98 half-open interval, 114
definition, 10 half-plane in R2 , 49
degenerate interval, 114 half-space in Rd , 61
derivative, 89 hypersphere, 61
derivative of order 1, 90

153
if . . . then, 6 open interval, 114
if and only if, 6 open set, 55
iff, 6 or, 6
image, 115
implies, 6 partial sum, 122
improper Riemann integral, 105 partial sum function, 79
injective, 116 partition of an interval, 100
integral, 105 path, 141
interior of a subset of Rd , 53 pointwise convergence, 76
interior point, 53 pointwise convergent function series, 79
intermediate value theorem, 86 pre-image, 115
interval, 114 product rule for derivatives, 125
product rule for sequences, 30
L’Hôpital’s rule for one-sided limits, 134 proof, 12
L’Hôpital’s rule for two-sided limits, 136 proof by contradiction, 13
left-hand endpoint of an interval, 114 proof by induction, 13
lemma, 11 proposition, 12
local maximum, 116
local minimum, 116 quantifiers, 7
lower integral, 103 quotient rule for derivatives, 125
lower sum, 100 quotient rule for sequences, 30

maximum, 116 range, 115


maximum and minimum theorem for continu- real-valued function, 115
ous real-valued functions, 74 refinement of a partition, 101
Mean value theorem, 95 regularity, 64
minimum, 116 relative complement, 115
monotone function, 84 restriction of a function, 115
monotone sequence, 28 reverse triangle inequality for the Euclidean
monotone sequence theorem, 28 distance, 40
monotonic function, 84 reverse triangle inequality for the Euclidean
monotonic sequence, 28 norm, 39
mutatis mutandis, 43 Riemann integrable function, 104
Riemann integral, 104
nested closed and bounded sets theorem, 62 Riemann lower integral, 103
nested interval theorem, 62 Riemann lower sum, 100
non-decreasing function, 84 Riemann upper integral, 103
non-decreasing sequence, 28 Riemann upper sum, 100
non-increasing function, 84 right-hand endpoint of an interval, 114
non-increasing sequence, 28 Rolle’s theorem, 94
non-Riemann integrable function, 106
nondegenerate interval, 114 sandwich theorem, 30
norm, 38 Schoenberg curve, 141
not, 6 second derivative, 90
second fundamental theorem of calculus, 110
one-one, 116 sequence in R, 24
one-to-one, 116 sequence in Rd , 41
onto, 116 sequence of functions, 76
open ball, 49 sequentially compact, 62
2
open half-plane in R , 55 series, 122

154
set, 15
set difference, 115
space-filling curve, 140
strictly decreasing function, 84
strictly decreasing sequence, 28
strictly increasing function, 83
strictly increasing sequence, 28
strictly monotone function, 84
strictly monotone sequence, 28
strictly monotonic function, 84
strictly monotonic sequence, 28
subsequence of a sequence in R, 33
subsequence of a sequence in Rd , 44
sum rule for derivatives, 125
sum rule for sequences, 30
surjective, 116

taxicab distance, 37
tends to +∞, 27
tends to −∞, 27
theorem, 11
triangle inequality for the Euclidean distance,
40
triangle inequality for the Euclidean norm, 39
twice differentiable function, 90

unbound variable, 8
unbounded subset of Rd , 61
uniform convergence, 77
uniformly convergent function series, 80
upper integral, 103
upper sum, 100

vertices of a partition, 100

Weierstraß M -test, 80

155

You might also like