Foundations of Calculus
Foundations of Calculus
of CALCULUS
John Hutchinson
email: [email protected]
Contents
Introduction 5
Chapter 2. Sequences 31
2.1. Examples and Notation 31
2.2. Convergence of Sequences 31
2.3. Properties of Sequence Limits 34
2.4. Proofs of limit properties 35
2.5. More results on sequences 40
2.6. Bolzano Weierstrass Theorem 42
2.7. FCauchy Sequences 44
In these notes you will study the real number system, the concepts of
limit and continuity, differentiability and integrability, and differential equa-
tions. While most of these terms will be familiar from high school in a more
or less informal setting, you will study them in a much more precise way.
5
6 INTRODUCTION
This is necessary both for applications and as a basis for generalising these
concepts to other mathematical settings.
One important question to be investigated in the last chapter is: when
do certain types of differential equations have a solution, when is there ex-
actly one solution, and when is there more than one solution? The solution
of this problem uses almost all the earlier material. The study of differential
equations is of tremendous importance in mathematics and for its applica-
tions. Any phenomenon that changes with position and/or time is usually
represented by one or more differential equations.
The ideas developed here are basic to further developments in mathe-
matics. The concepts generalise in many ways, such as to functions of more
than one variable and to functions whose variables are themselves functions
(!); all these generalisations are fundamental to further applications.
At the end of the first semester you should have a much better under-
standing of all these ideas.
These notes are intended so that you can concentrate on the relevant
lectures rather than trying to write everything down. There may occcasion-
ally be lecture material on this part of the course which is not mentioned in
the notes here, in which case that will be indicated.
There are quite a few footnotes. This can be annoying. You should
read the footnotes when you initially study the material. But after you have
noted and understood the footnotes, you do not need to reread them every
time. Instead, you should concentrate on the main ideas in the main body
of the notes.
References are to the seventh edition of the text Calculus text by Adams,
and occasionally to the book Calculus by Michael Spivak.
Why go to the lectures? Because the notes are frequently rather formal
(this is a consequence of the precision of mathematics) and it is often very
difficult to see the underlying concepts. In the lectures the material is ex-
plained in a less formal manner, the key and underlying ideas are singled
out and discussed, and generally the subject is explained and discussed in a
manner which it is not possible to do efficiently in print. It would be a very
big mistake to skip lectures.
Do not think that you have covered any of this material in school; the
topics may not appear new, but the material certainly will be. Do the
assignments, read the lecture notes before class. Mathematics is not a body
of isolated facts; each lecture will depend on certain previous material and
you will understand the lectures much better if you keep up with the course.
In the end this approach will be more efficient as you will gain more from
the lectures.
Throughout these notes I make various digressions and additional re-
marks, marked clearly by a star F. This would generally be non-examinable
and is (even) more challenging material. But you should still read and think
about it. It is included to put the subject in a broader perspective, to pro-
vide an overview, to indicate further directions, and to generally “round
out” the subject. In addition, studying this more advanced material will
help your understanding of the examinable material.
INTRODUCTION 7
There are a number of places where I ask why? Don’t just convince
yourself informally that it is indeed so. Write down a careful proof and then
copy it into the margin of these notes.
Some of the proofs of theorems are quite tricky, certainly at first. In this
case just try to understand what the theorem is saying, think about some
examples, and think why the various hypotheses are necessary, and think
about how they are used in the proof. There are examples which discuss
some of these points before or after some of the more difficult theorems.
Studying mathematics is not like reading a book in other subjects. It
may take many hours to understand just one sentence or one paragraph.
When you get stuck, it will often help to eventually continue on, and then
later come back to the difficult points. Also, ask your tutor, your lecturer,
a fellow student, or an assistant in the mathematics “drop in” centre. Do
not let things slide!
The study of Mathematics is not easy, but it is challenging, rewarding
and enjoyable.
CHAPTER 1
10a = 9.999 . . . .
Subtracting gives 9a = 9, and hence a = 1.
The only way a real number can have two decimal expansions is for it
to be of the form
.a1 a2 . . . an−1 an = .a1 a2 . . . an−1 (an − 1)9̇ .
For example,
.2356 = .23559̇ .
1.1.4. Density of the rationals and the irrationals. We claim that
between any two real numbers there is a rational number (in fact infinitely
many of them) and an irrational number (in fact infinitely many).
To see this, first suppose 0 < a < b.
Choose an integer n such that n1 < b − a. Then at least one member m n
of the sequence
1 2 3 4 5
, , , , ,...
n n n n n
will lie between a and b. To see this, take the first integer m such that
a< m m
n . It follows that n < b. Why?
1
Since we can similarly obtain another rational between mn and b, and yet
another rational between this rational and b, etc., etc., there is in fact an
infinite number of rationals between a and b.
1First try to understand this geometrically. Then write out an algebraic proof, which
should only be a couple of lines long!
12 1. THE REAL NUMBERS
1.2.1. Algebraic and Order Axioms. The real number system con-
sists of the real numbers, together with the two operations addition (denoted
by +) and multiplication (denoted by ×) and the less than relation (denoted
by <). One also singles out two particular real numbers, zero or 0 and one
or 1.
If a and b are real numbers, then so are a + b and a × b. We say that
the real numbers are closed under addition and multiplication. We usually
write
ab for a × b.
For any two real numbers a and b, the statement a < b is either true or
false.
We will soon see that one can define subtraction and division in terms
of + and ×. Moreover, ≤, > etc. can be defined from <.
There are a number of points that need to be made at this stage, before
we proceed to discuss the consequences of these axioms.
• By the symbol “=” for equality we mean “denotes the same thing
as”, or equivalently, “represents the same real number as”. We take
“=” to be a logical notion and do not write down axioms for it.3
Instead, we use any properties of “=” which follow from its logical
meaning. For example: a = a; if a = b then b = a; if a = b and
b = c then a = c; if a = b and something is true of a then it is also
true of b (since a and b denote the same real number!).
When we write a 6= b, we just mean that a does not denote the
same real number as b.
• We are not really using subtraction in the algebraic Axiom 4; we
are merely asserting that a real number, with a certain property,
exists. It is convenient to denote this number by −a. A similar
remark applies for Axiom 8.
• The assertion 0 6= 1 in Axiom 7 may seem silly. But it does not
follow from the other axioms, since all the other axioms hold for
the set containing just the number 0.
• Parts of some of the axioms are redundant. For example, from
Axiom 1 and the property a + 0 = a it follows that 0 + a = a.
Similar comments apply to Axiom 4; and because of Axiom 5 to
Axioms 7 and 8.
3FOne can write down basic properties, i.e. axioms, for “=” and the logic we use.
See later courses on the foundations of mathematics.
14 1. THE REAL NUMBERS
Proof. Assume
a + c = b + c.
Since a + c and b + c denote the same real number, we obtain the same result
if we add −c to both; i.e.
(a + c) + (−c) = (b + c) + (−c).
(This used the existence of the number −c from Axiom 4.) Hence
a + (c + (−c)) = b + (c + (−c))
from Axiom 2 applied twice, once to each side of the equation. Hence
a+0=b+0
4A similar remark is not true for subtraction, since (a − b) − c and a − (b − c) are in
general not equal.
1.2. ALGEBRAIC AND ORDER PROPERTIES 15
1. Write out your own proof, following the ideas of the proof of the similar
result for addition.
2. The trick here is to use the fact 0 + 0 = 0 (from A3), together with the
distributive axiom. The proof is as follows:
One has a(0 + 0) = a0
But the left side equals a0 + a0
and the right side equals 0 + a0
Hence a0 + a0 = 0 + a0
Hence a0 = 0.
4. Write out your own proof, along similar lines to the preceding proof.
You should first prove a cancellation theorem for multiplication.
5For example, if we prove that some statement P implies another statement Q, and
if we also prove that P is true, then it follows from rules of logic that Q is true.
6Since a can represent any number in Axiom 4, we can replace a in Axiom 4 by −a.
This might seem strange at first, but it is quite legitimate.
1.2. ALGEBRAIC AND ORDER PROPERTIES 17
6.
a(−b) = a((−1)b)
= (a(−1))b
= ((−1)a)b
= (−1)(ab)
= −(ab)
Prove the second equality yourself.
8.
(−a)(−b) = ((−1)a)(−b)
= (−1)(a(−b))
= −(a(−b))
= −(−(ab))
= ab
true.
In other words, we want to show that if cd = 0 then either c = 0 or d = 0
(possibly both).
The argument is written out as follows:
Claim: If c 6= 0 and d 6= 0 then cd 6= 0
We will establish the claim by proving that if cd = 0 then c = 0 or
d = 0.8
There are two possibilities concerning c;
either c = 0, in which case we are done
or c 6= 0. But in this case, since cd = 0, it follows
c−1 (cd) = c−1 0 and so
d=0
why?; fill in the steps.
10. Exercise
HINT: We want to prove
ac−1 + bd−1 = (ad + bc)(cd)−1 .
First prove that
(ac−1 + bd−1 )(cd) = ad + bc.
Then deduce the required result.
8Note; in mathematics, if we say P or Q (is true) then we always include the possibility
that both P and Q are true.
1.2. ALGEBRAIC AND ORDER PROPERTIES 19
1.2.6. FFields.
The real numbers and the rationals, as well as the integers
modulo a fixed prime number, form a field.
Any set S, together with two operations ⊕ and ⊗ and two members 0⊕
and 1⊗ of S, which satisfies the corresponding versions of Axioms 1–9, is
called a field.
Thus R (together with the operations of addition and multiplication and
the two real numbers 0 and 1) is a field. The same is true for Q, but not for
Z since Axiom 8 does not hold, why?.
An interesting example of a field is the set
Fp = {0, 1, . . . , p − 1}
20 1. THE REAL NUMBERS
for any fixed prime p, together with addition and multiplication defined
“modulo p”; i.e. one performs the usual operations of addition and multi-
plication, but then takes the “remainder” after dividing by p.
Thus for p = 5 one has:
⊕ 0 1 2 3 4 ⊗ 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
It is not too hard to convince yourself that the analogues of Axioms 1–9
hold for any prime p. The axiom which fails if p is not prime is Axiom
8, why?. Note that since Fp is a field, we can solve simultaneous linear
equations in Fp .
The fields Fp are very important in coding theory and cryptography.
R0 R0 H V D0 D R00 I R
R00 R00 D D0 H V I R R0
Thus we have:
Axiom 14 (Completeness Axiom): If a non-empty set A has an upper
bound then it has a least upper bound.
(Remember that when we say A “has” an upper bound or a least upper
bound x we do not require that x ∈ A.)
See Adams page A22, Example 1.
1.3.2. An equivalent formulation. There is an equivalent form of
the axiom, which says: If A is any non-empty set of real numbers with the
property that there is some real number x such that x ≤ a for every a ∈ A,
then there is a largest real number x with this same property. In other words
if a non-empty set A has a lower bound then it has a greatest lower bound.
It is not too hard to see that this form does indeed follow from the
Completeness Axiom. The trick is to consider, instead of A, the set
A∗ := { −x : x ∈ A },
which is obtained by “reflecting” A about 0.
1.3. COMPLETENESS AXIOM 23
√
Proof of claim. Since 2 is not rational it cannot be the required
rational number b.
24 1. THE REAL NUMBERS
√
On the other√ hand, if b < 2, since there is always a rational
√ number
between b and 2 this gives a member of A between b and 2, and so b
cannot be an upper√bound.
√ Finally, if b > 2, there is always a smaller rational number between
2 and b, and so b cannot be the least rational number which is an upper
bound for A. √ √ √
We have ruled out the three possibilities b = 2, b < 2 and b > 2.
This completes the proof of the claim. Hence there is no rational number
which is a least upper bound for A.
F Proof. Suppose that the theorem were false. Then there would be
a real number x with the property that n < x for all n ∈ N. This implies N
is bounded above and so there must be a least upper bound b (say) by the
Completeness Axiom.
In other words,
n ≤ b for every n ∈ N.
It follows that
n + 1 ≤ b for every n ∈ N,
since n + 1 ∈ N if n ∈ N. But this implies
n ≤ b − 1 for every n ∈ N.
In other words, b − 1 is also an upper bound for N, which contradicts the
fact that b is the least upper bound.
Since we have obtained a contradiction by assuming the statment of the
theorem is false, the statement must in fact be true.
The only surprising thing about the Archimedean property is that it
needs the Completeness Axiom to prove it. But there are in fact models of
the algebraic and order axioms in which the Archimedean property is false.
They are sometimes called the Hyperreals! See the next starred section.
The following corollary says that between zero and any positive number
(no matter how small) there is always a number of the form 1/n, where
n ∈ N. This is the same type of result as in Section 1.1.4, which stated that
between any two different real numbers there is always a rational number.
But in Section 1.1.4 we were not very careful, and did not go back to the
axioms to prove the result (as we actually do need to!).
The symbol ε in the following is called “epsilon” and is a letter of the
Greek alphabet. You could replace ε by any other symbol such as x or a,
and the Corollary would have exactly the same meaning. However, it is
traditional in mathematics to use ε when we are thinking of a very small
1.3. COMPLETENESS AXIOM 25
Corollary 1.3.3. For any real number ε > 0 there is a natural number
n such that n1 < ε.
You will not use the hyperreals. You certainly should not refer to them
in any of your proofs.
1.4. Sets
The notion of a set is basic in mathematics. We will not need to study
the theory of sets, but we will need to know some notation and a few basic
properties.
1.4.1. Notation for sets. By a set (sometimes called a class or family)
we mean a collection, often infinite, of (usually mathematical) objects.9
Members of a set are often called elements of the set. If a is a member
(i.e. element) of the set S, we write
a ∈ S.
If a is not a member of S we write
a 6∈ S.
In particular, we frequently prove two sets A and B are equal by first proving
that every member of A is a member of B (i.e. A ⊆ B) and then proving
that every member of B is a member of A (i.e. B ⊆ A).
For example, using the standard notation for intervals of real numbers
in Section ??,
{ x | 0 < x < 2 and 1 ≤ x ≤ 3 } = (0, 2) ∩ [1, 3] = [1, 2),
{ x | 0 < x < 1 or 2 < x ≤ 3 } = (0, 1) ∪ (2, 3].
Also
(0, 2) = (0, 1) ∪ [1, 2) = (0, 1] ∪ [1, 2) = (0, 1) ∪ ( 21 , 2),
etc.
1.4.2. Ordered pairs of real numbers. In the Linear Algebra course
a
you use both the notation and (a, b) to represent vectors in R2 , which we
b
also regard as ordered pairs, or 2-tuples, of real numbers. Of course, (a, b)
and (b, a) are distinct, unless a = b. This is different from the situation for
the set containing a and b; i.e. {a, b} and {b, a} are just different ways of
describing the same set.
We also have ordered triples (a, b, c), and more generally ordered n-tuples
(a1 , . . . , an ), of real numbers.
Remark 1.4.1. F It is sometimes useful to know that we can define
ordered pairs in terms of sets. The only property we require of ordered pairs
is that
(3) (a, b) = (c, d) iff (a = c and b = d).
We could not define (a, b) = {a, b}, because we would not be able to
distinguish between (a, b) and (b, a). But there are a number of ways that
we can define ordered pairs in terms of sets. The standard definition is
(a, b) := {{a}, {a, b}}.
To show this is a good definition, we need to prove (3).
Proof. It is immediate from the definition that if a = c and b = d then
(a, b) = (c, d).
Next suppose (a, b) = (c, d), i.e. {{a}, {a, b}} = {{c}, {c, d}}. We con-
sider the two cases a = b and a 6= b separately.
If a = b then {{a}, {a, b}} contains exactly one member, namely {a},
and so {{c}, {c, d}} also contains exactly the one member {a}. This means
{a} = {c} = {c, d}. Hence a = c and c = d. In conclusion, a = b = c = d.
If a 6= b then {{a}, {a, b}} contains exactly two (distinct) members,
namely {a} and {a, b}. Since {{a}, {a, b}} = {{c}, {c, d}} it follows {c} ∈
{{a}, {a, b}} and so {c} = {a} or {c} = {a, b}. The second equality cannot
be true since {a, b} contains two members whereas {c} contains one member,
and so {c} = {a}, and so c = a.
Since also {c, d} ∈ {{a}, {a, b}} it now follows that {c, d} = {a, b} (oth-
erwise {c, d} = {a}, but since also {c} = {a} this would imply {{c}, {c, d}}
and hence {{a}, {a, b}} has only one member, and we have seen this is not
1.4. SETS 29
so). Since a and b are distinct and {c, d} = {a, b}, it follows c and d are
distinct; since a = c it then follows b = d. In conclusion, a = c and b = d.
This completes the proof.
10F In a manner that can be made precise, and will be in later courses, it is possible to
define a notion of countably infinite and uncountably infinite. The integers, the rationals,
and the set of possible rules, are all countably infinite. The reals, the irrationals and the
set [0, 1], are uncountably infinite. The set of functions from R into R, or even from [0, 1]
into [0, 1], are both uncountably infinite. In fact, these sets of functions can be shown to
be, in a precise manner, a larger infinity than the infinity of real numbers.
CHAPTER 2
Sequences
The reference here is Adams Section 9.1, and the material on pages A-22
and A-23, but we do considerably more. Another reference is Calculus by
M. Spivak.
See Adams page 498, Figure 9.1, for a graphical illustration of limit.
Example 2.2.2. Show that the sequence given by an = 1 + n12 converges
to 1 according to the definition.
Solution. Let ε > 0 be given.
We want to find N such that (4) is true with a = 1.
We have
1
|an − 1| = 2 .
n
Since
1 1
2
< ε if n2 > ,
n ε
i.e.
1
if n > √ ,
ε
we can take
1
N= √ ,
ε
or any larger integer, where [ ] denotes “the integer part of”.
√
Thus if ε = .1 we can take any integer N > 1/ .1, for example N = 4
(or anything larger). If ε = .01 we can take N = 10 (or anything larger).
If ε = .001 we can take N = 32 (or anything larger). But the above proof
works of course for any ε > 0.1
h i h i
1* In the example we took N = 1
√ , the integer part of √1ε , or equivalently the
ε
1
smallest integer greater than √
ε
− 1.
The general statement
∀z > 0 ∃N ∈ N (N > z),
is just the Archimedean Property, which follows from the Completeness Axiom as we saw
before. Thus we are usually using the Archimedean Property when we prove the existence
of limits.
A similar remark applies to the following examples, but we will not usually explicitly
state this fact.
2.2. CONVERGENCE OF SEQUENCES 33
We will give the proofs in the next section. The theorem is more justi-
fication that Definition 2.2.1 does indeed capture the informal notion of a
limit.
The results are not very surprising. For example, if an is getting close
to a and bn is getting close to b then we expect that an + bn is getting close
to a + b.
2
Example 2.3.2. Let an = 1 + √1n − (1 + 2−n ).
We can prove directly from the definition of convergence that √1 → 0
n
and 2−n → 0. It then follows from the previous theorem that 1 + √1n →
1 (since we can think of 1 + √1 as obtained by adding the term 1 from
n
2.4. PROOFS OF LIMIT PROPERTIES 35
the constant sequence (1) to the term √1n ). Applying the theorem again,
2
1 + √1n → 1. Similarly, 1 + 2−n → 1.
Hence (again from the theorem) an → 1 − 1 = 0.
−12
Example 2.3.3. Let an = 3n2n
2 −7n+1 .
Write
2n2 − 1 2 − n12
= .
3n2 − 7n + 1 3 − n7 + n12
Since the numerator and denominator converge to 2 and 3 respectively, it
follows an → 2/3.
See also Adams, pages 499, Example 6.
Before we prove Theorem 2.3.1 there is a technical point. We should
prove that a convergent sequence cannot have two different limits. This is
an easy consequence of the definition of convergence.
This is done in the next section, but try it yourself first of all.
Theorem 2.3.4. If (an ) is a convergent sequence such that an → a and
an → b then a = b.
The next easy result is useful in a number of situations.
Theorem 2.3.5. Suppose an → a. Then the sequence is bounded; i.e.
there is a real number M such that |an | ≤ M for all n.
The next theorem is not true if we replace both occurrences of “≤” by
“<”. For example −1/n < 1/n for all n, but the sequences (1/n) and (−1/n)
have the same limit 0.
Theorem 2.3.6. Suppose an → a, bn → b, and an ≤ bn ultimately.
Then a ≤ b.
The following theorem says that if a sequence is “squeezed” between two
sequences which both converge to the same limit, then the original sequence
also converges, and it converges to the same limit.
Theorem 2.3.7 (Squeeze Theorem). Suppose an ≤ bn ≤ cn ultimately.
Suppose an → L and cn → L. Then bn → L.
Figure 1.
Notice in the proof how the definition of a limit is used three times; once
to get information from the fact an → a, once to get information from the
fact bn → b, and finally to deduce that an + bn → a + b.
By the way, why do we use ε/2 in (14) and (15), and why is this justifiable
by Definition 2.2.1?
Figure 2.
Proof. (Adams gives a proof (see Theorem 3 section 9.1 page 501)
which uses continuity and properties of ln for the first part — here is another
proof that does not use this.)
Since |x| < 1 the sequence |x|n is decreasing10 and all terms are ≥ 0.
Hence |x|n → a (say) by Theorem 2.5.1.
Since |x|n → a, also |x|n+1 → a (Why? ). But |x|n+1 = |x| |x|n → |x| a.
Hence a = |x|a by uniqueness of limits, and so a = 0 as |x| = 6 1.
6The statement “possibly depending on ” is redundant. We include it here for
emphasis, but normally would not include it. Whenever we introduce a constant and
then say there exists a k such that “blah blah involving k and is true” we always mean
that k may, and indeed it almost always does, depend on .
7In fact there are infinitely many such k, as follows from the decreasing property
which we next use. But at this point we are just using the properties of greatest lower
bound, and so just get the existence of one k.
8It follows easily from the definition of convergence that if a → a, then also a
n n+1 → a
(Exercise). This is frequently a useful fact.
9This can either be proved from the definition of a limit (Exercise). Later it will
√
follow easily from the fact that the function f given by f (x) = 6 + x is continuous.
10We could prove this by induction, but that is not really required at this level as it
is routine and assumed you can give a proof if asked.
42 2. SEQUENCES
Remark 2.6.3. The conclusion of the theorem is not true for the non-
bounded interval [1, ∞) or for the nonclosed interval [0, 1). In the first case
consider the sequence 1, 2, 3, . . . , n, . . . and in the second consider the se-
quence 1 − 21 , 1 − 31 , . . . , 1 − n1 , . . . .
Where does the following proof break down in each of these two cases?
Proof of Theorem. Divide the interval [a, b] into two closed bounded
intervals [a, (a + b)/2] and [(a + b)/2, b] each of equal length and with the
common endpoint (a + b)/2. At least one of these two subintervals contains
an (infinite)13 subsequence of the original sequence (cn ). Choose one such
subinterval and denote it by [a1 , b1 ].
Similarly subdivide [a1 , b1 ] and chose a subinterval which contains an
(infinite) subsequence of the infinite subsequence in [a1 , b1 ]. Denote this
interval by [a2 , b2 ].
Similarly subdivide [a2 , b2 ] to obtain [a3 , b3 ] which contains a subse-
quence of the subsequence of the original sequence. Etc., etc.
Now define a convergent subsequence (xn ) from the original sequence
(cn ) as follows. First choose x1 to be any element from the (infinite) subse-
quence corresponding to [a1 , b1 ]. Next choose some x2 from the subsequence
corresponding to [a2 , b2 ] which occurs in the sequence (cn ) after x1 . (Why is
this possible? ) Next choose some x3 from the subsequence corresponding to
[a3 , b3 ] which occurs in the sequence (cn ) after x2 . (Why is this possible? )
Etc., etc.
We now have
a1 ≤ b1 , a1 ≤ a2 ≤ b2 ≤ b1 , a1 ≤ a2 ≤ a3 ≤ b3 ≤ b2 ≤ b1 , ... ,
x1 ∈ [a1 , b1 ], x2 ∈ [a2 , b2 ], x3 ∈ [a3 , b3 ], ... ,
12A bounded interval is one for which there is both an upper and lower bound, not
necessarily in the interval. In particular, (0, 1), (0, 1] and [0, 1] are all bounded. However,
(−∞, 0], and [1, ∞) are not bounded.
A closed interval is one which contains all of its “finite” endpoints. Thus [0, ∞) and
[0, 1] are both closed. The only closed bounded intervals are those of the form [a, b], where
a and b are both real numbers (and a < b in cases of interest!).
13We use the word “infinite” only for emphasis. By our definition, any sequence is
infinite in the sense it contains an infinite number of terms. Of course, some or even all
of the terms may be equal. Consider the sequence 1, 1, 1, . . . , 1, . . . .
44 2. SEQUENCES
Remark 2.7.3. The idea behind the second part of the above proof,
from “By the Bolzano Weierstrass ...” onwards, is as follows:
(1) if we go out far enough in the original sequence (an ) then every two
elements of the sequence are within /2 of one another;
(2) if we go out far enough in the subsequence then every element is
within /2 of a.
Putting these two facts together, if we go out far enough in the original
sequence then every element is within of a.
where K is a positive real and 0 ≤ r < 1. Then the sequence is Cauchy and
hence converges.
46 2. SEQUENCES
Continuous Functions
47
48 3. CONTINUOUS FUNCTIONS
Proof. Let
f (x) = a0 + a1 x + a2 x2 + · · · + ak xk .
To show that f is continuous at some point c, suppose xn → c.
Then x2n → c2 , x3n → c3 , etc., by the theorem about products of con-
vergent sequences. It follows that a1 xn → a1 c, a2 x2n → a2 c2 , a3 x3n → a3 c3 ,
etc., by the theorem about multiplying a convergent sequence by a constant.
Finally,
a0 + a1 x + a2 x2 + · · · + ak xk → a0 + a1 c + a2 c2 + · · · + ak ck
by repeated applications of the theorem about sums of convergent sequences
(a0 is here regarded as a constant sequence).
The following diagram is misleading, since between any two real numbers
there is both a rational and an irrational number.
The function f is continuous at 0. To see this, suppose xn → 0. Then
|xn | → 0 (this follows from the definition of a limit). Since −|xn | ≤ f (xn ) ≤
|xn |, it follows from the Squeeze Theorem that f (xn ) → 0, i.e. f (xn ) → f (0).
On the other hand, f is not continuous at c if c 6= 0. For example if c is
irrational then we can choose a sequence of rationals xn such that xn → c
(by repeated applications of the remark above in italics). It follows that
f (xn ) = xn → c 6= f (c). Similarly if c is irrational.
We will later define the exponential, logarithm, and trigonometric func-
tions, and show they are continuous. Meanwhile, we will use them in exam-
ples (but not in the development of the theory).
The proof in the other cases is similar. Just note for the case f /g that
if xn → c and g(c) 6= 0, then g(xn ) 6= 0 for all sufficiently large n.6
The composition of two continuous functions is continuous. (See Adams
page 35 for a discussion about the composition of functions.)
Theorem 3.3.2. Suppose f and g are continuous. Then f ◦ g is contin-
uous.
8We define (x , y ) → (u, v) to mean that |(x , y ) − (u, v)| → 0. This can be easily
n n n n
shown to be equivalent to xn → u and yn → v. Exercise.
CHAPTER 4
Limits
Example 4.1.3. For the two examples at the beginning of this Chapter
it follows from Definition 4.1.1 that limx→1 f (x) = 2. Why? In fact we have
in both cases that limx→a f (x) = 2a for every real number a. Why?
Other simple examples are
f (x) = 2x for x ∈ D(f ) = [0, 1], f (x) = 2x for x ∈ D(f ) = [0, 1).
In both cases limx→1 f (x) = 2.
57
58 4. LIMITS
Finally, if
f (x) = 2x for x ∈ D(f ) = [0, 1] ∪ {3},
then limx→3 f (x) is not defined, why? In particular, limx→3 f (x) = 6 is not
a true statement, since there is no limit of f at 3.
Proof. (We just do the quotient. The other cases are similar and
slightly easier. Do them yourself ! )
Suppose xn → a where (xn )n≥1 is any sequence such that for all n,
xn ∈ D(f ) ∩ D(g).
Then f (xn ) → L and g(xn ) → M . (Why? ) Moreover, since M 6= 0 it
follows that g(xn ) 6= 0 for all sufficiently large n (why? ) and so f (xn )/g(xn )
is a real number for such n.
Since M 6= 0 it follows from Theorem 2.3.1 that f (xn )/g(xn ) → L/M .
The fourth claim in the Theorem now follows from Definition 4.1.1.
1It follows that a is a limit point of both D(f ) and D(g). Why? The important and
simple case to keep in mind is where f and g have the same domain and this domain is
an interval.
4.4. DEFINITIONS OF LIMIT AND CONTINUITY VIA − δ 59
Note that Adams does not need to consider the first paragraph in the
Proposition, since the type of domain he considers does not contain isolated
points.2
Proposition 4.3.1. Suppose f is a function and c ∈ D(f ). If c is not
a limit point of D(f ), i.e. c is an isolated point of D, then f is continuous
at c.
If c is a limit point of D(f ) then f is continuous at c iff limx→c f (x) =
f (c).
The idea of the definition is that for any given “tolerance” > 0, there
is a corresponding “tolerance” δ > 0, such that any x in the domain of f
within distance δ of a, and not equal to a, gives an output (or value) f (x)
which is within of L.
Inputs and Outputs Analogy. For any allowable x (i.e. x ∈ D(f )), input
x into the f machine gives output f (x).
Suppose there are allowable inputs arbitrarily close to a. Then limx→a f (x) =
L is equivalent to the following:
For every output tolerance > 0 for the deviation of f (x) from L, there
is a corresponding input tolerance δ > 0 (normally depending on ) for the
deviation of x from a, such that whenever the input tolerance is satisfied by
x 6= a, then the output tolerance is satisfied by f (x).
Proof. Almost exactly the same as for the proof of Theorem 4.4.2.
Write it out yourself to consolidate your understanding.
If for each > 0 there is a δ > 0 which works for all c ∈ D(f ) then we
say that f is uniformly continuous on its domain.
Definition 4.5.1. A function f is uniformly continuous on its domain if
for every > 0 there is a corresponding δ > 0 such that, for every c ∈ D(f ),
x ∈ D(f ) and |x − c| < δ =⇒ |f (x) − f (c)| < .
Differentiation
5.1. Introduction
The theory of differentiation allows us to analyse the concept of the slope
of the tangent to the graph of a function. Similarly, it allows us to find the
best linear approximation to a function near a given point.
If we write y = f (x) then we can interpret f 0 (a) in the following way: “for
x near a, y is changing approximately f 0 (a) times as fast as x is changing”.
Alternatively, for x ≈ a (“x approximately equal to a”) we have
f (x) ≈ f (a) + f 0 (a)(x − a).
See Adams Chapter 4.9, where this is made more precise.
There are many problems that can then be analysed using the ideas of
differentiation and extensions of these ideas. For example, anything that
changes with time or position, as well as optimisation problems (e.g. in
economics or engineering) and approximation problems. See Adams Chapter
3 for a number of examples.
1Thus the allowable domains are (a, b), [a, b], (a, b], [a, b), (the four bounded possibil-
ities); (a, ∞), [a, ∞), (−∞, b), (−∞, b], (−∞, ∞), (the five unbounded possibilities).
65
66 5. DIFFERENTIATION
The symbol is the evaluation symbol, and signifies that the function pre-
ceding it should be evaluated at a. If there is any doubt as to what is the
dependent variable, one replaces by .
a x=a
dy
The dx type notation is called Leibniz notation after its inventor. It is
very good for computations and for motivating some results. If one thinks
of
∆y = f (x + h) − f (x)
as being the increment in y and
∆x = (x + h) − h = h
as being the increment in x, then
dy ∆y
= lim .
dx ∆x→0 ∆x
However, the Leibniz notation should not be used when proving theorems
rigorously. It is often ambiguous in more complicated situations, and this
can easily lead to logical errors. See the discussion before Theorem 5.3.4 for
a good example of what can go wrong.
See Adams pp 103-105 for more discussion of notation.
But
f (a + h) − f (a)
f (a + h) = f (a) + h .
h
Taking the limit as h → 0 of the right side, we see this limit exists and hence
so does the limit of the left side, and both are equal. That is
lim f (a + h) = f (a) + 0f 0 (a) = f (a).
h→0
A similar proof applies if a is an endpoint of the domain of f .
Proof.
f (x + h) − f (x) (c(x + h) + d) − (cx + d) ch
lim = lim = lim = c.
h→0 h h→0 h h→0 h
The next theorem follows in a fairly straightforward way from the prop-
erties of limits given in Theorem 4.2.1.
Theorem 5.3.2. If f and g are differentiable at x and c is a real number,
then the following functions are differentiable at x with derivatives as shown.
(f ± g)0 (x) = f 0 (x) ± g 0 (x)
(cf )0 (x) = cf 0 (x)
(f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x)
0
1 −g 0 (x)
(x) =
g (g(x))2
0
f f 0 (x)g(x) − f (x)g 0 (x)
(x) =
g (g(x))2
In the last two cases we also assume g(x) 6= 0.
5.3.2. The
√ Chain Rule. In order to compute the derivatives of func-
tions such as 1 + x2 we need the Chain Rule. You have probably seen the
Chain Rule in the form
dy dy du
= ,
dx du dx
where u = g(x), y = f (u) and y = f (g(x)).
The informally stated motivation is that for a given value of x = a and
corresponding value u = g(a),
du
at x = a, u is changing times as fast as x,
dx
and
dy
at u = g(a), y is changing times as fast as u
du
du dy
(where is evaluated at x = a and is evaluated at u = g(a)). So that
dx du
dy du
at x = a, y is changing × times as fast as x.
du dx
In functional notation
(f ◦ g)0 (a)) = f 0 (g(a)) g 0 (a) or (f ◦ g)0 (x)) = f 0 (g(x)) g 0 (x).
An incorrect “proof” along these lines is often given for the chain rule
by writing
dy ∆y ∆y ∆u ∆y ∆u
= lim = lim = lim lim
dx ∆x→0 ∆x ∆x→0 ∆u ∆x ∆x→0 ∆u ∆x→0 ∆x
∆y ∆u dy dy
= lim lim =
∆u→0 ∆u ∆x→0 ∆x du dx
The second last step is “justified” by saying that ∆u → 0 as ∆x → 0. This
is all rather sloppy, because it is not clear what depends on what.
When one tries to fix it up, there arises a serious difficulty. Namely,
the increment ∆u = u(x + ∆x) − u(x) (which depends on ∆x) may be
zero although ∆x 6= 0. A trivial example is if u is the constant function.
There is the same difficulty when u is not constant, but there are points
x + ∆x arbitrarily close to x such that u(x + ∆x) = u(x) (such as with
u(x) = x2 sin(1/x) for x 6= 0 — see Example 5.3.5).
This becomes clearer when we write out the argument in a more precise
functional notation. See Adams Q46 p119.
We now state the Chain Rule precisely, and refer to Adams for a (correct)
proof.
Theorem 5.3.4 (Chain Rule). Assume the function f is differentiable at
g(x) and the function g is differentiable at x. Then the composite function
f ◦ g is differentiable at x and
(f ◦ g)0 (x) = f 0 (g(x)) g 0 (x).
(We also assume that g is defined in some open interval containing x and
f is defined in some open interval containing g(x), although this can be
generalised a bit.)
5.3. COMPUTING DERIVATIVES 71
Proof. See Adams p118. To help understand the proof, note that the
“error” function E(k) is the difference between the slope of the line through
the points (u, f (u)) and (u + k, f (u + k)) on the graph of f , and the slope
of the tangent at the point (u, f (u)) and (u + k, f (u + k)) on the graph of
f . (Draw a diagram like the first one in this chapter.)
Example 5.3.5.
The last limit follows easily from the Squeeze Theorem in Adams p69 applied
with ±x. (This Squeeze Theorem for limits follows easily from the Squeeze
Theorem 2.3.7 here, for sequences. Exercise.)
Thus f is differentiable for all x, but the derivative is not continuous at
0.
Remark 5.5.2. The proof used the Max-Min Theorem 3.4.1, which in
turn required the Completeness Axiom. Give an example that shows the
result would not be true if we did not assume the Completeness Axiom.
(The fact that our proof relied on the Completeness Axiom does answer the
question, why? )
HINT: See Remark 3.4.2.
Proof. Suppose x1 < x2 . (The proof is similar if x2 < x1 and the result
is trivial if x1 = x2 .)
By the Mean Value theorem there exists a number c between x1 and x2
such that
f (x1 ) − f (x2 ) = f 0 (c)(x1 − x2 ),
and so
|f (x1 ) − f (x2 )| = |f 0 (c)| |x1 − x2 | ≤ K |x1 − x2 |.
The corollary is not true if the domain of f is a finite union of more than
one interval. In this case the function is constant on each interval, but the
constant may depend on the interval.
A useful application of Corollary 5.5.5 is to prove that complicated ex-
pressions are equal. For example, to prove that f (x) = g(x) for all x in
some interval, it is sufficient to prove that the functions f and g are equal
at a single point c and that their derivatives are equal everywhere.
To see this apply the corollary to the function f (x)−g(x). The derivative
is zero and so the function is constant; but the constant is zero since f (c) −
g(c) = 0.
The Mean Value Theorem leads to a result which enables us to decide
where a function is increasing or decreasing.
Definition 5.5.6. We say a function f is
increasing if x1 < x2 implies f (x1 ) < f (x2 ),
decreasing if x1 < x2 implies f (x1 ) > f (x2 ),
non-decreasing if x1 < x2 implies f (x1 ) ≤ f (x2 ),
non-increasing if x1 < x2 implies f (x1 ) ≥ f (x2 ).
Draw a diagram.
The partial derivative with respect to y at (x0 , y0 ) ∈ A is defined by
∂f f (x0 , y0 + h) − f (x0 , y0 )
(x0 , y0 ) = lim .
∂y h→0 h
Think of the line parallel to the y-axis through the point (x0 , y0 ), and
think of f as a function with domain restricted to this line, i.e. f is a function
of y with x fixed to be x0 . Draw a diagram. Then ∂f /∂y(x0 , y0 ) is just the
ordinary derivative with respect to y.
If we know that
∂f
(x, y) ≤ K
∂y
at every point (x, y) in the open rectangle A, then it follows from Corol-
lary 5.5.4 that
|f (x, y1 ) − f (x, y2 )| ≤ K|y1 − y2 |
for every (x, y1 ), (x, y2 ) ∈ A.
We will use this in the proof of Theorem 7.3.1.
CHAPTER 6
Integration
The main references in Adams are Sections 5.1–5.5 and Appendix IV.
Integration allows us to find areas and volumes bounded by curves and surfaces.
It is rather surprising at first, but there is a close relationship between inte-
gration and differentiation; each is the inverse of the other. This is known as the
Fundamental Theorem of Calculus. It allows us to find areas by doing the reverse
of differentiation.
Integrals are also used to express lengths of curves, work, energy and force,
probabilities, and various quantities in economics, for example.
6.1. Introduction
The topic of this chapter is the concept of “area under a graph” in a quantitative
sense and the elucidation of some of its properties.
Everyone would be happy with the definition: “the area of a rectangle is the
product of its length and breadth”. The problem is more difficult with more com-
plicated plane figures. The circle, for example, has “area πr2 ”; but is this “area”
the same concept as that applied to rectangles?
In everyday life one often needs only an approximation to the area of, say, a
country or a field. If pressed one would calculate it approximately by filling it as
nearly as possible with rectangles and summing their area. This is very close to
what we do here in giving a precise definition of the concept of area.
77
78 6. INTEGRATION
might expect, though it agrees with our intuition in the case when f (x) ≥ 0 for all
x ∈ [a, b].
We begin by defining a partition of [a, b]; this is simply a finite set of points
from [a, b] which includes a and b. Thus P = {0, 1/4, 1} is a partition of [0, 1], and
P = {−1, − 1/2, − 1/4, 3/4, 1} is a partition of [−1, 1].
The general notation for a partition P of [a, b] is
P = {a = x0 , x1 , x2 , . . . , xn = b }.
We will assume always that a = x0 < x1 < . . . < xn = b. The ith subinterval is
[xi−1 , xi ] and its length is defined to be
∆xi := xi − xi−1 .
6.2. THE RIEMANN INTEGRAL 79
With each partition P of [a, b] we associate the so-called upper and lower sums.
To define these we need the following notation: write
That is, Mi is the maximum value and mi the minimum value of f on the ith
sub-interval [xi−1 , xi ] of the partition. These exist because f is continuous on the
closed bounded interval [xi−1 , xi ].
The upper sum of f over P is defined by
n
X
U (P, f ) = Mi ∆xi ,
i=1
(See Adams p289 for a discussion of the summation notation.) Roughly speak-
ing L(P, f ) is the sum of the areas of all the rectangles whose bases are the sub-
intervals [xi−1 , xi ] and which just fit under the graph of f . Similarly U (P, f ) is the
sum of the areas of all the rectangles whose bases are the sub-intervals [xi−1 , xi ]
and which just contain the graph of f . At least this is the case when f (x) ≥ 0. In
other cases the interpretation is less simple. Various possibilities are illustrated in
the diagrams below.
Example 6.2.1. Let f (x) = 1 − 2x on [0, 1], and let P = {0, 1/4, 1/3, 2/3, 1}.
Find L(P, f ) and U (P, f ).
80 6. INTEGRATION
Here
M1 = 1 m1 = 1/2 ∆x1 = 1/4
M2 = 1/2 m2 = 1/3 ∆x2 = 1/12
M3 = 1/3 m3 = −1/3 ∆x3 = 1/3
M4 = −1/3 m4 = −1 ∆x4 = 1/3
and so
4
X 1 1 1 1 1 −1 −1 7
L(P, f ) = mi ∆xi = · + · + + =−
i=1
2 4 3 12 3 3 3 24
4
X 1 1 1 1 1 1 1 7
U (P, f ) = Mi ∆xi = 1 · + · + · + − =
i=1
4 2 12 3 3 3 3 24
Exercise 6.2.2. Let f (x) = cos x on [−π/2, π] and P = {−π/2, −π/4, 0, π/2, π}.
Show that
1 π π
L(P, f ) = √ · −
2 4 2
and
1 π 3π
U (P, f ) = √ · + .
2 4 4
We now develop the properties of upper and lower sums that we need.
Lemma 6.2.3. Let f be a continuous function on [a, b] and P be a partition of
[a, b]. Then L(P, f ) ≤ U (P, f ).
Draw a diagram and you will see how obvious this result is.
The following lemma is obvious from a few diagrams. We need a proof that
does not rely on particular examples. This is straightforward.
6.2. THE RIEMANN INTEGRAL 81
Proof. We can get P2 from P1 by successively adding one new point at a time.
If therefore, we can show that adding one new point to a partition has the effect of
not decreasing the lower sum and not increasing the upper sum, we will be done.
In other words we might as well suppose that P2 is obtained from P1 by adding one
more point.
Suppose therefore that P1 = {a = x0 , x1 , x2 , . . . , xn = b} and that P2 =
P1 ∪ {x} 2 with x ∈ (xi−1 , xi ). Let Mj , mj (1 ≤ j ≤ n) be the maximum and
minimum values of f on [xj−1 , xj ]. Let M 0 , m0 be the maximum and minimum
values for f on [xi−1 , x]; and M 00 , m00 be the maximum and minimum values for f
on [x, xi ]. Note that
m0 ≥ mi , m00 ≥ mi
and
M 0 ≤ Mi , M 00 ≤ Mi
because when passing from an interval to a subinterval, the minimum value cannot
decrease and the maximum value cannot increase.
Then
L(P2 , f ) − L(P1 , f ) = m0 (x − xi−1 ) + m00 (xi − x) − mi (xi − xi−1 )
≥ mi (x − xi−1 ) + mi (xi − x) − mi (xi − xi−1 )
= 0,
and
U (P1 , f ) − U (P2 , f ) = Mi (xi − xi−1 ) − M 0 (x − xi−1 ) − M 00 (xi − x)
≥ Mi (xi − xi−1 ) − Mi (xi − x) − Mi (xi − x)
= 0.
That is
L(P1 , f ) ≤ L(P2 , f ) and U (P2 , f ) ≤ U (P1 , f ).
2This notation just means that P is the union of the set P and the set {x} containing
2 1
the single point x.
82 6. INTEGRATION
In words: refining a partition increases lower sums and decreases upper sums.
Corollary 6.2.5. If f is continuous on [a, b] and if P1 , P2 are arbitrary par-
titions of [a, b], then L(P1 , f ) ≤ U (P2 , f ).
Proof. The partition P obtained by using all the points of P1 and P2 together,
i.e. P is the union of P1 and P2 , is a refinement of both P1 and P2 . Hence
L(P1 , f ) ≤ L(P, f ) ≤ U (P, f ) ≤ U (P2 , f ),
by Lemma 6.2.3 and Lemma 6.2.4.
In other words : every lower sum is less than or equal to every upper sum.
The important consequence we need is this : since the lower sums L(P , f ) are
all bounded above (by every upper sum in fact) the set of lower sums has a least
upper bound. Similarly the set of upper sums is bounded below (by any lower sum)
so the set of upper sums has a greatest lower bound. We define the lower integral
of f from a to b and the upper integral of f from a to b by
Z b
L f := lub{ L(P, f ) : P is a partition of [a, b] }
a
Z b
U f := glb{ U (P, f ) : P is a partition of [a, b] }
a
respectively.
The next lemma just uses the fact that every lower sum is ≤ every upper sum.
It will soon be replaced by the stronger result that (for continuous functions) the
lower and upper integrals are in fact equal.
Rb Rb
Lemma 6.2.6. Let f be a continuous function on [a, b]. Then L a f ≤ U a f .
Remark 6.2.7. F Everything we have done so far can also be done with an
arbitrary bounded 3 function f defined on [a, b], except that we must define
Mi = lub{ f (x) : xi−1 ≤ x ≤ xi }, 1 ≤ i ≤ n,
mi = glb{ f (x) : xi−1 ≤ x ≤ xi }, 1 ≤ i ≤ n.
Lemma 6.2.3, Lemma 6.2.4, Corollary 6.2.5 and Lemma 6.2.6 are still valid, with
similar proofs as for continuous functions, but with “min” replaced by “glb” and
“max” replaced by “lub”.
3A function is bounded if there exist numbers A and B such that A ≤ f (x) ≤ B for
every x in the domain of f . Thus any continuous function defined on [a, b] is bounded.
But the function f , with f (x) = 1/x for x 6= 0 and f (0) = 0, is not bounded on its domain
R.
6.2. THE RIEMANN INTEGRAL 83
implication the difference between the maximum value Mi and the minimum mi of
the function f on the ith interval must be < ε/(b − a). Hence
N
X N
X
U (P, f ) − L(P, f ) = Mi ∆xi − mi ∆xi
i=1 i=1
N
X
= (Mi − mi ) ∆xi
i=1
ε
< (∆x1 + · · · + ∆xN )
b−a
ε
= (b − a) = ε.
b−a
This proves (29).
From the definition of the lower and upper integrals, and Lemma 6.2.6,
Z b Z b
L(P, f ) ≤ L f ≤U f ≤ U (P, f ).
a a
Since the difference between the outer two terms is < ε by (29), the difference
between the inner two terms is also < ε. That is
Z b Z b
U f −L f < ε.
a a
Rb Rb
Since this holds for every ε > 0 it follows that U a
f =L a
f.
where ui and li are points in the ith interval [xi−1 , xi ] for which f takes its maximum
and minimum values respectively. More generally, we can define a general Riemann
sum corresponding to the partition P by
N
X
R(P, f ) = f (ci ) ∆xi ,
i=1
where each ci is an arbitrary point in [xi−1 , xi ]. Note that this notation is a little
imprecise, since R(P, f ) depends not only on the partition P , but also on the points
ci chosen in each of the intervals given by P .
Note that
(30) L(P, f ) ≤ R(P, f ) ≤ U (P, f ).
Let the maximum length of the intervals in a partition P be denoted by kP k.
6.4. PROPERTIES OF THE RIEMANN INTEGRAL 85
Theorem 6.3.1. Z b
lim R(P, f ) = f.
kP k→0 a
More precisely, for any ε > 0 there exists a number δ > 0 (which may depend on
ε) such that
Z b
whenever kP k < δ then R(P, f ) − f < ε.
a
Proof. The proof of Theorem 6.2.10 in fact showed that if kP k < δ then
U (P, f ) − L(P, f ) < ε.
Since
L(P, f ) ≤ R(P, f ) ≤ U (P, f )
and Z b
L(P, f ) ≤ f ≤ U (P, f )
a
it follows that Z b
R(P, f ) − f < ε.
a
Remark 6.4.1. The (easy) theorems in this section apply more generally with
minor modifications, provided the functions are Riemann integrable, and not neces-
sarily continuous. For example, in Theorem 6.4.3 we need to replace the minimum
m by the glb of the set of values, and similarly for the maximum M .
In particular, piecewise continuous functions (see Adams p309) on a closed
bounded interval are integrable, and have the same properties as below. This
essentially follows from writing each integral as a sum of integrals over intervals on
which all the relevant functions are continuous.
86 6. INTEGRATION
Proof. The main point in the proofs of (31) and (32) is that similar properties
are true for the Riemann sums used to define the integrals. See Adams, A-30,
Exercise 6.
To prove (33), note that
−|f (x)| ≤ f (x) ≤ |f (x)|
for all x. From (32)
Z b Z b Z b
−|f | ≤ f≤ |f |.
a a a
From (31) this gives
Z b Z b Z b
− |f | ≤ f≤ |f |.
a a a
This implies (33).
Proof. Consider the partition P = {a, b} containing just the two points a
and b. Since
L(P, f ) = m(b − a), U (P, f ) = M (b − a),
and
Z b Z b Z b
L(P, f ) ≤ L f= f =U f ≤ U (P, f ),
a a a
the result follows.
Proof. TheR a first is really a definition. It also follows if we use the same
definition of b f as in the case b < a, but allow “decreasing” partitions where
∆ xi < 0.
The second is again by definition. It also follows if we use the same definition
of the integral as when the endpoints are distinct, except that now the points in
any “partition” are all equal and so ∆ xi = 0.
The third is straightforward. See **** for details.
Proof. Choose l and u to be minimum and maximum points for f on [a, b].
Then from (34) it follows that
Rb
f
f (l) ≤ a ≤ f (u),
b−a
By the Intermediate Value Theorem applied to the function f on the interval [l, u]
or [u, l] (depending on whether l ≤ u or u ≤ l), there exists c between l and u such
that Rb
f
f (c) = a .
b−a
This gives the result.
The following theorem essentially says that differentiation and integration are
reverse processes. Rx
In the first part of the theorem we consider the integral a f 4 as a function of
the endpoint x (we allow x ≤ a as well as x > a) and prove: the derivative of the
integral of f gives back f .
Rb
In the second part, we are saying that in order to compute a f it is sufficient
to find a function G whose derivative is f and then compute G(b) − G(a).
To put the second assertion in a form that looks more like the “reverse” of the
first, we could write it in the form
Z x
G0 = G(x) − G(a),
a
Z b
f = G(b) − G(a).
a
Proof.
R x+h
Figure 6. The area of the shaded region is x f.
6.5. FUNDAMENTAL THEOREM OF CALCULUS 89
d
For the second assertion, suppose G(x) = f (x) on the interval I.
Z xdx
d
But we have just seen that f = f (x). It follows that the derivative of
dx a
the function, given by Z x
G(x) − f,
a
is G0 (x) − f (x) = 0 on the interval I. Thus this function is constant on I by
Corollary 5.5.5.
Setting x = 0 we see that the constant is G(a). Hence
Z x
G(x) − f = G(a)
a
for all x ∈ I. Taking x = b now gives the second assertion.
CHAPTER 7
FDifferential Equations
7.1. Overview
The differential equation
dy
(40) = f (x, y)
dx
requires that the gradient of the function y = y(x) at each point (x, y) on its graph
should equal f (x, y) for the given function f .
Suppose that at each point (x, y) on the x − y plane we draw a little line whose
slope is f (x, y); this is the slope field. Then at every point on the graph of any
solution to (40), the graph should be tangent to the corresponding little line. In
the following diagram we have shown the slope field for f (x, y) = y + cos x and the
graph of three functions satisfying the corresponding differential equation.
It is plausible from the diagram that for any given point (x0 , y0 ) there is exactly
one solution y = y(x) satisfying y(x0 ) = y0 . This is indeed the case here, and is
true under fairly general conditions.
91
92 7. FDIFFERENTIAL EQUATIONS
But it is not always true. For example, if f (x, y) = y 2/3 then there is an infinite
set of solutions satisfying y(0) = 0. Namely, for any real numbers a ≤ 0 ≤ b,
(x−a)3
27
x≤a
y= 0 a≤x≤b
(x−b)3
27 x ≥b
is a solution, (check it). See the following diagram. The problem here is that
although f (x, y) is continuous everywhere, (∂/∂y)f (x, y) = 2y −1/3 /3 is not contin-
uous on the x-axis. Notice that the slope lines on the x-axis are horizontal.
Figure 2. Slope field for the function (x, y) 7→ y 2/3 . (In this
case the slope field does not depend on x.)
If the function f has even worse behaviour, there may be no solution at all.
In the two simple examples we just gave, we could write out the solutions in
terms of standard functions. But in practice, this is almost never the case. The
solutions of differential equations almost invariably cannot be expressed in terms
of standard functions. In fact, one of the most useful ways to introduce new and
useful functions is to define them as the solutions of certain differential equations.
But in order to do this, we first need to know that the differential equations have
unique solutions if we specify certain “initial conditions”. This is the main result
in this chapter.
The point to this chapter is to prove the Fundamental Existence and Uniqueness
Theorem for differential equations of the form (40). Such an equation is called first-
order, since only the first order derivative of y occurs in the differential equation.
Differential equations of the form (40) are essentially the most general first order
differential equation.
The following remark justifies that we are about to prove a major result in
mathematics!!.
7.2. OUTLINE OF PROOF OF THE EXISTENCE AND UNIQUENESS THEOREM 93
Remark 7.1.1. F A system of first order differential equations for the depen-
dent variables y1 , . . . , yn is a set of differential equations of the form
dy1
= f1 (x, y1 , . . . , yn )
dx
dy2
= f2 (x, y1 , . . . , yn )
dx
..
.
dyn
= fn (x, y1 , . . . , yn )
dx
which are meant to be satisfied simultaneously by functions y1 = y1 (x), y2 =
y2 (x), . . . , yn = yn (x). Here the functions f1 , f2 , . . . , fn are given. If n = 2 you
can visualise this as in the one dimensional case, by considering three axes la-
beled x, y1 , y2 . The solution to a differential equation in this case will be repre-
sented by the graph (curve) over the x axis which for each point x gives the point
(x, y1 (x), y2 (x)).
A very similar proof as for a single differential equation, gives the analogous
Fundamental Existence and Uniqueness Theorem for a system of first-order differ-
ential equations.
A differential equation which involves higher derivatives can be reduced to a
system of differential equations of first order (essentially by introducing new vari-
ables for each of the higher order derivatives). Thus the Existence and Uniqueness
Theorem, suitably modified, applies also to higher order differential equations. In
fact it even applies to systems of higher order differential equations in a similar
manner!.
can show that only a finite number of them are needed to “reach” the boundary of R.
94 7. FDIFFERENTIAL EQUATIONS
Thus the main point is to first show that in some (small) rectangle Rδ , whose
base is of length 2δ and which is centred at the point (x0 , y0 ), there is a solution
which extends to both sides of this small rectangle.
The proof proceeds in 7 steps.
Step A Problem (41) is equivalent to showing the “integral equation”
Z x
(42) y(x) = y0 + f (t, y(t)) dt
x0
has a solution. One sees this by integrating both sides of (41) from x0 to x. Con-
versely, differentiating the integral equation (42) gives back the differential equation,
and clearly y(x0 ) = x0 also follows from the integral equation.
7.2. OUTLINE OF PROOF OF THE EXISTENCE AND UNIQUENESS THEOREM 95
Step B To find the solution of (42) we begin with the constant function
y(x) = y0
and plug it into the right side of (42) to get a new function of x. We plug this
again into the right side to get yet another function of x. And so on.
For example, with (43), substituting the constant function y = 0 in the right
side of (44), and then repeating by plugging in the new function obtained after
each step, we get
Z x
cos t dt −→ sin x
0
Z x
sin t + cos t dt −→ − cos x + 1 + sin x
0
Z x
− cos t + 1 + sin t + cos t dt −→ x − cos x + 1
0
Z x 1 2
t − cos t + 1 + cos t dt −→ x +x
0 2
Z x
1 2 1 3 1 2
t + t + cos t dt −→ x + x + sin x
0 2 6 2
Z x
1 3 1 2 1 4 1 3
t + t + sin t + cos t dt −→ x + x − cos x + 1 + sin x
0 6 2 24 6
We call this sequence of functions a “sequence of approximate solutions”. We see
from the diagram that this sequence is converging, at least near (0, 0).
Step E The next step is to show that the limit function y = y(x) satisfies the
integral equation. The fact it is continuous implies we can integrate the right
side of (42). And the fact that the approximate solutions converge to the function
y = y(x) geometrically fast enables us to prove that we can take the limit as n → ∞
on both sides of (45) and deduce (42)
Step F The next step is to show that any two solutions are equal. We show that
if d is the distance between two solutions of (42) then d ≤ rd for some r < 1, by an
argument like the one in Step C. This implies that d = 0 and so the two solutions
agree.
Step G The final step is to extend the solution from the small rectangle in step
C up to the boundary of R. We do this by essentially starting the process again at
a new point on the graph near the boundary of the small rectangle, getting a new
rectangle centred at the new point, and extending the solution out into the new
rectangle. We can show this process stops after a finite number of steps, when the
solution reaches the boundary of R.
Then there exists a number δ > 0 and a unique function φ(x), defined and
having a continuous derivative on the interval (x0 − δ, x0 + δ), such that
(46) φ0 (x) = f (x, φ(x))
(47) φ(x0 ) = y0 .
In other words, φ(x) solves (i.e. satisfies) the initial value problem
dy
= f (x, y)
dx
y(x0 ) = y0
on the interval (x0 − δ, x0 + δ).
Remark : Let
∂ o
M = max{ |f (x, y)| : (x, y) ∈ R }, K = max f (x, y) : (x, y) ∈ R .
∂y
We will see in the proof that if we define Rδ (x0 , y0 ) to be the (open) rectangle
consisting of all those (x, y) such that x0 −δ < x < x0 +δ and y0 −M δ < y < y0 +M δ,
i.e.
n o
(48) Rδ (x0 , y0 ) = (x, y) : x ∈ (x0 − δ, x0 + δ), y ∈ (y0 − M δ, y0 + M δ) ,
then any δ > 0 for which Rδ (x0 , y0 ) ⊂ R and δ < K −1 , will work for the above
theorem.
Proof.
Step A We first claim that if φ(x) is a continuous function defined on some
interval (x0 −δ, x0 +δ), and (x, φ(x)) ∈ R for all x, then the following two statements
are equivalent:
(1) φ(x) has a continuous derivative on the interval (x0 − δ, x0 + δ) and solves
the given initial value problem there, i.e. (46) and (47) are true;
(2) φ(x) satisfies the integral equation
Z x
(49) φ(x) = y0 + f (t, φ(t)) dt.
x0
Assume the first statement is true. Then both φ0 (t), and f (t, φ(t)) by Sec-
tion 3.5, are continuous on (x0 − δ, x0 + δ) (it is convenient to use t here instead
98 7. FDIFFERENTIAL EQUATIONS
of x for the dummy variable). Thus for any x in the interval (x0 − δ, x0 + δ) the
following integrals exist, and from (46) they are equal:
Z x Z x
0
φ (t) dt = f (t, φ(t)) dt.
x0 x0
Next assume the second statement is true. Note that since φ(t) is continuous,
so is f (t, φ(t)) by Section 3.5, and so the integral does exist. Setting x = x0 we
immediately get (47)
Since the right side of (49) is differentiable and the derivative equals f (x, φ(x))
(by the Fundamental Theorem of Calculus), the left side must also be differentiable
and have the same derivative. That is, (46) is true for any x in the interval (x0 −
δ, x0 + δ). Moreover, we see that the derivative φ0 (x) is continuous since f (x, φ(x))
is continuous.
Thus the first statement is true.
The functions in the above sequence will be defined for all x in some interval
(x0 −δ, x0 +δ), where the δ has yet to be chosen. We will first impose the restriction
on δ that
(50) Rδ (x0 , y0 ) ⊂ R,
Next, for x ∈ (x0 − δ, x0 + δ), we show that (x, φ1 (x)) ∈ Rδ (x0 , y0 ) and hence
∈ R. This follows from the fact that
Z x
|φ1 (x) − y0 | = f (t, φ0 (t)) dt
x0
Z x
≤ |f (t, φ0 (t))| dt from (38)
x
Z x0
≤ M dt since |f | ≤ M in R
x0
≤ Mδ since |x − x0 | ≤ δ.
It follows as before that the definition of φ2 (x) makes sense for x ∈ (x0 −
δ, x0 + δ). (We also need the fact that f (t, φ1 (t)) is continuous. This follows from
the fact φ1 (t) is in fact differentiable by the Fundamental Theorem of Calculus, and
hence continuous; and the fact that f (t, φ1 (t)) is thus a composition of continuous
functions and hence continuous.)
Etc. etc. (or proof by induction, to be rigorous; but it is clear that it will work).
In this way we have a sequence of continuous functions φn (x) defined on the
interval (x0 − δ, x0 + δ), and for x in this interval we have (x, φn (x)) ∈ Rδ (x0 , y0 ).
Step C The next step is to prove there exists a function φ(x) defined on the
interval (x0 − δ, x0 + δ) such that
φn (x) → φ(x)
2We should be a little more careful here. Since the points (x, φn (x)) and (x, φn+1 (x))
both lie in Rδ (x0 , y0 ), it follows that |φn (x) − φn+1 (x)| < 2M δ. But the maximum may
be “achieved” only when x = x0 ± δ, which is not actually a point in the (open) interval
(x0 − δ, x0 + δ). To make the argument rigorous, we should replace “max” by “lub” in
Step C.
100 7. FDIFFERENTIAL EQUATIONS
Then for n ≥ 1
dn = max |φn (x) − φn+1 (x)|
x∈(x0 −δ,x0 +δ)
Z x
= max f (t, φn−1 (t)) − f (t, φn (t)) dt
x∈(x0 −δ,x0 +δ) x0
Z x
≤ max f (t, φn−1 (t)) − f (t, φn (t)) dt
x∈(x0 −δ,x0 +δ) x0
Z x
≤ max K φn−1 (t) − φn (t) dt by Section 5.6
x∈(x0 −δ,x0 +δ) x0
Z x
≤ max Kdn−1 dt from the definition of dn−1
x∈(x0 −δ,x0 +δ) x0
= Kδ dn−1
Repeating this argument we obtain
dn ≤ Kδ dn−1 ≤ (Kδ)2 dn−2 ≤ (Kδ)3 dn−3 ≤ · · · ≤ (Kδ)n d0 .
We now make the further restriction on δ that
(51) Kδ < 1.
Since
|φn (x) − φn+1 (x)| ≤ dn ≤ d0 (Kδ)n ,
it follows from Theorem 2.7.4 that the sequence φn (x) converges for each x ∈
(x0 − δ, x0 + δ). We define the function φ(x) on (x0 − δ, x0 + δ) by
φ(x) = lim φn (x).
n→∞
Moreover, by the commt following that theorem,
(52) |φn (x) − φ(x)| ≤ Arn ,
where A = d0 /(1 − Kδ) and r = Kδ < 1. (Note that this is saying that the graph
of φn lies within distance Arn of the graph of φ, see Figure 9.)
Step D (See Figure 9.) We next claim that φ(x) is continuous on the interval
(x0 − δ, x0 + δ).
To see this let a be any point in the interval (x0 − δ, x0 + δ); we will prove that
φ is continuous at a.
Let ε > 0 be an arbitrary positive number.
First choose n so that Arn < ε/3 and hence from (52)
(53) x ∈ (x0 − δ, x0 + δ) implies |φn (x) − φ(x)| ≤ ε/3.
Step E We defined
Z x
(55) φn+1 (x) = y0 + f (t, φn (t)) dt.
x0
Step F Next, we must show that any two solutions of (46) and (47), or equiva-
lently of (49), are equal.
Suppose that φ(x) and ψ(x) are any two solutions. Then if
d = max |φ(x) − ψ(x)|,
where the maximum is taken over the interval (x0 − δ, x0 + δ).4 Then for any
x ∈ (x0 − δ, x0 + δ),
Z x
|φ(x) − ψ(x)| = f (t, φ(t)) − f (t, ψ(t)) dt
x0
Z x
≤ f (t, φ(t)) − f (t, ψ(t)) dt
x
Z x0
≤ K|φ(t)) − ψ(t)| dt from Section 5.6
x0
≤ Kδd
Since this is true for any x ∈ (x0 − δ, x0 + δ), it follows that
d ≤ Kδd.
Since Kδ < 1, this implies d = 0!!
Hence φ(x) = ψ(x) for all x ∈ (x0 − δ, x0 + δ).
Step G So far we have a solution, and it is unique, whose graph lies in a rectangle
Rδ . The dimensions of Rδ depend only on K an M , and otherwise not on the initial
point, except that we also require Rδ ⊂ R. By starting the process again at a new
point close to the boundary of Rδ we can extend the solution outside Rδ , and after
a finite number of steps extend the solution up to the boundary of R.
4As in Step C we should really write “lub” instead of “max”. The proof is essentially
unchanged.