Calc1 forInfAndStatStudents

Download as pdf or txt
Download as pdf or txt
You are on page 1of 207

Calculus I

for Computer Science and Statistics Students


Peter Philip

Lecture Notes
Originally Created for the Class of Winter Semester 2010/2011 at LMU Munich,
Revised and Extended for Several Subsequent Classes

April 14, 2016

Contents
1 Foundations: Mathematical Logic and Set Theory 5
1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Functions and Relations 22


2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Natural Numbers, Induction, and the Size of Sets 33


3.1 Induction and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Cardinality: The Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Real Numbers 46
4.1 The Real Numbers as a Complete Totally Ordered Field . . . . . . . . . 46

E-Mail: [email protected]

1
CONTENTS 2

4.2 Important Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Complex Numbers 52
5.1 Definition and Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Sign and Absolute Value (Modulus) . . . . . . . . . . . . . . . . . . . . . 55
5.3 Sums and Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Binomial Coefficients and Binomial Theorem . . . . . . . . . . . . . . . . 58

6 Polynomials 62
6.1 Arithmetic of K-Valued Functions . . . . . . . . . . . . . . . . . . . . . . 62
6.2 1-Dimensional Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 n-Dimensional Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7 Limits and Convergence of Real and Complex Numbers 66


7.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Definitions and First Examples . . . . . . . . . . . . . . . . . . . 76
7.2.2 Continuity, Sequences, and Function Arithmetic . . . . . . . . . . 78
7.2.3 Bounded, Closed, and Compact Sets . . . . . . . . . . . . . . . . 80
7.2.4 Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . 83
7.2.5 Inverse Functions, Existence of Roots, Exponential Function, Log-
arithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3.1 Definition and Convergence . . . . . . . . . . . . . . . . . . . . . 94
7.3.2 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3.3 Absolute Convergence and Rearrangements . . . . . . . . . . . . . 100
7.3.4 b-Adic Representations of Real Numbers . . . . . . . . . . . . . . 103

8 Convergence of K-Valued Functions 104


8.1 Pointwise and Uniform Convergence . . . . . . . . . . . . . . . . . . . . . 104
8.2 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.3 Exponential Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.5 Polar Form of Complex Numbers, Fundamental Theorem of Algebra . . . 122
CONTENTS 3

9 Differential Calculus 126


9.1 Definition of Differentiability and Rules . . . . . . . . . . . . . . . . . . . 126
9.2 Higher Order Derivatives and the Sets C k . . . . . . . . . . . . . . . . . 132
9.3 Mean Value Theorem, Monotonicity, and Extrema . . . . . . . . . . . . . 133
9.4 LHopitals Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

10 The Riemann Integral on Intervals in R 138


10.1 Definition and Simple Properties . . . . . . . . . . . . . . . . . . . . . . 138
10.2 Important Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
10.2.1 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . 148
10.2.2 Integration by Parts Formula . . . . . . . . . . . . . . . . . . . . 150
10.2.3 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.3 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

A Logic and Set Theory 159


A.1 Principle of Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.2 Russells Antinomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.3 Power Sets and Characteristic Functions . . . . . . . . . . . . . . . . . . 160
A.4 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.5 Rules Concerning Functions and Set-Theoretic Operations . . . . . . . . 161
A.6 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

B Construction of the Real Numbers 169


B.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.2 Interlude: Orders on Groups . . . . . . . . . . . . . . . . . . . . . . . . . 172
B.3 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.4 Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.5 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

C Series: Additional Material 185


C.1 Riemann Rearrangement Theorem . . . . . . . . . . . . . . . . . . . . . . 185
C.2 Absolute Convergence and Rearrangements . . . . . . . . . . . . . . . . . 188
C.3 b-Adic Representations of Real Numbers . . . . . . . . . . . . . . . . . . 190
CONTENTS 4

D Trigonometric Functions 194


D.1 Additional Trigonometric Formulas . . . . . . . . . . . . . . . . . . . . . 194

E Cardinality of R and Some Related Sets 195

F Irrationality of e and 199


F.1 Irrationality of e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
F.2 Irrationality of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

G Riemann Integral for C-Valued Functions 202


G.1 Riemann Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
G.2 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . 205
G.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
G.4 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

References 206
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 5

1 Foundations: Mathematical Logic and Set Theory

1.1 Introductory Remarks


The task of mathematics is to establish the truth or falsehood of (formalizable) state-
ments using rigorous logic, and to provide methods for the solution of classes of (e.g.
applied) problems, ideally including rigorous logical proofs verifying the validity of the
methods (proofs that the method under consideration will, indeed, provide a correct
solution).
The topic of this class is calculus, which is short for infinitesimal calculus, usually un-
derstood (as it is here) to mean differential and integral calculus of real and complex
numbers (more generally, calculus may refer to any method or system of calculation
guided by the symbolic manipulation of expressions, we will briefly touch on another
example in Sec. 1.2 below). In that sense, calculus is the beginning part of the broader
field of (mathematical) analysis, the section of mathematics concerned with the notion
of a limit (for us, the most important examples will be limits of sequences (Def. 7.1
below) and limits of functions (Def. 8.17 below)).
Before we can properly define our first limit, however, it still needs some preparatory
work. In modern mathematics, the objects under investigation are almost always so-
called sets. So one aims at deriving (i.e. proving) true (and interesting and useful)
statements about sets from other statements about sets known or assumed to be true.
Such a derivation or proof means applying logical rules that guarantee the truth of the
derived (i.e. proved) statement.
However, unfortunately, a proper definition of the notion of set is not easy, and is
actually beyond the scope of this class. Interested students might want to consider
taking a separate class on set theory at a later time. And the same is also true regarding
an appropriate treatment of logic and proof theory. Here, we will only be able to very
briefly touch on the bare necessities from logic and set theory needed to proceed to the
core matter of this class. We begin with logic in Sec. 1.2, followed by set theory in Sec.
1.3, combining both in Sec. 1.4.

1.2 Propositional Calculus


1.2.1 Statements

Mathematical logic is a large field in its own right. As indicated before, a rigorous
introduction is beyond the scope of this class the interested reader may refer to [EFT07]
and references therein. Here, we will just introduce some basic concepts using common
English (rather than formal symbolic languages a concept explained in books like
[EFT07]).
As mentioned before, mathematics establishes the truth or falsehood of statements. By
a statement or proposition we mean any sentence (any sequence of symbols) that can
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 6

reasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false,
abbreviated F. The following example illustrates the difference between statements and
sentences that are not statements:
Example 1.1. (a) Sentences that are statements:
Every dog is an animal. (T)
Every animal is a dog. (F)
The number 4 is odd. (F)
2 + 3 = 5. (T)

2 < 0. (F)
x + 1 > 0 holds for each natural number x. (T)

(b) Sentences that are not statements:


Lets study calculus!
Who are you?
3 5 + 7.
x + 1 > 0.
All natural numbers are green.

The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be either
true or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) is
not a statement as it lacks any meaning and can, hence, not be either true or false. It
would become a statement if given a definition of what it means for a natural number
to be green.

1.2.2 Logical Operators

The next step now is to combine statements into new statements using logical operators,
where the truth value of the combined statements depends on the truth values of the
original statements and on the type of logical operator facilitating the combination.
The simplest logical operator is negation, denoted . It is actually a so-called unary
operator, i.e. it does not combine statements, but is merely applied to one statement.
For example, if A stands for the statement Every dog is an animal., then A stands
for the statement Not every dog is an animal.; and if B stands for the statement The
number 4 is odd., then B stands for the statement The number 4 is not odd., which
can also be expressed as The number 4 is even.
To completely understand the action of a logical operator, one usually writes what is
known as a truth table. For negation, the truth table is
A A
T F (1.1)
F T
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 7

that means if the input statement A is true, then the output statement A is false; if
the input statement A is false, then the output statement A is true.
We now proceed to discuss binary logical operators, i.e. logical operators combining
precisely two statements. The following four operators are essential for mathematical
reasoning:
Conjunction: A and B, usually denoted A B.
Disjunction: A or B, usually denoted A B.
Implication: A implies B, usually denoted A B.
Equivalence: A is equivalent to B, usually denoted A B.
Here is the corresponding truth table:

A B AB AB AB AB
T T T T T T
T F F T F F (1.2)
F T F T T F
F F F F T T

When first seen, some of the assignments of truth values in (1.2) might not be completely
intuitive, due to the fact that logical operators are often used somewhat differently in
common English. Let us consider each of the four logical operators of (1.2) in sequence:
For the use in subsequent examples, let A1 , . . . , A6 denote the six statements from Ex.
1.1(a).
Conjunction: Most likely the easiest of the four, basically identical to common language
use: A B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a),
A1 A4 is the statement Every dog is an animal and 2 + 3 = 5., which is true since
both A1 and A4 are true. On the other hand, A1 A3 is the statement Every dog is
an animal and the number 4 is odd., which is false, since A3 is false.
Disjunction: The disjunction A B is true if, and only if, at least one of the statements
A, B is true. Here one already has to be a bit careful A B defines the inclusive or,
whereas or in common English is often understood to mean the exclusive or (which is
false if both input statements are true). For example, using Ex. 1.1(a), A1 A4 is the
statement Every dog is an animal or 2 + 3 = 5., which is true since both A1 and A4
are true. The statement A1 A3 , i.e. Every dog is an animal or the number 4 is odd.
is also true,
since A1 is true. However, the statement A2 A5 , i.e. Every animal is a
dog or 2 < 0. is false, as both A2 and A5 are false.
As you will have noted in the above examples, logical operators can be applied to
combine statements that have no obvious contents relation. While this might seem
strange, introducing contents-related restrictions is unnecessary as well as undesirable,
since it is often not clear which seemingly unrelated statements might suddenly appear
in a common context in the future. The same occurs when considering implications and
equivalences, where it might seem even more obscure at first.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 8

Implication: Instead of A implies B, one also says if A then B, B is a consequence


of A, B is concluded or inferred from A, A is sufficient for B, or B is necessary for
A. The implication A B is always true, except if A is true and B is false. At first
glance, it might be surprising that A B is defined to be true for A false and B true,
however, there are many examples of incorrect statements implying correct statements.
For instance, squaring the (false) equality of integers 1 = 1, implies the (true) equality
of integers 1 = 1. However, as with conjunction and disjunction, it is perfectly valid
to combine statements without any obvious context relation: For example, using Ex.
1.1(a), the statement A1 A6 , i.e. Every dog is an animal implies x + 1 > 0 holds for
each natural number x. is true, since A6 is true, whereas the statement A4 A2 , i.e.
2 + 3 = 5 implies every animal is a dog. is false, as A4 is true and A2 is false.
Of course, the implication A B is not really useful in situations, where the truth
values of both A and B are already known. Rather, in a typical application, one tries
to establish the truth of A to prove the truth of B (a strategy that will fail if A happens
to be false).
Example 1.2. Suppose we know Sasha to be a member of a group of children. Then
the statement A Sasha is a girl. implies the statement B There is at least one girl
in the group. A priori, we might not know if Sasha is a girl or a boy, but if we can
establish Sasha to be a girl, then we also know B to be true. If we find Sasha to be a
boy, then we do not know, whether B is true or false.

Equivalence: A B means A is true if, and only if, B is true. Once again, using
input statements from Ex. 1.1(a), we see that A1 A4 , i.e. Every dog is an animal
is equivalent to 2 + 3 = 5., is true as well as A2 A3 , i.e. Every animal is a dog is
equivalent to the number 4 is odd.. On the other hand, A4 A5 , i.e. 2 + 3 = 5 is
equivalent to 2 < 0, is false.
Analogous to the situation of implications, A B is not really useful if the truth values
of both A and B are known a priori, but can be a powerful tool to prove B to be true
or false by establishing the truth value of A. It is obviously more powerful than the
implication as illustrated by the following example (compare with Ex. 1.2):
Example 1.3. Suppose we know Sasha is the tallest member of a group of children.
Then the statement A Sasha is a girl. is equivalent to the statement B The tallest
kid in the group is a girl. As in Ex. 1.2, if we can establish Sasha to be a girl, then we
also know B to be true. However, in contrast to Ex. 1.2, if we find Sasha to be a boy,
we know B to be false.
Remark 1.4. In computer science, the truth value T is often coded as 1 and the truth
value F is often coded as 0.

1.2.3 Rules

Note that the expressions in the first row of the truth table (1.2) (e.g. A B) are not
statements in the sense of Sec. 1.2.1, as they contain the statement variables (also known
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 9

as propositional variables) A or B. However, the expressions become statements if all


statement variables are substituted with actual statements. We will call expressions of
this form propositional formulas. Moreover, if a truth value is assigned to each statement
variable of a propositional formula, then this uniquely determines the truth value of the
formula. In other words, the truth value of the propositional formula can be calculated
from the respective truth values of its statement variables a first justification for the
name propositional calculus.

Example 1.5. (a) Consider the propositional formula (A B) (B). Suppose A is


true and B is false. The truth value of the formula is obtained according to the
following truth table:

A B A B B (A B) (B)
(1.3)
T F F T T

(b) The propositional formula A (A), also known as the law of the excluded middle,
has the remarkable property that its truth value is T for every possible choice of
truth values for A:
A A A (A)
T F T (1.4)
F T T
Formulas with this property are of particular importance.

Definition 1.6. A propositional formula is called a tautology or universally true if,


and only if, its truth value is T for all possible assignments of truth values to all the
statement variables it contains.

Notation 1.7. We write (A1 , . . . , An ) if, and only if, the propositional formula
contains precisely the n statement variables A1 , . . . , An .

Definition 1.8. The propositional formulas (A1 , . . . , An ) and (A1 , . . . , An ) are called
equivalent if, and only if, (A1 , . . . , An ) (A1 , . . . , An ) is a tautology.

Lemma 1.9. The propositional formulas (A1 , . . . , An ) and (A1 , . . . , An ) are equiva-
lent if, and only if, they have the same truth value for all possible assignments of truth
values to A1 , . . . , An .

Proof. If (A1 , . . . , An ) and (A1 , . . . , An ) are equivalent and Ai is assigned the truth
value ti , i = 1, . . . , n, then (A1 , . . . , An ) (A1 , . . . , An ) being a tautology implies it
has truth value T. From (1.2) we see that either (A1 , . . . , An ) and (A1 , . . . , An ) both
have truth value T or they both have truth value F.
If, on the other hand, we know (A1 , . . . , An ) and (A1 , . . . , An ) have the same truth
value for all possible assignments of truth values to A1 , . . . , An , then, given such an
assignment, either (A1 , . . . , An ) and (A1 , . . . , An ) both have truth value T or both
have truth value F, i.e. (A1 , . . . , An ) (A1 , . . . , An ) has truth value T in each case,
showing it is a tautology. 
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 10

For all logical purposes, two equivalent formulas are exactly the same it does not
matter if one uses one or the other. The following theorem provides some important
equivalences of propositional formulas. As too many parentheses tend to make formulas
less readable, we first introduce some precedence conventions for logical operators:
Convention 1.10. takes precedence over , , which take precedence over , .
So, for example,
(A B B A) C (A D)
is the same as
    
A (B) (B) (A) (C) A (D) .

Theorem 1.11. (a) (A B) A B. This means one can actually define impli-
cation via negation and disjunction.

(b) (A B) (A B) (B A) , i.e. A and B are equivalent if, and only if, A
is both necessary and sufficient for B. One also calls the implication B A the
converse of the implication A B. Thus, A and B are equivalent if, and only if,
both A B and its converse hold true.

(c) Commutativity of Conjunction: A B B A.

(d) Commutativity of Disjunction: A B B A.

(e) Associativity of Conjunction: (A B) C A (B C).

(f ) Associativity of Disjunction: (A B) C A (B C).

(g) Distributivity I: A (B C) (A B) (A C).

(h) Distributivity II: A (B C) (A B) (A C).

(i) De Morgans Law I: (A B) A B.

(j) De Morgans Law II: (A B) A B.

(k) Double Negative: A A.

(l) Contraposition: (A B) (B A).

Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9.
(a):
A B A A B A B
T T F T T
T F F F F
F T T T T
F F T T T

(b) (h): Exercise.


1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 11

(i):
A B A B A B (A B) A B
T T F F T F F
T F F T F T T
F T T F F T T
F F T T F T T

(j): Exercise.
(k):
A A A
T F T
F T F

(l):
A B A B A B B A
T T F F T T
T F F T F F
F T T F T T
F F T T T T
Having checked all the rules completes the proof of the theorem. 

The importance of the rules provided by Th. 1.11 lies in their providing proof techniques,
i.e. methods for establishing the truth of statements from statements known or assumed
to be true. Instead of discussing these techniques right now, we will rather discuss each
new technique of proof whenever we first encounter it subsequently in an application.
At that time, the connection with the corresponding rule of Th. 1.11 will be pointed
out.
In subsequent proofs, we will also frequently use so-called transitivity of implication as
well as transitivity of equivalence (we will encounter equivalence again in the context
of relations in Sec. 1.3 below). In preparation for the transitivity rules, we need to
generalize implication to propositional formulas.

Definition 1.12. In generalization of the implication operator defined in (1.2), we say


the propositional formula (A1 , . . . , An ) implies the propositional formula (A1 , . . . , An )
(denoted (A1 , . . . , An ) (A1 , . . . , An )) if, and only if, each assignment of truth values
to the A1 , . . . , An that makes (A1 , . . . , An ) true, makes (A1 , . . . , An ) true as well.

Theorem 1.13. (a) Transitivity of Implication: (A B) (B C) (A C).

(b) Transitivity of Equivalence: (A B) (B C) (A C).

Proof. According to Def. 1.12, the rules can be verified by providing truth tables that
show that, for all possible assignments of truth values to the propositional formulas on
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 12

the left-hand side of the implications, either the left-hand side is false or both sides are
true. (a):

A B C A B B C (A B) (B C) A C
T T T T T T T
T F T F T F T
F T T T T T T
F F T T T T T
T T F T F F F
T F F F T F F
F T F T F F T
F F F T T T T

(b):
A B C A B B C (A B) (B C) A C
T T T T T T T
T F T F F F T
F T T F T F F
F F T T F F F
T T F T F F F
T F F F T F F
F T F F F F T
F F F T T T T
Having checked both rules, the proof is complete. 

Definition and Remark 1.14. A proof of the statement B is a finite sequence of


statements A1 , A2 , . . . , An such that A1 is true; for 1 i < n, Ai implies Ai+1 , and An
implies B. If there exists a proof for B, then Th. 1.13(a) guarantees that B is true.

1.3 Set Theory


In the previous section, we have had a first glance at statements and corresponding truth
values. In the present section, we will move our focus to the objects such statements
are about. Reviewing Example 1.1(a), and recalling that this is a mathematics class
rather than one in zoology, the first two statements of Example 1.1(a) are less relevant
for us than statements 36. As in these examples, we will nearly always be interested in
statements involving numbers or collections of numbers or collections of such collections
etc.
In modern mathematics, the term one usually uses instead of collection is set. In
1895, Georg Cantor defined a set as any collection into a whole M of definite and
separate objects m of our intuition or our thought. The objects m are called the
elements of the set M .

Notation 1.15. We write m M for the statement m is an element of the set M .


1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 13

Definition 1.16. The sets M and N are equal, denoted M = N , if, and only if, M and
N have precisely the same elements.

Definition 1.16 means we know everything about a set M if, and only if, we know all its
elements.
Definition 1.17. The set with no elements is called the empty set; it is denoted by the
symbol .
Example 1.18. For finite sets, we can simply write down all its elements,
for example,
A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2, 2, 2}, where the
symbolism := is to be read as is defined to be equal to.
Note C = D, since both sets contain precisely the same elements. In particular, the
order in which the elements are written down plays no role and a set does not change if
an element is written down more than once.
If a set has many elements, instead of writing down all its elements, one might use
abbreviations such as F := {4, 2, . . . , 20, 22, 24}, where one has to make sure the
meaning of the dots is clear from the context.
Definition 1.19. The set A is called a subset of the set B (denoted A B and also
referred to as the inclusion of A in B) if, and only if, every element of A is also an
element of B (one sometimes also calls B a superset of A and writes B A). Please
note that A = B is allowed in the above definition of a subset. If A B and A 6= B,
then A is called a strict subset of B, denoted A ( B.
If B is a set and P (x) is a statement about an element x of B (i.e., for each x B,
P (x) is either true or false), then we can define a subset A of B by writing
A := {x B : P (x)}. (1.6)
This notation is supposed to mean that the set A consists precisely of those elements of
B such that P (x) is true (has the truth value T in the language of Sec. 1.2).
Example 1.20. (a) For each set A, one has A A and A.
(b) If A B, then A = {x B : x A}.
(c) We have {3} {6.7, 3, 0}. Letting A := {10, 8, . . . , 8, 10}, we have {2, 0, 2} =
{x A : x3 A}, = {x A : x + 21 A}.
Remark 1.21. As a consequence of Def. 1.16, the sets A and B are equal if, and only
if, one has both inclusions, namely A B and B A. Thus, when proving the equality
of sets, one often divides the proof into two parts, first proving one inclusion, then the
other.
Definition 1.22. (a) The intersection of the sets A and B, denoted A B, consists of
all elements that are in A and in B. The sets A, B are said to be disjoint if, and
only if, A B = .
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 14

(b) The union of the sets A and B, denoted A B, consists of all elements that are in
A or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). If
A and B are disjoint, one sometimes writes A B and speaks of the disjoint union
of A and B.

(c) The difference of the sets A and B, denoted A\B (read A minus B or A without
B), consists of all elements of A that are not elements of B, i.e. A \ B := {x
A: x / B}. If B is a subset of a given set A (sometimes called the universe in
this context), then A \ B is also called the complement of B with respect to A.
In that case, one also writes B c := A \ B (note that this notation suppresses the
dependence on A).

Example 1.23. (a) Examples of Intersections:

{1, 2, 3} {3, 4, 5} = {3}, (1.7a)



{ 2} {1, 2, . . . , 10} = , (1.7b)
{1, 2, 3, 4, 5} {10, 9, . . . , 1} {1, 7, 3} = {1, 3}. (1.7c)

(b) Examples of Unions:

{1, 2, 3} {3, 4, 5} = {1, 2, 3, 4, 5}, (1.8a)



{1, 2, 3}{4, 5} = {1, 2, 3, 4, 5}, (1.8b)
{1, 2, 3, 4, 5} {99, 98, . . . , 1} {1, 7, 3}
= {99, 98, . . . , 2, 1, 2, 4, 5, 7}. (1.8c)

(c) Examples of Differences:

{1, 2, 3} \ {3, 4, 5} = {1, 2}, (1.9a)



{1, 2, 3} \ {3, 2, 1, 5} = , (1.9b)
{10, 9, . . . , 9, 10} \ {0} = {10, 9, . . . , 1} {1, 2, . . . , 9, 10}. (1.9c)

With respect to the universe {1, 2, 3, 4, 5}, it is

{1, 2, 3}c = {4, 5}; (1.9d)

with respect to the universe {0, 1, . . . , 20}, it is

{1, 2, 3}c = {0} {4, 5, . . . , 20}. (1.9e)

As mentioned
 earlier, it will
often be unavoidable
to consider sets of sets. Here are first
examples: , {0}, {0, 1} , {0, 1}, {1, 2} .

Definition 1.24. Given a set A, the set of all subsets of A is called the power set of A,
denoted P(A) (for reasons explained in Appendix A.3, the power set is sometimes also
denoted as 2A ).
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 15

Example 1.25. Examples of Power Sets:

P() = {}, (1.10a)



P({0}) = , {0} , (1.10b)
   
P P({0}) = P , {0} = , {}, {{0}}, P({0}) . (1.10c)

So far, we have restricted our set-theoretic examples to finite sets. However, not sur-
prisingly, many sets of interest to us will be infinite (we will have to postpone a math-
ematically precise definition of finite and infinite to Sec. 2). We will now introduce the
most simple infinite set.

Definition 1.26. The set N := {1, 2, 3, . . . } is called the set of natural numbers. More-
over, we define N0 := {0} N.

Remark 1.27. Mathematicians tend to desire as few fundamental objects as possible.


One of the consequences is the idea to actually define numbers as special sets: 0 := ,
1 := {0}, 2 := {0, 1}; in general, define the natural number n := {0, 1, . . . , n 1} =
(n 1) {n 1}.

The following theorem compiles important set-theoretic rules:

Theorem 1.28. Let A, B, C, U be sets.

(a) Commutativity of Intersections: A B = B A.

(b) Commutativity of Unions: A B = B A.

(c) Associativity of Intersections: (A B) C = A (B C).

(d) Associativity of Unions: (A B) C = A (B C).

(e) Distributivity I: A (B C) = (A B) (A C).

(f ) Distributivity II: A (B C) = (A B) (A C).

(g) De Morgans Law I: U \ (A B) = (U \ A) (U \ B).

(h) De Morgans Law II: U \ (A B) = (U \ A) (U \ B).

(i) Double Complement: If A U , then U \ (U \ A) = A.

Proof. In each case, the proof results from the corresponding rule of Th. 1.11:
(a):
Th. 1.11(c)
xAB xAxB x B x A x B A.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 16

(g): Under the general assumption of x U , we have the following equivalences:


 Th. 1.11(i)
x U \ (A B) (x A B) x A x B (x A) (x B)
x U \ A x U \ B x (U \ A) (U \ B).

The proofs of the remaining rules are left as an exercise. 

Remark 1.29. The correspondence between Th. 1.11 and Th. 1.28 is no coincidence.
One can actually prove that, starting with an equivalence of propositional formulas
(A1 , . . . , An ) (A1 , . . . , An ), where both formulas contain only the operators , , ,
one obtains a set-theoretic rule (stating an equality of sets) by reinterpreting all state-
ment variables A1 , . . . , An as variables for sets, all subsets of a universe U , and replacing
by , by , and by U \ (if there are no multiple negations, then we do not need
the hypothesis that A1 , . . . , An are subsets of U ). The procedure also works in the op-
posite direction one can start with a set-theoretic formula for an equality of sets and
translate it into two equivalent propositional formulas.

Set theory using Cantors definition given at the beginning of this section is known
as naive set theory. Unfortunately, it is not free of contradictions. The most famous
one is known as Russells antinomy and is described in Appendix A.2. To avoid such
contradictions, in modern mathematics, one restricts the construction of sets according
to certain rules or axioms. The result is so-called axiomatic set theory, described, e.g.,
in [Kun80].

1.4 Predicate Calculus


Now that we have introduced sets in the previous section, we have to return to the
subject of mathematical logic once more. As it turns out, propositional calculus, which
we discussed in Sec. 1.2, does not quite suffice to develop the theory of calculus (nor
most other mathematical theories). The reason is that we need to consider statements
such as

x + 1 > 0 holds for each natural number x. (T) (1.11a)


All real numbers are positive. (F) (1.11b)
There exists a natural number bigger than 10. (T) (1.11c)
There exists a real number x such that x2 = 1. (F) (1.11d)
For all natural numbers n, there exists a natural number bigger than n. (T) (1.11e)

That means we are interested in statements involving universal quantification via the
quantifier for all (one also often uses for each or for every instead), existential
quantification via the quantifier there exists, or both. The quantifier of universal
quantification is denoted by and the quantifier of existential quantification is denoted
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 17

by . Using these symbols as well as N and R to denote the sets of natural and real
numbers, respectively, we can restate (1.11) as

x + 1 > 0. (T) (1.12a)


xN
x > 0. (F) (1.12b)
xR
n > 10. (T) (1.12c)
nN
x2 = 1. (F) (1.12d)
xR
m > n. (T) (1.12e)
nN mN

Definition 1.30. A universal statement has the form

P (x), (1.13a)
xA

whereas an existential statement has the form

P (x). (1.13b)
xA

In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-called
predicate of x, that becomes a statement (i.e. becomes either true or false) if x is substi-
tuted with any concrete element of the set A (in particular, P (x) is allowed to contain
further quantifiers, but it must not contain any other quantifier involving x one says
x must be a free variable in P (x), not bound by any quantifier in P (x)).
The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truth
value T for all elements x A; the existential statement (1.13b) has the truth value T
if, and only if, P (x) has the truth value T for at least one element x A.
V W
Remark 1.31. Some people prefer to write instead of and instead of .
xA xA xA xA
Even though this notation has the advantage of emphasizing that the universal statement
can be interpreted as a big logical conjunction and the existential statement can be
interpreted as a big logical disjunction, it is significantly less common. So we will stick
to and in this class.
Remark 1.32. According to Def. 1.30, the existential statement (1.13b) is true if, and
only if, P (x) is true for at least one x A. So if there is precisely one such x, then
(1.13b) is true; and if there are several different x A such that P (x) is true, then
(1.13b) is still true. Uniqueness statements are often of particular importance, and one
sometimes writes
! P (x) (1.14)
xA

for the statement there exists a unique x A such that P (x) is true. This notation
can be defined as an abbreviation for
 

P (x) P (y) x = y . (1.15)
xA yA
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 18

Example 1.33. Here are some examples of uniqueness statements:

! n > 10. (F) (1.16a)


nN
! 12 > n > 10. (T) (1.16b)
nN
! 11 > n > 10. (F) (1.16c)
nN
! x2 = 1. (F) (1.16d)
xR
! x2 = 1. (F) (1.16e)
xR
! x2 = 0. (T) (1.16f)
xR

Remark 1.34. As for propositional calculus, we also have some important rules for
predicate calculus:

(a) Consider the negation of a universal statement, P (x), which is true if, and
xA
only if, P (x) does not hold for each x A, i.e. if, and only if, there exists at least
one x A such that P (x) is false (such that P (x) is true). We have just proved
the rule
P (x) P (x). (1.17a)
xA xA

Similarly, consider the negation of an existential statement. We claim the corre-


sponding rule is
P (x) P (x). (1.17b)
xA xA

Indeed, we can prove (1.17b) from (1.17a):


Th. 1.11(k) (1.17a) Th. 1.11(k)
P (x) P (x) P (x) P (x).
xA xA xA xA
(1.18)
One can interpret (1.17) as a generalization of the De Morgans laws Th. 1.11(i),(j).
One can actually generalize (1.17) even a bit more: If a statement starts with several
quantifiers, then one negates the statement by replacing each with and vice versa
plus negating the predicate after the quantifiers (see the example in (1.21e) below).

(b) If A, B are sets and P (x, y) denotes a predicate of both x and y, then P (x, y)
xA yB
and P (x, y) both hold true if, and only if, P (x, y) holds true for each x A
yB xA
and each y B, i.e. the order of two consecutive universal quantifiers does not
matter:
P (x, y) P (x, y) (1.19a)
xA yB yB xA

In the same way, we obtain the following rule:

P (x, y) P (x, y). (1.19b)


xA yB yB xA
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 19

If A = B, one also uses abbreviations of the form

P (x, y) for P (x, y), (1.20a)


x,yA xA yA

P (x, y) for P (x, y). (1.20b)


x,yA xA yA

Generalizing rules (1.19), we can always commute identical quantifiers. Caveat:


Quantifiers that are not identical must not be commuted (see Ex. 1.35(d) below).
Example 1.35. (a) Negation of universal and existential statements:
(x+1>0)
z }| {
Negation of (1.12a) : x + 1 0 . (F) (1.21a)
xN
(x>0)
z }| {
Negation of (1.12b) : x 0 . (T) (1.21b)
xR
(n>10)
z }| {
Negation of (1.12c) : n 10 . (F) (1.21c)
nN
(x2 =1)
z }| {
Negation of (1.12d) : x2 6= 1 . (T) (1.21d)
xR
(m>n)
z }| {
Negation of (1.12e) : m n . (F) (1.21e)
nN mN

(b) As a more complicated example, consider the negation of the uniqueness statement
(1.14), i.e. of (1.15):
 

! P (x) P (x) P (y) x = y
xA xA yA
 
(1.17b), Th. 1.11(a) 
P (x) P (y) x = y
xA yA
 
Th. 1.11(i) 
P (x) P (y) x = y
xA yA
 
(1.17a) 
P (x) P (y) x = y
xA yA
 
Th. 1.11(j),(k) 
P (x) P (y) x 6= y . (1.22)
xA yA

So how to decode the expression, we have obtained at the end? It states that
there are two possibilities: The first is that P (x) holds true for each x A. The
 is, indeed, at least one x A such that P (x) is true. But then
second is that there
P (y) x 6= y must also be true, that means there must be at least a second,
yA
different, element y A such that P (y) is true. These are, indeed, precisely the
two cases that can occur if ! P (x) is false.
xA
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 20

(c) Identical quantifiers commute:


x2n 0 x2n 0, (1.23a)
xR nN nN xR
ny > x2 ny > x2 . (1.23b)
xR yR nN xR nN yR

(d) The following example shows that different quantifiers do, in general, not commute
(i.e. do not yield equivalent statements when commuted):
While the statement
y>x (1.24a)
xR yR

is true (for each real number x, there is a bigger real number y, e.g. y := x + 1 will
do the job), the statement
y>x (1.24b)
yR xR

is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are not
equivalent.
Remark 1.36. One can make the following observations regarding the strategy for
proving universal and existential statements:

(a) To prove that P (x) is true, one must check the truth of P (x) for every element
xA
x A examples are not enough!
(b) To prove that P (x) is false, it suffices to find one x A such that P (x) is
xA
false such an x is then called a counterexample and one counterexample is always
enough to prove P (x) is false!
xA

(c) To prove that P (x) is true, it suffices to find one x A such that P (x) is true
xA
such an x is then called an example and one example is always enough to prove
P (x) is true!
xA

The subfield of mathematical logic dealing with quantified statements is called predicate
calculus. In general, one does not restrict the quantified variables to range only over
elements of sets (as we have done above). Again, we refer to [EFT07] for a deeper
treatment of the subject.
As an application of quantified statements, let us generalize the notion of union and
intersection:
Definition 1.37. Let I 6= be a nonempty set, usually called an index set in the present
context. For each i I, let Ai denote a set (some or all of the Ai can be identical).

(a) The intersection  


\
Ai := x : x Ai (1.25a)
iI
iI
consists of all elements x that belong to every Ai .
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 21

(b) The union  


[
Ai := x : x Ai (1.25b)
iI
iI
consists of all elements x that belong to at least one Ai . The union is called disjoint
if, and only if, for each i, j I, i 6= j implies Ai Aj = .
Proposition 1.38. Let I 6= be an index set, let M denote a set, and, for each i I,
let Ai denote a set. The following set-theoretic rules hold:
 
T T
(a) Ai M = (Ai M ).
iI iI
 
S S
(b) Ai M = (Ai M ).
iI iI
 
T T
(c) Ai M = (Ai M ).
iI iI
 
S S
(d) Ai M = (Ai M ).
iI iI
T S
(e) M \ Ai = (M \ Ai ).
iI iI
S T
(f ) M \ Ai = (M \ Ai ).
iI iI

Proof. We prove (c) and (e) and leave the remaining proofs as an exercise.
(c):
!
\ () 
x Ai M x M x Ai x Ai x M
iI iI
iI
\
x (Ai M ).
iI

To justify the equivalence at (), we make use of Th. 1.11(b) and verify and . For
note that the truth of x M implies x Ai x M is true for each i I. If x Ai
is true for each i I, then x Ai x M is still true for each i I. To verify , note
that the existence of i I such that x M implies the truth of x M x Ai .
iI
If x M is false for each i I, then x Ai must be true for each i I, showing
x M x Ai is true also in this case.
iI

(e):
\
xM\ Ai x M x Ai x M x
/ Ai
iI iI
iI
[
x M \ Ai x (M \ Ai ),
iI
iI
2 FUNCTIONS AND RELATIONS 22

completing the proof. 

Example 1.39. We have the following identities of sets:


\
N = N, (1.26a)
xR
\
{1, 2, . . . , n} = {1}, (1.26b)
nN
[
N = N, (1.26c)
xR
[
{1, 2, . . . , n} = N, (1.26d)
nN
[ \ 
N\ {2n} = {1, 3, 5, . . . } = N \ {2n} . (1.26e)
nN nN

2 Functions and Relations

2.1 Functions
Definition 2.1. Let A, B be sets. Given x A, y B, the set
n o
(x, y) := {x}, {x, y} (2.1)

is called the ordered pair (often shortened to just pair) consisting of x and y. The set of
all such pairs is called the Cartesian product A B, i.e.

A B := {(x, y) : x A y B}. (2.2)

Example 2.2. Let A be a set.

A = A = , (2.3a)
{1, 2} {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} (2.3b)
6 {1, 2, 3} {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}.
= (2.3c)

Also note that, for x 6= y,


 
(x, y) = {x}, {x, y} 6= {y}, {x, y} = (y, x). (2.4)

Definition 2.3. Given sets A, B, a function or map f is an assignment rule that assigns
to each x A a unique y B. One then also writes f (x) for the element y. The set A
is called the domain of f , denoted D(f ), and B is called the range of f , denoted R(f ).
The information about a map f can be concisely summarized by the notation

f : A B, x 7 f (x), (2.5)
2 FUNCTIONS AND RELATIONS 23

where x 7 f (x) is called the assignment rule for f , f (x) is called the image of x, and
x is called a preimage of f (x) (the image must be unique, but there might be several
preimages). The set

graph(f ) := (x, y) A B : y = f (x) (2.6)

is called the graph of f (not to be confused with pictures visualizing the function f ,
which are also called graph of f ). If one wants to be completely precise, then one
identifies the function f with the ordered triple (A, B, graph(f )).
The set of all functions with domain A and range B is denoted by F(A, B) or B A , i.e.

F(A, B) := B A := (f : A B) : A = D(f ) B = R(f ) . (2.7)

Caveat: Some authors reserve the word map for continuous functions, but we use func-
tion and map synonymously.

Definition 2.4. Let A, B be sets and f : A B a function.

(a) If T is a subset of A, then

f (T ) := {f (x) B : x T } (2.8)

is called the image of T under f .

(b) If U is a subset of B, then

f 1 (U ) := {x A : f (x) U } (2.9)

is called the preimage or inverse image of U under f .

(c) f is called injective or one-to-one if, and only if, every y B has at most one
preimage, i.e. if, and only if, the preimage of {y} has at most one element:
 
1
f injective f {y} = ! f (x) = y
yB xA

x1 6= x2 f (x1 ) 6= f (x2 ) . (2.10)
x1 ,x2 A

(d) f is called surjective or onto if, and only if, every element of the range of f has a
preimage:

f surjective y = f (x) f 1 {y} 6= . (2.11)


yB xA yB

(e) f is called bijective if, and only if, f is injective and surjective.
2 FUNCTIONS AND RELATIONS 24

Example 2.5. Examples of Functions:

f : {1, 2, 3, 4, 5} {1, 2, 3, 4, 5}, f (x) := x + 6, (2.12a)


g : N N, g(n) := 2n, (2.12b)
h : N {2, 4, 6, . . . }, h(n) := 2n, (2.12c)
(
n for n even,
h : N {2, 4, 6, . . . }, h(n) := (2.12d)
n + 1 for n odd,
G : N R, G(n) := n/(n + 1), (2.12e)

F : P(N) P P(N) , F (A) := P(A). (2.12f)

Instead of f (x) := x + 6 in (2.12a), one can also write x 7 x + 6 and analogously


in the other cases. Also note that, in the strict sense, functions g and h are different,
since their ranges are different (however, using the following Def. 2.4(a), they have the
same image in the sense that g(N) = h(N)). Furthermore,

f ({1, 2}) = {5, 4} = f 1 ({1, 2}), h1 ({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13)

f is bijective; g is injective, but not surjective; h is bijective; h is surjective, but not


injective. Can you figure out if G and F are injective and/or surjective?

Example 2.6. (a) For each nonempty set A, the map Id : A A, Id(x) := x, is
called the identity on A. If one needs to emphasize that Id operates on A, then one
also writes IdA instead of Id. The identity is clearly bijective.

(b) Let A, B be nonempty sets. A map f : A B is called constant if, and only if,
there exists c B such that f (x) = c for each x A. In that case, one also writes
f c, which can be read as f is identically equal to c. If f c, 6= T A, and
U B, then (
A for c U ,
f (T ) = {c}, f 1 (U ) = (2.14)
for c / U.
f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}.

(c) Given A X, the map


: A X, (x) := x, (2.15)
is called inclusion (also embedding or imbedding). An inclusion is always injective;
it is surjective if, and only if A = X, i.e. if, and only if, it is the identity on A.

(d) Given A X and a map f : X B, the map g : A B, g(x) = f (x), is called


the restriction of f to A; f is called the extension of g to X. In this situation, one
also uses the notation f A for g (some authors prefer the notation f |A or f |A).

There are several important rules regarding functions and set-theoretic operations. How-
ever, we will not make use of them in this class, and the interested student can find
them in Appendix A.5.
2 FUNCTIONS AND RELATIONS 25

Definition 2.7. The composition of maps f and g with f : A B, g : C D, and


f (A) C is defined to be the map

g f : A D, (g f )(x) := g f (x) . (2.16)

The expression g f is read as g after f or g composed with f .

Example 2.8. Consider the maps

f : N R, n 7 n2 , (2.17a)
g : N R, n 7 2n. (2.17b)

We obtain f (N) = {1, 4, 9, . . . } D(g), g(N) = {2, 4, 6, . . . } D(f ), and the composi-
tions

(g f ) : N R, (g f )(n) = g(n2 ) = 2n2 , (2.18a)


(f g) : N R, (f g)(n) = f (2n) = 4n2 , (2.18b)

showing that composing functions is, in general, not commutative, even if the involved
functions have the same domain and the same range.

Proposition 2.9. Consider maps f : A B, g : C D, h : E F , satisfying


f (A) C and g(C) E.

(a) Associativity of Compositions:

h (g f ) = (h g) f. (2.19)

(b) One has the following law for forming preimages:

(g f )1 (W ) = f 1 (g 1 (W )). (2.20)
W P(D)

Proof. (a): Both h (g f ) and


 (h g) f map A into F . So it just remains to prove
h (g f ) (x) = (h g) f (x) for each x A. One computes, for each x A,
  
h (g f ) (x) = h (g f )(x) = h g(f (x)) = (h g)(f (x))

= (h g) f (x), (2.21)

establishing the case.


(b): Exercise. 

Definition 2.10. A function g : B A is called a right inverse (resp. left inverse


of a function f : A B if, and only if, f g = IdB (resp. g f = IdA ). Moreover,
g is called an inverse of f if, and only if, it is both a right and a left inverse. If g is
an inverse of f , then one also writes f 1 instead of g. The map f is called (right, left)
invertible if, and only if, there exists a (right, left) inverse for f .
2 FUNCTIONS AND RELATIONS 26

Example 2.11. (a) Consider the map

f : N N, f (n) := 2n. (2.22a)

The maps
(
n/2 if n even,
g1 : N N, g1 (n) := (2.22b)
1 if n odd,
(
n/2 if n even,
g2 : N N, g2 (n) := (2.22c)
2 if n odd,

both constitute left inverses of f . It follows from Th. 2.12(c) below that f does not
have a right inverse.

(b) Consider the map


(
n/2 for n even,
f : N N, f (n) := (2.23a)
(n + 1)/2 for n odd.

The maps

g1 : N N, g1 (n) := 2n, (2.23b)


g2 : N N, g2 (n) := 2n 1, (2.23c)

both constitute right inverses of f . It follows from Th. 2.12(c) below that f does
not have a left inverse.

(c) The map (


n 1 for n even,
f : N N, f (n) := (2.24a)
n + 1 for n odd,
is its own inverse, i.e. f 1 = f . For the map


2 for n = 1,

3 for n = 2,
g : N N, g(n) := (2.24b)


1 for n = 3,

n for n/ {1, 2, 3},

the inverse is


3 for n = 1,

1 for n = 2,
g 1 : N N, g 1 (n) := (2.24c)


2 for n = 3,

n for n/ {1, 2, 3}.
2 FUNCTIONS AND RELATIONS 27

While Examples 2.11(a),(b) show that left and right inverses are usually not unique,
they are unique provided f is bijective (see Th. 2.12(c)).
Theorem 2.12. Let A, B be nonempty sets.

(a) f : A B is right invertible if, and only if, f is surjective.

(b) f : A B is left invertible if, and only if, f is injective.

(c) f : A B is invertible if, and only if, f is bijective. In this case, the right inverse
and the left inverse are unique and both identical to the inverse.

Proof. (a): If f is surjective, then, for each y B, there exists xy f 1 {y} such that
f (xy ) = y. Define
g : B A, g(y) := xy (2.25)
(note to the interested reader: the definition of g is, in general, not as unproblematic
as it might seem g is a so-called choice function, and its definition makes use of the
axiom of choice, see Appendix A.4). Then, for each y B, f (g(y)) = y, showing g is
a right inverse of f . Conversely, if g : B A is a right inverse of f , then, for each
y B, it is y = f (g(y)), showing that g(y) A is a preimage of y, i.e. f is surjective.
(b): Fix a A. If f is injective, then, for each y B with f 1 {y} 6= , let xy denote
the unique element in A satisfying f (xy ) = y. Define
(
xy for f 1 {y} 6= ,
g : B A, g(y) := (2.26)
a otherwise.

Then, for each x A, g(f (x)) = x, showing g is a left inverse of f . Conversely, if


g : B A is a left inverse of f and x1 , x2 A with f (x1 ) = f (x2 ) = y, then
x1 = (g f )(x1 ) = g(f (x1 )) = g(f (x2 )) = (g f )(x2 ) = x2 , showing y has precisely one
preimage and f is injective.
The first part of (c) follows immediately by combining (a) and (b). It merely remains
to verify the uniqueness of right and left inverse for bijective maps. So let g be a left
inverse of f , let h be a right inverse of f , and let f 1 be an inverse of f . Then, for each
y B,
 
g(y) = g (f f 1 ) (y) = (g f ) f 1 (y) = f 1 (y), (2.27a)
 
h(y) = (f 1 f ) h (y) = f 1 (f h) (y) = f 1 (y), (2.27b)

thereby proving the uniqueness of left and right inverse for bijective maps. 
Theorem 2.13. Consider maps f : A B, g : B C. If f and g are both injective
(resp. both surjective, both bijective), then so is g f . Moreover, in the bijective case,
one has
(g f )1 = f 1 g 1 . (2.28)

Proof. 
2 FUNCTIONS AND RELATIONS 28

Definition 2.14. (a) Given an index set I and a set A, a map f : I A is sometimes
called a family (of elements in A), and is denoted in the form f = (ai )iI with
ai := f (i). When using this representation, one often does not even specify f and
A, especially if the ai are themselves sets.

(b) A sequence in a set A is a family of elements in A, where the index set is the set of
natural numbers N. In this case, one writes (an )nN or (a1 , a2 , . . . ). More generally,
a family is called a sequence, given a bijective map between the index set I and a
subset of N.

(c) Given a family of sets (Ai )iI , we define the Cartesian product of the Ai to be the
set of functions
( ! )
Y [
Ai := f : I Aj : f (i) Ai . (2.29)
iI
iI jI

QI has precisely n elements with n N, then the elements of the Cartesian product
If
iI Ai are called (ordered) n-tuples, (ordered) triples for n = 3.

Example
T 2.15. (a) Using
S the notion of family, we can now say that the intersection
iI Ai and union iI Ai as defined in Def. 1.37 are the intersection and union of
the family of sets (Ai )iI , respectively. As a concrete example, let us revisit (1.26b),
where we have
\
(An )nN , An := {1, 2, . . . , n}, An = {1}. (2.30)
nN

(b) Examples of Sequences:

Sequence in {0, 1} : (1, 0, 1, 0, 1, 0, . . . ), (2.31a)


Sequence in N : (n2 )nN = (1, 4, 9, 16, 25, . . . ), (2.31b)
n
  
Sequence in R : (1) n nN = 1, 2, 3, . . . , (2.31c)
 
1 1
Sequence in R : (1/n)nN = 1, , , . . . , (2.31d)
2 3

Finite Sequence in P(N) : {3, 2, 1}, {2, 1}, {1}, . (2.31e)
Q
(c) The Cartesian product iI A, where all sets Ai =QA, is the same as AI , the set
of all functions from I into A. So, for example, nN R = RN is the set of all
sequences in R. If I = {1, 2, . . . , n} with n N, then

Y n
Y
A = A{1,2...,n} =: A =: An (2.32)
iI i=1

is the set of all n-tuples with entries from A.


2 FUNCTIONS AND RELATIONS 29

2.2 Relations
Definition 2.16. Given sets A and B, a relation is a subset R of A B (if one wants
to be completely precise, a relation is an ordered triple (A, B, R), where R A B).
If A = B, then we call R a relation on A. One says that a A and b B are related
according to the relation R if, and only if, (a, b) R. In this context, one usually writes
a R b instead of (a, b) R.
Example 2.17. (a) The relations we are probably most familiar with are = and .
The relation R of equality, usually denoted =, makes sense on every nonempty set
A:
R := (A) := {(x, x) A A : x A}. (2.33)
The set (A) is called the diagonal of the Cartesian product, i.e., as a subset of
A A, the relation of equality is identical to the diagonal:

x = y x R y (x, y) R = (A). (2.34)

Similarly, the relation on R is identical to the set

R := {(x, y) R2 : x y}. (2.35)

(b) Every function f : A B is a relation, namely the relation

Rf = {(x, y) A B : y = f (x)} = graph(f ). (2.36)

Conversely, if B 6= , then every relation R A B uniquely corresponds to the


function
fR : A P(B), fR (x) = {y B : x R y}. (2.37)
Definition 2.18. Let R be a relation on the set A.

(a) R is called reflexive if, and only if,

x R x, (2.38)
xA

i.e. if, and only if, every element is related to itself.


(b) R is called symmetric if, and only if,

xRy yRx , (2.39)
x,yA

i.e. if, and only if, each x is related to y if, and only if, y is related to x.
(c) R is called antisymmetric if, and only if,

(x R y y R x) x = y , (2.40)
x,yA

i.e. if, and only if, the only possibility for x to be related to y at the same time that
y is related to x is in the case x = y.
2 FUNCTIONS AND RELATIONS 30

(d) R is called transitive if, and only if,



(x R y y R z) x R z , (2.41)
x,y,zA

i.e. if, and only if, the relatedness of x and y together with the relatedness of y and
z implies the relatedness of x and z.
Example 2.19. The relations = and on R (or N) are reflexive, antisymmetric, and
transitive; = is also symmetric, whereas is not; < is antisymmetric (since x < yy < x
is always false) and transitive, but neither reflexive nor symmetric. The relation

R := (x, y) N2 : (x, y are both even) (x, y are both odd) (2.42)

on N is not antisymmetric, but reflexive, symmetric, and transitive. The relation

S := {(x, y) N2 : y = x2 } (2.43)

is not transitive (for example, 2 S 4 and 4 S 16, but not 2 S 16), not reflexive, not sym-
metric; it is only antisymmetric.
Definition 2.20. A relation R on a set A is called an equivalence relation if, and only
if, R is reflexive, symmetric, and transitive. If R is an equivalence relations, then one
often writes x y instead of x R y.
Example 2.21. (a) The equality relation = is an equivalence relation on each A 6= .
(b) The relation R defined in (2.42) is an equivalence relation on N.
S
(c) Given a disjoint union A = iI Ai with every Ai 6= (which is sometimes called a
decomposition of A), an equivalence relation on A is defined by

x y x Ai y Ai . (2.44)
iI

Conversely, given an equivalence


S relation on a nonempty set A, we can construct
a decomposition A = iI Ai such that (2.44) holds: For each x A, define

[x] := {y A : x y}, (2.45)

called the equivalence class of x; each y [x] is called a representative of [x]. One
verifies that the properties of guarantee
 
[x] = [y] x y [x] [y] = (x y) . (2.46)

The set of all equivalence classes I := A/ := {[x] : x A} is called the quotient set
S
of A by , and A = iI Ai with Ai := i for each i I is the desired decomposition
of A.
Definition 2.22. A relation R on a set A is called a partial order if, and only if, R is
reflexive, antisymmetric, and transitive. If R is a partial order, then one usually writes
x y instead of x R y. A partial order is called a total or linear order if, and only if,
for each x, y A, one has x y or y x.
2 FUNCTIONS AND RELATIONS 31

Notation 2.23. Given a (partial or total) order on A 6= , we write x < y if, and
only if, x y and x 6= y, calling < the strict order corresponding to (note that the
strict order is never a partial order).

Definition 2.24. Let be a partial order on A 6= , 6= B A.

(a) x A is called lower (resp. upper) bound for B if, and only if, x b (resp. b x)
for each b B. Moreover, B is called bounded from below (resp. from above) if, and
only if, there exists a lower (resp. upper) bound for B; B is called bounded if, and
only if, it is bounded from above and from below.

(b) x B is called minimum or just min (resp. maximum or max) of B if, and only if,
x is a lower (resp. upper) bound for B. One writes x = min B if x is minimum and
x = max B if x is maximum.

(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is called
infimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. a
smallest upper bound) is called supremum of B, denoted sup B.

Example 2.25. (a) For each A R, the usual relation defines a total order on A.
For A = R, we see that N has 0 and 1 as lower bound with 1 = min N = inf N. On
the other hand, N is unbounded from above. The set M := {1, 2, 3} is bounded
with min M = 1, max M = 3. The positive real numbers R+ := {x R : x > 0}
have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x).

(b) Consider A := N N. Then

(m1 , m2 ) (n1 , n2 ) m1 n1 m2 n2 , (2.47)

defines a partial order on A that is not a total order (for example, neither (1, 2)
(2, 1) nor (2, 1) (1, 2)). For the set

B := (1, 1), (2, 1), (1, 2) , (2.48)

we have inf B = min B = (1, 1), B does not have a max, but sup B = (2, 2) (if
(m, n) A is an upper bound for B, then (2, 1) (m, n) implies 2 m and
(1, 2) (m, n) implies 2 n, i.e. (2, 2) (m, n); since (2, 2) is clearly an upper
bound for B, we have proved sup B = (2, 2)).
A different order on A is the so-called lexicographic order defined by

(m1 , m2 ) (n1 , n2 ) m1 < n1 (m1 = n1 m2 n2 ). (2.49)

In contrast to the order from (2.47), the lexicographic order does define a total
order on A.

Lemma 2.26. Let be a partial order on A 6= , 6= B A. Then the relation ,


defined by
x y y x, (2.50)
2 FUNCTIONS AND RELATIONS 32

is also a partial order on A. Moreover, using obvious notation, we have, for each x A,

x -lower bound for B x -upper bound for B, (2.51a)


x -upper bound for B x -lower bound for B, (2.51b)
x = min B x = max B, (2.51c)
x = max B x = min B, (2.51d)
x = inf B x = sup B, (2.51e)
x = sup B x = inf B. (2.51f)

Proof. Reflexivity, antisymmetry, and transitivity of clearly imply the same properties
for , respectively. Moreover

x -lower bound for B x b b x x -upper bound for B,


bB bB

proving (2.51a). Analogously, we obtain (2.51b). Next, (2.51c) and (2.51d) are implied
by (2.51a) and (2.51b), respectively. Finally, (2.51e) is proved by

x = inf B x = max {y A : y -lower bound for B}


x = min {y A : y -upper bound for B} x = sup B,

and (2.51f) follows analogously. 


Proposition 2.27. Let be a partial order on A 6= , 6= B A. The elements
max B, min B, sup B, inf B are all unique, provided they exist.

Proof. Exercise. 
Definition 2.28. Let A, B be nonempty sets with partial orders, both denoted by
(even though they might be different). A function f : A B, is called (strictly)
isotone, order-preserving, or increasing if, and only if,

x < y f (x) f (y) (resp. f (x) < f (y)) ; (2.52a)
x,yA

f is called (strictly) antitone, order-reversing, or decreasing if, and only if,



x < y f (x) f (y) (resp. f (x) > f (y)) . (2.52b)
x,yA

Functions that are (strictly) isotone or antitone are called (strictly) monotone.
Proposition 2.29. Let A, B be nonempty sets with partial orders, both denoted by .

(a) A (strictly) isotone function f : A B becomes a (strictly) antitone function


and vice versa if precisely one of the relations is replaced by .

(b) If the order on A is total and f : A B is strictly isotone or strictly antitone,


then f is one-to-one.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 33

(c) If the order on A is total and f : A B is invertible and strictly isotone (resp.
antitone), then f 1 is also strictly isotone (resp. antitone).

Proof. (a) is immediate from (2.52).


(b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictly
isotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f (x) < f (y)
or f (y) < f (x), i.e. f (x) 6= f (y) in every case, showing f is one-to-one.
(c): Again, due to (a), it suffices to consider the isotone case. If u, v B such that u < v,
then u = f (f 1 (u)), v = f (f 1 (v)), and the isotonicity of f imply f 1 (u) < f 1 (v) (we
are using that the order on A is total otherwise, f 1 (u) and f 1 (v) need not be
comparable). 
Example 2.30. (a) f : N N, f (n) := 2n, is strictly increasing, every constant map
on N is both increasing and decreasing, but not strictly increasing or decreasing.
All maps occurring in (2.24) are neither increasing nor decreasing.
(b) The map f : R R, f (x) := 2x, is invertible and strictly decreasing, and so is
f 1 : R R, f 1 (x) := x/2.
(c) The following counterexamples show that the assertions of Prop. 2.29(b),(c) are no
longer correct if one does not assume the order on A is total. Let A be the set from
(2.48) (where it had been called B) with the (nontotal) order from (2.47). The map

f (1, 1) := 1,

f : A N, f (1, 2) := 2, (2.53)


f (2, 1) := 2,
is strictly isotone, but not one-to-one. The map

f (1, 1) := 1,

f : A {1, 2, 3}, f (1, 2) := 2, (2.54)


f (2, 1) := 3,
is strictly isotone and invertible, however f 1 is not isotone (since 2 < 3, but
f 1 (2) = (1, 2) and f 1 (3) = (2, 1) are not comparable, i.e. f 1 (2) f 1 (3) is not
true).

3 Natural Numbers, Induction, and the Size of Sets

3.1 Induction and Recursion


One of the most useful proof techniques is the method of induction it is used in
situations, where one needs to verify the truth of statements (n) for each n N, i.e.
the truth of the statement
(n). (3.1)
nN
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 34

Induction is based on the fact that N satisfies the so-called Peano axioms:

P1: N contains a special element called one, denoted 1.

P2: There exists an injective map S : N N \ {1}, called the successor function (for
each n N, S(n) is called the successor of n).

P3: If a subset A of N has the property that 1 A and S(n) A for each n A, then
A is equal to N. Written as a formula, the third axiom is:

1 A S(A) A A = N .
AP(N)

Remark 3.1. In Def. 1.26, we had introduced the natural numbers N := {1, 2, 3, . . . }.
The successor function is S(n) = n + 1. In axiomatic set theory, one starts with the
Peano axioms and shows that the axioms of set theory allow the construction of a
set N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . ,
n + 1 := S(n). The interested reader can find more details in Appendix B.1.

Theorem 3.2 (Principle of Induction). Suppose, for each n N, (n) is a statement


(i.e. a predicate of n in the language of Def. 1.30). If (a) and (b) both hold, where

(a) (1) is true,



(b) (n) (n + 1) ,
nN

then (3.1) is true, i.e. (n) is true for every n N.

Proof. Let A := {n N : (n)}. We have to show A = N. Since 1 A by (a), and


(b)
n A (n) (n + 1) S(n) = n + 1 A, (3.2)

i.e. S(A) A, the Peano axiom P3 implies A = N. 

Remark 3.3. To prove some (n) for each n N by induction according to Th. 3.2
consists of the following two steps:

(a) Prove (1), the so-called base case.

(b) Perform the inductive step, i.e. prove that (n) (the induction hypothesis) implies
(n + 1).

Example 3.4. We use induction to prove the statement


 
n(n + 1)
1 + 2 + + n = : (3.3)
nN 2
| {z }
(n)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 35

12
Base Case (n = 1): 1 = 2
, i.e. (1) is true.
n(n+1)
Induction Hypothesis: Assume (n), i.e. 1 + 2 + + n = 2
holds.
Induction Step: One computes

(n) n(n + 1) n(n + 1) + 2n + 2
1 + 2 + + n + (n + 1) = +n+1=
2 2
2
n + 3n + 2 (n + 1)(n + 2)
= = , (3.4)
2 2
i.e. (n + 1) holds and the induction is complete.

Corollary 3.5. Theorem 3.2 remains true if (b) is replaced by


  
(m) (n + 1) . (3.5)
nN 1mn

Proof. If, for each n N, we use (n) to denote (m), then (3.5) is equivalent to
1mn

(n) (n + 1) , i.e. to Th. 3.2(b) with replaced by . Thus, Th. 3.2 implies
nN
(n) holds true for each n N, i.e. (n) holds true for each n N. 

Corollary 3.6. Let I be an index set. Suppose, for each i I, (i) is a statement. If
there is a bijective map f : N I and (a) and (b) both hold, where

(a) f (1) is true,
  
(b) f (n) f (n + 1) ,
nN

then (i) is true for every i I.


Finite Induction: The above assertion remains true if f : {1, . . . , m} I is bijective
for some m N and N in (b) is replaced by {1, . . . , m 1}.

Proof. If, for each n N, we use (n) to denote f (n) , then Th. 3.2 shows (n) is
true for every n N. Given i I, we have n := f 1 (i) N with f (n) = i, showing that
(i) = f (n) = (n) is true.

For the finite induction, let (n) denote n m f (n) n > m. Then, for 1
n < m, we have (n) (n + 1) due to (b). For n m, we also have (n) (n + 1)
due to n m n + 1 > m. Thus, Th. 3.2 shows (n) is true for every n N. Given 
i I, it is n := f 1 (i) {1, . . . , m} with f (n) = i. Since n m (n) f (n) , we
obtain that (i) is true. 

Apart from providing a widely employable proof technique, the most important ap-
plication of Th. 3.2 is the possibility to define sequences inductively, using so-called
recursion:
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 36

Theorem 3.7 (Recursion Theorem). Let A be a nonempty set and x A. Given a


sequence of functions (fn )nN , where fn : An A, there exists a unique sequence
(xn )nN in A satisfying the following two conditions:

(i) x1 = x.

(ii) xn+1 = fn (x1 , . . . , xn ).


nN

The same holds if N is replaced by an index set I as in Cor. 3.6.

Proof. To prove uniqueness, let (xn )nN and (yn )nN be sequences in A, both satisfying
(i) and (ii), i.e.

x1 = y1 = x and (3.6a)

xn+1 = fn (x1 , . . . , xn ) yn+1 = fn (y1 , . . . , yn ) . (3.6b)
nN

We prove by induction (in the form of Cor. 3.5) that (xn )nN = (yn )nN , i.e.

xn = yn : (3.7)
nN | {z }
(n)

Base Case (n = 1): (1) is true according to (3.6a).


Induction Hypothesis: Assume (m) for each m {1, . . . , n}, i.e. xm = ym holds for
each m {1, . . . , n}.
Induction Step: One computes

(3.6b) (1),...,(n) (3.6b)
xn+1 = fn (x1 , . . . , xn ) = fn (y1 , . . . , yn ) = yn+1 , (3.8)

i.e. (n + 1) holds and the induction is complete.


Proving existence is not as easy as one might think at first glance, and we refer to
[EHH+ 95, Sec. 1.2.2] for the proof. 
Example 3.8. In many applications of Th. 3.7, one has functions gn : A A and
uses 
fn : An A, fn (a1 , . . . , an ) := gn (an ) . (3.9)
nN

Here are some important concrete examples:

(a) The factorial function F : N0 N, n 7 n!, is defined recursively by

0! := 1, 1! := 1, (n + 1)! := (n + 1) n!, (3.10a)


nN

i.e. we have A = N and gn (x) := (n + 1) x. So we obtain

(n!)nN0 = (1, 1, 2, 6, 24, 120, . . . ). (3.10b)


3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 37

(b) For each a R and each d R, we define the following arithmetic progression (also
called arithmetic sequence) recursively by

a1 := a, an+1 := an + d, (3.11a)
nN

i.e. we have A = R and gn = g with g(x) := x + d. For example, for a = 2 and


d = 0.5, we obtain

(an )nN = (2, 1.5, 1, 0.5, 0, 0.5, 1, 1.5, . . . ). (3.11b)

(c) For each a R and each q R \ {0}, we define the following geometric progression
(also called geometric sequence) recursively by

x1 := a, xn+1 := xn q, (3.12a)
nN

i.e. we have A = R and gn = g with g(x) := x q. For example, for a = 3 and


q = 2, we obtain
(xn )nN = (3, 6, 12, 24, 48, . . . ). (3.12b)

For the time being, we will continue to always specify A and the gn or fn in subsequent
recursive definitions, but in the literature, most of the time, the gn or fn are not provided
explicitly.

Example 3.9. (a) The Fibonacci sequence consists of the Fibonacci numbers, defined
recursively by
F0 := 0, F1 := 1, Fn+1 := Fn + Fn1 , (3.13a)
nN

i.e. we have A = N0 and


(
1 for n = 1,
fn : An A, fn (a1 , . . . , an ) := (3.13b)
an + an1 for n 2.

So we obtain

(Fn )nN0 = (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . ). (3.13c)

(b) For A := N, x := 1, and

fn : An A, fn (a1 , . . . , an ) := a1 + + an , (3.14a)

one obtains
x1 = 1, x2 = f1 (1) = 1, x3 = f2 (1, 1) = 2, x4 = f3 (1, 1, 2) = 4,
(3.14b)
x5 = f4 (1, 1, 2, 4) = 8, x6 = f5 (1, 1, 2, 4, 8) = 16, . . .
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 38

Definition 3.10. (a) Summation Symbol: On A = R (or, more generally, on every set
where an addition + : A A A is defined), define recursively, for each given
(possibly finite) sequence (a1 , a2 , . . . ) in A:
1
X n+1
X n
X
ai := a1 , ai := an+1 + ai for n 1, (3.15a)
i=1 i=1 i=1

i.e.
fn : An A, fn (x1 , . . . , xn ) := xn + an+1 . (3.15b)
In (3.15a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
More generally, if I is an index set and : {1, . . . , n} I a bijective map, then
define n
X X
ai := a(i) . (3.15c)
iI i=1

The commutativity of addition implies that the definition in (3.15c) is actually


independent of the chosen bijective map . Also define
X
ai := 0. (3.15d)
i

(b) Product Symbol: On A = R (or, more generally, on every set where a multiplication
: A A A is defined), define recursively, for each given (possibly finite)
sequence (a1 , a2 , . . . ) in A:
1
Y n+1
Y n
Y
ai := a1 , ai := an+1 ai for n 1, (3.16a)
i=1 i=1 i=1

i.e.
fn : An A, fn (x1 , . . . , xn ) := xn an+1 . (3.16b)
In (3.16a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
More generally, if I is an index set and : {1, . . . , n} I a bijective map, then
define n
Y Y
ai := a(i) . (3.16c)
iI i=1

The commutativity of multiplication implies that the definition in (3.16c) is actually


independent of the chosen bijective map . Also define
Y
ai := 1. (3.16d)
i
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 39

Example 3.11. (a) Given a, d R, let (an )nN be the arithmetic sequence as defined
in (3.11a). It is an exercise to prove by induction that

an = a + (n 1)d, (3.17a)
nN
n
X n n 
Sn := ai = (a1 + an ) = 2 a + (n 1) d , (3.17b)
nN
i=1
2 2

where the Sn are called arithmetic sums.

(b) Given a R and q R \ {0}, let (xn )nN be the geometric sequence as defined in
(3.12a). We will prove by induction that

xn = a q n1 ,
(3.18a)
nN
n n n1
(
X X X na for q = 1,
Sn := xi = (a q i1 ) = a q i = a (1qn ) (3.18b)
nN
i=1 i=1 i=0 1q
for q 6= 1,

where the Sn are called geometric sums.


For the induction proof of (3.18a), (n) is xn = a q n1 . The base case, (1), is the
statement x1 = a q 0 = a, which is true. For the induction step, we assume (n)
and compute 
(n)
xn+1 = xn q = a q n1 q = a q n , (3.19)
showing (n) (n + 1) and completing the proof.
For q = 1, the sum Sn is actually arithmetic with d = 0, i.e. Sn = na can be
obtained from (3.17b). For the induction proof of (3.18b) with q 6= 1, (n) is
n)
Sn = a(1q
1q
. The base case, (1), is the statement S1 = a(1q)1q
= a, which is true.
For the induction step, we assume (n) and compute

(n) a(1 q n ) a(1 q n ) + aq n (1 q) a(1 q n+1 )
Sn+1 = Sn + xn+1 = + aq n = = ,
1q 1q 1q
(3.20)
showing (n) (n + 1) and completing the proof.

3.2 Cardinality: The Size of Sets


Cardinality measures the size of sets. For a finite set A, it is precisely the number of
elements in A. For an infinite set, it classifies the sets degree or level of infinity (it turns
out that not all infinite sets have the same size).

Definition 3.12. (a) The sets A, B are defined to have the same cardinality or the
same size if, and only if, there exists a bijective map : A B. One can show
that this defines an equivalence relation on every set of sets (see Th. A.7 of the
Appendix).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 40

(b) The cardinality of a set A is n N (denoted #A = n) if, and only if, there exists
a bijective map : A {1, . . . , n}. The cardinality of is defined as 0, i.e.
# := 0. A set A is called finite if, and only if, there exists n N0 such that
#A = n; A is called infinite if, and only if, A is not finite, denoted #A = (in the
strict sense, this is an abuse of notation, since is not a cardinality for example
#N = and #P(N) = , but N and P(N) do not have the same cardinality, since
the power set P(A) is always strictly bigger than A (see Th. 3.20 below) #A =
is merely an abbreviation for the statement A is infinite). The interested student
finds additional material regarding the uniqueness of finite cardinality in Th. A.8
and Cor. A.9, and regarding characterizations of infinite sets in Th. A.10 of the
Appendix.

(c) The set A is called countable if, and only if, A is finite or A has the same cardinality
as N. Otherwise, A is called uncountable.
Theorem 3.13. Let A 6= be a finite set.

(a) If B A with A 6= B, then B is finite with #B < #A.



(b) If a A, then # A \ {a} = #A 1.

Proof. For #A = 0, i.e. A = , (a) and (b) are trivially true, since A has neither
strict subsets nor elements. For #A = n N, we use induction to prove (a) and (b)
simultaneously, i.e. we show
 

#A = n #B {0, . . . , n 1} # A \ {a} = n 1 .
nN BP(A)\{A} aA
| {z }
(n)

Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = , and
# = 0 = n 1 proves (1).
Induction Step: For the induction hypothesis, we assume (n) to be true, i.e. we assume
(a) and (b) hold for each A with #A = n. We have to prove (n + 1), i.e., we consider
A with #A = n + 1. From #A = n + 1, we conclude the existence of a bijective map :
A {1, . . . , n + 1}. We have to construct a bijective map : A \ {a} {1, . . . , n}.
To this end, set k := (a) and define the auxiliary function

n + 1 for x = k,

f : {1, . . . , n + 1} {1, . . . , n + 1}, f (x) := k for x = n + 1,


x for x
/ {k, n + 1}.

Then f : A {1, . . . , n + 1} is bijective by Th. 2.13, and

(f )(a) = f ((a)) = f (k) = n + 1.

 := f A\{a} is the desired bijective map : A\{a} {1, . . . , n},


Thus, the restriction
proving # A \ {a} = n. It remains to consider the strict subset B of A. Since B is a
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 41

strict subset of A,there exists a A \ B. Thus, B A \ {a} and, as we have already


shown # A \ {a} = n, the induction hypothesis applies and yields B is finite with
#B # A \ {a} = n, i.e. #B {0, . . . , n}, proving (n + 1), thereby completing the
induction. 

Theorem 3.14. For #A = #B = n N and f : A B, the following statements


are equivalent:

(i) f is injective.

(ii) f is surjective.

(iii) f is bijective.

Proof. It suffices to prove the equivalence of (i) and (ii).


If f is injective, then f : A f (A) is bijective. Since #A = n, there exists a bijective
map : A {1, . . . , n}. Then ( f 1 ) : f (A) {1, . . . , n} is also bijective,
showing #f (A) = n, i.e., according to Th. 3.13(a), f (A) can not be a strict subset of
B, i.e. f (A) = B, proving f is surjective.
If f is surjective, then f has a right inverse g : B A by Th. 2.12(a), i.e. f g = IdB .
But this also means f is a left inverse for g, such that g must be injective by Th. 2.12(b).
According to what we have already proved above, g injective implies g surjective, i.e.
g must be bijective. From Th. 2.12(c), we then know the left inverse of g is unique,
implying f = g 1 . In particular, f is injective. 

Lemma 3.15. For each finite set A (i.e. #A = n N0 ) and each B A, one has
#(A \ B) = #A #B.

Proof. For B = , the assertion is true since #(A \ B) = #A = #A 0 = #A #B.


For B 6= , the proof is conducted over the size of B, i.e. as a finite induction (cf. Cor.
3.6) over the set {1, . . . , n}, showing

#B = m #(A \ B) = #A #B .
m{1,...,n} | {z }
(m)

Base Case (m = 1): (1) is precisely the statement provided by Th. 3.13(b).
Induction Step: For the induction hypothesis, we assume (m) with 1 m < n. To
prove (m + 1), consider B A with #B = m + 1. Fix an element b B and set
B1 := B \ {b}. Then #B1 = m by Th. 3.13(b), A \ B = (A \ B1 ) \ {b}, and we compute

 Th. 3.13(b) (m)
#(A \ B) = # (A \ B1 ) \ {b} = #(A \ B1 ) 1 = #A #B1 1
= #A #B,

proving (m + 1) and completing the induction. 


3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 42

Theorem 3.16. If A, B are finite sets, then #(A B) = #A + #B #(A B).

Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, then
there exist m, n N such that #A = m and #B = n, i.e. there are bijective maps
f : A {1, . . . , m} and g : B {1, . . . , n}.
We first consider the case AB = . We need to construct a bijective map h : AB
{1, . . . , m + n}. To this end, we define
(
f (x) for x A,
h : A B {1, . . . , m + n}, h(x) :=
g(x) + m for x B.
The bijectivity of f and g clearly implies the bijectivity of h, proving #(A B) =
m + n = #A + #B.

Finally, we consider the case of arbitrary A, B. Since A B = A (B \ A) and B \ A =
B \ (A B), we can compute

#(A B) = # A (B \ A) = #A + #(B \ A)
 Lem. 3.15
= #A + # B \ (A B) = #A + #B #(A B),
thereby establishing the case. 
Theorem 3.17. If (A1 , . . . , An ), n N, is a finite sequence of finite sets, then
n
Y n
 Y
# Ai = # A1 An = #Ai . (3.21)
i=1 i=1

Proof. If at least one Ai is empty, then (3.21) is true, since both sides are 0.
The case where all Ai are nonempty is proved by induction over n, i.e. we know ki :=
#Ai N for each i {1, . . . , n} and show by induction
n
Y n
Y
# Ai = ki .
nN
i=1 i=1
| {z }
(n)
Q1 Q1
Base Case (n = 1): i=1 Ai = #A1 = k1 = i=1 ki , i.e. (1) holds.
Induction Step: From the induction
Qn hypothesis (n),Qn we obtain a bijective map :
A {1, . . . , N }, where A := i=1 Ai and N := i=1 ki . To prove (n + 1), we need
to construct a bijective map h : A An+1 {1, . . . , N kn+1 }. Since #An+1 = kn+1 ,
there exists a bijective map f : An+1 {1, . . . , kn+1 }. We define
h : A An+1 {1, . . . , N kn+1 },

h(a1 , . . . , an , an+1 ) := f (an+1 ) 1 N + (a1 , . . . , an ).
Since and f are bijective, and since every m {1, . . . , N kn+1 } has a unique rep-
resentation in the form m = a N + r with a {0, . . . , kn+1 1} and r {1, . . . , N }
(exercise), h is also bijective. This proves (n + 1) and completes the induction. 
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 43

Theorem 3.18. For each finite set A (i.e. #A = n N0 ), one has #P(A) = 2n .

Proof. The proof is conducted by induction by showing



#A = n #P(A) = 2n .
nN0 | {z }
(n)

Base Case (n = 0): For n = 0, we have A = , i.e. P(A) = {}. Thus, #P(A) = 1 = 20 ,
proving (0).
Induction Step: Assume (n) and consider A with #A = n + 1. Then A contains
 B := A \ {a}, we then
at least one element a. For know #B = n from Th. 3.13(b).
Moreover, setting M := C {a} : C P(B) , we have the disjoint decomposition
P(A) = P(B) M. As the map : P(B) M, (C) := C {a}, is clearly bijective,
P(B) and M have the same cardinality. Thus,

Th. 3.16 (n)
#P(A) = #P(B) + #M = #P(B) + #P(B) = 2 2n = 2n+1 ,
thereby proving (n + 1) and completing the induction. 
Remark 3.19. In the proof of the following Th. 3.20, we will encounter a new proof
technique that we did not use before, the so-called proof by contradiction, also called
indirect proof. It is based on the observation, called the principle of contradiction, that
A A is always false:
A A A A
T F F (3.22)
F T F
Thus, one possibility of proving a statement B to be true is to show B A A for
some arbitrary statement A. Since the right-hand side of the implication is false, the
left-hand side must also be false, proving B is true.
Theorem 3.20. Let A be a set. There can never exist a surjective map from A onto
P(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; in
particular, A and P(A) can never have the same size).

Proof. If A = , then there is nothing to prove. For nonempty A, as mentioned above,


the idea is to conduct a proof by contradiction. To this end, assume there does exist a
surjective map f : A P(A) and define
B := {x A : x
/ f (x)}. (3.23)
Now B is a subset of A, i.e. B P(A) and the assumption that f is surjective implies
the existence of a A such that f (a) = B. If a B, then a / f (a) = B, i.e. a B
implies a B (a B), so that the principle of contradiction tells us a / B must be
true. However, a / B implies a f (a) = B, i.e., this time, the principle of contradiction
tells us a B must be true. In conclusion, we have shown our original assumption that
there exists a surjective map f : A P(A) implies a B (a B), i.e., according
to the principle of contradiction, no surjective map from A into P(A) can exist. 
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 44

We conclude the section with a number of important results regarding the natural
numbers and countability.
Theorem 3.21. (a) Every nonempty finite subset of a totally ordered set has a mini-
mum and a maximum.
(b) Every nonempty subset of N has a minimum.

Proof. The induction proof for (a) is left as an exercise.


(b): Let 6= A N. We have to show A has a min. If A is finite, then A has a min by (a).
If A is infinite, let n be an element from A. Then the finite set B := {k A : k n}
must have a min m by (a). Since m x for each x B and m n < x for each
x A \ B, we have m = min A. 
Proposition 3.22. Every subset A of N is countable.

Proof. Since is countable, we may assume A 6= . From Th. 3.21(b), we know that
every nonempty subset of N has a min. We recursively define a sequence in A by
(
min A if A \ {ai : 1 i n} 6= ,
a1 := min A, an+1 :=
an if A \ {ai : 1 i n} = .

This sequence is the same as the function f : N A, f (n) = an . An easy induction


shows that, for each n N, an 6= an+1 implies the restriction f {1,...,n+1} is injective.
Thus, if there exists n N such that an = an+1 , then f {1,...,k} : {1, . . . , k} A is
bijective, where k := min{n N : an = an+1 }, showing A is finite, i.e. countable. If
there does not exist n N with an = an+1 , then f is injective. Another easy induction
shows that, for each n N, f ({1, . . . , n}) {k A : k n}, showing f is also
surjective, proving A is countable. 
Proposition 3.23. For each set A 6= , the following three statements are equivalent:

(i) A is countable.
(ii) There exists an injective map f : A N.
(iii) There exists a surjective map g : N A.

Proof. Directly from the definition of countable in Def. 3.12(c), one obtains (i)(ii) and
(i)(iii). To prove (ii)(i), let f : A N be injective. Then f : A f (A) is
bijective, and, since f (A) N, f (A) is countable by Prop. 3.22, proving A is countable
as well. To prove (iii)(i), let g : N A be surjective. According to Th. 2.12(a), g
has a right inverse f : A N, i.e. g f = IdA . But this means g is a left inverse for f ,
showing f is injective according to Th. 2.12(b). Then A is countable by an application
of (ii). 
Qn
Theorem 3.24. If (A1 , . . . , An ), n N, is a finite family of countable sets, then i=1 Ai
is countable.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 45

Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map

: N N N, (m, n) := 2m 3n ,

is injective: If (m, n) = (p, q), then 2m 3n = 2p 3q . Moreover m p or p m.


If m p, then 3n = 2pm 3q . Since 3n is odd, 2pm 3q must also be odd, implying
p m = 0, i.e. m = p. Moreover, we now have 3n = 3q , implying n = q, showing
(m, n) = (p, q), i.e. is injective.
We now come back to the general case stated in the theorem. If at least one of the Ai is
empty, then A is empty. So it remains to consider the case, where all Ai are nonempty.
The proof is conducted by induction by showing
n
Y
Ai is countable .
nN
i=1
| {z }
(n)

Base Case (n = 1): (1) is merely the hypothesis that A1 is countable.


Q
Induction Step: Assuming (n), Prop. 3.23(ii) provides injective maps f1 : Q ni=1 Ai
N and f2 : An+1 N. To prove (n+1), we provide an injective map h : n+1 i=1 Ai
N: Define
n+1
Y 
h: Ai N, h(a1 , . . . , an , an+1 ) := f1 (a1 , . . . , an ), f2 (an+1 ) .
i=1

The injectivity of f1 , f2 , and clearly implies the injectivity of h, thereby proving


(n + 1) and completing the induction. 

Theorem 3.25. If (Ai )iI is a countable family of countable


S sets (i.e. 6= I is countable
and each Ai , i I, is countable), then the union A := iI Ai is also countable.

Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according to
Prop. 3.23(iii), it suffices to construct a surjective map : N A. Also according
to Prop. 3.23(iii), the countability of I and the Ai provides us with surjective maps
f : N I and gi : N Ai . Define

F : N N A, F (m, n) := gf (m) (n).

Then F is surjective: Given x A, there exists i I such that x Ai . Since f is


surjective, there is m N satisfying f (m) = i. Moreover, since gi is surjective, there
exists n N with gi (n) = x. Then F (m, n) = gi (n) = x, verifying that F is surjective.
As N N is countable by Th. 3.24, there exists a surjective map h : N N N. Thus,
F h is the desired surjective map from N onto A. Note: The axiom of choice (AC, see
Appendix A.4) is used when choosing each gi from the set of all surjective maps from N
onto Ai . It has actually been shown that it is impossible to prove the theorem without
using AC. 
4 REAL NUMBERS 46

4 Real Numbers

4.1 The Real Numbers as a Complete Totally Ordered Field


The set of real numbers, denoted R, is a set with special properties, namely a so-called
complete totally ordered field. We already know what totally ordered means, but we still
need to explain what a field is, what an ordered field is, and what it means for a total
order to be complete. We begin with the last part.
Definition 4.1. A total order on a nonempty set A is called complete if, and only if,
every nonempty subset B of A that is bounded from above has a supremum, i.e.
  
bx s = sup B . (4.1)
BP(A)\{} xA bB sA

Lemma 4.2. A total order on a nonempty set A is complete if, and only if, every
nonempty subset B of A that is bounded from below has an infimum.

Proof. According to Lem. 2.26, it suffices to prove one implication. We show that (4.1)
implies that every nonempty B bounded from below has an infimum: Define

C := {x A : x is lower bound for B}. (4.2)

Then every b B is an upper bound for C and (4.1) implies there exists s = sup C A.
To verify s = inf B, it remains to show s C, i.e. that s is a lower bound for B.
However, every b B is an upper bound for C and s = sup C is the min of all upper
bounds for C, i.e. s b for each b B, showing s C. 
Definition 4.3. Let A be a nonempty set with a map

: A A A, (x, y) 7 x y (4.3)

(called a composition on A, the examples we have in mind are addition and multiplication
on R). Then A is called a group with respect to if, and only if, the following three
conditions are satisfied:

(i) Associativity: x (y z) = (x y) z holds for all x, y, z A.

(ii) There exists a neutral element e A, i.e. an element e A such that

x e = x.
xA

(iii) For each x A, there exists an inverse element x A, i.e. an element x A such
that
x x = e.

A is called a commutative or abelian group if, and only if, it is a group and satisfies the
additional condition:
4 REAL NUMBERS 47

(iv) Commutativity: x y = y x holds for all x, y A.


Definition 4.4. Let A be a nonempty set with two maps
+ : A A A, (x, y) 7 x + y,
(4.4)
: A A A, (x, y) 7 x y

(+ is called addition and is called multiplication; often one writes xy instead of x y).
Then A is called a field if, and only if, the following three conditions are satisfied:

(i) A is a commutative group with respect to +. The neutral element with respect
to + is denoted 0.

(ii) A\{0} is a commutative group with respect to . The neutral element with respect
to is denoted 1.

(iii) Distributivity:
x (y + z) = x y + x z. (4.5)
x,y,zA

If A is a field and is a total order on A, then A is called a totally ordered field if, and
only if, the following condition is satisfied:

(iv) Compatibility with Addition and Multiplication:



xy x+z y+z , (4.6a)
x,y,zA

0 x 0 y 0 xy . (4.6b)
x,yA

Finally, A is called a complete totally ordered field if, and only if, A is a totally ordered
field that is complete in the sense of Def. 4.1.
Theorem 4.5. There exists a complete totally ordered field R (it is called the set of
real numbers). Moreover, R is unique up to isomorphism, i.e. if A is a complete totally
ordered field, then there exists an isomorphism : A R, i.e. a bijective map :
A R, satisfying

(x + y) = (x) + (y), (4.7a)


x,yA

(xy) = (x)(y), (4.7b)


x,yA

x < y (x) < (y) . (4.7c)
x,yA

It also turns out that the isomorphism is unique.

Proof. To really prove the existence of the real numbers by providing a construction is
tedious and not easy. One possible construction is provided in Appendix B. For several
different existence proofs as well as for a proof of uniqueness in the above sense, see
[EHH+ 95, Ch. 2]. 
4 REAL NUMBERS 48

Theorem 4.6. The following statements and rules are valid in the set of real numbers
R (and, more generally, in every field):

(a) Inverse elements are unique. For each x R, the unique inverse with respect to
addition is denoted by x. Also define y x := y + (x). For each x R \ {0}, the
unique inverse with respect to multiplication is denoted by x1 . For x 6= 0, define
the fractions xy := y/x := yx1 with numerator y and denominator x.

(b) (x) = x and (x1 )1 = x for x 6= 0.

(c) (x) + (y) = (x + y) and x1 y 1 = (xy)1 for x, y 6= 0.

(d) x + a = y + a x = y and, for a 6= 0, xa = ya x = y.

(e) x 0 = 0.

(f ) x(y) = (xy).

(g) (x)(y) = xy.

(h) x(y z) = xy xz.

(i) xy = 0 x = 0 y = 0.

(j) Rules for Fractions:

a b ad + bc a b ab a/c ad
+ = , = , = ,
c d cd c d cd b/d bc

where all denominators are assumed 6= 0.

Proof. (a): Let a, b be additive inverses to x. Then a = a + 0 = a + x + b = 0 + b = b.


The multiplicative case is proved completely analogously.
(b): x + x = 0 already shows that x is the inverse to x, i.e. (x) = x. The
multiplicative case is proved completely analogously.
(c): x + y + (x) + (y) = x x + y y = 0, showing (x) + (y) is the inverse to
(x + y). The multiplicative case is proved completely analogously.
(d): If x + a = y + a, then x = x + a a = y + a a = y. Again, the multiplicative
case is proved completely analogously.
(e): One computes
(4.5)
x 0 + x 1 = x (0 + 1) = x 1 = 0 + x 1,

i.e. x 0 = 0 follows from (d).


(f): xy + x(y) = x(y y) = x 0 = 0, where we used (4.5) and (e). This shows x(y)
is the additive inverse to xy.
4 REAL NUMBERS 49

(g): xy = ((xy)) = (x(y)) = ((y)x) = (y)(x), where (f) was used twice.
(h): x(y z) = x(y + (z)) = xy + x(z) = xy xz.
(i): If xy = 0 and x 6= 0, then y = 1 y = x1 xy = x1 0 = 0.
(j): One computes
a b ad + bc
+ = ac1 + bd1 = add1 c1 + bcc1 d1 = (ad + bc)(cd)1 =
c d cd
and
a b ab
= ac1 bd1 = ab(cd)1 =
c d cd
and
a/c ad
= ac1 (bd1 )1 = ac1 b1 d = ad(bc)1 = ,
b/d bc
completing the proof. 
Theorem 4.7. The following statements and rules are valid in the set of real numbers
R (and, more generally, in every totally ordered field):

(a) x y x y.

(b) x y z 0 xz yz holds as well as x y z 0 xz yz.

(c) x 6= 0 x2 := x x > 0. In particular 1 > 0.

(d) x > 0 1/x > 0, whereas x < 0 1/x < 0.

(e) If 0 < x < y, then x/y < 1, y/x > 1, and 1/x > 1/y.

(f ) x < y u < v x + u < y + v.

(g) 0 < x < y 0 < u < v xu < yv.


x+y
(h) x < y 0 < < 1 x < x + (1 )y < y. In particular x < 2
< y.

Proof. (a): Using (4.6a): x y 0 y x y x.


(b): One argues, for z 0,
(4.6b)
x y 0 y x 0 (y x)z = yz xz xz yz,

and, for z 0,
(4.6b)
x y 0 y x 0 (y x)(z) = xz yz xz yz.

(c): From (4.6b), one obtains x2 0. From Th. 4.6(i), one then gets x2 > 0.
(d): If x > 0, then x1 < 0 implies the false statement 1 = xx1 < 0, i.e. x1 > 0. The
case x < 0 is treated analogously.
4 REAL NUMBERS 50

(e): Using (d), we obtain from 0 < x < y that x/y = xy 1 < yy 1 = 1 and 1 = xx1 <
yx1 = y/x.
(f): x < y x + u < y + u and u < v y + u < y + v; both combined yield
x + u < y + v.
(g): 0 < x < y 0 < u < v xu < yu yu < yv xu < yv.
(h): Since 0 < and 1 > 0, x < y implies

x < y (1 )x < (1 )y.

Using (4.6a), we obtain

x = x + (1 )x < x + (1 )y < y + (1 )y = y,

completing the proof of the theorem. 

Theorem 4.8. Let 6= A, B R, R, and define

A + B := {a + b : a A b B}, (4.8a)
A := {a : a A}. (4.8b)

If A and B are bounded, then

sup(A + B) = sup A + sup B, (4.9a)


inf(A + B) = inf A + inf B, (4.9b)
(
sup A for 0,
sup(A) = (4.9c)
inf A for < 0,
(
inf A for 0,
inf(A) = (4.9d)
sup A for < 0.

Proof. Exercise. 

4.2 Important Subsets


Remark 4.9. We would like to recover the natural numbers N as a subset of R. Indeed,
if we start with 1 as the neutral element of multiplication and define 2 := 1+1, 3 := 2+1,
. . . , then N := {1, 2, . . . } is a subset of R, satisfying the Peano axioms P1, P2, P3 of Sec.
3.1. However, if one does actually construct R according to the axioms of axiomatic
set theory, then one starts by constructing N first (basically as we did in Rem. 1.27
and Def. 1.26), constructing R from N in several steps (cf. Appendix B). Depending
on the construction used, the original set of natural numbers will typically not be the
same set as the natural numbers as a subset of R. However, both sets will satisfy the
Peano axioms and you will have a canonical bijection between the two sets. Which
one you consider the genuine set of natural numbers depends on your personal taste
4 REAL NUMBERS 51

and philosophy and is completely irrelevant. Any two models of N will always produce
equivalent results, since they must both satisfy the three Peano axioms.

We now introduce a zoo of important subsets of R together with corresponding notation:

N := {1, 2, 3, . . . } (natural numbers), (4.10a)


N0 := N {0}, (4.10b)
Z := {n : n N} (negative integers), (4.10c)
Z := Z N0 (integers), (4.10d)
Q+ := {m/n : m, n N} (positive rational numbers), (4.10e)
Q+ +
0 := Q {0} (nonnegative rational numbers), (4.10f)
Q := {q : q Q+ } (negative rational numbers), (4.10g)
Q
0 := Q {0} (nonpositive rational numbers), (4.10h)

Q := Q+0 Q (rational numbers), (4.10i)
R+ := {x R : x > 0} (positive real numbers), (4.10j)
R+
0 := {x R : x 0} (nonnegative real numbers), (4.10k)
R := {x R : x < 0} (negative real numbers), (4.10l)
R
0 := {x R : x 0} (nonpositive real numbers). (4.10m)

For a, b R with a b, one also defines the following intervals:

[a, b] := {x R : a x b} (bounded closed interval), (4.11a)


]a, b[ := {x R : a < x < b} (bounded open interval), (4.11b)
]a, b] := {x R : a < x b} (bounded half-open interval), (4.11c)
[a, b[ := {x R : a x < b} (bounded half-open interval), (4.11d)
] , b] := {x R : x b} (unbounded closed interval), (4.11e)
] , b[ := {x R : x < b} (unbounded open interval), (4.11f)
[a, [ := {x R : a x} (unbounded closed interval), (4.11g)
]a, [ := {x R : a < x} (unbounded open interval). (4.11h)

For a = b, one says that the intervals defined by (4.11a) (4.11d) are degenerate or
trivial, where [a, a] = {a}, ]a, a[=]a, a] = [a, a[= it is sometimes convenient to have
included the degenerate cases in the definition. It is sometimes also useful to abandon
the restriction a b, to let c := min{a, b}, d := max{a, b}, and to define

[a, b] := [c, d], ]a, b[:=]c, d[, ]a, b] :=]c, d], [a, b[:= [c, d[. (4.11i)

Theorem 4.10 (Archimedean Property). Let , x be real numbers. If > 0 and x > 0,
then there exists n N such that n > x.
5 COMPLEX NUMBERS 52

Proof. We conduct the proof by contradiction: Suppose x is an upper bound for the set
A := {n : n N}. Since the order on R is complete, according to (4.1), there exists
s R such that s = sup A. In particular, s is not an upper bound for A, i.e. there
exists n N satisfying n > s . But then (n + 1) > s in contradiction to s = sup A.
This shows x is not an upper bound for A, thereby establishing the case. 

5 Complex Numbers

5.1 Definition and Basic Arithmetic


According to Th. 4.7(c), x2 0 holds for every real number x R, i.e. the equation
x2 + 1 = 0 has no solution in R. This deficiency of the real numbers motivates the
effort to try to extend the field of real numbers to a larger field C, the so-called complex
numbers. The two requirements that C is to be a field containing R and that there is to
be some complex number i C satisfying i2 = 1 already dictates the following laws
of addition and multiplication for complex numbers z = x + iy and w = u + iv with
x, y, u, v R:

z + w = x + iy + u + iv = x + u + i(y + v), (5.1a)


zw = (x + iy)(u + iv) = xu yv + i(xv + yu). (5.1b)

Moreover, if x + iy = u + iv, then (x u)2 = (v y)2 , i.e. x u = 0 = v y,


implying x = u and y = v. This suggests to try defining complex numbers as pairs of
real numbers. Indeed, this works:
Definition 5.1. We define the set of complex numbers C := R R, where, keeping in
mind (5.1), addition on C is defined by

+ : C C C, (x, y), (u, v) 7 (x, y) + (u, v) := (x + u, y + v), (5.2)

and multiplication on C is defined by



: C C C, (x, y), (u, v) 7 (x, y) (u, v) := (xu yv, xv + yu). (5.3)

Theorem 5.2. (a) The set of complex numbers C with addition and multiplication as
defined in Def. 5.1 forms a field, where (0, 0) and (1, 0) are the neutral elements
with respect to addition and multiplication, respectively,

z := (x, y) (5.4a)

is the additive inverse to z = (x, y), whereas


 
1 1 x y
z := := , (5.4b)
z x2 + y 2 x2 + y 2

is the multiplicative inverse to z = (x, y) 6= (0, 0).


5 COMPLEX NUMBERS 53

(b) Defining subtraction and division in the usual way, for each z, w C, by w z :=
w + (z), and w/z := wz 1 for z 6= (0, 0), respectively, all the rules stated in Th.
4.6 are valid in C.

(c) The map


: R C, (x) := (x, 0), (5.5)
is a monomorphism, i.e. it is injective and satisfies

(x + y) = (x) + (y), (5.6a)


x,yR

(xy) = (x) (y). (5.6b)


x,yR

It is customary to identify R with (R), as it usually does not cause any confusion.
One then just writes x instead of (x, 0).

Proof. All computations required for (a) and (c) are straightforward and are left as
an exercise; (b) is a consequence of (a), since Th. 4.6 and its proof are valid in every
field. 

Notation 5.3. The number i := (0, 1) is called the imaginary unit (note that, indeed,
i2 = i i = (0, 1) (0, 1) = (0 0 1 1, 0 1 + 1 0) = (1, 0) = 1). Using i, one obtains
the commonly used representation of a complex number z = (x, y) C:

z = (x, y) = x (1, 0) + y (0, 1) = x + iy, (5.7)

where one calls Re z := x the real part of z and Im z := y the imaginary part of z.
Moreover, z is called purely imaginary if, and only if, Re z = 0.

Remark 5.4. There does not exist a total order on C that makes C into a totally
ordered field (i.e. no total order on C can be compatible with addition and multiplication
in the sense of (4.6)): Indeed, if there were such a total order on C, then all the rules
of Th. 4.7 had to be valid with respect to that total order . In particular, 0 < 12 = 1
and 0 < i2 = 1 had to be valid by Th. 4.7(c), and, then, 0 < 1 + (1) = 0 had to
be valid by Th. 4.7(f). However, 0 < 0 is false, showing that there is no total order on
C that satisfies (4.6). Caveat: Of course, there do exist total orders on C, just none
compatible with addition and multiplication for example, the lexicographic order on
R R (defined as it was in (2.49) for N N) constitutes a total order on C.

Definition and Remark 5.5. Conjugation: For each complex number z = x + iy, we
define its complex conjugate or just conjugate to be the complex number z := x iy.
We then have the following rules that hold for each z = x + iy, w = u + iv C:

(a) z + w = x+uiyiv = z+ w and zw = xuyv(xv+yu)i = (xiy)(uiv) = z w.

(b) z + z = 2x = 2 Re z and z z = 2yi = 2i Im z.

(c) z = z x + iy = x iy y = 0 z R.
5 COMPLEX NUMBERS 54

(d) z z = (x + iy)(x iy) = x2 + y 2 R+


0.

Notation 5.6. Exponentiation with Integer Exponents: Define recursively for each
z C and each n N0 :

z 0 := 1, z n+1 := z z n , and for z 6= 0: z n := (z 1 )n . (5.8)


nN0

Theorem 5.7. Exponentiation Rules: Let z, w C. For z, w 6= 0, the following rules


hold for every m, n Z; otherwise they hold for each m, n N0 :

(a) z m+n = z m z n .

(b) z n wn = (zw)n .

(c) (z m )n = z m n .

Proof. (a): First, we prove the statement for each m N0 by induction: The base case
(m = 0) is z n = z n , which is true. For the induction step, we compute
(5.8) ind. hyp. (5.8)
z m+1+n = z z m+n = z z m z n = z m+1 z n ,

completing the induction step. The above prove allows n < 0 for z 6= 0. Interchanging
m and n covers the case m < 0 and n 0. If m < 0 and n < 0, then
(5.8) (5.8)
z m+n = z (mn) = (z 1 )mn = (z 1 )m (z 1 )n = z m z n .

(b): For n N0 , the statement is proved by induction: The base case (n = 0) is


z 0 w0 = 1 = (zw)0 , which is true. For the induction step, we compute
(5.8) ind. hyp. (5.8)
z n+1 wn+1 = z z n w wn = zw (zw)n = (zw)n+1 ,

completing the induction step. For n < 0 and z 6= 0:


(5.8) Th. 4.6(c) n (5.8)
z n wn = (z 1 )n (w1 )n = (z 1 w1 )n = (zw)1 = (zw)n .

(c): First, we prove the statement for each n N0 by induction: The base case (n = 0)
is (z m )0 = 1 = z 0 , which is true. For the induction step, we compute
(5.8) ind. hyp. (a)
(z m )n+1 = z m (z m )n = z m z m n = z m n+m = z m (n+1) ,

completing the induction step. From (a), we also have (z m )1 = z m for z 6= 0. Thus,
for n < 0 and z 6= 0:
(5.8) n
(z m )n = (z m )1 = (z m )n = z (m)(n) = z m n ,

thereby completing the proof. 


5 COMPLEX NUMBERS 55

5.2 Sign and Absolute Value (Modulus)


We face a certain conundrum regarding the handling of square roots. The problem
is that we will needthe notion of a continuous function to prove the existence of a
unique square root x for every nonnegative real number x and, in consequence, we
will have to wait until Section 7.2.5 below to carry out this proof. On the other hand, it
is extremely desirable to present the theory of convergence simultaneously for real and
for complex numbers, which requires the notion of the absolute value or modulus of a
complex number, to be defined in Def. 5.9(b) below as the square root of a nonnegative
real number.
Faced with this difficulty, we will introduce the notion of square root now, assuming the
existence, until we can add the proof in Section 7.2.5. Some students might be worried
that this might lead to a circular argument, where our later proof of the existence of
square roots would somehow make use of our previous assumption of that existence. Of
course, we will be careful not to make such a circular (and, thereby, logically invalid)
argument. The point is that for real numbers the notion of absolute value does in no
way depend on the notion of a square root (see Lem. 5.10 below).

Definition and Remark 5.8. We define a nonnegative real number y R+ 0 to be the


+ 2
square root of the nonnegative real number x R0 if, and only if, y = x. If y is the
square root of x, then one uses the notation x := y. We will see in Rem. and Def.
7.61 thatevery x R+ +
0 has a unique square root and that the function f : R0 R0 ,
+

f (x) := x, is strictly increasing (in particular, injective).

Definition 5.9. (a) The sign function is defined by



1
for x > 0,
sgn : R R, sgn(x) := 0 for x = 0, (5.9)


1 for x < 0.

It is emphasized that the sign function is only defined for real numbers (cf. Rem.
5.4)!

(b) The absolute value or modulus function is defined by


p
abs : C R+ 0, z = x + iy 7 |z| := z z = x2 + y 2 , (5.10)

where the term absolute value is often preferred for real numbers z R and the
term modulus is often preferred if one also considers complex numbers z
/ R.

Lemma 5.10. For each x R, one has


(
x for x 0,
|x| = x sgn(x) = (5.11)
x for x < 0.
5 COMPLEX NUMBERS 56

Proof. One has (


x for x 0,
|x| = x2 = (5.12)
x for x < 0,
as claimed. 
Theorem 5.11. The following rules hold for each z, w C:

(a) z 6= 0 |z| > 0.


(b) ||z|| = |z|.
(c) |z| = |z|.
(d) max{| Re z|, | Im z|} |z| | Re z| + | Im z|.
(e) |zw| = |z||w|.
|z|
(f ) For w 6= 0, one has | wz | = |w|
.

(g) Triangle Inequality:


|z + w| |z| + |w|. (5.13)

(h) Inverse Triangle Inequality:



|z| |w| |z w|. (5.14)

Proof. We carry out the proofs for z, w C. However, for z, w R, everything can
easily be shown directly from (5.11), without making use of square roots.
Let z = x + iy with x, y R.
(a): If z 6= 0, then x 6= 0 or y 6= p 0, i.e. x2 > 0 or y 2 > 0 by Th. 4.7(c), implying
x2 + y 2 > 0 by Th. 4.7(f), i.e. |z| = x2 + y 2 > 0.

(b): Since a := |z| R+0 , we have |a| = a2 = a = |z|.
p p
(c): Since z = x iy, we have |z| = x2 + (y)2 = x2 + y 2 = |z|.
(d): It is x = Re z, y = Im z. Let a := max{|x|, |y|}. As remarked in Def. and Rem.
5.8, the square root function is increasing and, thus, taking square roots in the chain of
inequalities a2 x2 + y 2 (|x| + |y|)2 implies a |z| |x| + |y| as claimed.
(e): As remarked in Def. and Rem. 5.8, the square root function is injective, and, thus,
(e) follows from
Def. and Rem. 5.5(a)
|zw|2 = zw zw = zwz w = z z ww = |z|2 |w|2 .

(f): Let w = u + iv with u, v R. We first consider the special case z = 1. Applying


the formula (5.4b) for the inverse to w, one obtains
u2 v2 1 
1 2
|w1 |2 = + = = |w| ,
(u2 + v 2 )2 (u2 + v 2 )2 u2 + v 2
5 COMPLEX NUMBERS 57

|z|
i.e. |w1 | = |w|1 . Now (f) follows from (e): | wz | = |zw1 | = |z||w1 | = |z||w|1 = |w|
.
(g) follows from
|z + w|2 = (z + w)(z + w) = z z + wz + z w + ww
Def. and Rem. 5.5(b)
= |z|2 + 2 Re(z w) + |w|2
(d) 2
|z|2 + 2|z w| + |w|2 = |z| + |w| ,
once again using that the square root function is increasing.
(h): Using (g), we obtain
|z| = |z w + w| |z w| + |w| |z| |w| |z w|,
|w| = |w z + z| |z w| + |z| (|z| |w|) |z w|,

implying |z| |w| |z w| by (5.11) (notice |z| |w| R). 
Remark 5.12. Each complex number (x, y) = x + iy can be visualized as a point in
the so-called complex plane, where the horizontal x-axis represents real numbers and
the veritcal y-axis represents purely imaginary numbers. Then the addition of complex
numbers is precisely the vector addition of 2-dimensional vectors in the complex plane,
and conjugation is represented by reflection through the x-axis. Moreover, the modulus
|z| of a complex number is precisely its distance from the origin (0, 0), and |z w|
is the distance between the points z = (x, y) and w = (u, v) in the plane. Complex
multiplication can also be interpreted geometrically in the plane: If denotes the angle
that the vector representing z = (x, y) forms with the x-axis, and, likewise, denotes
the angle that the vector representing w = (u, v) forms with the x-axis, then zw is
the vector of length |zw| that forms the angle + with the x-axis (we will better
understand this geometrical interpretation of complex multiplication later (see Def. and
Rem. 8.29), when writing complex numbers in the polar form z = x + iy = |z| exp(i),
making use of the exponential function exp).

5.3 Sums and Products


Here we compile some important rules involving sums and products of complex numbers
(the exceptions are the estimates in Th. 5.13(d),(e) below, which actually require real
numbers):
Theorem 5.13. (a) For each n N and each , , zj , wj C, j {1, . . . , n}:
n
X n
X n
X
( zj + wj ) = zj + wj .
j=1 j=1 j=1

(b) For each n N0 and each z C:


n
X
2 n
(1 z)(1 + z + z + + z ) = (1 z) z j = 1 z n+1 .
j=0
5 COMPLEX NUMBERS 58

(c) For each n N0 and each z, w C:


n
X
n+1 n+1
w z = (w z) z j wnj = (w z)(wn + zwn1 + + z n1 w + z n ).
j=0

(d) For each n N and each xj , yj R, j {1, . . . , n}:


  n
X n
X
xj yj xj yj ,
j{1,...,n}
j=1 j=1

where equality can only hold if xj = yj for each j {1, . . . , n}.

(e) For each n N and each xj , yj R, j {1, . . . , n}:


  n
Y n
Y
0 < xj yj xj yj ,
j{1,...,n}
j=1 j=1

where equality can only hold if xj = yj for each j {1, . . . , n}.

(f ) Triangle Inequality: For each n N and each zj C, j {1, . . . , n}:


n n
X X

z j |zj |.

j=1 j=1

Proof. In each case, the proof can be conducted by an easy induction. We carry out
(c) and leave the other cases as exercises. For (c), the base case (n = 0) is provided by
the true statement w0+1 z 0+1 = w z = (w z)z 0 w00 . For the induction step, one
computes
n+1 n
!
X X
(w z) z j wn+1j = (w z) z n+1 w0 + z j wn+1j
j=0 j=0
n
X
n+1
= (w z)z + (w z) w z j wnj
j=0

ind. hyp.
= (w z) z n+1 + w(wn+1 z n+1 ) = wn+2 z n+2 ,

completing the induction. 

5.4 Binomial Coefficients and Binomial Theorem


The goal in this section is to expand (z +w)n into a sum. This sum involves the so-called
binomial coefficients nk , which are also useful in other contexts. To obtain an idea for
what to expect, let us compute the cases n = 0, 1, 2, 3: (z + w)0 = 1, (z + w)1 = z + w,
5 COMPLEX NUMBERS 59

(z + w)2 = z 2 + 2zw + w2 , (z + w)3 = z 3 + 3z 2 w + 3zw2 + w3 . One finds that the


coefficients form what is known as Pascals triangle, which we write for n = 0, . . . , 5:

n=0: 1
n=1: 1 1
n=2: 1 2 1
(5.15)
n=3: 1 3 3 1
n=4: 1 4 6 4 1
n=5: 1 5 10 10 5 1
 
The entries of the nth row of Pascals triangle are denoted by n0 , . . . , nn . One also
observes that one obtains each entry of the (n + 1)st row, except the first and last entry,
by adding the corresponding entries in row n to the left and to the right of the considered
entry in row n + 1. The first and last entry of each row are always set to 1. This can
be summarized as
          
n n n+1 n n
= = 1, = + for k {1, . . . , n} . (5.16)
nN0 0 n k k1 k

The following Def. 5.14 provides a different and more general definition of binomial
coefficients. We will then prove in Prop. 5.15 that the binomial coefficients as defined
in Def. 5.14 do, indeed, satisfy (5.16).

Definition 5.14. For each C and each k N0 , we define the binomial coefficient
    k
Y +1j ( 1) ( k + 1)
:= 1, := = for k N. (5.17)
0 k j=1
j 1 2k

Proposition 5.15. (a) For each C and each k N:


       
+1
= 1, = + . (5.18)
0 k k1 k

(b) For each n N0 :  


n
= 1. (5.19)
n

The above statements include (5.16) as a special case.

Proof. (a): The first identity is part of the definition in (5.17). For the second identity,
we first observe, for each k N,
  Y k k1  
+1j +1k Y+1j +1k
= = = , (5.20)
k j=1
j k j=1
j k1 k
5 COMPLEX NUMBERS 60

which implies
        
+1k +1
+ = 1+ =
k1 k k1 k k1 k
k1 k  
+1Y+1j Y+2j +1
= = = . (5.21)
k j=1 j j=1
j k

(b): 00 = 1 according to (5.17). For n N, (5.19) is proved by induction. The base
 1+11
1
case (n = 1) is provided by the true statement 1 = 1 = 1. For the induction step,
one computes
  n+1Y n+1+1j n  
n+1 n+1Yn+1j n ind. hyp.
= = = = 1, (5.22)
n+1 j=1
j n + 1 j=1
j n

which completes the induction. 


Theorem 5.16 (Binomial Theorem). For each z, w C and each n N0 , the following
formula holds:
n      
n
X n nk k n n n1 n
(z + w) = z w =z + z w + + zwn1 + wn . (5.23)
k=0
k 1 n 1

Proof. We first prove the special case w = 1 by induction  on n. The base case (n = 0)
is provided by the correct statement (z + 1)0 = 1 = 00 z 00 10 . For the induction step,
we compute
n  
n+1 n ind. hyp.
X n nk
(z + 1) = (z + 1)(z + 1) = (z + 1) z
k=0
k
n   n  
Th. 5.13(a) X n nk X n n+1k
= z + z
k=0
k k=0
k
n+1   n  
X n n+1k
X n n+1k
= z + z
k=1
k 1 k=0
k
  n      
Th. 5.13(a) n n+1 X n n n+1k n 0
= z + + z + z
0 k=1
k1 k n
  n    
Prop. 5.15 n + 1 n+1 X n + 1 n+1k n+1 0
= z + z + z
0 k=1
k n+1
n+1  
X n + 1 n+1k
= z , (5.24)
k=0
k
completing the induction and proving the special case. For the general case, first consider
w = 0. Then (5.23) is proved by
n  
X n nk k
z 0 = z n0 00 = z n 1 = z n = (z + 0)n . (5.25)
k=0
k
5 COMPLEX NUMBERS 61

For w 6= 0, we apply the special case with z replaced by z/w, yielding


z n X n   
n z nk
+1 = . (5.26)
w k=0
k w
Multiplying (5.26) by wn proves (5.23). 

The binomial theorem can now be used to infer a few more rules that hold for the
binomial coefficients:
Corollary 5.17. One has the following identities:
n        
X n n n n
= + + + = 2n , (5.27a)
nN0
k=0
k 0 1 n
n          
X n k n n n n n
(1) = + + + (1) = 0. (5.27b)
nN
k=0
k 0 1 2 n

Proof. (5.27a) is just (5.23) with z = w = 1; (5.27b) is just (5.23) with z = 1 and
w = 1. 

The formulas provided by the following proposition are also sometimes useful.
Proposition 5.18. (a) For each C and each k N0 :
k          
X +j +1 +k +k+1
= + + + = . (5.28)
j=0
j 0 1 k k

(b) For each n, k N0 with k n:


 
n n!
= . (5.29)
k k!(n k)!

Moreover, for n 1, one has nk = #Pk ({1, . . . , n}), where

Pk (A) := B P(A) : #B = k (5.30)
denotes the set of all subsets of a set A that have precisely k elements.
(c) For each n, k N0 :
k          
X n+j n n+1 n+k n+k+1
= + + + = . (5.31)
j=0
n n n n n+1

Proof. The induction proofs of (a) and (b) are left as exercises. For (c), one computes
k   k k  
X n+j (5.29) X (n + j)! (5.29) X n + j
= =
j=0
n j=0
n!(n + j n)! j=0
j
   
(5.28) n + k + 1 (5.29) (n + k + 1)! n+k+1
= = = ,
k k!(n + 1)! n+1
thereby establishing the case. 
6 POLYNOMIALS 62

6 Polynomials

6.1 Arithmetic of K-Valued Functions


Notation 6.1. We will write K in situations, where we allow K to be R or C.

Notation 6.2. If A is any nonempty set, then one can add and multiply arbitrary
functions f, g : A K, and one can define several further operations to create new
functions from f and g:

(f + g) : A K, (f + g)(x) := f (x) + g(x), (6.1a)


(f ) : A K, (f )(x) := f (x) for each K, (6.1b)
(f g) : A K, (f g)(x) := f (x)g(x), (6.1c)
(f /g) : A K, (f /g)(x) := f (x)/g(x) (assuming g(x) 6= 0), (6.1d)
Re f : A R, (Re f )(x) := Re(f (x)), (6.1e)
Im f : A R, (Im f )(x) := Im(f (x)). (6.1f)

For K = R, we further define



max(f, g) : A R, max(f, g)(x) := max f (x), g(x) , (6.1g)

min(f, g) : A R, min(f, g)(x) := min f (x), g(x) , (6.1h)
f + : A R, f + := max(f, 0), (6.1i)
f : A R, f := max(f, 0). (6.1j)

Finally, once again also allowing K = C,

|f | : A R, |f |(x) := |f (x)|. (6.1k)

One calls f + and f the positive part and the negative part of f , respectively. For
R-valued functions f , we have
|f | = f + + f . (6.1l)

6.2 1-Dimensional Polynomials


Definition 6.3. Let n N. Each function from K into K, x 7 xn , is called a monomial.
A function P from K into K is called a polynomial if, and only if, it is a linear combination
of monomials, i.e. if, and only if P has the form
n
X
P : K K, P (x) = aj x j = a0 + a1 x + + an x n , aj K. (6.2)
j=0

The aj are called the coefficients of P . The largest number d n such that ad 6= 0 is
called the degree of P , denoted deg(P ). If all coefficients are 0, then P is called the zero
6 POLYNOMIALS 63

polynomial; the degree of the zero polynomial is defined as 1 (in Th. 6.6(b) below, we
will see that each polynomial of degree n N0 is uniquely determined by its coefficients
a0 , . . . , an and vice versa).
Polynomials of degree 0 are constant. Polynomials of degree 1 have the form
P (x) = a + bx and are called affine functions (often they are also called linear functions,
even though this is not really correct for a 6= 0, since every function P that is linear (in
the sense of linear algebra) must satisfy P (0) = 0). Polynomials of degree 2 have the
form P (x) = a + bx + cx2 and are called quadratic functions.
Each K such that P () = 0 is called a zero or a root of P .
A rational function is a quotient P/Q of two polynomials P and Q.

Remark 6.4. Let K and let P, Q be polynomials. Then P , P +Q, and P Q defined
according to Not. 6.2 are polynomials as well. More precisely, if = 0 or P 0, then
P = 0; if P 0, then P + Q = Q; if Q 0, then P + Q = P ; if P 0 or Q 0, then
P Q = 0. If 6= 0 and
n
X m
X
P (x) = aj x j , Q(x) = bj xj ,
j=0 j=0 (6.3)
with deg(P ) = n 0, deg(Q) = m 0, n m 0,

then, defining bj := 0 for each j {m + 1, . . . , n} in case n > m,


n
X
(P )(x) = ( aj ) xj , deg(P ) = n, (6.4a)
j=0
n
X
(P + Q)(x) = (aj + bj ) xj , deg(P + Q) n = max{m, n}, (6.4b)
j=0
m+n
X
(P Q)(x) = c j xj , deg(P Q) = m + n, (6.4c)
j=0

where, setting ak := 0 for each k {n + 1, . . . , m + n} and bk := 0 for each k


{m + 1, . . . , m + n},
j
X
cj = ak bjk . (6.4d)
j{0,...,n+m}
k=0

Formula (6.4c) can be proved by induction on m = deg(Q) N0 as follows: For m = 0,


we compute
n
X n+0
X
j
(P Q)(x) = b0 aj x = b 0 aj x j ,
j=0 j=0
Pj
i.e. cj = b0 aj = k=0 ak bjk , which establishes the base case, remembering bjk = 0 for
6 POLYNOMIALS 64

j > k. For the induction step, we compute, for deg(Q) = m + 1,


n m+1 n m
!
X X X X
(P Q)(x) = aj x j b x = aj x j bm+1 xm+1 + b x
j=0 =0 j=0 =0
n m+n j
!
ind. hyp.
X X X
m+1+j
= aj bm+1 x + ak bjk xj
j=0 j=0 k=0
m+n+1 m+n j
!
X X X
= ajm1 bm+1 xj + ak bjk xj
j=m+1 j=0 k=0
m+n+1 j
!
X X
= ak bjk xj ,
j=0 k=0

which completes the induction step. There is a notational issue in the second and third
line in of the above computation, since, in both lines, the bm+1 in the first sum is the
actual bm+1 from Q, but bm+1 = 0 in the second sum in both lines, which is due to the
induction hypothesis being applied for m < m+1. This is actually used when combining
both sums in the last step, computing, for m + 1 Pj m + n: ajm1 bm+1 xj + ajm1
0 xj = ajm1 bm+1 xj . For j = m + n + 1, one has m+n+1
k=0 ak bm+n+1k = an bm+1 , since
bm+n+1k = 0 for n > k and ak = 0 for k > n.
Finally, deg(P Q) = m + n follows from cm+n = am bn 6= 0.
Theorem 6.5. (a) For each polynomial P given in the form of (6.3) and each K,
we have the identity
Xn
P (x) = bj (x )j , (6.5)
j=0

where
n  
X k kj
bj = ak , in particular b0 = P (), b n = an . (6.6)
j{0,...,n}
k=j
j

(b) If P is a polynomial with n := deg(P ) 1, then, for each K, there exists a


polynomial Q with deg(Q) = n 1 such that
P (x) = P () + (x ) Q(x). (6.7)
In particular, if is a zero of P , then P (x) = (x ) Q(x).

Proof. (a): For = 0, there is nothing to prove. For 6= 0, defining the auxiliary
variable := x , we obtain x = + and
n n X k   n n  
X
k (5.23)
X k kj j X X k kj j
P (x) = ak ( + ) = ak = ak
k=0 k=0 j=0
j k=0 j=0
j
n X n   n n  
X k kj j X X k kj j
= ak = ak , (6.8)
j=0 k=0
j j=0 k=j
j
6 POLYNOMIALS 65

which is (6.5).
(b): According to (a), we have
n
X n1
X
j1
P (x) = P () + (x ) Q(x), with Q(x) = bj (x ) = bj+1 (x )j , (6.9)
j=1 j=0

proving (b). 
Theorem 6.6. (a) If P is a polynomial with n := deg(P ) 0, then P has at most n
zeros.

(b) Let P, Q be polynomials as in (6.3) with n = m, deg(P ) n, and deg(Q) n. If


P (xj ) = Q(xj ) at n + 1 distinct points x0 , x1 , . . . , xn K, then aj = bj for each
j {0, . . . , n}.
Consequence 1: If P, Q with degree n agree at n + 1 distinct points, then P = Q.
Consequence 2: If we know P = Q, then they agree everywhere, in particular at
max{deg(P ), deg(Q)} + 1 distinct points, which implies they have the same coeffi-
cients.

Proof. (a): For n = 0, P is constant, but not the zero polynomial, i.e. P a0 6= 0 with
no zeros as claimed. For n N, the proof is conducted by induction. The base case
(n = 1) is provided by the observation that deg(P ) = 1 implies P is the affine function
with P (x) = a0 + a1 x, a1 6= 0, i.e. P has precisely one zero at = a0 /a1 . For the
induction step, assume deg(P ) = n + 1. If P has no zeros, then the assertion of (a)
holds true. Otherwise, P has at least one zero K, and, according to Th. 6.5(b),
there exists a polynomial Q such that deg(Q) = n and

P (x) = (x ) Q(x). (6.10)

From the induction hypothesis, we gather that Q has at most n zeros, i.e. (6.10) implies
P has at most n + 1 zeros, which completes the induction.
(b): If P (xj ) = Q(xj ) at n + 1 distinct points xj , then each of these points is a zero of
P Q. Thus P Q is a polynomial of degree n with at least n + 1 zeros. Then (a)
implies deg(P Q) = 1, i.e. P Q is the zero polynomial, i.e. aj bj = 0 for each
j {0, . . . , n}. 
Remark 6.7. Let P be a polynomial with n := deg(P ) 0. According to Th. 6.6(a), P
has at most n zeros. Using Th. 6.5(b) for an induction shows there exists k {0, . . . , n}
and a polynomial Q of degree n k such that
k
Y
P (x) = Q(x) (x j ) = (x 1 )(x 2 ) (x k )Q(x), (6.11a)
j=1

where Q does not have any zeros in K and {1 , . . . , k } = { K : P () = 0} is the set


of zeros of P . It can of course happen that P does not have any zeros and P = Q (no
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 66

j exist). It can also occur that some of the j in (6.11a) are identical. Thus, we can
rewrite (6.11a) as
l
Y
P (x) = Q(x) (x j )mj = (x 1 )m1 (x 2 )m2 (x l )ml Q(x), (6.11b)
j=1
Pl
where 1 , . . . , l , l {0, . . . , k}, are the distinct zeros of P , and mj N with j=1 mj =
k. Then mj is called the multiplicity of the zero j of P .

6.3 n-Dimensional Polynomials


In the previous section, we have studied polynomials as functions P : K K. One
can generalize the notion of polynomial to functions P : Kn K with n N. We will
briefly discuss this situation in the present section.
Definition 6.8. Let n N. An element p = (p1 , . . . , pn ) (N0 )n is called a multi-index;
|p| := p1 + + pn is called the degree of the multi-index. If x = (x1 , . . . , xn ) Kn and
p = (p1 , . . . , pn ) is a multi-index, then we define
xp := xp11 xp22 xpnn . (6.12)
Each function from Kn into K, x 7 xp , is called a monomial; the degree of p is called
the degree of the monomial. A function P from Kn into K is called a polynomial if, and
only if, it is a linear combination of monomials, i.e. if, and only if P has the form
X
P : Kn K, P (x) = ap xp , k N0 , ap K. (6.13)
|p|k

The degree of P , still denoted deg(P ), is the largest number d k such that there is p
with |p| = d and ap 6= 0. If all ap = 0, i.e. if P 0, then P is the (n-dimensional) zero
polynomial and, as for n = 1, its degree is defined to be 1. A rational function is once
again a quotient of two polynomials.
Example 6.9. Writing x, y, z instead of x1 , x2 , x3 , xy 3 z, x2 y 2 , x2 y, x2 , y, 1 are examples
of monomials of degree 5, 4, 3, 2, 1, and 0, respectively, P (x, y) := 5x2 y 3x2 + y 1
and Q(x, y, z) := xy 3 z 2x2 y 2 + 1 are polynomials of degree 3 and 5, respectively,
and P (x, y)/Q(x, y, z) is a rational function defined for each (x, y, z) K3 such that
Q(x, y, z) 6= 0.

7 Limits and Convergence in the Real and Complex


Numbers

7.1 Sequences
Recall from Def. 2.14(b) that a sequence in K is a function f : N K, in this context
usually denoted as f = (zn )nN or (z1 , z2 , . . . ) with zn := f (n). Sometimes the sequence
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 67

also has the form (zn )nI , where I 6= is a countable index set (e.g. I = N0 ) different
from N (in the context of convergence (see the following Def. 7.1), I must be N or it
must have the same cardinality as N, i.e. finite I are not permissible).

Definition 7.1. The sequence (zn )nN in K is said to be convergent with limit z K if,
and only if, for each > 0, there exists an index N N such that |zn z| < for every
index n > N . The notation for (zn )nN converging to z is limn zn = z or zn z for
n . Thus, by definition,

lim zn = z |zn z| < . (7.1)


n R+ N N n>N

The sequence (zn )nN in K is called divergent if, and only if, it is not convergent.

Example 7.2. (a) For every constant sequence (zn )nN = (a)nN with a K, one has
limn zn = limn a = a: Since, for each n N, |zn a| = |a a| = 0, one can
choose N = 1 for each > 0.
1
(b) limn n+a = 0 for each a C: Here zn := 1/(n + a) (if n = a, then set zn := w
with w C arbitrary). Given > 0, choose an arbitrary N N with N 1 + |a|.
Then, for each n N , we compute |n + a| = |n (a)| |n |a|| = n |a| >
N |a| 1 , and, thus, |zn | = |n + a|1 < as desired.

(c) ((1)n )nN is not convergent: We have zn = 1 for each even n and zn = 1 for each
odd n. Thus, for each z 6= 1 and each even n, |zn z| = |1 z| > |1 z|/2 =: > 0,
i.e. z is not a limit of (zn )nN . However, z = 1 is also not a limit of the sequence,
since, for each odd n, |zn 1| = | 1 1| = 2 > 1 =: > 0, proving that the
sequence has no limit.

Theorem 7.3. (a) Let (zn )nN be a sequence in C. Then (zn )nN is convergent in C
if, and only if, both (Re zn )nN and (Im zn )nN are convergent in R. Moreover, in
that case,

lim zn = z lim Re zn = Re z lim Im zn = Im z. (7.2)


n n n

(b) Let (xn )nN be a sequence in R and z C. Then

lim xn = z z R. (7.3)
n

Proof. (a): Suppose (zn )nN converges to z C. Then, given > 0, there exists N N
such that, for each n > N , |zn z| < . In consequence, for each n > N ,
Th. 5.11(d)
| Re zn Re z| = | Re(zn z)| |zn z| < , (7.4)

proving limn Re zn = Re z. The proof of limn Im zn = Im z is completely anal-


ogous. Conversely, suppose there are x, y R such that limn Re zn = x and
limn Im zn = y. Here we encounter, for the first time, what is sometimes called an /2
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 68

argument: Given > 0, there exists N N such that, for each n > N , | Re zn x| < /2
and | Im zn y| < /2, implying, for each n > N ,

|zn (x + iy)| = | Re zn + i Im zn (x + iy)|


| Re zn x| + |i|| Im zn y| < /2 + /2 = , (7.5)

proving limn zn = x + iy.


(b) is a direct consequence of (a). 

Example 7.4. (a) According to Th. 7.3(a), we have


 
i Ex. 7.2(a),(b)
lim 2+ = 2 + 0i = 2.
n n 17

(b) According to Th. 7.3(a) and Ex. 7.2(c), the sequence ( n1 + (1)n i)nN is divergent.

Another important example relies on the following inequality:

Proposition 7.5 (Bernoullis Inequality). For each n N0 and each x [1, [, we


have
(1 + x)n 1 + nx, (7.6)
with strict inequality whenever n > 1 and x 6= 0.

Proof. For n = 0, (7.6) reads 1 1, for n = 1, (7.6) reads 1 + x 1 + x, for n = 2,


(7.6) reads (1 + x)2 = 1 + 2x + x2 1 + 2x, all three statements being trivially true, in
the case n = 2 with strict inequality for x 6= 0. We now proceed by induction for n 2.
For the induction step, one estimates
ind. hyp., x 1
(1 + x)n+1 = (1 + x)n (1 + x) (1 + nx) (1 + x) = 1 + (n + 1)x + nx2
1 + (n + 1)x, (7.7)

with strict inequality for x 6= 0. 

Example 7.6. We have, for each q C,

|q| < 1 lim q n = 0 : (7.8)


n

For q = 0, there is nothing to prove. For 0 < |q| < 1, it is |q|1 > 1, i.e. h := |q|1 1 > 0.
Thus, for each > 0 and N 1/(h), we obtain
(7.6)
n>N |q|n = (1 + h)n 1 + nh > nh > 1/ |q n | = |q|n < . (7.9)

Definition 7.7. (a) Given z K and R+ , we call the set B (z) := {w K :


|w z| < } the -neighborhood of z or, in anticipation of Calculus II, the (open) -
ball with center z (in fact, for K = C, B (z) represents an open disk in the complex
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 69

plane with center z and radius , whereas, for K = R, B (z) =]z , z + [ is the
open interval with center z and length 2). More generally, a set U K is called
a neighborhood of z if, and only if, there exists > 0 with B (z) U (so, for
example, for > 0, B (z) is always a neighborhood of z, whereas R and [z , [
are neighborhoods of z for K = R, but not for K = C ([z , [ not even being
defined for z
/ R); the sets {z}, {w K : Re w Re z}, {w K : Re w Re z +}
are never neighborhoods of z).
(b) If (n) is a statement for each n N, then (n) is said to be true for almost all
n N if, and only if, there exists a finite subset A N such that (n) is true for
each n N \ A, i.e. if, and only if, (n) is always true, with the possible exception
of finitely many cases.
Remark 7.8. In the language of Def. 7.7, the sequence (zn )nN converges to z if, and
only if, every neighborhood of z contains almost all zn .
Definition 7.9. The sequence (zn )nN in K is called bounded if, and only if, the set
{|zn | : n N} is bounded in the sense of Def. 2.24(a).
Proposition 7.10. Let (zn )nN be a sequence in K.

(a) Limits are unique, that means if z, w K such that limn zn = z and limn zn =
w, then z = w.
(b) If (zn )nN is convergent, then it is bounded.

Proof. (a): Exercise.


(b): If limn zn = z, then A := {|zn | : |zn z| 1} {|z1 |} is nonempty and finite.
According to Th. 3.21(a), A has an upper bound M . Then max{M, |z| + 1} is an upper
bound for {|zn | : n N}, and 0 is always a lower bound, showing that the sequence is
bounded. 
Proposition 7.11. Let (zn )nN be a sequence in C with limn zn = 0.

(a) If (bn )nN is a sequences in C such that there exists C R+ with |bn | C|zn | for
almost all n, then limn bn = 0.
(b) If (cn )nN is a bounded sequence in C, then limn (cn zn ) = 0.

Proof. (a): Given > 0, there exists N N such that |zn | < /C and |bn | C|zn | for
each n > N . Then, for each n > N , |bn | C|zn | < , proving limn bn = 0.
(b): If (cn )nN is bounded, then there exists C R+ such that |cn | C for each n N.
Thus, |cn zn | C|zn | for each n N, implying limn (cn zn ) = 0 via (a). 
Example 7.12. The sequences ((1)n )nN and (b)nN with b C are bounded. Since,
1
for each a C, limn n+a = 0 by Example 7.2(b), we obtain
(1)n b
lim = lim =0 (7.10)
n n + a n n + a

from Prop. 7.11(b).


7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 70

Theorem 7.13. (a) Let (zn )nN and (wn )nN be sequences in C. Moreover, let z, w C
with limn zn = z and limn wn = w. We have the following identities:

lim (zn ) = z for each C, (7.11a)


n
lim (zn + wn ) = z + w, (7.11b)
n
lim (zn wn ) = zw, (7.11c)
n
lim zn /wn = z/w given all wn 6= 0 and w 6= 0, (7.11d)
n
lim |zn | = |z|, (7.11e)
n
lim zn = z, (7.11f)
n
lim znp = z p for each p N. (7.11g)
n

(b) Let (xn )nN and (yn )nN be sequences in R. Moreover, let x, y R with limn xn =
x and limn yn = y. Then

lim max{xn , yn } = max{x, y}, (7.12a)


n
lim min{xn , yn } = min{x, y}. (7.12b)
n

(c) If, in the situation of (b) (i.e. for real sequences), xn yn holds for almost all
n N, then x y. In particular, if almost all xn 0, then x 0.

Proof. We start with the identities of (a).


(7.11a): For = 0, there is nothing to prove. For 6= 0 and > 0, there exists N N
such that, for each n > N , |zn z| < /||, implying

| zn z| = || |zn z| < . (7.13a)


n>N

(7.11b): Given > 0, there exists N N such that, for each n > N , |zn z| < /2 and
|wn w| < /2, implying

|zn + wn (z + w)| |zn z| + |wn w| < /2 + /2 = . (7.13b)


n>N

(7.11c): Let M1 := max{|z|, 1}. According to Prop. 7.10(b), there exists M2 R+ such
that M2 is an upper bound for {|wn | : n N}. Moreover, given > 0, there exists
N N such that, for each n > N , |zn z| < /(2M2 ) and |wn w| < /(2M1 ), implying

|zn wn zw| = (zn z)wn + z(wn w)

M2 M1 (7.13c)
n>N
|wn | |zn z| + |z| |wn w| < + = .
2M2 2M1
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 71

(7.11d): We first consider the case, where all zn = 1. Given > 0, there exists N N
such that, for each n > N , |wn w| < |w|2 /2 and |wn w| < |w|/2 (since w 6= 0 for
this case), implying |w| |w wn | + |wn | < |w|/2 + |wn | and |wn | > |w|/2. Thus,

1 1 wn w 2 |wn w| 2 |w|2
=
< = . (7.13d)
n>N wn w wn w |w|2 |w|2 2

The general case now follows from (7.11c).


(7.11e): This is a consequence of the inverse triangle inequality (5.14): Given > 0,
there exists N N such that, for each n > N , |zn z| < , implying

|zn | |z| |zn z| < . (7.13e)
n>N

(7.11f): Write zn = xn + iyn and z = x + iy with xn , yn , x, y R, n N. Then we know


limn xn = x and limn yn = y from (7.2), and
(7.11a),(7.11b)
lim zn = lim (xn iyn ) = x iy = z, (7.13f)
n n

which establishes the case.


(7.11g) follows by induction from (7.11c) (cf. (7.16b) below).
The proofs for the two identities of (b) are left as exercises.
(c): Proceeding by contraposition, assume x > y and set s := (x+y)/2. Then y < s < x
and yn < s < xn holds for almost all n, i.e. xn yn does not hold for almost all n. 

Example 7.14. (a) limn n+a


n+b
= 1 for each a, b C: Here zn := (n + a)/(n + b) (if
n = b, then set zn := w with w C arbitrary). Using (7.11b) and (7.11d), one
obtains
n+a 1 + a/n lim 1 + lim na 1+0
n n
lim = lim = b
= = 1. (7.14)
n n + b n 1 + b/n lim 1 + lim n 1+0
n n

(b) Using (7.11b), (7.11d), and (7.11g), one obtains

2n5 3in3 + 2i 2 3i/n2 + 2i/n5 2+0+0 2


lim 5
= lim 4
= = . (7.15)
n 3n + 17n n 3 + 17/n 3+0 3
(1) (k)
Corollary 7.15. For k N, let (zn )nN , . . . , (zn )nN be sequences in C. Moreover,
(j)
let z (1) , . . . , z (k) C with limn zn = z (j) for each j {1, . . . , k}. Then
k
X k
X
lim zn(j) = z (j) , (7.16a)
n
j=1 j=1
k
Y k
Y
lim zn(j) = z (j) . (7.16b)
n
j=1 j=1
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 72

Proof. (7.16) follows by simple inductions from (7.11b) and (7.11c), respectively. 

Theorem 7.16 (Sandwich Theorem). Let (xn )nN , (yn )nN , and (an )nN be sequences
in R. If xn an yn holds for almost all n N, then

lim xn = lim yn = x R lim an = x. (7.17)


n n n

Proof. Given > 0, there exists N N such that, for each n > N , xn an yn ,
|xn x| < , and |yn x| < , implying

x < xn an yn < x + , (7.18)


n>N

which establishes the case. 


1 1
Example 7.17. Since, 0 < n!
n
holds for each n N, the Sandwich Th. 7.16 implies

1
lim = 0. (7.19)
n n!

Definition 7.18. Let (xn )nN be a sequence in R. The sequence is said to diverge to
(resp. to ), denoted limn xn = (resp. limn xn = ) if, and only if, for
each K R, almost all xn are bigger (resp. smaller) than K. Thus,

lim xn = xn > K, (7.20a)


n KR N N n>N
lim xn = xn < K. (7.20b)
n KR N N n>N

Theorem 7.19. Suppose S := (xn )nN is a monotone sequence in R (increasing or


decreasing). Defining A := {xn : n N}, the following holds:


sup A if S is increasing and bounded,

if S is increasing and not bounded,
lim xn = (7.21)
n

inf A if S is decreasing and bounded,

if S is decreasing and not bounded.

Proof. We treat the increasing case; the decreasing case is proved completely analo-
gously. If A is bounded and > 0, let K := sup A ; if A is unbounded, then let
K R be arbitrary. In both cases, since K can not be an upper bound, there exists
N N such that xN > K. Since the sequence is increasing, for each n > N , xN xn ,
showing | sup A xn | < in the bounded case, and xn > K in the unbounded case. 

Example 7.20. Theorem 7.19 implies


 
lim nk = , lim (nk ) = . (7.22)
kN n n


7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 73

It is sometimes necessary to consider so-called subsequences and reorderings of a given


sequence. Here, we are interested in sequences in R or C, but for subsequences and
reorderings it is irrelevant in which set A the sequence takes its values. As it presents
virtually no extra difficulty to introduce the notions for general sequences, and since we
will need to consider sequences with values in sets other than R or C in Calculus II, we
admit general sequences in the following definition.

Definition 7.21. Let A be an arbitrary nonempty set. Consider a sequence : N


A. Given a function : N N (that means ((n))nN constitutes a sequence of
indices), the new sequence ( ) : N A is called a subsequence of if, and
only if, is strictly increasing (i.e. 1 (1) < (2) < . . . ). Moreover, is
called a reordering of if, and only if, is bijective. One can write in the form
(zn )nN by setting zn := (n), and one can write in the form (wn )nN by setting
wn := ( )(n) = z(n) . Especially for a subsequence of (zn )nN , it is also common
to write (znk )kN . This notation corresponds to the one above if one lets nk := (k).
Analogous definitions work if the index set N of is replaced by a general countable
nonempty index set I.

Example 7.22. Consider the sequence (1, 2, 3, . . . ). Then (2, 4, 6, . . . ) constitutes a


subsequence and (2, 1, 4, 3, 6, 5, . . . ) constitutes a reordering. Using the notation of Def.
7.21, the original sequence is given by : N N, (n) := n; the subsequence
is selected via 1 : N ( N, 1 (n) := 2n; and the reordering is accomplished via
n + 1 if n is odd,
2 : N N, 2 (n) :=
n 1 if n is even.

Proposition 7.23. Let (zn )nN be a sequence in C. If limn zn = z, then every


subsequence and every reordering of (zn )nN is also convergent with limit z.

Proof. Let (wn )nN be a subsequence of of (zn )nN , i.e. there is a strictly increasing
function : N N such that wn = z(n) . If limn zn = z, then, given > 0, there
is N N such that zn B (z) for each n > N . For N choose any number from N that
is N and in (N). Take M := 1 (N ) (where 1 : (N) N). Then, for each
n > M , one has (n) > N N , and, thus, wn = z(n) B (z), showing limn wn = z.
Let (wn )nN be a reordering of (zn )nN , i.e. there is a bijective function : N N
such that wn = z(n) . Let and N be as before. Define

M := max{1 (n) : n N }. (7.23)

As is bijective, it is (n) > N for each n > M . Then, for each n > M , one has
wn = z(n) B (z), showing limn wn = z. 

Definition 7.24. Let (zn )nN be a sequence in K. A point z K is called a cluster


point or an accumulation point of the sequence if, and only if, for each > 0, B (z)
contains infinitely many members of the sequence (i.e. #{n N : zn B (z)} = ).

Example 7.25. The sequence ((1)n )nN has cluster points 1 and 1.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 74

Proposition 7.26. A point z K is a cluster point of the sequence (zn )nN in K if,
and only if, the sequence has a subsequence converging to z.

Proof. If (wn )nN is a subsequence of (zn )nN , limn wn = z, then every B (z), > 0,
contains infinitely many wn , i.e. infinitely many zn , i.e. z is a cluster point of (zn )nN .
Conversely, if z is a cluster point of (zn )nN , then, inductively, define : N N as
follows: For (1), choose the index k of any point zk in B1 (z) (such a point exists, since
z is a cluster point of the sequence). Now assume that n > 1 and that (m) have already
been defined for each m < n. Let M := max{(m) : m < n}. Since B 1 (z) contains
n
infinitely many zk , there must be some zk B 1 (z) such that k > M . Choose this k as
n
(n). Thus, by construction, is strictly increasing, i.e. (wn )nN with wn := z(n) is a
subsequence of (zn )nN . Moreover, for each > 0, there is N N such that 1/N < .
Then, for each n > N , wn B 1 (z) B 1 (z) B (z), showing limn wn = z. 
n N

Theorem 7.27 (Bolzano-Weierstrass). Every bounded sequence S := (xn )nN in K


has at least one cluster point in K. Moreover, for K = R, the set A := {x R :
x is cluster point of S} has a max x R and a min x R, i.e. every bounded sequence
in R has a largest and a smallest cluster point. In addition, for each > 0, the inequality
x < xn < x + holds for almost all n.

Proof. We first consider the case K = R. Define

A := {x R : xn x for almost all n}, (7.24a)


A := {x R : xn x for almost all n}. (7.24b)

We claim A 6= is bounded from below and x = max A = inf A ; A 6= is bounded


from above and x = min A = sup A . We prove the claim for A the proof for A is
conducted completely analogous. Let m, M R be a lower and an upper bound for S,
respectively. Then M A , showing A 6= ; and m is a lower bound for A . Since A
is bounded from below, a := inf A R by the completeness of R. Moreover, for each
> 0, a / A , as a is a lower bound for A , i.e. xn > a holds for infinitely many
n N. On the other hand, a + /2 A follows from a being the largest lower bound
of A , i.e. xn > a + /2 holds for only finitely many n (if any). In particular, we have
shown xn < a + holds for almost all n, and a < xn < a + must hold for infinitely
many n, showing a is a cluster point of S. To see that a is the largest cluster point of
S (i.e. a = max A), we have to show that x > a implies x is not a cluster point of S.
However, letting := x a > 0, we had seen above that xn > a + /2 holds for only
finitely many n, i.e. B/2 (x) contains only finitely many xn , showing x is not a cluster
point of S.
It now remains to consider the complex case, i.e. a bounded sequence S := (zn )nN in C.
For each n N, let zn = xn +iyn with xn , yn R. Due to Th. 5.11(d), we have |xn | |zn |
and |yn | |zn |, i.e. the boundedness of S implies the boundedness of both (xn )nN and
(yn )nN . Then we know that (xn )nN has a cluster point x and, by Prop. 7.26, S
has a subsequence (znj )jN such that x = limj xnj . As the subsequence (ynj )jN is
still bounded, it must have a cluster point y and a subsequence (ynjk )kN such that
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 75

y = limk ynjk . Since x = limk xnjk as well, we now have limk znjk = x + iy =: z,
i.e. S has a subsequence converging to z. According to Prop. 7.26, z is a cluster point
of S. 

Definition 7.28. A sequence (zn )nN in C is defined to be a Cauchy sequence if, and
only if, for each R+ , there exists N N such that |zn zm | < for each n, m > N ,
i.e.
(zn )nN Cauchy + |zn zm | < . (7.25)
R N N n,m>N

Theorem 7.29. The sequence (zn )nN in C is convergent if, and only if, it is a Cauchy
sequence.

Proof. Suppose the sequence is convergent with limn zn = z. Then, given > 0,
there is N N such that zn B 2 (z) for each n > N . If n, m > N , then |zn zm |
|zn z| + |z zm | < 2 + 2 = , establishing that (zn )nN is a Cauchy sequence.
Conversely, suppose the sequence is a Cauchy sequence. Using similar reasoning as in
the proof of Prop. 7.10(b), we first show the sequence is bounded. If the sequence is
Cauchy, then there exists N N such that |zn zm | < 1 for all n, m > N . Thus, the
set A := {|zn | : |zn zN +1 | 1} {|z1 |} R+
0 is nonempty and finite. According to
Th. 3.21(a), A has an upper bound M . Then max{M, |zN +1 | + 1} is an upper bound for
{|zn | : n N}, showing that the sequence is bounded. From Th. 7.27, we obtain that
the sequence has a cluster point z. It remains to show limn zn = z. Given > 0,
choose N N such that |zn zm | < /2 for all n, m > N . Since z is a cluster point,
there exists k > N such that |zk z| < /2. Thus,

|zn z| |zn zk | + |zk z| < + = , (7.26)
n>N 2 2
proving limn zn = z. 

Example 7.30. Consider the sequence S := (sn )nN defined by


n
X 1 1 1
sn := =1+ + + . (7.27)
k=1
k 2 n

We claim S is not a Cauchy sequence and, thus, not convergent by Th. 7.29: For each
N N, we find n, m > N such that sn sm > 1/2, namely m = N +1 and n = 2(N +1):
2(N +1)
X 1 1 1 1
s2(N +1) sN +1 = = + + +
k=N +2
k N +2 N +3 2(N + 1)
1 1
> (N + 1) = . (7.28)
2(N + 1) 2

While we have just seen that S is not convergent, it is clearly increasing, i.e. Th. 7.19
implies S is unbounded and limn sn = . Sequences defined by longer and longer
sums are known as series and will be studied further in Sec. 7.3 below. The series of the
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 76

present example is known as the harmonic series. It has become famous as the simplest
example of a series that does not converge even though its summands converge to 0. In
terms of the notation introduced in Sec. 7.3 below, we have shown

X 1 1 1
=1+ + + = . (7.29)
k=1
k 2 3

7.2 Continuity
7.2.1 Definitions and First Examples

Roughly, a function is continuous if a small change in its input results in a small change
of its output. For functions defined on an interval, the notion of continuity makes
precise the idea of a function having no jump no discontinuity at some point x in
its domain. For example, we would say the sign function of (5.9) has precisely one
jump one discontinuity at x = 0, whereas quadratic functions (or, more generally,
polynomials) do not have any jumps they are continuous.
Definition 7.31. Let M C. If M , then a function f : M K is said to be
continuous in if, and only if, for each > 0, there is > 0 such that the distance
between the values f (z) and f () is less than , provided the distance between z and
is less than , i.e. if, and only if,

+ + |z | < |f (z) f ()| < . (7.30)
R R zM

Moreover, f is called continuous if, and only if, f is continuous in every M . The set of
all continuous functions from f : M K is denoted by C(M, K), C(M ) := C(M, R).
Example 7.32. (a) Every constant map f : M K, 6= M C, is continuous: In
this case, given , we can choose any > 0 we want, say := 42: If , z M , then
|f () f (z)| = 0 < , which holds independently of , in particular, if | z| < .
(b) Every affine function f : K K, f (z) := az + b is continuous: For a = 0, this
follows from (a). For a 6= 0, given > 0, choose := /|a|. Then,

|z | < = f (z) f () = az + b a b
|a|
. (7.31)
,zK
= |a| |z | < |a| =
|a|

(c) The sign function of (5.9) is not continuous: It is continuous in each R\{0}, but
not continuous in 0: If 6= 0, then, given > 0, choose := ||. If |x | < , then
sgn(x) = sgn(), i.e. | sgn(x) sgn()| = 0 < , proving continuity in . However,
at 0, for := 1/2, we have
1
sgn(0) sgn(/2) = |0 1| = 1 > = , (7.32)
>0 2
showing sgn is not continuous in 0.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 77

Some subtleties arise from the possibility that f can be defined on subsets of C with
very different properties. The notions introduced in Def. 7.33 help to deal with these
subtleties.
Definition 7.33. Let M C.

(a) The point z C is called a cluster point or accumulation point of M if, and only if,
each -neighborhood of z, R+ , contains infinitely many points of M , i.e. if, and
only if,
+ #(M B (z)) = . (7.33)
R

Note: A cluster point of M is not necessarily in M .


(b) The point z is called an isolated point of M if, and only if, there is R+ such
that B (z) M = {z}. Note: An isolated point of M is always in M .
Proposition 7.34. If M C, then each point of M is either a cluster point or an
isolated point of M , i.e.
M : z isolated point of M }.
M = {z M : z cluster point of M } {z (7.34)

Proof. Consider z M that is not a cluster point of M . We have to show that z is an


isolated point of M . Since z is not a cluster point of M , there exists > 0 such that
A := (M B (z)) \ {z} is finite. Define
(
min{|a z| : a A} if A 6= ,
:= (7.35)
if A = .

Then B (z) M = {z}, showing z is an isolated point of M . Finally, the union in (7.34)
is clearly disjoint. 
Lemma 7.35. Let M C, f : M K. If is an isolated point of M , then f is
always continuous in .

Proof. Independently of the concrete definition of f , we know there is > 0 such that
B () M = {}. In other words, if z M with |z | < , then z = , implying
|f (z) f ()| = 0 < for each > 0, showing f to be continuous in . 
Example 7.36. (a) The sign function restricted to the set M :=], 1]{0}[1, [,
i.e.
1
for x [1, [,
sgn(x) = 0 for x = 0,


1 for x ] , 1]
is continuous: As in Ex. 7.32(c), one sees that sgn is continuous in each M \{0}.
However, now it is also continuous in 0, since 0 is an isolated point of M .
(b) Every function f : N K is continuous, since every n N is an isolated point of
N (due to {n} = N B 1 (n)).
2
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 78

7.2.2 Continuity, Sequences, and Function Arithmetic

To make available the power of the results on convergent sequences from Sec. 7.1 to
investigations regarding the continuity of functions, we need to understand the relation-
ship between both notions. The core of this relationship is the contents of the following
Th. 7.37, which provides a criterion allowing one to test continuity in terms of convergent
sequences:

Theorem 7.37. Let M C, f : M K. If M , then f is continuous in if, and


only if, for each sequence (zn )nN in M with limn zn = , the sequence (f (zn ))nN
converges to f (), i.e.

lim zn = lim f (zn ) = f (). (7.36)


n n

Proof. If M is an isolated point of M , then there is > 0 such that M B () = {}.


Then every f : M K is continuous in according to Lem. 7.35. On the other hand,
every sequence in M converging to must be finally constant and equal to , i.e. (7.36)
is trivially valid at . Thus, the assertion of the theorem holds if M is an isolated
point of M .
If M is not an isolated point of M , then is a cluster point of M according to Prop.
7.34. So, for the remainder of the proof, let M be a cluster point of M . Assume
that f is continuous in and (zn )nN is a sequence in M with limn zn = . For each
> 0, there is > 0 such that z M and |z | < implies |f (z) f ()| < . Since
limn zn = , there is also N N such that, for each n > N , |zn | < . Thus,
for each n > N , |f (zn ) f ()| < , proving limn f (zn ) = f (). Conversely, assume
that f is not continuous in . We have to construct a sequence (zn )nN in M with
limn zn = , but (f (zn ))nN does not converge to f (). Since f is not continuous
in , there must be some 0 > 0 such that, for each 1/n, n N, there is at least one
zn M satisfying |zn | < 1/n and |f (zn ) f ()| 0 . Then (zn )nN is a sequence in
M with limn zn = and (f (zn ))nN does not converge to f (). 

We can now apply the rules of Th. 7.13 to see that all the arithmetic operations defined
in Not. 6.2 preserve continuity:

Theorem 7.38. Let M C, f, g : M K, K, M . If f, g are both continuous


in , then f , f + g, f g, f /g for g() 6= 0, |f |, Re f , and Im f are all continuous in .
If K = R, then max(f, g), min(f, g), f + and f , are also all continuous in .

Proof. Let (zn )nN be a sequence in M such that limn zn = . Then the continuity
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 79

of f and g in yields limn f (zn ) = f () and limn g(zn ) = g(). Then

(7.11a) lim (f )(zn ) = (f )(),


n
(7.11b) lim (f + g)(zn ) = (f + g)(),
n
(7.11c) lim (f g)(zn ) = (f g)(),
n
(7.11d) lim (f /g)(zn ) = (f /g)() for g() 6= 0,
n
(7.11e) lim |f |(zn ) = |f |(),
n
(7.2) lim (Re f )(zn ) = (Re f )(),
n
(7.2) lim (Im f )(zn ) = (Im f )().
n

For the fourth case, i.e. for f /g, one might need to discard some initial part of the
sequence ((f /g)(zn ))nN to make sure that all the g(zn ) 6= 0. If f, g are both R-valued,
then we also have

(7.12a) lim max(f, g)(zn ) = max(f, g)(),


n
(7.12b) lim min(f, g)(zn ) = min(f, g)(),
n

and, finally, the continuity of f + and f follows from the continuity of max(f, g). 

Corollary 7.39. A function f : M C, M C, is continuous in M if, and


only if, both Re f and Im f are continuous in .

Proof. If f is continuous in , then Re f and Im f are both continuous in by Th. 7.38.


If Re f and Im f are both continuous in , then, as

f = Re f + i Im f, (7.37)

f is continuous in , once again, by Th. 7.38. 

Example 7.40. (a) The continuity of the absolute value function z 7 |z| on K can be
concluded directly from (7.11e) and, alternatively, from combining the continuity
of f : K K, f (z) = z, according to Ex. 7.32(b), with the continuity of |f |
according to Th. 7.38.
P
(b) Every polynomial P : K K, P (x) = nj=0 aj xj , aj K, is continuous: First
note that every monomial x 7 xj is continuous on K by (7.11g). Then Th. 7.38
implies the continuity of x 7 aj xj on K. Now the continuity of P follows from
(7.16a) or, alternatively, by an induction from the f + g part of Th. 7.38.

(c) Let P, Q : K K, be polynomials and let A := Q1 {0} the set of all zeros of
Q (if any). Then the rational function (P/Q) : K \ A K is continuous as a
consequence of (b) plus the f /g part of Th. 7.38.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 80

Theorem 7.41. Let Df , Dg C, f : Df C, g : Dg K, f (Df ) Dg . If f


is continuous in Df and g is continuous in f () Dg , then g f : Df K is
continuous in . In consequence, if f and g are both continuous, then the composition
g f is also continuous.

Proof. Let Df and assume f is continuous in and g is continuous in f (). If (zn )nN
is a sequence in Df such that limn zn = , then the continuity of f in implies that
limn f (zn ) = f (). Then the continuity of g in f () implies limn g(f (zn )) =
g(f ()), thereby establishing the continuity of g f in . 

7.2.3 Bounded, Closed, and Compact Sets

Subsets A of C (and even subsets of R) can be extremely complicated. If the set A has
one or more of the benign properties defined in the following, then this can often be
exploited in some useful way (we will see an important example in Th. 7.54 below).

Definition 7.42. Consider A C.

(a) A is called bounded if, and only if, A = or the set {|z| : z A} is bounded in R
in the sense of Def. 2.24(a), i.e. if, and only if,

A BM (0).
M R+

(b) A is called closed if, and only if, every sequence in A that converges in C has its
limit in A (note that is, thus, closed).

(c) A is called compact if, and only if, A is both closed and bounded.

Example 7.43. (a) Clearly, and sets containing single points {z}, z C are com-
pact. The sets C and R are simple examples of closed sets that are not bounded.

(b) Let a, b R, a < b. Each bounded interval ]a, b[, ]a, b], [a, b[, [a, b] is, indeed,
bounded (by M := max{|a|, |b|}). If (xn )nN is a sequence in [a, b], converging
to x R, then Th. 7.13(c) shows a x b, i.e. x [a, b] and [a, b] is, indeed,
closed. Analogously, one sees that the unbounded intervals [a, [ and ] , a] are
also closed. On the other hand, open and half-open intervals are not closed: For
sufficiently large n, the convergent sequence (b n1 )nN is in [a, b[, but limn (b
1
n
) = b / [a, b[, and the other cases are treated analogously. In particular, only
intervals of the form [a, b] (and trivial intervals) are compact.

(c) For each > 0 and each z C, the set B (z) is bounded (since B (z) B+|z| (0)
by the triangle inequality), but not closed (since, for sufficiently large n N,
(z + n1 )nN is a sequence in B (z), converging to z +
/ B (z)). In particular,
B (z) is not compact.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 81

Proposition 7.44. (a) Finite unions of bounded (resp. closed, resp. compact) sets are
S if A1 , . . . , An C, n N, are bounded
bounded (resp. closed, resp. compact), i.e.
(resp. closed, resp. compact), then A := nj=1 Aj is also bounded (resp. closed, resp.
compact).
(b) Arbitrary (i.e. finite or infinite) intersections of bounded (resp. closed, resp. com-
pact) sets are bounded (resp. closed, resp. compact), i.e. if I 6= is an arbitrary
set and, for each j I, Aj C is bounded (resp. closed, resp. compact), then
index T
A := jI Aj is also bounded (resp. closed, resp. compact).

Proof. (a): Exercise.


(b): Fix j0 I. If all Aj , j I,Tare bounded, then, in particular, there is M R+
0 such
that Aj0 BM (0). Thus, A = jI Aj Aj0 BM (0) shows A is also bounded. If all
Aj , j I, are closed and (an )nN is a sequence in A that converges to some z C, then
(an )nN is a sequenceT in each Aj , j I, and, since each Aj is closed, z Aj for each
j I, i.e. z A = jI Aj . If all Aj , j I, are compact, then they are all closed and
bounded and, thus, A is closed and bounded, i.e. A is compact. 
Example 7.45. (a) According to Prop. 7.44(a), all finite subsets of C are compact.
S
(b) N = nN S {n}1 shows1 that infinite unions of compact sets can be unbounded, and
]0, 1[= nN [ n , 1 n ] shows that infinite unions of compact sets are not always
closed.

Many more examples of closed sets can be obtained as preimages of closed sets under
continuous maps according to the following remark:
Remark 7.46. In Calculus II, it will be shown in the more general context of maps f
between metric spaces that a map f is continuous if, and only if, all preimages f 1 (A)
under f of closed sets A are closed. Here, we will only prove the following special case:
f : C K continuous and A K closed f 1 (A) C closed. (7.38)
Indeed, suppose f is continuous and A K is closed. If (zn )nN is a sequence in f 1 (A)
with limn zn = z C, then (f (zn ))nN is a sequence in A. The continuity of f then
implies limn f (zn ) = f (z) and, then, f (z) A, since A is closed. Thus, z f 1 (A),
showing f 1 (A) is closed.
Example 7.47. (a) For each z C and each r > 0, the closed disk B r (z) := {w C :
|z w| r} with radius r and center z is, indeed, closed by (7.38), since
B r (z) = f 1 [0, r], (7.39)
where f is the continuous map f : C R, f (w) := |z w|. Since B r (z) is clearly
bounded, it is also compact.
(b) For each z C and each r > 0, the circle (also called a 1-sphere) Sr (z) := {w C :
|z w| = r} with radius r and center z is closed by (7.38), since Sr (z) = f 1 {r},
where f is the same map as in (7.39). Moreover, Sr (z) is also clearly bounded, and,
thus, compact.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 82

(c) According to (7.38), for each x R, the closed half-spaces {z C : Re z x} =


Re1 [x, [ and {z C : Im z x} = Im1 [x, [ are, indeed, closed.
Theorem 7.48. A subset K of C is compact if, and only if, every sequence in K has a
subsequence that converges to some limit z K.

Proof. If K is closed and bounded, and (zn )nN is a sequence in K, then the boundedness,
the Bolzano-Weierstrass Th. 7.27, and Prop. 7.26 yield a subsequence that converges to
some z C. However, since K is closed, z K.
Conversely, assume every sequence in K has a subsequence that converges to some
limit z K. Let (zn )nN be a sequence in K that converges to some w C. Then this
sequence must have a subsequence that converges to some z K. However, according to
Prop. 7.23, it must be w = z K, showing K is closed. If K is not bounded, then there
exists a sequence (zn )nN in K such that limn |zn | = . Every subsequence (znk )kN
then still has the property that limk |znk | = , in particular, each subsequence is
unbounded and can not converge to some z C (let alone in K). 
Caveat 7.49. In Calculus II, we will generalize the notion of compactness to subsets of
so-called metric spaces, defining a set K to be compact if, and only if, every sequence
in K has a subsequence that converges to some limit in K. While it remains true that
every compact set is closed and bounded, the converse does not(!) hold in general metric
spaces (in general, even in closed sets, there exist bounded sequences that do not have
convergent subsequences).

One reason that compact sets are useful is that real-valued continuous functions on
compact sets assume a maximum and a minimum, which is the contents of Th. 7.54
below. In preparation, we now define maxima and minima for real-valued functions.
Definition 7.50. Let M C, f : M R.

(a) Given z M , f has a (strict) global min at z if, and only if, f (z) f (w) (f (z) <
f (w)) for each w M \ {z}. Analogously, f has a (strict) global max at z if, and
only if, f (z) f (w) (f (z) > f (w)) for each w M \{z}. Moreover, f has a (strict)
global extreme value at z if, and only if, f has a (strict) global min or a (strict)
global max at z.
(b) Given z M , f has a (strict) local min at z if, and only if, there exists > 0
such that f (z) f (w) (f (z) < f (w)) for each w {w M : |z w| < } \ {z}.
Analogously, f has a (strict) local max at z if, and only if, there exists > 0 such
that f (z) f (w) (f (z) > f (w)) for each w {w M : |z w| < } \ {z}.
Moreover, f has a (strict) local extreme value at z if, and only if, f has a (strict)
local min or a (strict) local max at z.
Remark 7.51. In the context of Def. 7.50, it is immediate from the respective definitions
that f has a (strict) global min at z M if, and only if, f has a (strict) global max
at z. Moreover, the same holds if global is replaced by local. It is equally obvious
that every (strict) global min/max is a (strict) local min/max.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 83

Theorem 7.52. If K C is compact, and f : K C is continuous, then f (K) is


compact.

Proof. If (wn )nN is a sequence in f (K), then, for each n N, there is some zn K
such that f (zn ) = wn . As K is compact, there is a subsequence (an )nN of (zn )nN
with limn an = a for some a K. Then (f (an ))nN is a subsequence of (wn )nN and
the continuity of f yields limn f (an ) = f (a) f (K), showing that (wn )nN has a
convergent subsequence with limit in f (K). By Th. 7.48, we have therefore established
that f (K) is compact. 

Lemma 7.53. If K is a nonempty compact subset of R, then K contains a smallest


and a largest element, i.e. there exist m, M K such that m x M for each x K.

Proof. Since the compact set K is bounded, we know that

< m := inf K sup K =: M < .

According to the definition of the inf and sup as largest lower bound and smallest upper
bound, respectively, for each n N, there must be elements xn , yn K such that
m xn m + n1 and M n1 yn M . Since the compact set K is also closed, we get
m = limn xn K and M = limn yn K. 

Theorem 7.54. If K C is compact, and f : K R is continuous, then f assumes


its max and its min, i.e. there are zm K and zM K such that f has a global min at
zm and a global max at zM . In particular, the continuous function f assumes its max
and min on each compact interval K = [a, b] R, a, b R.

Proof. Since K is compact and f is continuous, f (K) R is compact according to


Th. 7.52. Then, by Lem. 7.53, f (K) contains a smallest element m and a largest
element M . This, in turn, implies that there are zm , zM K such that f (zm ) = m and
f (zM ) = M . 

Example 7.55. On an unbounded set, a continuous function does not necessarily have
a global max or a global min, as one can already see from x 7 x. An example for a
continuous function on a bounded, but not closed, interval, that does not have a global
max is f : ]0, 1] R, f (x) := 1/x, which is continuous by Th. 7.38.

7.2.4 Intermediate Value Theorem

Theorem 7.56 (Bolzanos Theorem). Let a, b R with a < b. If f : [a, b] R is


continuous with f (a) > 0 and f (b) < 0, then f has at least one zero in ]a, b[. More
precisely, the set A := f 1 {0} has a min 1 and a max 2 , a < 1 2 < b, where f > 0
on [a, 1 [ and f < 0 on ]2 , b].

Proof. Let 1 := inf f 1 (R


0 ).
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 84

(a): f (1 ) 0: This is clear if 1 = b. If 1 < b, then, for each n N sufficiently large,


there exists xn ]1 , 1 + 1/n[ [a, b] such that f (xn ) 0). Then limn xn = 1 and
the continuity of f implies limn f (xn ) = f (1 ). Now f (1 ) 0 is a consequence of
Th. 7.13(c). In particular, (a) yields a < 1 and f > 0 on [a, 1 [.
(b): f (1 ) 0: The continuity of f implies limn f (1 1/n) = f (1 ) and, since we
have already seen f (1 1/n) > 0 for each n N sufficiently large, f (1 ) 0 is again a
consequence of Th. 7.13(c). In particular, we have 1 < b.
Combining (a) and (b), we have f (1 ) = 0 and a < 1 < b.
Defining 2 := sup f 1 (R+0 ), f (2 ) = 0 and a < 2 < b is shown completely analogous.
Then f < 0 on ]2 , b] is also clear as well as 1 2 . 

Theorem 7.57 (Intermediate Value Theorem). Let a, b R with a < b. If f : [a, b]


R is continuous, then f assumes every value between f (a) and f (b), i.e.
h i 
min{f (a), f (b)}, max{f (a), f (b)} f [a, b] . (7.40)

Proof. If f (a) = f (b), then there is nothing to prove. If f (a) < f (b) and ]f (a), f (b)[,
then consider the auxiliary function g : [a, b] R, g(x) := f (x). Then g is
continuous with g(a) = f (a) > 0 and g(b) = f (b) < 0. According to Bolzanos
Th. 7.56, there exists ]a, b[ such that g() = f () = 0, i.e. f () = as claimed.
If f (b) < f (a) and ]f (b), f (a)[, then consider the auxiliary function g : [a, b] R,
g(x) := f (x). Then g is continuous with g(a) = f (a) > 0 and g(b) = f (b) < 0.
Once again, according to Bolzanos Th. 7.56, there exists ]a, b[ such that g() =
f () = 0, i.e. f () = . 

Theorem 7.58. If I R is an interval (of one of the 8 types listed in (4.11)) and
f : I R is continuous, then f (I) is also an interval (it can degenerate to a single
point if f is constant). More precisely, if 6= I = [a, b] is a compact interval, then
6= f (I) = [min f (I), max f (I)]; if I is not a compact interval, then one of the following
9 cases occurs:

f (I) = R, (7.41a)
f (I) =] , sup f (I)], (7.41b)
f (I) =] , sup f (I)[, (7.41c)
f (I) = [inf f (I), [ (7.41d)
f (I) = [inf f (I), sup f (I)], (7.41e)
f (I) = [inf f (I), sup f (I)[, (7.41f)
f (I) =] inf f (I), [, (7.41g)
f (I) =] inf f (I), sup f (I)], (7.41h)
f (I) =] inf f (I), sup f (I)[. (7.41i)

Proof. If I is a compact interval, then we merely combine Th. 7.54 with Th. 7.57.
Otherwise, let f (I). If f (I) has an upper bound, then Th. 7.57 implies [, sup f (I)[
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 85

f (I) and f (I) [, [ [, sup f (I)]. If f (I) does not have an upper bound, then Th.
7.57 implies f (I) [, [= [, [. Analogously, one obtains f (I)] , ] =] , ]
or f (I)] , ] = [inf f (I), ] or f (I)] , ] =] inf f (I), ],showing that there
 are
precisely the 9 possibilities of (7.41) for f (I) = f (I)] , ] f (I) [, [ . 

The above results will have striking consequences in the following Sec. 7.2.5.

Example 7.59. The piecewise affine function



(1)n n 2n+1
1 1 x n1 for x [ n1 , n1
1
], n even,

f : ]0, 1] R, f (x) := n1 n

(1)n n + 2n+1
1 x n1 for x [ n1 , n1
1
], n 3 odd,
1
n1 n

satisfies f (1/n) = (1)n n for each n N and is an example of a continuous function


on the bounded half-open interval I :=]0, 1] with f (I) = R.

7.2.5 Inverse Functions, Existence of Roots, Exponential Function, Loga-


rithm

Theorem 7.60. Let I R be an interval (of one of the 8 types listed in (4.11)). If
f : I R is continuous and strictly increasing (resp. decreasing), then f has an
inverse function f 1 defined on the interval J := f (I), i.e. f 1 : J I, and f 1 is
also continuous and strictly increasing (resp. decreasing).

Proof. From Prop. 2.29(b), we know f : I R is one-to-one. Then f : I f (I)


is invertible and Prop. 2.29(c) shows f 1 is strictly monotone in the same sense as f .
Furthermore, we know from Th. 7.58 that J = f (I) is an interval. It remains to verify
f 1 : J I R is continuous. Let J, > 0, and I with f () = . Then
I := B () I is an interval, J := f (I ) is an interval, and J . Choose > 0 such
that B () J J . Then y J and |y | < (i.e. y B () J) implies f 1 (y) I ,
i.e. |f 1 (y) f 1 ()| = |f 1 (y) | < , proving the continuity of f 1 . 

Remark and Definition 7.61 (Roots). We are now in a position to fulfill the promise
made in Def. and Rem. 5.8, i.e. to prove the existence of unique roots for nonnegative
real numbers: For each n N, the function f : R+ n
0 R, f (x) := x , is continuous
+ +
and strictly increasing with J := f (R0 ) = R0 . Then Th. 7.60 implies the existence
of a continuous and strictly increasing inverse function f 1 : R+ +
0 R0 . For each
1
x R+ 0 , we call f
1
(x) the nth root of x and write n x := x n := f 1 (x). Then
1
( n x)n = (x n )n = x is immediate from the definition. Caveat: By definition, roots are
always nonnegative and they are only defined for nonnegative numbers (when studying
complex numbers and C-valued functions more deeply in the field of Complex Analysis,
one typically extends the notion of root, but we will not
pursue thisroute in thisclass).
As anticipated in Def. and Rem. 5.8, one also writes x instead of 2 x and calls x the
square root of x.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 86


Remark and Definition 7.62. It turns out that 2 (and many other roots)
are not
rational numbers, i.e. 2 by contradiction: If 2 Q, then
/ Q. This is easily proved
there exist natural numbers m, n N such that 2 = m/n. Moreover, by canceling
possible
factors of 2, we may assume at least one of the numbers m, n is odd. Now
2 = m/n implies m2 = 2n2 , i.e. m2 and, thus, m must be even. In consequence, there
exists p N such that m = 2p, implying 2n2 = m2 = 4p2 and n2 = 2p2 . Thus n2 and n
must also be even, in contradiction to m, n not both being even.
The elements of R \ Q are called irrational numbers. It turns out that most real num-
bers are irrational numbers one can show that Q is countable, whereas R \ Q is not
countable (actually, every interval contains countably many rational and uncountably
many irrational numbers, see Appendix E, in particular, Th. E.1(c) and Cor. E.4).

Theorem 7.63 (Inequality Between the Arithmetic Mean and the Geometric Mean).
If n N and x1 , . . . , xn R+
0 , then

x1 + + xn
n
x1 xn , (7.42)
n
where the left-hand side is called the geometric mean and the right-hand side is called
the arithmetic mean of the numbers x1 , . . . , xn . Equality occurs if, and only if, x1 =
= xn .

Proof. If at least one of the xj is 0, then (7.42) becomes the true statement 0 x1 ++x
n
n

with strict equality if at least one xj > 0. If x1 = = xn = x, then (7.42) also holds
since both sides are equal to x. Thus, for the remainder of the proof, we assume all
xj > 0 and not all xj are equal. First, we consider the special case, where x1 ++x
n
n
= 1.
Since not all xj are equal, there exists k with xk 6= 1. We prove (7.42) by induction for
n {2, 3, . . . } in the form
n
! n
X Y
xj = n xk 6= 1 xj < 1. (7.43)
k{1,...,n}
j=1 j=1

Base Case (n = 2): Since x1 + x2 = 2, 0 < x1 , x2 and not both x1 and x2 are equal to
1, there is > 0 such that x1 = 1 + and x2 = 1 , i.e. x1 x2 = 1 2 < 1, which
Pn+1 the base case. Induction Step: We now have n 2 and 0 < x1 , . . . , xn+1
establishes
with j=1 xj = n + 1 plus the existence of k, l {1, . . . , n + 1} such that xk = 1 + ,
xl = 1 with , > 0. Then define y := xk + xl 1 = 1 + . One observes y > 0
(since < 1) and
n+1
X n+1
X n+1
Y
ind. hyp.
y+ xj = 1 + xj = n y xj 1 (7.44)
j=1, j=1 j=1,
j6=k,l j6=k,l

(we can not exclude equality as y and all the remaining xj might be equalQto 1). Since
xk xl = (1 + )(1 ) = 1 + = y < y, (7.44) implies n+1 j=1 xj < 1,
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 87

concluding the induction proof of (7.43). It remains to consider the case x1 ++x n
n
=>
0, not all xj equal. One estimates
r
x1 xn special case x1 + + xn x1 + + xn
n
x1 xn = n < = , (7.45)
n n
completing the proof of the theorem. 
Corollary 7.64. For each a R+
0 \ {1}, n {2, 3, . . . }, p {1, . . . , n 1}:


n p a1
ap < 1 + (a 1); p = 1 yields n
a<1+ . (7.46)
n n
Proof. The simple application
v
u np

n
u Y Th. 7.63 p a + n p p
p
a = a
n
t p 1 < = 1 + (a 1) (7.47)
j=1
n n

of Th. 7.63 establishes the case. 


Example 7.65. We use (7.42) to show

n
lim n=1: (7.48)
n

First note 0 < x < 1 0 < xn < 1, i.e.
Qn2
n
n > 1 for each ( n
n)n = n > 1. Now write n
as the product of n factors n = n n k=1 1. Then, for n > 1,
v
Y Th. 7.63 2n + n 2
u
u n2 2
n
n= tn
n n 1 < <1+ . (7.49)
k=1
n n

It is an exercise to show
1
lim = 0. (7.50)
n n

Now this together with 1 n
n1+ 2 and the Sandwich Th. 7.16 proves (7.48).
n

Example 7.66 (Eulers Number). We use Th. 7.63 to prove the limit
 n
1
e := lim 1 + (7.51)
n n

exists. It is known as Eulers number. One can show it is an irrational number (see
Appendix F.1) and its first digits are e = 2.71828 . . . It is of exceptional importance for
analysis and mathematics in general, as it pops up in all kinds of different mathematical
contexts. From Th. 7.63, we obtain
  n+1
x n  x n x
1+ =1 1+ < 1+ , (7.52)
nN x[n,[, n n n+1
x6=0
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 88

where we have used that, on both sides of the inequality in (7.52), there are n + 1 factors
having the same sum, namely n + 1 + x; and the inequality in (7.42) is strict, unless
all factors are equal. We now apply (7.52) to the sequences (an )nN , (bn )nN , (cn )nN ,
where  n  n
1 1
an := 1 + , bn := 1 ,

n n


  1
!n+1   n+1 :
(7.53)
nN
c := b1 = 1 1
n n+1 1 = 1+
n+1 n
Applying (7.52) with x = 1 and x = 1, respectively, yields (an )nN and (bn )nN are
strictly increasing, and (cn )nN is strictly decreasing. On the other hand, an < cn holds
for each n N, showing (an )nN is bounded from above by c1 , and (cn )nN is bounded
from below by a1 . In particular, Th. 7.19 implies the convergence of both (an )nN and
(cn )nN . Moreover, limn cn = limn an (1 + 1/n) = e 1 = e, which, together with
an < e < cn for each n N, can be used to compute e to an arbitrary precision.

Definition 7.67. Let A R be a subset of the real numbers. Then A is called dense
in R if, and only if, every -neighborhood of every real number contains a point from A,
i.e. if, and only if,
+ A B (x) 6= .
xR R

Theorem 7.68. (a) Q is dense in R.

(b) R \ Q is dense in R.

(c) For each x R, there exist sequences (rn )nN and (sn )nN in the rational numbers
Q such that x = limn rn = limn sn , (rn )nN is strictly increasing and (sn )nN
is strictly decreasing.

Proof. (a): Since each B (x) is an interval, it suffices to prove that every interval ]a, b[,
a < b, contains a rational number. If 0 ]a, b[, then there is nothing to prove. Suppose
0 < a < b and set := b a > 0. Choose n N such that 1/n < and let
 
k k
q := max : kN <b .
n n

Then q Q and a < q < b. If a < b < 0, choose and n as above, but let
 
k k
q := min : k N > a .
n n

Then, once again, q Q and a < q < b.


(b): Analogous to (a), we show that every interval ]a, b[, a < b, contains an irrational
According to (a), we choose q
number: Q]a, b[, := b q > 0 and n N such
that
2/n < . Then a < := q + 2/n < b and also R \ Q (otherwise,
2 = n( q) Q).
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 89

(c): Using (a), for each n N, we choose rational numbers rn and sn such that
   
1 1 1 1
rn x , x , sn x + ,x + .
n n+1 n+1 n

Then, clearly, (rn )nN is strictly increasing, (sn )nN is strictly decreasing, and the Sand-
wich Th. 7.16 implies x = limn rn = limn sn . 

Definition and Remark 7.69 (Exponentiation). In Not. 5.6, we had defined ax for
(a, x) C N0 and for (a, x) (C \ {0}) Z. We will now extend the definition to
(a, x) R+ R (later, we will further extend the definition to (a, z) R+ C). The
present extension to (a, x) R+ R is accomplished in two steps first, in (a), for
rational x, then, in (b), for irrational x.

(a) For rational x = k/n with k Z and n N, define


k
n
ax := a n := ak . (7.54)

For this definition to make sense, we have to check it does not depend on the special
representation of x, i.e., we have to verify x = nk = nmkm
with k Z and m, n N
k km
implies a = a . To this end, observe, using Rem. and Def. 7.61,
n nm

k
n km
nm
(a n )nm = ( ak )nm = akm and (a nm )nm = ( akm )nm = akm , (7.55)
k km
proving a n = a nm (here, as in Rem. and Def. 7.61, we used that 7 N is one-
to-one on R+0 for each N N). The exponentiation rules of Th. 5.7 now extend to
rational exponents in a natural way, i.e., for each a, b > 0 and each x, y Q:

ax+y = ax ay , (7.56a)
ax bx = (ab)x , (7.56b)
(ax )y = ax y . (7.56c)

For the proof, by possibly multiplying numerator and denominator by some natural
number, we can assume x = k/n and y = l/n with k, l Z and n N. Then
k+l Th. 5.7(a) k l Th. 5.7(b)
(ax+y )n = (a n )n = ak+l = ak al = (a n )n (a n )n = (ax ay )n ,

proving (7.56a);
Th. 5.7(b) k k Th. 5.7(b) k Th. 5.7(c)
(ax bx )n = (a n )n (b n )n ak bk = (ab)k = (ab) n n = ((ab)x )n ,

proving (7.56b);
Th. 5.7(c)
 l
n k Th. 5.7(c) k Th. 5.7(c)
x y n2 x n
((a ) ) = ((a ) ) n = ((a n )l )n = ((a n )n )l = akl
kl 2 2
= (a n2 )n = (ax y )n ,
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 90

proving (7.56c).
Moreover, we obtain the following monotonicity rules for each a, b R+ and each
x, y Q:
 
x x
a<b a <b , (7.57a)
x>0
 
a < b ax > b x , (7.57b)
x<0
 
x < y ax < ay , (7.57c)
a>1
 
x < y ax > ay . (7.57d)
0<a<1

If x = k/n with k, n N and a < b, then a1/n < b1/n according to Rem. and
Def. 7.61, which, in turn, implies ax = (a1/n )k < (b1/n )k = bx , proving (7.57a); and
a1 > b1 implies ax = (a1 )x > (b1 )x = bx , proving (7.57b). If x < y, set q :=
y x > 0. Then 1 < a and (7.57a) imply 1 = 1q < aq , i.e. ax < ax aq = ay , proving
(7.57c). Similarly, 0 < a < 1 and (7.57a) imply aq < 1q = 1, i.e. ay = ax aq < ax ,
proving (7.57d).
The following estimates will also come in handy: For a R+ and x, y Q:

a > 1 x > 0 ax 1 < x ax+1 , (7.58)



x, y [m, m] |ax ay | L |x y|,
mN
 (7.59)
where L := max{am+1 , (1/a)m+1 } .

For x 1, (7.58) is proved by ax < ax+1 < x ax+1 + 1; for x < 1, write x = p/n
with p, n N and p < n, and apply (7.46) to obtain ax < 1 + x(a 1) < 1 + xa <
1 + x ax+1 . For the proof of (7.59), first consider a > 1. Moreover, by possibly
renaming x and y, we may assume x < y, i.e. z := y x > 0. Thus, (7.58) holds
with x replaced by z. Multiplying the resulting inequality by ax yields

ax az ax = ay ax < z ax az+1 = (y x) ay+1 (y x) am+1 ,

proving (7.59) for a > 1. For a = 1, it is clearly true, and for a < 1, it is a1 > 1,
i.e.
|ax ay | = |(a1 )x (a1 )y | |y x| (a1 )m+1 ,
finishing the proof of (7.59).

(b) We now define ax for irrational x by letting

ax := lim aqn , where (qn )nN is a sequence in Q with lim qn = x. (7.60)


n n

For this definition to make sense, we have to know such sequences (qn )nN exist,
which we do know from Th. 7.68(c). We also know from Th. 7.68(c) that there
exists an increasing sequence (qn )nN in Q converging to x, in particular, bounded
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 91

by x. Then, by (7.57c) and (7.57d), respectively, (aqn )nN is increasing for a > 1
and decreasing for 0 < a < 1. Moreover, the sequence is bounded from above by
aN with N N, N > x, for a > 1; and bounded from below by 0 for 0 < a < 1.
In both cases, Th. 7.19 implies convergence of the sequence to some limit that we
may call ax . However, we still need to verify that, for each sequence (rn )nN in Q
with limn rn = x, the sequence (arn )nN converges to the same limit ax in R. If
limn rn = x, then limn |qn rn | = 0. Since (rn )nN and (qn )nN are bounded,
(7.59) implies
+ |aqn arn | L |qn rn |, (7.61)
LR nN

such that Prop. 7.11(a) implies limn |aqn arn | = 0 and

lim arn = lim (arn aqn + aqn ) = 0 + ax = ax , (7.62)


n n

showing (7.60) does not depend on the chosen sequence.


Proposition 7.70. The exponentiation rules (7.56), the monotonicity rules (7.57), and
the estimates (7.58) and (7.59) remain valid if x, y Q is replaced by x, y R. More-
over, for each a > 0 and each sequence (xn )nN in R:

lim xn = x R lim axn = ax . (7.63)


n n

Proof. Given x, y R, let (pn )nN and (qn )nN be sequences in Q such that limn pn =
x and limn qn = y.
We start by verifying (7.59). As we can assume (pn )nN and (qn )nN to be monotone,
we may also assume pn , qn [m, m] for each n N. Then the rational case of (7.59)
implies
|apn aqn | L |pn qn |,
nN

and Th. 7.13(c) establishes the case. Then (7.63) also follows, since

0 |axn ax | L |xn x| 0.

We deal with (7.56) next. For each a, b > 0:


(7.56a)
ax+y = lim apn +qn = lim (apn aqn ) = ax ay ,
n n
(7.56b)
ax bx = lim apn lim bpn = lim (apn bpn ) = lim (ab)pn = (ab)x ,
n n n n
(7.56c)
(ax )qk = lim (apn )qk = lim apn qk = ax qk ,
kN n n
(7.59)
(ax )y = lim (ax )qn = lim ax qn = ax y ,
n n

thereby proving (7.56).


Proceeding to (7.57c), let a > 1 and h > 0. If (qn )nN is an increasing sequence in Q+
with limn qn = h, then ah = limn aqn > aq1 > 1. Thus, if x < y, let h := y x > 0
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 92

to obtain ay = ax ah > ax , i.e. (7.57c). If 0 < a < 1 and x < y, then (1/a)x < (1/a)y ,
yielding (7.57d). For (7.57a), consider x > 0 and 0 < a < b. Then
 x
b bx b
>1 x
= > 1 b x > ax ,
a a a
proving (7.57a). If x < 0 and 0 < a < b, then ax = (1/a)x > (1/b)x = bx , proving
(7.57b).
Finally, it remains to verify (7.58). For x 1, the proof for rational x still works for
irrational x. For 0 < x < 1, one uses the usual sequence (qn )nN in Q with limn qn = x
and obtains (recalling a > 1)
(7.46) 
ax = lim aqn lim 1 + qn (a 1) = 1 + x(a 1) < 1 + x ax+1 ,
n n

proving (7.58). 
Definition 7.71 (Exponential and Power Functions). (a) Each function of the form
f : R+ R, f (x) := x , R, (7.64)
is called a power function. For > 0, the power function is extended to x = 0 by
setting 0 := 0; for Z, it is defined on R \ {0}; for N0 even on R.
(b) Each function of the form
f : R R+ , f (x) := ax , a > 0, (7.65)
is called a (general) exponential function. The case where a = e with e being Eulers
number from (7.51) is of particular interest and importance. Most of the time, when
referring to an exponential function, one actually means x 7 ex . It is also common
to write exp(x) instead of ex .
Theorem 7.72. (a) Every power function as defined in Def. 7.71(a) is continuous on
its respective domain. Moreover, for each > 0, it is strictly increasing on [0, [;
for each < 0, it is strictly decreasing on ]0, [.
(b) Every exponential function as defined in Def. 7.71(b) is continuous. Moreover, for
each a > 1, it is strictly increasing; for each 0 < a < 1, it is strictly decreasing.

Proof. (a): The monotonicity claims are provided by (7.57a) and (7.57b), respectively.
For each N0 , the power function is a polynomial, for each Z, a rational function,
i.e. continuity is provided by Ex. 7.40(b) and Ex. 7.40(c), respectively. For a general
R, the continuity proof on R+ will be postponed to Ex. 7.76(a) below, where it can
be accomplished more easily. So it remains to show the continuity in x = 0 for > 0.
However, if (xn )nN is a sequence in R+ with limn xn = 0 and k N with 1/k ,
1/k
then, at least for n sufficiently large such that xn 1, 0 < xn xn by (7.57d). Then
1/k
the continuity of x 7 x1/k implies limn xn = 0 and the Sandwich Th. 7.16 implies
limn xn = 0, proving continuity in x = 0.
(b): Everything has already been proved continuity is provided by (7.63), monotonicity
is provided by (7.57c) and (7.57d). 
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 93

Remark and Definition 7.73 (Logarithm). According to Th. 7.72(b), for each a
R+ \ {1}, the exponential function f : R R+ , f (x) := ax , is continuous and strictly
monotone with f (R) = R+ (verify that the image is all of R+ as an exercise). Then
Th. 7.60 implies the existence of a continuous and strictly monotone inverse function
f 1 : R+ R. For each x R+ , we call f 1 (x) the logarithm of x to base a and write
loga x := f 1 (x). The most important special case is where the base is Eulers number,
a = e. This is called the natural logarithm. Bases a = 2 and a = 10 also carry special
names, binary and common logarithm, respectively. The notation is

ln x := loge x, lb x := log2 x, lg x := log10 x, (7.66)

however, the notation in the literature varies one finds log used instead of ln, lb, and
lg; one also finds lg instead of lb. So you always need to verify what precisely is meant
by either notation.

Corollary 7.74. For each a R+ \ {1}, the logarithm function f : R+ R, f (x) =


loga x is continuous. For a > 1, it is strictly increasing; for 0 < a < 1, it is strictly
decreasing. 

Theorem 7.75. One obtains the following logarithm rules:

loga 1 = 0, (7.67a)
aR+ \{1}

loga a = 1, (7.67b)
aR+ \{1}

aloga x = x, (7.67c)
aR+ \{1} xR+

loga ax = x, (7.67d)
aR+ \{1} xR

loga (xy) = loga x + loga y, (7.67e)


aR+ \{1} x,yR+

+ loga (xy ) = y loga x, (7.67f)


aR+ \{1} xR yR

loga (x/y) = loga x loga y, (7.67g)


aR+ \{1} x,yR+

1
+ loga n
x= loga x, (7.67h)
+
aR \{1} xR nN n
logb x = (logb a) loga x. (7.67i)
a,bR+ \{1} xR+

Proof. All the rules are easy consequences of the logarithm being defined as the inverse
function to f : R R+ , f (x) := ax .
(7.67a): It is loga 1 = f 1 (1) = 0, as f (0) = a0 = 1.
(7.67b): It is loga a = f 1 (a) = 1, as f (1) = a1 = a.
(7.67c): It is aloga x = f (f 1 (x)) = x.
(7.67d): It is loga ax = f 1 (f (x)) = x.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 94


(7.67e): It is loga (xy) = f 1 (xy) = f 1 f (loga x + loga y) = loga x + loga y, since

(7.67c)
f (loga x + loga y) = aloga x+loga y = aloga x aloga y = xy.

(7.67f): It is loga (xy ) = f 1 (xy ) = f 1 f (y loga x) = y loga x, since

(7.67c)
f (y loga x) = ay loga x = (aloga x )y = xy .

(7.67g) is just a combination of (7.67e) and (7.67f): loga (x/y) = loga (xy 1 ) = loga x
loga y.

(7.67h) is just a special case of (7.67f): loga n x = loga x1/n = n1 loga x.
(7.67i): One computes
(7.67f) (7.67c)
(logb a) loga x = logb aloga x = logb x.

Thus, we have verified all the rules and concluded the proof. 

Example 7.76. (a) For each R, the power function

f : R+ R, f (x) := x = e ln x , (7.68)

is continuous, which follows from Th. 7.41, since f = exp ( ln), ln is continuous
by Cor. 7.74, and exp is continuous by Th. 7.72(b).

(b) As a consequence of Th. 7.41, each of the following functions f1 , f2 , f3 , where



f1 : R R, f1 (x) := exp( + x2 ) ,
1
f2 : R R, f2 (x) := ,
ex
+
x5
f3 : R R, f3 (x) := ,
( + |x|)

is continuous for each R and each R+ .

7.3 Series
7.3.1 Definition and Convergence

Series are a special type of sequences, namely sequences whose members arise from
summing up the members of another sequence. We have, on occasion, already encoun-
tered series, for example the harmonic series (sn )nN , whose members sn were defined
in (7.27). In the present section, we will study series more systematically.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 95

Definition 7.77. Given a sequence (an )nN in K (or, more generally, in any set A,
where an addition is defined), the sequence (sn )nN , where
n
X
sn := aj , (7.69)
nN
j=1

is called an (infinite) series and is denoted by



X X
aj := aj := (sn )nN . (7.70)
j=1 jN

The anPare called the summands of the series, the sn its partial sums. Moreover, each
series j=k aj with k N is called a remainder (series) of the series (sn )nN .

The example of the remainder series already shows that it is useful to allow countable
index sets other than N. Thus, if (aj )jI , where I is a countable index set and : N
I a bijective map, then define
X X
aj := a(j) (7.71)
jI j=1

(compare the definition in (3.15c) for finite


P sums). Note that the definition depends on
, which is suppressed in the notation jI aj .

For sequences in K, the notion of convergence is available, and, thus, it is also available
for series arising from real or complex sequences (as such series are, again, sequences in
K).

Definition 7.78. If (sn )nN is a series with the sn defined as in (7.69) and with sum-
mands aj K, then the series is called convergent with limit s K if, and only if,
limn sn = s in the sense of (7.1). In that case, one writes

X
aj = s (7.72)
j=1

and calls s the sum of thePseries. The series P is called divergent if, and only if, it is
not convergent. We write j=1 aj = (resp.

j=1 aj = ) if, and only if, (sn )nN
diverges to (resp. ) in the sense of Def. 7.18.
P
Caveat 7.79. One has to use care as the symbol j=1 aj is used with two completely
different meanings. If it is used according to (7.70), then it means a sequence; if it is
used according to (7.72), then it means a real or complex number (or, possibly, or
). It should always be clear from the P context, if it means a sequence or a number.
For example, in the statement the series j=1 2
j
is convergent, it means a sequence;
P j
whereas in the statement j=1 2 = 1, it means a number.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 96

P
Example 7.80. (a) For each q C with |q| < 1, j
j=0 q is called a geometric series.
From (3.18b) (the reader is asked to go back and check that (3.18b) andPits proof,
indeed, remain valid for each q C), we obtain the partial sums sn = nj=0 q j =
1q n+1
1q
Since |q| < 1, we know limn q n+1 = 0 from Ex. 7.6. Thus, the series is
.
convergent with

X 1 q n+1 1
q j = lim sn = lim = . (7.73)
|q|<1 n n 1 q 1 q
j=0

(b) In Ex. 7.30, we obtained the divergence of the harmonic series:



X 1
= . (7.74)
k=1
k
P P
Corollary 7.81. Let j=1 aj and j=1 bj be convergent series in C.

(a) Linearity:

X
X
X
( aj + bj ) = aj + bj . (7.75)
,C
j=1 j=1 j=1

(b) Complex Conjugation:



X
X
aj = aj . (7.76)
j=1 j=1

(c) Monotonicity:
 
X
X
aj , b j R aj b j aj bj . (7.77)
jN
j=1 j=1
P P
(d) Each P
remainder series P j=n+1 a j , n N, converges, and, letting S := j=1 aj ,
sn := nj=1 aj , rn := a
j=n+1 j , one has
 
S = sn + rn , lim an = lim rn = 0. (7.78)
nN n n

Proof. (a) follows from the first two identities of Th. 7.13(a), (b) is due to

X n
X n
X n
X
X
Def. and Rem. 5.5(a) (7.11f)
aj = lim aj = lim aj = lim aj = aj ,
n n n
j=1 j=1 j=1 j=1 j=1

(c) follows from Th. 7.13(c), and, for (d), one computes
lim an = lim (sn sn1 ) = S S = 0,
n n
Xk
rn = lim aj = lim (sk sn ) = S sn ,
nN k k
j=n+1

lim rn = lim (S sn ) = S S = 0,
n n
completing the proof. 
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 97

7.3.2 Convergence Criteria


P +
Pn
Corollary 7.82.PLet j=1 aj be series such that all aj R0 . If sn := j=1 aj are the

partial sums of j=1 aj , then
(
sup{sn : n N} if (sn )nN is bounded,
lim sn = (7.79)
n if (sn )nN is not bounded.

Proof. Since (sn )nN is increasing, (7.79) is a consequence of (7.21). 


P P
Theorem 7.83. Let j=1 aj and j=1 bj be series in C such that |aj | |bj | holds for
each j k for some fixed k N.
P P
(a) If j=1 |bj | is convergent, then aj is convergent as well, and, moreover,
j=1

X X

aj |bj |. (7.80)

j=k j=k

P P
(b) If j=1 aj is divergent, then j=1 |bj | is divergent as well.

Proof. Since
Pn(b) is merely the P contraposition of (a), it suffices toP
prove (a). ToP
this end,
let sn := j=1 aj and tn := j=1 |bj | be the partial sums of j=1 aj and
n
j=1 |bj |,
respectively. Since (tn )nN converges, it must be a Cauchy sequence by Th. 7.29. Thus,
|tn tm | = |bm+1 | + + |bn | <
R+ N N, n>m>N
N k

and the triangle inequality for finite sums implies


|sn sm | = |am+1 + + an | |am+1 | + + |an |
+
R N N, n>m>N |bm+1 | + + |bn | < ,
N k

showing (sn )nN is a Cauchy sequence as well, i.e. Pnconvergent


Pby Th. 7.29. Since the
n
triangle inequality for finite sums also implies j=k aj j=k |bj | for each n k,

(7.80) is now a consequence of Th. 7.13(c). 
P
Definition 7.84. A series j=1 aj in R is called alternating if, and only if, its summands
alternate between positive and negative signs, i.e. if sgn(aj+1 ) = sgn(aj ) 6= 0 for each
j N.
P
Theorem 7.85 (Leibniz Criterion). Let j=1 aj be an alternating series. If the sequence
(|an |)nN of absolute values is strictly decreasing and limn an = 0, then the series is
convergent and
X
rn := aj = n an+1 , (7.81)
nN 0<n <1
j=n+1

that means the error made when approximating the limit by the partial sum sn has the
same sign as the first neglected summand an+1 , and its absolute value is less than |an+1 |.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 98

Proof. We first consider the case where a1 > 0, i.e. where there exists a strictly de-
creasing sequence of positive numbers (bn )nN such that an = (1)n+1 bn . As the bn are
strictly decreasing, we obtain bn bn+1 > 0 for each n N, such that the sequences
(un )nN and (vn )nN , defined by
n
X
un := s2n = (b2j1 b2j ) = (b1 b2 ) + (b3 b4 ) + + (b2n1 b2n ),
nN
j=1
n
X
vn := s2n+1 = b1 (b2j b2j+1 )
nN
j=1

= b1 (b2 b3 ) (b4 b5 ) (b2n b2n+1 ),


are strictly monotone, namely (un )nN strictly increasing and (vn )nN strictly decreasing.
Since, 0 < un < un + b2n+1 = vn < b1 for each n N, both sequences (un )nN and
(vn )nN are also bounded, and, thus, convergent by Th. 7.19, i.e. U := limn un R
and V := limn vn R. Since
V U = lim (vn un ) = lim (s2n+1 s2n ) = lim a2n+1 = 0,
n n n

we obtain U = V P and limn sn = U and 0 < U < b1 = a1 . In particular, there is


]0, 1[ satisfying j=1 aj = a1 .
P P
In
P the case a 1 < 0, the above proof yields convergence of j=1 a j = j=1 (aj ) with

Pj=1 (aj ) = (a1 ) for a suitable ]0, 1[. However, this then yields, as before,

j=1 aj = a1 .
P
Applying the above result to each remainder series j=n+1 aj , n N, completes the
proof of (7.81) and the theorem. 
Example 7.86. (a) Each of the following alternating series clearly converges, as the
Leibniz criterion of Th. 7.85 clearly applies in each case:

X (1)j+1 1 1
=1 + +..., (7.82a)
j=1
j 2 3

X (1)j+1 1 1
=1 + +..., (7.82b)
j=1
2j 1 3 5

X (1)j+1 1 1 1
= + +... (7.82c)
j=1
ln(j + 1) ln 2 ln 3 ln 4

(b) To see that Th. 7.85


P is false without its monotonicity requirement, take any diver-
gent series with j=1 aj = , 0 <Paj , limj aj = 0 (for example the harmonic
series), any convergent series with +
j=1 cj = s R and 0 < cj (for example any
geometric series with 0 < q < 1), and define
(
a(n+1)/2 for n odd,
dn :=
cn/2 for n even.
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 99

P
an exercise to show that
It is P j=1 dj is an alternating series with limn dn = 0

and j=1 dj = .
P
Definition
P 7.87. The series j=1 aj in C is said to be absolutely convergent if, and
only if, j=1 |a j | is convergent.
P
Corollary 7.88. Every absolutely convergent series j=1 aj is also convergent and
satisfies the triangle inequality for infinite series:

X X

aj |aj |. (7.83)

j=1 j=1

Proof. The corollary is given by the special case aj = bj for each j N of Th. 7.83(a). 
P
Theorem 7.89. We consider the series j=1 aj in C.

P
(a) If +
Pcj is a convergent series such that cj R0 and |aj | cj for each j N,
j=1
then j=1 aj is absolutely convergent.

(b) Root Test:


 
p
n
( |an | q < 1 for almost all n N)
0<q<1

X
aj is absolutely convergent, (7.84a)
j=1
n o
X
p
# n N : n |an | 1 = aj is divergent. (7.84b)
j=1

(c) Ratio Test: If all an 6= 0, then


  
an+1
an q < 1 for almost all n N

0<q<1

X
aj is absolutely convergent, (7.85a)
j=1

an+1 X
an 1 for almost all n N aj is divergent. (7.85b)

j=1

Proof. (a) is just another special case of Th. 7.83(a).


p
(b): If there is q ]0, 1[ and N N such that n |an | q for each n > N , i.e. |an | q n
P PN
for each n > N , then, by (7.73), 1
j=1 |aj | is bounded by 1q + j=1 |aj | and, thus,
p
convergent. If n |an | 1 for infinitely many n N, then |an | 1 for infinitely
P many
n N, showing that (an )nN does not converge to 0, proving the divergence of j=1 aj .
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 100


(c): If there is q ]0, 1[ and N N such that an+1

an
q for each n > N , then,
letting C := |aN +1 |, an induction
P P shows |aN +1+k | Cq k for each k N, i.e., by (7.73),
C N +1
j=1 |aj | is bounded by 1q + j=1 |aj | and, thus, convergent. If there is N N such

that an+1

an
1 for each n > N , then |an | |aN +1 | > 0 for each n > N , showing (an )nN
P
does not converge to 0 and proving the divergence of j=1 aj . 
p
Caveat 7.90. In (7.84a), it does not suffice to have n |an | < 1 to conclude convergence,
and, likewise, | an+1 an
| < 1 does not suffice in (7.85a): As a counterexample, consider
p
the harmonic series, which does not converge, but n 1/n < 1 for each n 2 and
1/(n+1) n
1/n
= n+1 < 1 for each n N.
P
Example 7.91. (a) For each z C with |z| < 1and each p N0 , the series p n
n=1 n z
is absolutely convergent: We have limn n p
n = 1 as a consequence of Ex. 7.65.
p p
This implies limn |an | = limn n |z|n = |z| < 1. Thus, the root test of
n n p

(7.84a) applies and proves convergence of the series.


P
(b) Let z C. The series z n n!
n=1 nn is absolutely convergent for |z| < e and divergent
for |z| e, where e is Eulers number from (7.51). We have, for each n N,

an+1 |z| (n + 1) nn |z| |z|
an = (n + 1)n+1 = 1 + 1 n e for n . (7.86)

n

Thus, the ratio test of (7.85a) applies and proves absolute convergence of the series
for |z| < e. For |z|
 > e, (7.85b) applies and proves divergence. Since, according to
1 n
Ex. 7.66, 1 + n < e for each n N, (7.85b) applies to prove divergence also for
|z| = e.

7.3.3 Absolute Convergence and Rearrangements

In general, one has to use care when dealing with infinite series, as convergence properties
and even the limit in case of convergence can depend on the order of the summands (in
obvious contrast to the situation of finite sums). For real series that are convergent,
but not absolutely convergent, one has the striking Riemann rearrangement theorem
(provided as Th. C.2 of the Appendix), that states one can choose an arbitrary number
S R {, } and reorder the summands such that the new series converges to S
(actually, Th. C.2 says even more, namely that one can prescribe an entire interval of
cluster points for the rearranged series). However, the situation is better for absolutely
convergent series. In the present section, we will prove results that show the sum of
absolutely convergent series does not depend on the order of the summands.
P P
Theorem 7.92. Let j=1 aj and j=1 bj be
Pseries in C such that (bn )nN is a reordering

of (an )nN inPthe sense P
P of Def. 7.21. If j=1 aj is absolutely convergent, then so is

b
j=1 j and a
j=1 j = b
j=1 j .
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 101

Pn Pn Pn
Proof. Let sn := j=1 aj , sn := j=1 |aj |, and tn := j=1 bj denote the respective
partial sums. We will show that limn (sn tn ) = 0. Given > 0, since (sn )nN is a
Cauchy sequence by Th. 7.29, there exists N N, such that

|sn sm | = |am+1 | + + |an | < .


n>m>N

Since (bn )nN is a reordering of (an )nN , there exists a bijective map : N N such
that bn = a(n) for each n N. Since is bijective, there exists M N such that
{1, 2, . . . , N + 1} {1, 2, . . . , M }. Then n > M implies (n) > N + 1, and

|sn tn | |aN +2 | + + |aN +k | < ,


n>M kN

since all aj with j N +1 occur in both sn and tn and cancel in sn tn (i.e. all aj that do
not cancel must have an index j > N + 1). So we have shown that limn (sn tn ) = 0,
which, in turn, implies

X
X
X
bj = lim tn = lim (tn sn + sn ) = 0 + aj = aj .
n n
j=1 j=1 j=1

Pn P P
ApplyingPthis to sn := j=1 |aj | yields j=1 |bj | = j=1 |aj |, proving absolute conver-
gence of j=1 bj . 

Theorem 7.93. Let I be an arbitrary infinite countable index set and let

[
I= In (7.87)
nN

be a disjoint decomposition of I into (empty, finite, or infinite) countable index sets In .


P
(a) If the series jI aj (cf. (7.71)) is absolutely convergent, then

X X
X
aj = a . (7.88)
jI n=1 In

(b) The following statements are equivalent:


P
(i) jI aj is absolutely convergent.
P
(ii) There exists a constant C R+
0 such that jJ |aj | C for each finite subset
J of I.
P P
(iii) n=1 In |a | < .

Proof. The proof needs some work and is provided in Appendix C.2. 
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 102

Example 7.94. We apply Th. 7.93 to so-called double series, i.e. to series with index
set I := N N. The following notation is common:

X X
amn := a(m,n) , (7.89)
m,n=1 (m,n)NN

where one writes amn (also am,n ) instead of a(m,n) . Recall from Th. 3.24 that N N is
countable. In general, the convergence properties of the double series and, if it exists,
the value of the sum, will depend on the chosen bijection : N N N.
However, we will now assume our double series to be absolutely convergent. Then Th.
7.92 guarantees the sum does not depend on the chosen bijection and we can apply Th.
7.93. Applying Th. 7.93 to the decompositions

[
NN= {(m, n) : n N}, (7.90a)
mN

[
NN= {(m, n) : m N}, (7.90b)
nN

[
NN= {(m, n) N N : m + n = k}, (7.90c)
kN

yields
X X
X X
X
(7.90a) (7.90b)
a(m,n) = amn = amn
(m,n)NN m=1 n=1 n=1 m=1

X X X
X k1
(7.90c)
= amn := am,km . (7.91)
k=2 m+n=k k=2 m=1

Theorem 7.95. ItP is possible to compute


P the product of two absolutely convergent (real

or complex) series m=1 am and m=1 bm as a double series:

! ! Xk1
X X X X X
am bm = am b n = am bkm = ck ,
m=1 m=1 m,n=1 k=2 m=1 k=2
(7.92)
k1
X
where ck := am bkm = a1 bk1 + a2 bk2 + + ak1 b1 .
m=1

This form of computing the product is known as a Cauchy product.


P
Proof.
P We first showPthat m,n=1 am bn is absolutely convergent: By letting A :=

m=1 |am | and B := m=1 |bm |, we obtain

X
X
X 
|am bn | = |am | B = AB < ,
m=1 n=1 m=1
7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 103

P
i.e. m,n=1 am bn is absolutely convergent according to Th. 7.93(b)(iii). Now the second
equality in (7.92) is just the third equality in (7.91), and the first equality in (7.92) also
follows from (7.91):
X
! !
X X X X X X
am b n = am b n = am bn = am bm ,
m,n=1 m=1 n=1 m=1 n=1 m=1 m=1

completing the proof. 

Theorem 7.95 will be useful in Sec. 8.2 below.

7.3.4 b-Adic Representations of Real Numbers

We are mostly used to representing real numbers in the decimal system. For example,
we write
395 X
x= = 131.6 = 1 102 + 3 101 + 1 100 + 6 10n , (7.93a)
3 n=1

where  
X
n (7.73) 1 1 2
6 10 = 6 1 1 =6 = .
n=1
1 10 9 3
The decimal system represents real numbers as, in general, infinite series of decimal
fractions. Digital computers represent numbers in the dual system, using base 2 instead
of 10. For example, the number from (7.93a) has the dual representation

X
x = 10000011.10 = 27 + 21 + 20 + 2(2n+1) , (7.93b)
n=0

where it is an exercise to verify



X 2
2(2n+1) = .
n=0
3

Representations with base 16 (hexadecimal) and 8 (octal) are also of importance when
working with digital computers. More generally, each natural number b 2 can be used
as a base.
Definition 7.96. Let b 2 be a natural number.

(a) Given an integer N Z and a sequence (dN , dN 1 , dN 2 , . . . ) in {0, . . . , b 1}, the


series
X
dN bN (7.94)
=0

is called a b-adic series. The number b is called the base or the radix, and the
numbers d are called digits.
8 CONVERGENCE OF K-VALUED FUNCTIONS 104

(b) If x R+0 is the sum of the b-adic series given by (7.94), than one calls the b-adic
series a b-adic representation or a b-adic expansion of x.

Theorem 7.97. Given a natural number b 2 and a nonnegative real number x


R+0 , there exists a b-adic series representing x, i.e. there is N Z and a sequence
(dN , dN 1 , dN 2 , . . . ) in {0, . . . , b 1} such that

X
x= dN bN . (7.95)
=0

If one introduces the additional requirement that 0 6= dN , then each x > 0 has either a
unique b-adic representation or precisely two b-adic representations. More precisely, for
0 6= dN and x > 0, the following statements are equivalent:

(i) The b-adic representation of x is not unique.

(ii) There are precisely two b-adic representations of x.

(iii) There exists a b-adic representation of x such that dn = 0 for each n n0 for
some n0 < N .

(iv) There exists a b-adic representation of x such that dn = b 1 for each n n0 for
some n0 N .

Proof. The proof is a bit lengthy and is provided in Appendix C.3. 

Example 7.98. Every natural number has precisely two decimal (i.e. 10-adic) repre-
sentations. For instance,
 
X
n (7.73) 1 1
2 = 2.0 = 1.9 = 1 + 9 10 = 1+9 1 1 =1+9 , (7.96)
n=1
1 10 9

and analogously for all other natural numbers.

8 Convergence of K-Valued Functions

8.1 Pointwise and Uniform Convergence


So far we have studied the convergence of sequences in K. We will now also need to
study the convergence of sequences (fn )nN , where each member fn of the sequence is
a function fn : M K, M C. Here, for the first time, we encounter the situation
that there exist different useful notions of convergence for such sequences.

Definition 8.1. Let (fn )nN be a sequence of functions, fn : M K, 6= M C.


8 CONVERGENCE OF K-VALUED FUNCTIONS 105

(a) We say (fn )nN converges pointwise to f : M K if, and only if, limn fn (z) =
f (z) for each z M , i.e. if, and only if,
|fn (z) f (z)| < . (8.1)
zM R+ N N n>N

So, in general, N in (8.1) depends on both z and .


(b) We say (fn )nN converges uniformly to f : M K if, and only if,
|fn (z) f (z)| < . (8.2)
R+ N N n>N zM

In (8.2), N is still allowed to depend on , but, in contrast to the situation of (8.1),


not on z in that sense, the convergence is uniform in z.
Remark 8.2. It is immediate from Def. 8.1(a),(b) that uniform convergence implies
pointwise convergence, but Ex. 8.3(b) below will show the converse is not true.
Example 8.3. (a) Let =
6 M C (for example M = [0, 1] or M = B1 (0)), and
fn : M K, fn (z) = 1/n for each n N. Then, clearly, (fn )nN converges
uniformly to f 0.
(b) The sequence (fn )nN , where fn : [0, 1] R, fn (x) := xn , converges pointwise,
but not uniformly, to
(
0 for 0 x < 1,
f : [0, 1] R, f (x) := (8.3)
1 for x = 1 :
For x = 1, limn xn = limn 1 = 1, and, for 0 x < 1, limn xn = 0 by Ex.
7.6. To see that the convergence is not uniform, consider := 21 . Then, for every
n N, according to the intermediate value Th. 7.57, there exists n ]0, 1[ such
that fn (n ) = nn = 12 , i.e.
1
|fn (n ) f (n )| = nn = = , (8.4)
nN 2
proving the convergence is not uniform.
Theorem 8.4. Let (fn )nN be a sequence of functions, fn : M K, 6= M C. If
(fn )nN converges uniformly to f : M K and all fn are continuous at M , then
f is also continuous at . In particular, if each fn is continuous, then so is f (uniform
limits of continuous functions are continuous).

Proof. Let > 0. Due to the uniform convergence of (fn )nN ,



|fm (z) f (z)| < . (8.5)
mN zM 3
Due to the continuity of fm in ,

|fm (z) fm ()| < . (8.6)
>0 zM B () 3
Thus,

|f (z)f ()| |f (z)fm (z)|+|fm (z)fm ()|+|fm ()f ()| < 3 = , (8.7)
zM B () 3
proving continuity of f in . 
8 CONVERGENCE OF K-VALUED FUNCTIONS 106

8.2 Power Series


Definition 8.5. (a) In Def. 7.77, it was mentioned that series can be formed from each
sequence in a set A, where an addition is defined. Letting 6= M C, we now
consider A := F(M, K), i.e. the set of functions from M into K. Then the addition
on A is defined according to (6.1a) and, given a sequence of functions (fn )nN in A,
the series
X
fj := (sn )nN (8.8)
j=1
Pn
is defined as the sequence of partial sums sn := j=1 fj .
(b) Given a sequence of functions (fn )nN , where fn : K K, fn (z) = an z n with
an K, then
X
X
aj z j := fj (8.9)
j=0 j=0

is called a power series


P and the aj are called the coefficients of the power series.
j
Note: The notation j=0 aj z introduced in (8.9) is very common, but not entirely
correct, since one writes aj z j = fj (z) for the summands, even though one actually
means fj . Moreover, one uses the same notation if one actually
P P does mean the series
j
j=0 fj (z) in K, so one has to see from the context if j=0 aj z means a series of
K-valued functions or a series of numbers.
P
Definition 8.6. P Consider a series of K-valued functions j=1 fj as in Def. 8.5(a), in
n
particular, sn := j=1 fj for each n N.

(a) The series converges pointwise to f : M K if, and only if, it (i.e. (sn )nN )
converges pointwise in the sense of Def. 8.1(a). In that case, we use the notation

X
f= fj . (8.10)
j=1

If (8.10) holds, then the series is sometimes called a series expansion of f , in par-
ticular, a power series expansion if the series happens to be a power series.
P
Analogous to the situation of series in K, the notation j=1 fj is also used with
two different meanings it can mean the sequence of partial sums as in (8.8) or, in
the case of convergent series, the limit function as in (8.10) (cf. Caveat 7.79).
(b) The series converges uniformly to f : M K if, and only if, it converges uniformly
in the sense of Def. 8.1(b).
P
Corollary 8.7. Consider a function series j=1 fj with fj : M K, 6= M C.

(a) The series converges uniformly to some f : PM K if, and only if, for each
n N and each z M , the remainder series j=n+1 fj (z) in K converges to some
rn (z) K such that
+ |rn (z)| < . (8.11)
R N N n>N zM
8 CONVERGENCE OF K-VALUED FUNCTIONS 107

P
(b) If j=1 aj is a convergent series in R+
0 , then the condition

|fj (z)| aj (8.12)


zM jN

P
implies uniform convergence of j=1 fj .

(c) If each fj is continuous in M and the series converges uniformly to f : M


K, then f is continuous in . In particular, if each fj is continuous, then f is
continuous.
P P
Proof. (a): If j=1 fj converges uniformly to f , then f (z) = j=1 fj (z) holds for each
z M, P rn (z) = f (z) sn (z) for each n N, z M according to (7.78), where
sn (z) := nj=1 fj (z). Then (8.11) is just (8.2), where the sn now play the role of the fn
series converge for each z M , then we can define
in (8.2). Conversely, if the remainder P
f : M K, f (z) := f1 (z)+r1 (z) = j=1 fj (z). Then, once again, rn (z) = f (z)sn (z)
for each n N, z M , and (8.11) is just (8.2), yielding the uniform convergence of the
series.
P
(b): First, (8.12) implies each remainder series j=n+1 fj (z) converges absolutely. Thus,
with rn (z) as in (a),

(7.83) X X
|rn (z)| |fj (z)| aj 0 for n ,
zM
j=n+1 j=n+1

such that (a) yields uniform convergence.


(c) is immediate from Th. 8.4. 
P
RemarkP8.8. Given a function series j=1 fj with fj : M K, 6= M C; for each

z PM , j=1 fj (z) constitutes a series in K. Typically, one will only have convergence
of j=1 fj (z) in K on a subset C M . The series then converges pointwise in the
sense
Pof Def. 8.6(a) if all fj are restricted to C. It can be very difficult to determine
if j=1 fj (z) converges or diverges for some z M , and such investigations are often
of particular interest in the context of function series. Even for power series, studying
convergence can still be difficult, but the availability of the following Th. 8.9 does help
to (at least partially) settle the question in many cases.
P
Theorem 8.9. For each power series j
j=0 aj z , aj K, there exists a number r
+
[0, ] := R0 {}, called the radius of convergence of the power series, such that
 
X
z K |z| < r aj z j converges absolutely in K, (8.13a)
j=0
  X
z K |z| > r aj z j diverges in K (8.13b)
j=0
8 CONVERGENCE OF K-VALUED FUNCTIONS 108

P r = j , (8.13a) claims absolute convergence for each z K). In particular,


(for
j=0 aj z converges pointwise in the sense of Def. 8.6(a) for each z Br (0) (cf. Def.
7.7(a)). Moreover,

!
X converges uniformly on B r0 (0) (cf. Ex. 7.47(a))
aj z j . (8.14)
0<r0 <r
j=0
in the sense of Def. 8.6(b)

For the radius of convergence, one has the formula


1 p
n
r= , where L := lim sup |an |. (8.15)
L n

In (8.15), lim sup denotes the pso-called limit superior, which is defined as the largest
cluster point of the sequence ( n |an |)nN if the sequence is bounded (cf. Th. 7.27) and
if the sequence is unbounded. As the limit superior can be 0 or , we also define
1/0 := and 1/ := 0 in (8.15).
One has the simpler formula
an
r = lim , (8.16)
n an+1

provided all an are nonzero and provided the limit in (8.16) either exists in R+
0 or is .

Proof. For the proof of (8.15), we apply theproot test from Th. 7.89(b). Here, for the
root test, we have to consider the sequence ( n |an ||z|n )nN . As a consequence of (7.11a)
and Prop. 7.26, lim supn (xn ) = lim supn xn for each > 0 and each sequence
(xn )nN in R (with := , this also holds if the limit superior is infinite). Thus,
p p
lim sup n |an ||z|n = |z| lim sup n |an | = |z| L.
n n

If |z| > 1/L, then |z| L > 1 and (7.84b) applies, i.e. (8.13b) holds for r = 1/L. If
|z| < 1/L, then |z| L < 1, and, recalling the Bolzano-Weierstrass Th. 7.27, one sees that
(7.84a) applies, i.e. (8.13a) holds for r = 1/L.
P
Next, if 0 < r0 < r, then j
j=0 |aj r0 | converges according to (8.13a). Since, for each
z Br0 (0) and each j N, we have |aj z j | |aj r0j |, (8.14) is a consequence of Cor.
8.7(b).
The validity of (8.16) follows from the ratio test of Th. 7.89(c): If all an 6= 0 and z 6= 0,
then
an+1 z n+1 an+1 |z|
lim = |z| lim =
an .

n an z n n an
limn an+1

an
If |z| < l := limn an+1 , then |z|/l < 1, i.e. (7.85a) applies, proving (8.13a) for r = l.
If |z| > l, then |z|/l > 1, i.e. (7.85b) applies, proving (8.13b) for r = l. 
8 CONVERGENCE OF K-VALUED FUNCTIONS 109

P
Corollary 8.10. If j
j=0 aj z , aj K, is a power series with radius of convergence
r ]0, ], then the function

X
f : Br (0) K, f (z) := aj z j , (8.17)
j=0

is continuous. In particular, if r = , then f is continuous on K.


P
Proof. Each partial sum z 7 nj=0 aj z j is a polynomial, i.e. continuous on K. Moreover,
if Br (0), then the power series converges uniformly on M := B|| (0) by (8.14), i.e.
it is continuous at M by Th. 8.4. 
P n
Example 8.11. (a) For each R, the radius of convergence of n=1 n z is r = 1,
since p
lim sup n |an | = lim n n = 1, (8.18)
n n

which, for each Z, follows from (7.48) and Th. 7.13(a), and, then, for all R
from the Sandwich Th. 7.16.
Let us investigate what can happen for |z| = r = 1 for some cases: The series
P n
n=1 z ( = 0) is divergent for each z C with z = 1 by the observation that
(z )nNPdoes not converge to 0 for n (as |z n | = 1 for each n N); the
n

series n=1 n
1 n
z ( = 1) is the harmonic series, i.e. divergent, for z = 1, but
convergent for z = 1 according to Ex. 7.86(a).
P P z n
(b) The radius of convergence of both zn
n=0 n! and n=0 nn is r = by (8.16) and
(8.15), respectively, since

an
lim = lim (n + 1)! = lim (n + 1) = , (8.19a)
n an+1 n n! n
r
pn n 1 1
lim sup |an | = lim n
= lim = 0. (8.19b)
n n n n n

P
(c) The radius of convergence of n
n=0 n! z is r = 0 by (8.16), since

an n! 1
lim = lim = lim = 0. (8.20)
n an+1 n (n + 1)! n n + 1
P
Caveat 8.12. Theorem 8.9 does not claim the uniform convergence of aj z j on
P j=0
Br (0), which is usually not true (e.g., it is an exercise to show that j=0 z j does not
converge uniformlyPon B1 (0)). Theorem 8.9 also claims nothing about the convergence
j
or divergence of j=0 aj z for |z| = r, which has to be determined case by case (cf. Ex.
8.11(a)).
P j
Definition and Remark 8.13. Given two power series p := j=0 aj z and q :=
P j
j=0 bj z in K, we define their Cauchy product

j
X X
j
p q := cj z , where cj := ak bjk = a0 bj + a1 bj1 + + aj b0 . (8.21)
j=0 k=0
8 CONVERGENCE OF K-VALUED FUNCTIONS 110

Note that we have not assumed any convergence of the series so far, i.e. p, q, and p q
are not K-valued functions, but sequences of K-valued functions according to Def. 8.5
(sequences of polynomials, actually). Sometimes one also calls the Cauchy product p q
the convolution of p and q.
Now, if we do assume p and q to have some nonzero radii of convergence, say rp , rq
]0, ], respectively, then, by (8.13a), both series are absolutely convergent for each
z Br (0), where r := min{rp , rq }. Thus, the functions

X
X
j
f : Br (0) K, f (z) := aj z , g : Br (0) K, g(z) := bj z j , (8.22)
j=0 j=0

are well-defined, and (7.92) implies



X
f (z)g(z) = cj z j with cj as in (8.21). (8.23)
zBr (0)
j=0

8.3 Exponential Functions


The notion of power series allows us to extend the definition of exponential functions to
complex arguments:
Definition and Remark 8.14. We define the exponential function

X zn z2 z3
exp : C C, exp(z) := =1+z+ + + ... (8.24)
n=0
n! 2! 3!

From Ex. 8.11(b), we already know the radius of convergence of the power series in
(8.24) is , such that the function in (8.24) is well-defined.
For the time being, we also redefine Eulers number as e := exp(1) > 1 > 0 and, for each
x R+ , ln x := logexp(1) (x). This, as well as calling the function of (8.24) exponential
function, will be justified as soon as we will have proved
 n X
1 1 1 1 1
lim 1 + = = 1 + + + + ... (8.25)
n n n=0
n! 1! 2! 3!

and
X xn x2 x3
ex = =1+x+ + + ... (8.26)
xR
n=0
n! 2! 3!
in (8.36) of Th. 8.18 and in Th. 8.16(c) below, respectively.
Proposition 8.15. If a continuous function E : R R satisfies

a := E(1) > 0 and (8.27a)


E(x + y) = E(x)E(y), (8.27b)
x,yR
8 CONVERGENCE OF K-VALUED FUNCTIONS 111

then f is an exponential function more precisely


E(x) = ax . (8.28)
xR

Proof. First, a = E(1) = E(0 + 1) = E(0)E(1) = E(0) a and a > 0 shows E(0) = 1.
1
Then, for each x R, 1 = E(0) = E(x x) = E(x)E(x), i.e. E(x) = E(x) ,
showing E(x) 6= 0 for each x R. Thus, E(1) > 0, the continuity of E, and the
intermediate value Th. 7.57 imply E(x) > 0 for each x R. Next, an induction shows
n
E(n x) = E(x) : (8.29)
xR nN

The base case is trivially true and the induction step is


ind. hyp.
E((n + 1)x) = E(nx)E(x) = (E(x))n E(x) = (E(x))n+1 .
Applying (8.29) with x = 1 shows E(n) = an for each n N. Applying (8.29) with
x = 1/n, n N, shows a = E(1) = (E(1/n))n , i.e. E(1/n) = a1/n since E(1/n) > 0.
Next,
(8.29) k k
E(k/n) = E(1/n) = (a1/n )k = a n ,
n,kN

showing (8.28) holds for each x Q+ . Then (8.28) also holds for each x R+ , since, if
(qn )nN is a sequence in Q+ with limn qn = x, then the continuity of E implies
ax = lim aqn = lim E(qn ) = E(x).
n n

Finally, if x R , then
1
ax = (ax )1 = E(x) = E(x),
completing the proof that (8.28) holds for each x R. 
Theorem 8.16. We consider the exponential function exp as defined in (8.24). The
following holds:

(a) exp is continuous on C.


(b) exp(z + w) = exp(z) exp(w) is valid for all z, w C.
(c) With e := exp(1) (cf. Def. and Rem. 8.14), it is

x
X xn
e = exp(x) = .
xR
n=0
n!

Proof. (a) holds by Cor. 8.10; for (b), we compute (using (7.92)),

X
exp(z) exp(w) = cn ,
n=0
n n   ;
z,wC
X z j wnj 1 X n j nj (5.23) (z + w)n

where cn = = z w =
j=0
j! (n j)! n! j=0 j n!
(8.30)
and then (c) is an immediate consequence of (a), (b), and Prop. 8.15. 
8 CONVERGENCE OF K-VALUED FUNCTIONS 112

Definition 8.17. Let M C. If C is a cluster point of M , then a function


f : M K is said to tend to K (or to have the limit K) for z
(denoted by limz f (z) = ) if, and only if, for each sequence (zk )kN in M \ {} with
limk zk = , the sequence (f (zk ))kN converges to K, i.e.
 
lim f (z) = lim zk = lim f (zk ) = . (8.31)
z (zk )kN in M \{} k k

Theorem 8.18. We consider the exponential function exp as defined in (8.24). With
ez := exp(z) for each z C and ln x := logexp(1) (x) for each x R+ (cf. Th. 8.16(c)
and Def. and Rem. 8.14), we have the following limits:
ez 1 
lim = 1 z M := C \ {0} , (8.32)
z0 z
ln(1 + x) 
lim = 1 x M :=] 1, [\{0} , (8.33)
x0 x
1 
lim ln(1 + x) x = x M := {x R : 1 + x > 0} \ {0} , (8.34)
R x0
1 
lim (1 + x) x = e x M := {x R : 1 + x > 0} \ {0} , (8.35)
R x0

 x n x
X xn
lim 1 + =e = . (8.36)
xR n n n=0
n!

Proof. (8.32): From (8.24) and ez = exp(z), we obtain



ez 1 X z n z z2
= =1+ + + ...,
z6=0 z n=0
(n + 1)! 2! 3!
P zn
which, since z 7 n=0 (n+1)! is continuous on C by Cor. 8.10, implies (8.32).
(8.33): Consider the auxiliary function f : ] 1, [: R, f (x) := ln(x + 1), with
f 1 (x) = ex 1. Now, given a sequence (xk )kN in ] 1, [\{0} with limk xk = 0,
one obtains
 
ln(1 + xk ) ln 1 + f 1 (f (xk )) ln 1 + ef (xk ) 1
lim = lim = lim
k xk k f 1 (f (xk )) k ef (xk ) 1
f (xk ) (8.32)
= lim f (x ) = 1,
k e k 1

where, in the last step, it was used that limk xk = 0 and the continuity of f implies
limk f (xk ) = ln 1 = 0.
Similarly, but simpler, one obtains (8.34) and (8.35) (exercise). Finally, for the sequence
(xn )nN with xn := 1/n, (8.35) implies (8.36). 
Definition 8.19 (Exponentiation with Complex Exponents). For each (a, z) R+ C,
we define
az := exp(z ln a), (8.37)
where exp is the function defined in (8.24). For a = e, (8.37) yields ez = exp(z), i.e.
(8.37) is consistent with (8.26).
8 CONVERGENCE OF K-VALUED FUNCTIONS 113

Theorem 8.20. (a) The first two exponentiation rules of (7.56) still hold for each
a, b > 0 and each z, w C:

az+w = az aw , (8.38a)
az bz = (ab)z . (8.38b)

(b) For each a R+ , the exponential function

f : C C, f (z) := az , (8.39a)

is continuous, and, for each C, the power function

g : R+ C, g(x) := x , (8.39b)

is continuous.

(c) The limit in (8.36) extends to complex numbers:



 z n X zn
lim 1+ = ez = . (8.40)
zC n n n=0
n!

Proof. (a): We compute


(8.37)
az+w = exp((z + w) ln a) = exp(z ln a + w ln a)
Th. 8.16(b) (8.37)
= exp(z ln a) exp(w ln a) = az aw ,

proving (8.38a), and


(8.37) Th. 8.16(b)
az b z = exp(z ln a) exp(z ln b) = exp(z ln a + z ln b)
(7.67e)  (8.37)
= exp z ln(ab) = (ab)z ,

proving (8.38b).
(b): The continuity of both functions follows from the continuity of exp (according to
Th. 8.16(a)) and from the fact that continuity is preserved by compositions (according
to Th. 7.41): The exponential function f , given by f (z) = ez ln a , is the composition of
the continuous functions z 7 z ln a and w 7 ew , whereas (analogous to Ex. 7.76(a)),
the power function g, given by g(x) = e ln x , is the composition g = exp ( ln), where
ln is continuous by Cor. 7.74.
(c): We have to show that



z n X z k
lim 1 + = 0.
n n k=0
k!
8 CONVERGENCE OF K-VALUED FUNCTIONS 114

Given > 0, choose K N such that



X (|z| + 1)k
< .
nK
k=n
k! 3

We continue by using (5.23) to estimate




 n X k
z z
An := 1 + Rn + Sn + T,
nN n k=0
k!

where
K1 n  
X  n

zk z k X n |z|k X |z|k
Rn := , Sn := , T := .
nN
k=0
k nk k! k=K
k nk k=K
k!

We proceed to estimate each of the three terms Rn , Sn , and T , starting with the last:

X |z|k X (|z| + 1)k
T = < < .
k=K
k! k=K
k! 3

To estimate Sn , we first estimate


  k
n 1 n! 1 Ynk+j 1
k
= k
= .
nN 1kn k n k! (n k)! n k! j=1 n k!

We then obtain n  
X n |z|k X |z|k
Sn = =T < .
nK
k=K
k nk k=K
k! 3
To estimate Rn , we first compute the limit
  k k
n 1 1 Y nk+j 1 Y 1
lim k
= lim = 1= ,
n k n k! n j=1 n k! j=1 k!

implying limn Rn = 0 and



Rn < .
N K n>N 3
Combining the three estimates shows

An Rn + Sn + T < + + = ,
n>N K 3 3 3
completing the proof. 
8 CONVERGENCE OF K-VALUED FUNCTIONS 115

8.4 Trigonometric Functions


The first definition of the trigonometric functions sine and cosine is the one based on
geometric visualization usually given in high school: cos x and sin x are the coordinates
of the point p = (p1 , p2 ) R2 on the unit circle, such that x is the angle measured in
radian between the line segment between (0, 0) and (1, 0) and the line segment between
(0, 0) and p.
While this definition allows to obtain many important properties of sine and cosine
using geometric arguments, it is not mathematically rigorous, and, for example, provides
no clue how to compute values like sin 1. The problem is related to the fact that the
angle measured in radian between the line segment between (0, 0) and (1, 0) and the line
segment between (0, 0) and p is supposed to be the length of the segment of the unit
circle between (1, 0) and p (taken in the counter-clockwise direction).
In the following Def. and Rem. 8.21, we will provide a mathematically rigorous definition
of sine and cosine using power series, and we will then verify that the functions have the
familiar properties one learns in high school. However, as the computation of lengths
of curved paths is actually beyond the scope of this lecture, we will not be able to see
that our sine and cosine functions are precisely the same we visualized in high school
(the interested reader is referred to Ex. 1 in Sec. 5.14 of [Wal02]).
Definition and Remark 8.21. We define the sine function, denoted sin, and the
cosine function, denoted cos by

X (1)n z 2n+1 z3 z5
sin : C C, sin z := =z + +..., (8.41a)
n=0
(2n + 1)! 3! 5!

X (1)n z 2n z2 z4
cos : C C, cos z := =1 + +.... (8.41b)
n=0
(2n)! 2! 4!

(a) sin and cos are well-defined and continuous: For both series and each z C, we can
estimate the absolute value of the nth summand by the nth summand of the series
for the exponential function e|z| (cf. (8.36)), which we know to be convergent from
Ex. 8.11(b). Thus, by Th. 8.9, both series in (8.41) have radius of convergence
and are continuous by Cor. 8.10.
(b) cos : R R (i.e. cosR ) has a smallest positive zero R+ . We define := 2.
One can show is an irrational number (see Appendix F.2) and its first digits are
= 3.14159 . . .
To see cos has a smallest positive zero and to obtain a first (very coarse) estimate,
note
 k 
x xk+1 x
> 1> k+1>x ,
xR+ kN k! (k + 1)! k+1
k k+1
x
showing xk! > (k+1)! holds for each k 2 and each x ]0, 3[. In particular, the
summands of the series in (8.41) converge monotonically to 0 (for k 2) and, since
8 CONVERGENCE OF K-VALUED FUNCTIONS 116

the series are alternating for x 6= 0, Th. 7.85 applies and (7.81) yields

x2 x2 x4
f (x) := 1 2 < cos x < 1 2 + 24 =: g(x),
(8.42)
0<x<3 x3 x3 x5
x < sin x < x + .
6 6 120

The zeros of x 7 f p(x) are 2,p2, i.e. 2p is its smallest positive zero;pthe zeros
p
of x 7 g(x) are 6 2 3, 6 + 2 3, 6 2 3, 6 + 2 3, i.e. 6 2 3
is its smallest positive zero. Thus, as f (0) = g(0) = 1, the intermediate value Th.
7.57 implies cos has a smallest positive zero and
q

1.4 < 2 < := < 6 2 3 < 1.6 (8.43)
2
Theorem 8.22. We have the following identities:

sin 0 = 0, cos 0 = 1, (8.44a)


sin z = sin(z), cos z = cos(z), (8.44b)
zC
sin(z + w) = sin z cos w + cos z sin w, (8.44c)
z,wC

cos(z + w) = cos z cos w sin z sin w, (8.44d)


z,wC

(sin z)2 + (cos z)2 = 1, (8.44e)


zC

cos = 0, sin = 1, cos x > 0, (8.44f)
2 2 x[0, 2 [
   
sin z + = cos z, cos z + = sin z, (8.44g)
zC 2 2
sin(z + ) = sin z, cos(z + ) = cos z, (8.44h)
zC
sin(z + 2) = sin z, cos(z + 2) = cos z, (8.44i)
zC
sin z cos z 1 1
lim = 1, lim 2
= . (8.44j)
z0 z z0 z 2
Identities (8.44i) can be restated as sine and cosine being periodic functions with period
2.

Proof. (8.44a) is immediate from (8.41) since, for z = 0, all summands of the sine
series are 0 and all summands of the cosine series are 0, except the first one, which is
(1)0 00
0!
= 1.
(8.44b) is also immediate from (8.41), since (z)2n+1 = (1)2n+1 z 2n+1 = z 2n+1 and
(z)2n = (1)2n z 2n = z 2n .
8 CONVERGENCE OF K-VALUED FUNCTIONS 117

(8.44c) and (8.44d) can be verified using the Cauchy product: According to (7.92),


X X
sin z cos w = cn , cos z sin w = dn ,
n=0 n=0


n j 2j+1 nj 2(nj)
X (1) z (1) w
where cn = , ,
z,wC
j=0
(2j + 1)! (2(n j))!
n

j
X (1) z (1) 2j nj 2(nj)+1
w
dn = ,
j=0
(2j)! (2(n j) + 1)!

that means, for each z, w C,


n n
X (1)n z 2j+1 w2(nj) X (1)n z 2j w2(nj)+1
cn + d n = +
j=0
(2j + 1)! (2(n j))! j=0 (2j)! (2(n j) + 1)!
n n
X (1)n z 2j+1 w2n+1(2j+1) X (1)n z 2j w2n+12j
= +
j=0
(2j + 1)! (2n + 1 (2j + 1))! j=0 (2j)! (2n + 1 2j)!
2n+1 2n+1  
n
X z j w2n+1j (1)n X 2n + 1 j 2n+1j
= (1) = z w
j=0
j! (2n + 1 j)! (2n + 1)! j=0 j

(1)n (z + w)2n+1
= ,
(2n + 1)!

proving (8.44c). Similarly, according to (7.92),




X X
cos z cos w = cn , sin z sin w = dn ,
n=0 n=0


n j 2j nj 2(nj)
X (1) z (1) w
where cn = , ,
z,wC
j=0
(2j)! (2(n j))!

n

X (1)j z 2j+1 (1)nj w2(nj)+1
dn = ,
j=0
(2j + 1)! (2(n j) + 1)!
8 CONVERGENCE OF K-VALUED FUNCTIONS 118

that means, for each z, w C,

c0 = 1 and
n n1
X (1)n z 2j w2(nj) X (1)n1 z 2j+1 w2(n1j)+1
cn dn1 =
nN
j=0
(2j)! (2(n j))! j=0 (2j + 1)! (2(n 1 j) + 1)!
n n1
X (1)n z 2j w2n2j X (1)n z 2j+1 w2n(2j+1)
= +
j=0
(2j)! (2n 2j)! j=0 (2j + 1)! (2n (2j + 1))!
2n 2n  
X z j w2nj
n (1)n X 2n j 2nj
= (1) = z w
j=0
j! (2n j)! (2n)! j=0 j

(1)n (z + w)2n
= ,
(2n)!
proving (8.44d).
(8.44e): One computes for each z C:
(8.44d)
(sin z)2 + (cos z)2 = cos z cos(z) sin z sin(z) = cos(z z) = cos 0 = 1.

(8.44f): cos 2 = 0 and cos x > 0 for 0 x < 2 hold according to the definition of in
Def. and Rem. 8.21(b). Then
 2 (8.44e)  2 (/2)3 (8.43) (1.6)3
sin = 1 cos = 1 and sin > > 1.4 > 0.7 > 0.
2 2 2 2 6 6

(8.44g) is immediate from (8.44c), (8.44d), and (8.44f).


(8.44h): One obtains
  (8.44c)
sin = sin + = 1 0 + 0 1 = 0,
2 2
  (8.44d)
cos = cos + = 0 0 1 1 = 1,
2 2
(8.44c)
sin(z + ) = sin z + 0 = sin z,
zC
(8.44d)
cos(z + ) = cos z + 0 = cos z.
zC

(8.44i): One obtains


(8.44c)
sin(2) = sin( + ) = 0 + 0 = 0,
(8.44d)
cos(2) = cos( + ) = (1)(1) 0 = 1,
(8.44c)
sin(z + 2) = sin z + 0 = sin z,
zC
(8.44d)
cos(z + 2) = cos z 0 = cos z.
zC
8 CONVERGENCE OF K-VALUED FUNCTIONS 119

(8.44j): One obtains



sin z X (1)n z 2n z2 z4
= =1 + +...,
z n=0
(2n + 1)! 3! 5!


.
n+1 2n 2 4
zC\{0} cos z 1 X (1) z 1 z z
2
= = + + . . .
z n=0
(2(n + 1))! 2! 4! 6!

For both series on the right-hand side and each z C, we can estimate the absolute
value of each summand by the corresponding summand of the exponential series for e|z|
(cf. (8.36)), showing they have radius of convergence and are continuous by Cor. 8.10.
In particular, their continuity in z = 0 proves (8.44j). 

Theorem 8.23. One has sin(R) = cos(R) = [1, 1], i.e. the range of both sine and
cosine is [1, 1]. Moreover, for each k Z:
h i
sin is strictly increasing on + 2k, + 2k , (8.45a)
 2 2 
3
sin is strictly decreasing on + 2k, + 2k , (8.45b)
2 2
cos is strictly increasing on [(2k 1), 2k], (8.45c)
cos is strictly decreasing on [2k, (2k + 1)], (8.45d)

which, due to (8.44e), can be summarized (and visualized) by saying that, if x runs from
2k to 2(k + 1), then (cos x, sin x) runs once counterclockwise through the unit circle,
starting at (1, 0).

Proof. From (8.44e), we know sin(R) [1, 1] and cos(R) [1, 1]. As
(8.44f)   (8.44b) (8.44a) (8.44h)
sin = 1, sin = 1, cos 0 = 1, cos = cos 0 = 1,
2 2
the continuity of sine and cosine together with the intermediate value Th. 7.57 implies
sin(R) = cos(R) = [1, 1].
3 2 4
From (8.42), we know 0 < x x6 < sin x and cos x < 1 x2 + x24 < 1 for each x ]0, 2 ],
implying

cos(x + y) = cos x cos y sin x sin y cos x cos y < cos x,


0x<x+y 2

showing cos is strictly decreasing on [0, 2 ]. Then cos is strictly increasing on [ 2 , 0] by


(8.44b), sin is strictly increasing on [0, 2 ] and strictly decreasing on [ 2 , ] by (8.44g),
implying sin is strictly increasing on [ 2 , 0] and strictly decreasing on [, 2 ] by
(8.44b), i.e. sin is strictly increasing on [ 32
, 2] and strictly decreasing on [, 3 2
] by
(8.44i), implying cos is strictly decreasing on [ 2 , ] and strictly increasing on [, 2 ]
by (8.44g). Since this fixes the monotonicity properties of both sine and cosine over
more than one period, the general statements in (8.45) are provided by (8.44i). 
8 CONVERGENCE OF K-VALUED FUNCTIONS 120

We now come to important complex number relations between sine, cosine, and the
exponential function.
Theorem 8.24. One has the following formulas, relating the (complex) sine, cosine,
and exponential function:
eiz = cos z + i sin z (Euler formula), (8.46a)
zC

eiz + eiz
cos z = , (8.46b)
zC 2
eiz eiz
sin z = . (8.46c)
zC 2i

Proof. Exercise. 

As a first application of (8.46), we can now determine all solutions to the equation
ez = 1 and all zeros (if any) of exp, sin, and cos:
Theorem 8.25. The set of (complex) solutions to the equation ez = 1 consists precisely
of all integer multiples of 2i, the exponential function has no zeros (neither in R nor
in C), and the set of all (real or complex) zeros of sine and cosine consists of a discrete
set of real numbers. More precisely:
exp1 {1} = {2ki : k Z}, (8.47a)
exp1 {0} = , (8.47b)
sin1 {0} = {k : k Z}, (8.47c)

cos1 {0} = (2k + 1) 2 : k Z . (8.47d)

Proof. We start by considering the zeros of the functions cos, sin : R R: Due to
(8.44f), cos x > 0 for each x [0, 2 [ such that cos(x) = cos x (by (8.44b)) implies 2
to be the only zero of cos in the interval ] 2 , 2 ]. Then, since cos(x + ) = cos x for
each x R by (8.44h), 2 and 2 + are the only zeros of cos in the interval ] 2 , 2 + ],
and, thus, using that cos has period 2 according to (8.44i), adding integer multiples of
2 to 2 and 2 + must generate precisely all zeros of cos : R R, i.e.
 
R cos1 {0} = 2 + k : k Z = (2k + 1) 2 : k Z .
Since, by (8.44g), sin x = cos(x + 2 ) for each x R, we also obtain

R sin1 {0} = 2 + x : x R cos1 {0} = {k : k Z}.
We consider (8.47a) next. If k Z, then
k (8.46a) k
e2ki = e2i = cos(2) + i sin(2) = 1k = 1,
proving . For the remaining inclusion, assume z exp1 {1}, i.e. ez = 1, and write
z = x + iy with x, y R. Then
(8.46a) p (8.44e)
1 = |ez | = ex |eiy | = ex cos y + i sin y| = ex (sin y)2 + (cos y)2 = = ex ,
8 CONVERGENCE OF K-VALUED FUNCTIONS 121

first implying x = 0 and, then, using (8.46a) once again, 1 = ez = eiy = cos y + i sin y
implies cos y = 1 and sin y = 0, i.e. y {2k : k Z}, proving .
To finish the proof of (8.47c), assume sin z = 0. Then eiz = cos z = cos(z) = eiz ,
implying e2iz = 1 and, by (8.47a), there is k Z such that 2iz = 2ki, i.e. z = k,
proving (8.47c). Since, by (8.44g), cos z = sin(z + 2 ) for each z C, we also obtain
(8.47d):  
cos1 {0} = 2 + z : z sin1 {0} = (2k + 1) 2 : k Z .
Finally, if z = x + iy with x, y R, then |ez | = ex |eiy | = ex 6= 0 proves (8.47b). 
Definition and Remark 8.26. We define tangent and cotangent by
sin z
tan : C \ cos1 {0} C, tan z := , (8.48a)
| {z } cos z
C \ {(2k + 1) 2 : k Z} by (8.47d)
cos z
cot : C \ sin1 {0} C, cot z := , (8.48b)
| {z } sin z
C \ {k : k Z} by (8.47c)

respectively. Since sine and cosine are both continuous, tangent and cotangent are also
both continuous on their respective domains. Both functions have period , since, for
each z in the respective domains,
sin(z + ) (8.44h) sin z (8.44h) cos z
tan(z + ) = = = tan z, cot(z + ) = = cot z.
cos(z + ) cos z sin z
(8.49)
Since
   
1 1
lim sin = sin = 1 lim cos = cos = 0
n 2 n 2 n 2 n 2
   
1 1
cos >0 lim tan = ,
2 n n 2 n
       
1 1
lim sin + = sin = 1 lim cos + = cos =0
n 2 n 2 n 2 n 2
   
1 1
cos + >0 lim tan + = ,
2 n n 2 n

1 1 1
lim sin = sin 0 = 0 lim cos = cos 0 = 1 sin >0
n n n n n
1
lim cot = ,
n n
   
1 1
lim sin = sin = 0 lim cos = cos = 1
n n n n
   
1 1
sin >0 lim cot = ,
n n n
8 CONVERGENCE OF K-VALUED FUNCTIONS 122

we obtain tan(R \ cos1 {0}) = cot(R \ sin1 {0}) = R.


For each k Z,
i h
tan is strictly increasing on + k, + k , (8.50a)
i2 2 h
cot is strictly decreasing on k, (k + 1) : (8.50b)

On ]0, 2 [, sin is strictly increasing and cos is strictly decreasing, i.e. tan is strictly
increasing and cot is strictly decreasing. Since tan(x) = sin(x)/ cos(x) = tan(x),
on ] 2 , 0[, tan is strictly increasing and cot is strictly decreasing. Taking into account
the signs of tan and cot on the respective intervals and their -periodicity according to
(8.49) proves (8.50).
Definition and Remark 8.27. Since we have seen sin to be strictly increasing on
[ 2 , 2 ] with range [1, 1], cos to be strictly decreasing one [0, ] with range [1, 1], tan
to be strictly increasing on ] 2 , 2 [ with range R, and cot to be strictly decreasing one
]0, [ with range R; and since all four functions are continuous, Th. 7.60 implies the
existence of inverse functions, denoted by
arcsin : [1, 1] [/2, /2], (8.51a)
arccos : [1, 1] [0, ], (8.51b)
arctan : R ] /2, /2[, (8.51c)
arccot : R ]0, [, (8.51d)
respectively, where all four inverse functions are continuous, arcsin is strictly increasing,
arccos is strictly decreasing, arctan is strictly increasing, and arccot is strictly decreasing.
Of course, using (8.45) and (8.50), respectively, one can also obtain the inverse functions
on different intervals, and, in the literature, such inverse functions are, indeed, considered
as well. Somewhat confusingly, it is common to denote all these different functions by
the same symbols, namely the ones introduced in (8.51). Here, we will not need to pursue
this any further, i.e. we will only consider the inverse functions precisely as defined in
(8.51), which are also known as the principle inverse functions of sin, cos, tan, and cot,
respectively.

8.5 Polar Form of Complex Numbers, Fundamental Theorem


of Algebra
Theorem 8.28. For each complex number z C, there exist real numbers r 0 and
R such that
z = r ei . (8.52)
Moreover, if (8.52) holds with r 0 and R, then r is the modulus of z and, for
z 6= 0, is uniquely determined up to addition of an integer multiple of 2, i.e.
 
z = r ei1 = r ei2 r 0 r = |z| 1 2 = 2k . (8.53)
zC\{0} kZ
8 CONVERGENCE OF K-VALUED FUNCTIONS 123

Proof. For z = 0, there is nothing to prove, so we assume z 6= 0 and set r := |z|. We


write z = x + iy with x, y R, first assuming y 0. Then
z x y
= + i, where = , = 0, 2 + 2 = 1. (8.54)
r r r
In particular, 1 1. Thus, letting

:= arccos ,

we obtain [0, ], = cos , and sin 0, yielding


p p (8.54)
sin = 1 (cos )2 = 1 2 = .

In consequence,
z (8.46a)
= + i = cos + i sin = ei ,
r
as desired. If y 0, then the above shows the existence of R such that z = x iy =
rei = r cos + ir sin . Letting := , we, once again, have z = r cos ir sin =
rei = rei , as desired, completing the existence proof for the representation (8.52).
Now assume (8.52) holds with r 0. Then
p
|z| = r|ei | = r (sin )2 + (cos )2 = r.

Finally, if r ei1 = r ei2 with r > 0, then ei(1 2 ) = 1, i.e. i(1 2 ) {2ki : k Z}
by (8.47a). 
Definition and Remark 8.29. The representation of z C given by (8.52) is called its
polar form, where (r, ) are also called polar coordinates of z, is called an argument of
z. For z 6= 0, one can fix the argument uniquely by the additional requirement [0, 2[
(but one also finds other choices, for example ] , ], in the literature). The above
terminology is consistent with the common use of calling (r, ) polar coordinates of the
vector z = (x, y) R2 (= C) (in contrast to the Cartesian coordiantes (x, y)), where r
constitutes the distance of the point z = (x, y) from the origin (0, 0) and is the angle
between the vector z = (x, y) and the x-axis (cf. the three introductory paragraphs
of the previous Sec. 8.4). As promised, we can now better understand the geometric
interpretation of complex multiplication already described in Rem. 5.12: If z1 = r1 ei1
and z2 = r2 ei2 , then z1 z2 = r1 r2 ei(1 +2 ) , i.e. complex multiplication, indeed, means
multiplying absolute values and adding arguments.
Corollary 8.30. If z C, then |z| = 1 holds if, and only if, there exists R such
that z = ei in other words, the map

f : R {z C : |z| = 1}, f () := ei , (8.55)

is surjective. Moreover f (1 ) = f (2 ) holds if, and only if, 1 2 = 2k for some


k Z.

Proof. Everything is immediate from Th. 8.28. 


8 CONVERGENCE OF K-VALUED FUNCTIONS 124

Corollary 8.31 (Roots of Unity). For each n N, the equation z n = 1 has precisely n
distinct solutions 1 , . . . , n C, where
(8.46a) k2 k2
k := ek2i/n = cos + i sin = 1k . (8.56)
k=1,...,n n n
The numbers 1 , . . . , n defined in (8.56) are called the nth roots of unity.

Proof. It is kn = ek2i = 1 for each k {1, . . . , n} and the 1 , . . . , n are all distinct
by Cor. 8.30, since, for k, l {1, . . . , n} with k 6= l, (k l)/n / Z. As 1 , . . . , n are
n
n distinct zeros of the polynomial P : C C, P (z) := z 1, and P has at most n
zeros by Th. 6.6(a), 1 , . . . , n constitute all solutions to z n = 1. 

We are now in a position to prove one of the central results of analysis and algebra,
namely the fundamental theorem of algebra. The following proof does not need any tools
beyond the ones provided by this class it is actually mainly founded on continuous
functions attaining a min and a max on compact sets according to Th. 7.54 and the
existence of nth roots of unity according to Cor. 8.31.

Theorem P8.32 (Fundamental Theorem of Algebra). Every polynomial P : C C,


n
P (z) := j=0 aj z j , of degree n 1 (i.e. a0 , . . . , an C with an 6= 0) has at least one
zero z0 C.

Proof. Dividing the equation P (z) = 0 by an 6= 0, it suffices to consider the case an = 1.


We therefore assume

P (z) = z n + an1 z n1 + + a1 z + a0 .
zC

Claim 1. The function |P | attains its global min on C, i.e. there exists z0 C such that
|P | is minimal in z0 .

Proof. We first note


 an1 a0
P (z) = z n 1 + r(z) , where r(z) := + + n.
z6=0 z z
Set M := |a0 | + + |an1 | and R := max{1, 2M }.
Then
|z|1 M |z|2M 1
|r(z)|
|z|R |z| 2
and, thus,
|z|n
|P (z)| = |z|n 1 + r(z) M.
|z|R 2
This estimate together with |P (0)| = |a0 | M shows that the min of |P | on the compact
disk B R (0) (see Ex. 7.47(a)) (such a min z0 B R (0) exists due to Th. 7.54) must be
the global min of |P | on C. N
8 CONVERGENCE OF K-VALUED FUNCTIONS 125

Claim 2. If |P | has a min in z0 C, then P (z0 ) = 0.

Proof. Proceeding by contraposition, we assume P (z0 ) 6= 0 and show that |P | does not
have a min in z0 . We need to construct z1 C such that |P (z1 )| < |P (z0 )|. To this end,
define
P (z0 + z)
p : C C, p(z) := .
P (z0 )
Then p is still a polynomial of degree n. Since p(0) = 1,
n
X
p(z) = 1 + bj z j , bk 6= 0.
k{1,...,n} bk ,...,bn C zC
j=k

Write b1 1
k in polar form, i.e. bk = re
i
with r R+ and C. Define

:= k r ei/k i.e. k = rei = b1
k

and
n
X
k
q : C C, q(z) := p(z) = 1 + bk + bj j z j = 1 z k + z k+1 S(z),
j=k+1

where S is the polynomial


nk1
X
S : C C, S(z) := bk+1+j k+1+j z j (S 0 in case k = n).
j=0

Then, according to Th. 7.54,

|S(z)| C.
CR+ zB 1 (0)

Letting
c := min{1, C 1 },
one obtains k+1
z S(z) C |z|k+1 < |z|k
0<|z|<c

and, thus,
|q(x)| 1 xk + xk+1 S(x) < 1 xk + xk = 1.
x]0,c[

Thus, finally,
|P (z0 + x)|
= |p(x)| = |q(x)| < 1,
x]0,c[ |P (z0 )|
showing |P | does not have a min in z0 . N

Combining Claims 1 and 2 completes the proof of the theorem. 


9 DIFFERENTIAL CALCULUS 126

Corollary 8.33. For every polynomial P : C C of degree n 1, there exist numbers


c, 1 , . . . , n C such that
n
Y
P (z) = c (z j ) = c(z 1 )(z 2 ) (z n ) (8.57)
j=1

(the 1 , . . . , n are precisely all the zeros of P , some or all of which might be identical).

Proof. One just combines Th. 8.32 with Rem. 6.7. 

9 Differential Calculus

9.1 Definition of Differentiability and Rules


The basic idea of differential calculus is to locally approximate nonlinear functions f by
linear functions. In our case, f will be defined on a subset M of R and, given M
and R-valued f , we will investigate the question if we can define a number f () R
that represents the slope of the graph of f at such that the line through with slope
f () (called the tangent of f in ) can be considered as a local approximation of the
graph of f .
If such a local approximation of f in is at all reasonable, then, for x 6= ,

f (x) f ()
x

should provide good approximations of f () if x tends to . This leads to the following


Def. 9.1, where we also allow C-valued functions (while the above-described geometric
interpretation only works for R-valued functions, it can be applied to both the real and
the imaginary parts of a C-valued function, cf. Rem. 9.2 below); but note that we do
not consider differentiability of functions f : C C, which would lead to the notion
of complex differentiability or holomorphicity, which is studied in the field of Complex
Analysis and is beyond the scope of this class.

Definition 9.1. Let a < b, f : ]a, b[ K (a = , b = is admissible), and ]a, b[.


Then f is said to be differentiable at if, and only if, the following limit in (9.1) exists
in the sense of Def. 8.17 (where x 7 f (x)f
x
()
plays the role of x 7 f (x) in Def. 8.17).
The limit is then called the derivative of f in . Many symbols are used in the literature
to denote derivatives, the following provides a selection:

df () f (x) f () f ( + h) f ()
f () := x f () := := lim = lim . (9.1)
dx x x h0 h

Note both limits occurring in (9.1) are, indeed, identical, since the sequence (xk )kN in
]a, b[ converges to if, and only if, the sequence (hk )k with hk := xk converges
9 DIFFERENTIAL CALCULUS 127

to 0. The number in (9.1) (if it exists) is also called a differential quotient, whereas
f (x)f ()
x
is known as a difference quotient.
f is called differentiable if, and only if, it is differentiable at each ]a, b[. In that case,
one calls the function
f : ]a, b[ K, x 7 f (x), (9.2)
the derivative of f .
Remark 9.2. In the situation of Def. 9.1, the complex-valued function f : ]a, b[ C
is differentiable at ]a, b[ if, and only if, both functions Re f, Im f : ]a, b[ R are
differentiable, and, in that case

f () = (Re f ) () + i (Im f ) (). (9.3)

Indeed, we merely have to note


f (x) f () Re f (x) Re f () Im f (x) Im f ()
= +i (9.4)
x,]a,b[, x x x
x6=

and that, by (7.2) a sequence (zn )nN in C converges to C if, and only if, both
limn Re zn = Re and limn Im zn = Im hold.
Definition 9.3. If f : ]a, b[ R as in Def. 9.1 is differentiable at ]a, b[, then the
graph of the affine function

L : R R, L(x) := f () + f ()(x ), (9.5)

i.e. the line through (, f ()) with slope f () is called the tangent to the graph of f at
.
Theorem 9.4. If f : ]a, b[ K as in Def. 9.1 is differentiable at ]a, b[, then it is
continuous at . In particular, if f is everywhere differentiable, then it is everywhere
continuous.

Proof. Let (xk )kN be a sequence in ]a, b[\{} such that limk xk = . Then

 (xk ) f (xk ) f ()
lim f (xk ) f () = lim = 0 f () = 0, (9.6)
k k xk
proving the continuity of f in . 
Example 9.5. (a) For each a, b K, the affine function f : R K, f (x) := ax + b,
is differentiable with f (x) = a for each x R: If x R and (hk )kN is a sequence
with hk 6= 0 such that limk hk = 0, then
f (x + hk ) f (x) a(x + hk ) + b ax b a hk
lim = lim = lim = a. (9.7)
k hk k hk k hk

In particular, each constant function f b has derivative f 0.


9 DIFFERENTIAL CALCULUS 128

(b) For each c K, the function f : R K, f (x) := ecx , is differentiable with


f (x) = c ecx for each x R (in particular, c = 1 yields f (x) = ex for f (x) = ex ,
and c = ln a yields f (x) = (ln a) ax for f (x) = ax = ex ln a , a R+ ): The case c = 0
was treated in (a). Thus, let c 6= 0. If x R and (hk )kN is a sequence with hk 6= 0
such that limk hk = 0, then

f (x + hk ) f (x) ecx+chk ecx cx echk 1 (8.32) cx


lim = lim = c e lim = c e . (9.8)
k hk k hk k chk

(c) The sine and the cosine function f, g : R R, f (x) := sin x, g(x) := cos x, are
differentiable with f (x) = cos x and g (x) = sin x for each x R: If x R and
(hk )kN is a sequence with hk 6= 0 such that limk hk = 0, then

f (x + hk ) f (x) sin(x + hk ) sin x


lim = lim
k hk k hk
(8.44c) sin x cos hk + cos x sin hk sin x
= lim
k hk
hk (cos hk 1) sin hk
= sin x lim 2
+ cos x lim
k h k hk
 k
(8.44j) 1
= (sin x) 0 + (cos x) 1 = cos x. (9.9)
2

The proof of g (x) = sin x is left as an exercise.

(d) The absolute value function f : R R, f (x) := |x|, is not differentiable at = 0:

f (0 + n1 ) f (0)
lim 1 = lim 1 = 1, (9.10a)
n n
n
1 1
f (0 n
) f (0) n
lim = lim = 1, (9.10b)
n n1 n 1
n

f (0+h)f (0)
showing that h
does not have a limit for h 0.
Theorem 9.6. Let a < b, f, g : ]a, b[ K (a = , b = is admissible), and
]a, b[. Assume f and g are differentiable at .

(a) For each K, f is differentiable at and (f ) () = f ().

(b) f + g is differentiable at and (f + g) () = f () + g ().

(c) Product Rule: f g is differentiable at and (f g) () = f ()g() + f ()g ().

(d) Quotient Rule: If g() 6= 0, then f /g is differentiable at and

f ()g() f ()g () g ()
(f /g) () = , in particular (1/g) () = .
(g())2 (g())2
9 DIFFERENTIAL CALCULUS 129

Proof. Let (hk )kN be a sequence with hk 6= 0 such that limk hk = 0.


For (a), one computes

(f )( + hk ) (f )() f ( + hk ) f ()
lim = lim
k hk k hk
f ( + hk ) f ()
= lim = f ().
k hk
For (b), one computes

(f + g)( + hk ) (f + g)() f ( + hk ) f () + g( + hk ) g()


lim = lim
k hk k hk
f ( + hk ) f () g( + hk ) g()
= lim + lim = f () + g ().
k hk k hk
For (c), one computes

(f g)( + hk ) (f g)()
lim
k hk
f ( + hk )g( + hk ) f ()g( + hk ) + f ()g( + hk ) f ()g()
= lim
k hk
f ( + hk ) f () g( + hk ) g()
= lim g( + hk ) lim + f () lim
k k hk k hk
= f ()g() + f ()g (),

where, in the last equality, we used the continuity of g in according to Th. 9.4.
For (d), one first proves the special case f 1 by

(1/g)( + hk ) (1/g)() g() g( + hk ) g ()


lim = lim = ,
k hk k g( + hk )g()hk (g())2

which implies the general case using (c):


 
1 f () f ()g () f ()g() f ()g ()
(f /g) () = f () = = ,
g g() (g())2 (g())2
completing the proof. 
Example 9.7. (a) Each polynomial is differentiable and the derivative is, again, a
polynomial. More precisely,
n
X
P : R K, P (x) = aj x j , aj K
j=0
n (9.11)
X
j1
P : R K, P (x) = j aj x :
j=1
9 DIFFERENTIAL CALCULUS 130

The cases n = 0, 1 are provided by Ex. 9.5(a). To complete the induction Pn proofj of
(9.11), we carry out the induction step for each n N: Writing P (x) = j=0 aj x +
an+1 x xn and applying the induction hypothesis as well as the rules of Th. 9.6 yields
n
X n+1
X
j1 n n1
P (x) = j aj x + an+1 (1 x + x n x )= j aj xj1 ,
j=1 j=1

which establishes the case.

(b) Clearly, the derivatives of rational functions P/Q with polynomials P and Q can
be computed from (9.11) and the quotient rule of Th. 9.6(d).

(c) The functions tan and cot as defined in (8.48) and restricted to R \ cos1 {0} and
R \ sin1 {0}, respectively, are differentiable and one obtains
1
tan : R \ cos1 {0} R, tan x = = 1 + (tan x)2 , (9.12a)
| {z } (cos x)2
R\{(2k+1) 2 : kZ}

1
cot : R \ sin1 {0} R, cot x = = (1 + (cot x)2 ) : (9.12b)
| {z } (sin x)2
R\{k: kZ}

One merely needs the derivatives of sin and cos from Ex. 9.5(c) and the quotient
rule of Th. 9.6(d):
cos x cos x sin x( sin x) (8.44e) 1 (8.44e)
tan x = 2
= = 1 + (tan x)2 ,
(cos x) (cos x)2
sin x sin x cos x cos x (8.44e) 1 (8.44e)
cot x = = = (1 + (cot x)2 ).
(sin x)2 (sin x)2
Theorem 9.8 (Derivative of Inverse Functions). Let a < b, I :=]a, b[ (a = , b =
is admissible). If f : I R is differentiable and strictly increasing (resp. decreasing),
then f has a continuous, strictly increasing (resp. decreasing) inverse function f 1 de-
fined on the interval J := f (I), i.e. f 1 : J I, and, for each I with f () 6= 0,
f 1 is differentiable at := f () with
1 1
(f 1 ) () = = . (9.13)
f () f f 1 ()

Proof. As a differentiable function, f is continuous by Th. 9.4, i.e. Th. 7.60 provides
all the present assertions, except differentiability at and (9.13). Let (yk )kN be a
sequence in J \ {} such that limk yk = . Then, as f 1 is bijective and continuous,
(f 1 (yk ))kN is a sequence in I \ {} such that limk f 1 (yk ) = , and one obtains

f 1 (yk ) f 1 () f 1 (yk ) f 1 () 1
lim = lim  = , (9.14)
k yk k f f 1 1
(yk ) f f () 1
f f ()

establishing the case. 


9 DIFFERENTIAL CALCULUS 131

Example 9.9. (a) The function ln : R+ R is differentiable and, for each x R+ ,


ln x = 1/x: If f (x) = ex , then f (x) = ex 6= 0 for each x R, ln x = f 1 (x), and
(9.13) yields
1 1 1
ln x = = ln x = .
f (ln x) e x
(b) The function arcsin : ] 1, 1[] /2, /2[ is differentiable and, for each x
] 1, 1[, arcsin x = 1/ 1 x2 : If f (x) = sin x, then f (x) = cos x =
6 0 for each
1
x ] /2, /2[, arcsin x = f (x), and (9.13) yields
1 1 () 1 1
arcsin x = = = p = ,
f (arcsin x) cos arcsin x 1 (sin arcsin x)2 1 x2
where, at (), it was used that cos2 = 1sin2 and cos t > 0 for each t ]/2, /2[.
(c) The function arccos
: ] 1, 1[]0, [ is differentiable and, for each x ] 1, 1[,

arccos x = 1/ 1 x2 : If f (x) = cos x, then f (x) = sin x 6= 0 for each x ]0, [,
arccos x = f 1 (x), and (9.13) yields
1 1 () 1 1
arccos x = = = p = ,
f (arccos x) sin arccos x 1 (cos arccos x)2 1 x2
where, at (), it was used that sin2 = 1 cos2 and sin t > 0 for each t ]0, [.
(d) The function arctan : R ] /2, /2[ is differentiable and, for each x R,
arctan x = 1/(1 + x2 ): Apply Th. 9.8 with f (x) = tan x as an exercise.
(e) The function arccot : R ]0, [ is differentiable and, for each x R, arccot x =
1/(1 + x2 ): Apply Th. 9.8 with f (x) = cot x as an exercise.
Theorem 9.10 (Chain Rule). Let a < b, c < d, f : ]a, b[ R, g : ]c, d[ K,
f (]a, b[) ]c, d[ (a, c = ; b, d = is admissible). If f is differentiable in ]a, b[
and g is differentiable in f () ]c, d[, then g f : ]a, b[ K is differentiable in and
(g f ) () = f ()g (f ()). (9.15)

Proof. Let := f () and define the auxiliary function


(
g(x)g()
x
for x 6= ,
g : ]c, d[ K, g(x) :=
(9.16)
g (x) for x = .
Then
g(x) g() = g(x)(x ). (9.17)
x]c,d[

Let (xk )kN be a sequence in ]a, b[\{} such that limk xk = . One obtains
   
g f (xk ) g f () (9.17) g f (xk ) f (xk ) f ()
lim = lim
k xk k xk
 f (xk ) f ()
= lim g f (xk ) lim
k k xk
= f ()g (f ()), (9.18)
9 DIFFERENTIAL CALCULUS 132

establishing the case. 


Example 9.11. (a) According to the chain rule of Th. 9.10, the function h : R R,
h(x) := sin(x3 ) is differentiable and, for each x R, h (x) = 3x2 cos(x3 ).
(b) According to the chain rule of Th. 9.10, each power function h : R+ K, h(x) :=
x = e ln x , K, is differentiable and, for each x R+ , h (x) = x e ln x = x1 .
Indeed, h = g f , where f : R+ R, f (x) := ln x with f : R+ R, f (x) := x1 ,
according to Ex. 9.5(b), and g : R K, g(x) := ex , with g : R K,
g (x) := ex according to Ex. 9.9(a).

9.2 Higher Order Derivatives and the Sets C k


Definition 9.12. Let a < b, I :=]a, b[, f : I K (a = , b = is admissible). If f
is differentiable, then f might or might not itself be differentiable. If f is differentiable,
then its derivative is denoted by f and is called the second derivative of f . Clearly, this
process can be iterated, leading to the following general recursive definition of higher-
order derivatives:
Let f (0) := f . For k N0 assume the kth derivative of f , denoted by f (k) exists on
I. Then f is said to have a derivative of order k + 1 at I if, and only if, f (k) is
differentiable at . In that case, define
f (k+1) () := (f (k) ) (). (9.19)
If f (k+1) () exists for all I, then f is said to be (k + 1)-times differentiable and the
function f (k+1) : I K, x 7 f (k+1) (), is called the (k + 1)st derivative of f . It is
common to write f := f (1) , f := f (2) , f := f (3) , but f (k) if k 4.
If f (k) exists, it might or might not be continuous (cf. Ex. 9.13(c) below). One defines
n o
k (k)
C (I, K) := f F(I, K) : f exists and is continuous on I , (9.20)
kN0
\
C (I, K) := C k (I, K) (9.21)
kN0

(note C 0 (I, K) = C(I, K) and C(I, K) C 1 (I, K) C 2 (I, K) . . . ). Finally, we define


the notation C k (I) := C k (I, R) for k N0 {}.
Example 9.13. (a) One has sin C (R) with sin = cos, sin = sin, sin = cos,
sin(4) = sin, . . .
P
(b) A simple induction shows, for each polynomial P : R K, P (x) = nj=0 aj xj ,
aj K, n N0 , that P (n) (x) = n! an . In particular, P C (R, K).
(c) It is an exercise to show the following function f is differentiable, but f is not
continuous, i.e. f / C 1 (R):
( 
x2 cos x1 for x 6= 0,
f : R R, f (x) :=
0 for x = 0.
9 DIFFERENTIAL CALCULUS 133

9.3 Mean Value Theorem, Monotonicity, and Extrema


Theorem 9.14. Let a < b. If f : ]a, b[ R is differentiable in ]a, b[ and f has a
local min or max in , then f () = 0.

Proof. Suppose f has a local max at . Then there exists > 0 such that |h| < implies
f ( + h) f () 0. Now let (hk )kN be a sequence in ]0, [ with limk hk = 0. Then
f ( hk ) f () 0 for all k N implies

f ( + hk ) f () f ( hk ) f ()
f () = lim 0, f () = lim 0, (9.22)
k hk k hk
showing f () = 0. Now, if f has a local min at , then f has a local max at , and
f () = (f ) () = 0 establishes the case. 

Remark 9.15. For f : R R, f (x) := x3 , it is f (0) = 0, but f does not have a local
min or max at 0, showing that, while being necessary for an differentiable function f to
have a local extremum at , f () = 0 is not a sufficient condition for such an extremum
at . Points with f () = 0 are sometimes called stationary or critical points of f .

Now, we first prove an important special case of the mean value theorem:

Theorem 9.16 (Rolles Theorem). Let a < b. If f : [a, b] R is continuous on the


compact interval [a, b], differentiable on the open interval ]a, b[, and f (a) = f (b), then
there exists ]a, b[ such that f () = 0.

Proof. If f is constant, then f () = 0 holds for each ]a, b[. If f is nonconstant,


then there exists x ]a, b[ with f (x) 6= f (a). If f (x) > f (a), then Th. 7.54 implies the
existence of ]a, b[ such that f attains its (global and, thus, local) max in . Then
Th. 9.14 yields f () = 0. The case f (x) < f (a) is treated analogously. 

Theorem 9.17 (Mean Value Theorem). Let a < b. If f : [a, b] R is continuous on


the compact interval [a, b] and differentiable on the open interval ]a, b[, then there exists
]a, b[ such that
f (b) f (a)
= f (). (9.23)
ba

Proof. One applies Rolles Th. 9.16 to the auxiliary function

f (b) f (a)
: [a, b] R, (x) := f (x) (x a), where := . (9.24)
ba
Since f is continuous on [a, b] and differentiable on ]a, b[, so is . Moreover (a) = f (a) =
(b), i.e. Rolles Th. 9.16 applies and yields ]a, b[ satisfying 0 = () = f () ,
proving (9.23). 
9 DIFFERENTIAL CALCULUS 134

Corollary 9.18. Let c < d and f : ]c, d[ R be differentiable (c = , d = is


admissible).

(a) If f 0 (resp. f 0), then f is increasing (resp. decreasing). Moreover, if the


inequalities are strict, then the monotonicity of f is strict as well.
(b) If f 0, then f is constant.

Proof. If c < a < b < d and f 0 (resp. f 0, resp. f 0), then (9.23) implies
f (b) f (a) (resp. f (b) f (a), resp. f (b) = f (a)). Moreover, strict inequalities for f
yield strict inequality between f (b) and f (a). 
Lemma 9.19. Let a < b, f : ]a, b[ R, ]a, b[, and assume f is differentiable at .
If f () > 0 (resp. f () < 0), then there exists > 0 such that ] , + []a, b[ and

f (a1 ) < f () < f (b1 ) resp. f (a1 ) > f () > f (b1 ) .
a1 ],[ b1 ],+[

Proof. If there does not exist > 0 such that f (a1 ) < f () < f (b1 ) for each a1 ] , [
and each b1 ], + [, then then there exists a sequence (xk )kN in ]a, b[\{} such that
limk xk = and
f (xk ) f ()
0,
kN xk
showing f () 0. Analogously, one obtains that f () 0 provided there does not exist
> 0 such that f (a1 ) > f () > f (b1 ) for each a1 ] , [ and each b1 ], + [. 
Theorem 9.20 (Sufficient Conditions for Extrema). Let c < d, let f : ]c, d[ R be
differentiable, and assume f () = 0 for some ]c, d[.

(a) If f (x) > 0 for each x ]c, [ and f (x) < 0 for each x ], d[, then f has a strict
max at . Likewise, if f () exists and is negative, then f has a strict max at .
(b) If f (x) < 0 for each x ]c, [ and f (x) > 0 for each x ], d[, then f has a strict
min at . Likewise, if f () exists and is positive, then f has a strict min at .

Proof. We just present the proof for (a); (b) is proved analogously. If f (x) > 0 for each
x ]c, [, then (9.23) shows f () f (a) > 0 for each c < a < ; analogously, if f (x) < 0
for each x ], d[, then (9.23) shows f () f (b) > 0 for each < b < d. Altogether, we
have shown f to have a strict max at . If f () exists and is negative, then Lem. 9.19
yields the existence of > 0 such that f is positive on ] , [ and negative on ], + [.
Applying what we have already proved with c := and d := + establishes the
case. 
Example 9.21. One obtains
f : R R, f (x) := x ex , (9.25a)
f : R R, f (x) = ex + x ex = (1 + x) ex , (9.25b)
f : R R, f (x) = 2ex + x ex = (2 + x) ex . (9.25c)
9 DIFFERENTIAL CALCULUS 135

From Th. 9.14, we know that f can have at most one extremum, namely at = 1,
where f () = 0. Since f () = ex > 0, Th. 9.20(b) implies that f has a strict min at
1.

9.4 LHopitals Rule


We need a slight generalization of the mean value Th. 9.17:

Theorem 9.22. Let a < b. If f, g : [a, b] R are continuous on the compact interval
[a, b], differentiable on the open interval ]a, b[, and g (x) 6= 0 for each x ]a, b[, then
there exists ]a, b[ such that

f (b) f (a) f ()
= . (9.26)
g(b) g(a) g ()

Proof. First note that the mean value Th. 9.17 and g 6= 0 imply g(b) g(a) 6= 0. Define
the auxiliary function
 f (b) f (a)
h : [a, b] R, h(x) := f (x) g(x) g(a) . (9.27)
g(b) g(a)

Then h is continuous on [a, b] and differentiable on ]a, b[. Moreover, h(a) = f (a) = h(b).
Applying Th. 9.17 to h, yields the existence of some ]a, b[ satisfying h () = 0.
However, (9.27) implies h () = 0 is equivalent to (9.26). 

LHopitals rule is a result that can help to determine (function) limits (cf. Def. 8.17).

Theorem 9.23 (LHopitals Rule). Let R and either I =]a, [ with a < or
I :=], b[ with < b. Moreover, assume f, g : I R are differentiable, g (x) 6= 0 for
each x I, and one of the following two conditions (a), (b) is satisfied:

(a) limx f (x) = limx g(x) = 0.

(b) limx g(x) = or limx g(x) = , where Def. 8.17 is extended to the case
{, } in the obvious way.

Then
f (x) f (x)
lim = lim = . (9.28)
x g (x) x g(x)

The above statement also holds for {, } and/or {, } if, as in (b),


one extends Def. 8.17 to these cases in the obvious way.

Proof. First, we assume (a). Consider the case R. Since f and g are continuous, (a)
implies f and g remain continuous, if we extend them to by letting f () := g() = 0.
This extension will now allow us to apply Th. 9.22 to f and g. To prove (9.28), let
9 DIFFERENTIAL CALCULUS 136

(xk )kN be a sequence in I with limk xk = . Then (9.26) yields, for each k N,
some k ]xk , [ if xk < and some k ], xk [ if < xk , satisfying

f (xk ) f (xk ) f () f (k )
= = . (9.29)
g(xk ) g(xk ) g() g (k )
f (x)
From the Sandwich Th. 7.16, we obtain limk k = , i.e. (9.29) and limx g (x)
=
imply limx fg(x)
(x)
= (also for {, }). Now consider the case {, }
and let (xk )kN be as before. If = , then choose 1 c I and set I :=]0, c1 [; if
= , then choose 1 c I and set I :=]c1 , 0[. We apply what we have already
proved above to the auxiliary functions

f : I R, f(x) := f (1/x), g : I R, g(x) := g(1/x)



at := 0. From the chain rule (9.15), we know f (x) = f (1/x)
x2
and g (x) = g (1/x)
x2
for
f (x)
each x I. Thus, limx g (x) = implies,

f (xk ) x2k f (xk ) f (1/xk ) f(1/xk ) f (xk )


= lim = lim = lim = lim = lim ,
k g (xk ) k x2 g (xk ) k g (1/xk ) k g(1/xk ) k g(xk )
k

f (x)
proving limx g(x)
= .
We now assume (b), still letting (xk )kN be as before. Note that g 6= 0 implies g
is injective by Rolles Th. 9.16. Then the intermediate value theorem implies g is
either strictly increasing or strictly decreasing. We proceed with the proof for the case
I =]a, [, the proof for I =], b[ can be done completely analogous. We first consider
the case where g is strictly increasing, i.e. limx g(x) = . Assume R and > 0.
(x)
Then limx fg (x) = and limx g(x) = imply
 
f (x)
g(x) > 0 < <+ .
c]a,[ x]c,[ 2 g (x) 2

Since limk xk = , there exists N0 N such that, for each k > N0 , c < xk < . Next,
according to Th. 9.22,

f (xk ) f (c) f (k )
< = <+ .
k>N0 k ]c,xk [ 2 g(xk ) g(c) g (k ) 2

In consequence, using g(xk ) > g(c), as g is strictly increasing,


   
(g(xk ) g(c)) < f (xk ) f (c) < + (g(xk ) g(c))
k>N0 2 2
and
 
  f (c) 2 g(c) f (xk )   f (c) + 2 g(c)
+ < < + + .
k>N0 2 g(xk ) g(xk ) 2 g(xk )
9 DIFFERENTIAL CALCULUS 137

Since limk g(xk ) = ,


!
f (c)  g(c) f (c) +  g(c)
2 2
< < ,
N N0 k>N g(xk ) 2 g(xk ) 2

that means
f (xk )
< < + ,
k>N g(xk )
f (x)
proving limx g(x)
= . For = and given n N, the argument is similar:
(x)
limx fg (x) = and limx g(x) = imply
 
f (x)
g(x) > 0 n< .
c]a,[ x]c,[ g (x)
As before, since limk xk = , there exists N0 N such that, for each k > N0 ,
c < xk < . Again, according to Th. 9.22,
f (xk ) f (c) f (k )
n< = .
k>N0 k ]c,xk [ g(xk ) g(c) g (k )

In consequence, using g(xk ) > g(c), as g is strictly increasing,

n (g(xk ) g(c)) < f (xk ) f (c)


k>N0

and
f (c) n g(c) f (xk )
n+ < .
k>N0 g(xk ) g(xk )
Since limk g(xk ) = ,

f (c) n g(c)
< 1,
N N0 k>N g(xk )

that means
f (xk )
n1< ,
k>N g(xk )
f (x)
proving limx g(x)
= . If = , then, using what we have already shown,

f (x) f (x) f (x) f (x)


lim = lim = = lim lim = .
x g (x)
x g (x) x g(x) x g(x)

Finally, if g strictly decreasing, then g is strictly increasing and we obtain


f (x) f (x) f (x) f (x)
lim = lim = = lim lim = ,
x g (x) x g (x) x g(x) x g(x)

concluding the proof. 


10 THE RIEMANN INTEGRAL ON INTERVALS IN R 138

Example 9.24. (a) Applying LHopitals rule to f : ] /2, /2[ R, f (x) := tan x,
g : ] /2, /2[ R, g(x) := ex 1, with = 0 yields

tan x 1 + tan2 x 1
lim x
= lim x
= =1 (9.30)
x0 e 1 x0 e 1
(note g (x) = ex 6= 0 for each x ] /2, /2[).

(b) It can happen that a single application of LHopitals rule does not, yet, yield a
useful result, but that a repeated application does. An example is provided by
considering > 0, n N, and f : R+ R, f (x) := e x , g : R+ R,
g(x) := xn , := . Applying LHopitals rule n times yields
e x n e x
+ lim = lim = (9.31)
R nN x xn x n!

(note g (k) (x) = n(n 1) (n k + 1)xnk 6= 0 for each k {1, . . . , n} and each
x R+ ).

(c) It can also happen that even repeated applications of LHopitals rule do not
help at all, even though limx fg(x)(x)
does exist and the hypotheses of Th. 9.23
are all satisfied. A simple example is given by f : R R, f (x) := ex , g :
R R, g(x) := 2ex , and = . Even though limx fg(x) (x)
= 21 , one has
limx f (n) (x) = limx g (n) (x) = 0 for every n N.

10 The Riemann Integral on Intervals in R

10.1 Definition and Simple Properties


We will restrict ourselves to considering the Riemann integral for R-valued functions.
However, by applying the theory to the R-valued functions Re f and Im f , many results
can be extended to C-valued functions f . Details can be found in Appendix G. When
stating some important R-valued result that has a C-valued analogue, we will usually
provide the corresponding reference to the Appendix.
R
Given a nonnegative function f : M R+ 0 , M R, we aim to compute the area M f
of the set under the graph of f , i.e. of the set

(x, y) R2 : x M and 0 y f (x) . (10.1)
R
This area M f (if it exists) will be called the integral of f over M . Moreover, for
functions f : M R that are not necessarily nonnegative, we would like to count
areas of sets of the form (10.1)
(which are below the graph of f and above the set
M = (x, 0) R2 : x M R2 ) with a positive sign, and whereas we would like to
count areas of sets above the graph of f and below the set M with a negative sign. In
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 139

other words, making use of the positive and negative parts f + and f of f = f + f
as defined in (6.1i) and (6.1j), respectively, we would like our integral to satisfy
Z Z Z
f= +
f f . (10.2)
M M M

Difficulties arise from the fact that both the function f and the set M can be extremely
complicated. To avoid dealing with complicated sets M , we restrict ourselves to the
situation of integrals over compact intervals, i.e. to integrals over sets of the form M =
[a, b]. Moreover, we will also restrict ourselves to bounded functions f , which we now
define:

Definition 10.1. Let 6= M R and f : M R. Then f is called bounded if, and


only if, the set {|f (x)| : x M } R+
0 is bounded, i.e. if, and only if,

kf ksup := sup{|f (x)| : x M } R+


0. (10.3)
R
The basic idea for the definition of the Riemann integral M f Ris rather simple: De-
compose the set M into small pieces I1 , . . . , IN and approximate M f by the finite sum
PN R
j=1 f (xj )|Ij |, where xj Ij and |Ij | denotes the size of the set Ij . Define M f as the
limit of such sums as the size of the Ij tends to zero (if the limit exists). However, to
carry out this idea precisely and rigorously does require some work.
As stated before, we will assume that M is a closed finite interval, and we will choose
the Ij to be closed finite intervals as well. To emphasize we are dealing with intervals,
in the following, we will prefer to use the symbol I instead of M .

Definition 10.2. If a, b R, a b, and I := [a, b], then we call

|I| := b a = |a b|, (10.4)

the length or the (1-dimensional) size, volume, or measure of I.

Definition 10.3. Given a real interval I := [a, b] R, a, b R, a < b, the (N + 1)-tuple


:= (x0 , . . . , xN ) RN +1 , N N, is called a partition of I if, and only if, a = x0 <
x1 < < xN = b. We call x0 , . . . , xN the nodes of , and let () := {x0 , . . . , xN }
be the set of all nodes. A tagged partition of I is a partition together with an N -tuple
(t1 , . . . , tN ) RN such that tj [xj1 , xj ] for each j {1, . . . , N }. Given a partition
(with or without tags) of I as above and letting Ij := [xj1 , xj ], the number

|| := max |Ij | : j {1, . . . , N } , (10.5)

is called the mesh size of . It is sometimes convenient, if we extend our definitions to


trivial intervals, consisting of just one point: For a = b, we have I = [a, a] = {a}. We
then define = x0 = a to be a partition of I, () = {x0 }, and a is then the only tag
that makes into a tagged partition. We also set I0 := I = {a}, and the mesh size in
this case is || := 0.
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 140

Definition 10.4. Let be a partition of I = [a, b] R, a b, as in Def. 10.3. Given


a function f : I R that is bounded according to Def. 10.1, define

mj := mj (f ) := inf{f (x) : x Ij }, Mj := Mj (f ) := sup{f (x) : x Ij }, (10.6)

and
N
X N
X
r(, f ) := mj |Ij | = mj (xj xj1 ), (10.7a)
j=1 j=1
N
X N
X
R(, f ) := Mj |Ij | = Mj (xj xj1 ), (10.7b)
j=1 j=1

where r(, f ) is called the lower Riemann sum and R(, f ) is called the upper Riemann
sum associated with and f . If is tagged by := (t1 , . . . , tN ), then we also define
the intermediate Riemann sum
N
X N
X
(, f ) := f (tj ) |Ij | = f (tj )(xj xj1 ). (10.7c)
j=1 j=1

Note that, for a = b, all the above sums are empty and we have r(, f ) = R(, f ) =
(, f ) = 0.

Definition 10.5. Let I = [a, b] R be an interval, a b, and suppose f : I R is


bounded.

(a) Define

J (f, I) := sup r(, f ) : is a partition of I , (10.8a)

J (f, I) := inf R(, f ) : is a partition of I . (10.8b)

We call J (f, I) the lower Riemann integral of f over I and J (f, I) the upper
Riemann integral of f over I.

(b) The function f is called Riemann integrable over I if, and only if, J (f, I) = J (f, I).
If f is Riemann integrable over I, then
Z b Z Z b Z
f (x) dx := f (x) dx := f := f := J (f, I) = J (f, I) (10.9)
a I a I

is called the Riemann integral of f over I. The set of all functions f : I R that
are Riemann integrable over I is denoted by R(I).

Remark 10.6. If I = [a, b] R, , and f are as before, then (10.6) implies


(4.9c) (4.9d)
mj (f ) = Mj (f ) and mj (f ) = Mj (f ), (10.10a)
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 141

(10.7) implies

r(, f ) = R(, f ) and r(, f ) = R(, f ), (10.10b)

and (10.8) implies

J (f, I) = J (f, I) and J (f, I) = J (f, I). (10.10c)

Example 10.7. (a) If I = [a, b] R as before and f : I R is constant, i.e. f c


with c R, then f R(I) and
Z b
f = c (b a) = c |I| : (10.11)
a

We have, for each partition of I,


N
X N
X N
X
r(, f ) = mj |Ij | = c |Ij | = c |I| = c (b a) = Mj |Ij | = R(, f ),
j=1 j=1 j=1
(10.12)
proving J (f, I) = c (b a) = J (f, I).
(b) An example of a function that is not Riemann integrable for a < b is given by the
Dirichlet function
(
0 for x irrational,
f : [a, b] R, f (x) := a < b. (10.13)
1 for x rational,
P
Since r(, f ) = 0 and R(, f ) = N j=1 |Ij | = b a for every partition of I, one
obtains J (f, I) = 0 6= (b a) = J (f, I), showing that f / R(I).
Definition 10.8. (a) If is a partition of [a, b] R as in Def. 10.3, then another
partition of [a, b] is called a refinement of if, and only if, () ( ), i.e. if,
and only if, the nodes of include all the nodes of .
(b) If and are partitions of [a, b] R, then the superposition of and , denoted
+ , is the unique partition of [a, b] having () ( ) as its set of nodes. Note
that the superposition of and is always a common refinement of and .
Lemma 10.9. Let a, b R, a < b, I := [a, b], and suppose f : I R is bounded with
M := kf ksup R+
0 . Let be a partition of I and assume
 
:= # ( ) \ {a, b} 1 (10.14)

is the number of interior nodes that occur in . Then, for each partition of I, the
following holds:

r(, f ) r( + , f ) r(, f ) + 2 M ||, (10.15a)


R(, f ) R( + , f ) R(, f ) 2 M ||. (10.15b)
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 142

Proof. We carry out the proof of (10.15a) the proof of (10.15b) can be conducted
completely analogous. Consider the case = 1 and let be the single element of
( ) \ {a, b}. If (), then + = , and (10.15a) is trivially true. If
/ (),
then xk1 < < xk for a suitable k {1, . . . , N }. Define

I := [xk1 , ], I := [, xk ] (10.16)

and
m := inf{f (x) : x I }, m := inf{f (x) : x I }. (10.17)
Then we obtain

r( + , f ) r(, f ) = m |I | + m |I | mk |Ik | = (m mk ) |I | + (m mk ) |I |.
(10.18)
Together with the observation

0 m mk 2M, 0 m mk 2M, (10.19)

(10.18) implies

0 r( + , f ) r(, f ) 2M |I | + |I | 2M ||. (10.20)

The general form of (10.15a) follows by an induction on . 


Theorem 10.10. Let a, b R, a b, I := [a, b], and let f : I R be bounded.

(a) Suppose and are partitions of I such that is a refinement of . Then

r(, f ) r( , f ), R(, f ) R( , f ). (10.21)

(b) For arbitrary partitions and , the following holds:

r(, f ) R( , f ). (10.22)

(c) J (f, I) J (f, I).

(d) For each sequence of partitions (n )nN of I such that limn |n | = 0, one has

lim r(n , f ) = J (f, I), lim R(n , f ) = J (f, I). (10.23)


n n

In particular, if f R(I), then


Z
lim r(n , f ) = lim R(n , f ) = f, (10.24a)
n n I

and if f R(I) and the n are tagged, then also


Z
lim (n , f ) = f. (10.24b)
n I
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 143

Proof. (a): If is a refinement of , then = + . Thus, (10.21) is immediate


from (10.15).
(b): This also follows from (10.15):
(10.15a) (10.7) (10.15b)
r(, f ) r( + , f ) R( + , f ) R( , f ). (10.25)

(c): One just combines (10.8) with (b).


(d): For a = b, there is nothing to show. For a < b, let (n )nN be a sequence of
partitions of I such that limn |n | = 0, and let be an arbitrary partition of I with
numbers and M defined as in Lem. 10.9. Then, according to (10.15a):
r(n , f ) r(n + , f ) r(n , f ) + 2 M |n | for each n N. (10.26)

From (b), we conclude the sequence r(n , f ) nN is bounded. According to the Bolza-
no-Weierstrass Th. 7.27, if we can show that the sequence has J (f, I) as its only cluster
point, then the first equality of (10.23) must hold. Thus, according to Prop. 7.26, it suf-
fices to show that every converging subsequence of (r(n , f ))nN converges to J (f, I).
To this end, suppose (r(nk , f ))kN is a converging subsequence of (r(n , f ))nN with
:= limk r(nk , f ). First note J (f, I) due to the definition of J (f, I). More-
over, (10.26) implies limk r(nk + , f ) = . Since r( , f ) r(nk + , f ) and
is arbitrary, we obtain J (f, I) , i.e. J (f, I) = . Thus, we have shown that, indeed,
every subsequence of (r(n , f ))nN converges to = J (f, I). In the same manner, one
conducts the proof of J (f, I) = limn R(n , f ). Then (10.24a) is immediate from
the definition of Riemann integrability, and (10.24b) follows from (10.24a), since (10.7)
implies r(, f ) (, f ) R(, f ) for each tagged partition of I. 
Theorem 10.11. Let a, b R, a b, I := [a, b].

(a) The integral is linear: More precisely, if f, g R(I) and , R, then f + g


R(I) and Z Z Z
(f + g) = f + g. (10.27)
I I I
This result still holds in the C-valued situation (see Th. G.5(a)).
(b) Let = (y0 , . . . , yM ), M N, be a partition of I, Jk := [yk1 , yk ]. Then f R(I)
if, and only if, f R(Jk ) for each k {1, . . . , M }. If f R(I), then
Z b Z XM Z XM Z yk
f= f= f= f. (10.28)
a I k=1 Jk k=1 yk1

This result still holds for C-valued f (see Th. G.5(b)).


(c) Monotonicity of the Integral: If f, g : I R are bounded and f g (i.e. f (x)
g(x) for each x I), then J (f, I) J (g, I) and J (f, I) J (g, I). In particular,
if f, g R(I) and f g, then Z Z
f g. (10.29)
I I
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 144

(d) Triangle Inequality: For each f R(I), one has


Z Z

f |f |. (10.30)

I I

This result still holds for C-valued f (see Th. G.5(c)).


(e) Mean Value Theorem for Integration: If f R(I), then, for each m, M R with
m f M:
Z b Z
m (b a) = m |I| f = f M |I| = M (b a). (10.31)
a I
R
The theorems name comes from the fact that, for a < b, |I|1 I
f is sometimes
referred to as the mean value of f on I.

Proof. (a): For a = b, there is nothing to prove, so let a < b. Let (n )nN be a sequence
of partitions of I, n = (xn,0 , . . . , xn,Nn ), In,j := [xn,j1 , xn,j ], satisfying limn |n | =
0. Note that, for each n N and each j {1, . . . , Nn },
mn,j (f + g) = inf{f (x) + g(x) : x In,j }
inf{f (x) : x In,j } + inf{g(x) : x In,j }
= mn,j (f ) + mn,j (g), (10.32a)
Mn,j (f + g) = sup{f (x) + g(x) : x In,j }
sup{f (x) : x In,j } + sup{g(x) : x In,j }
= Mn,j (f ) + Mn,j (g), (10.32b)
mn,j (f ) = inf{f (x) : x In,j }
R
(
(4.9d) inf{f (x) : x In,j } = mn,j (f ) for 0,
= (10.32c)
sup{f (x) : x In,j } = Mn,j (f ) for < 0,
Mn,j (f ) = sup{f (x) : x In,j }
R
(
(4.9c) sup{f (x) : x In,j } = Mn,j (f ) for 0,
= (10.32d)
inf{f (x) : x In,j } = mn,j (f ) for < 0.

Thus, for each n N,


Nn
X
(10.23) (10.7a)
J (f + g, I) = lim r(n , f + g) = lim mn,j (f + g) |In,j |
n n
j=1
(10.32a) 
lim r(n , f ) + r(n , g) = J (f, I) + J (g, I), (10.33a)
n
Nn
X
(10.23) (10.7b)
J (f + g, I) = lim R(n , f + g) = lim Mn,j (f + g) |In,j |
n n
j=1
(10.32b) 
lim R(n , f ) + R(n , g) = J (f, I) + J (g, I),(10.33b)
n
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 145

Nn
X
(10.23) (10.7a)
J (f, I) = lim r(n , f ) = lim mn,j (f ) |In,j |
R n n
j=1
(
(10.32c) limn r(n , f ) = J (f, I) for 0,
= (10.33c)
limn R(n , f ) = J (f, I) for < 0,
Nn
X
(10.23) (10.7b)
J (f, I) = lim R(n , f ) = lim Mn,j (f ) |In,j |
R n n
j=1
(
(10.32d) limn R(n , f ) = J (f, I) for 0,
= (10.33d)
limn r(n , f ) = J (f, I) for < 0.
Thus, if f and g are both Riemann integrable over I, then we obtain J (f + g, I)
J (f, I)+J (g, I) = J (f, I)+J (g, I) J (f +g, I), i.e., by Th. 10.10(c), (f +g) R(I);
and J (f, I) = J (f, I) = J (f, I) for 0, J (f, I) = J (f, I) = J (f, I) for
< 0, i.e. (f ) R(I) in each case. In particular, for each , R,
Z Z Z
(f + g) = J (f + g, I) = J (f, I) + J (g, I) = f + g, (10.34)
I I I

proving (10.27).
(b): Once again, for a = b, there is nothing to prove, so let a < b. For M = 1, there is
still nothing to prove. For N = 2, we have a = y0 < y1 < y2 = b. Consider a sequence
(n )nN of partitions of I, n = (xn,0 , . . . , xn,Nn ), such that limn |n | = 0 and
y1 (n ) for each n N. Define n := (xn,0 , . . . , y1 ), n := (y1 , . . . , xn,Nn ). Then n
and n are partitions of J1 and J2 , respectively, and limn |n | = limn |n | = 0.
Moreover,
 

r(, f ) = r(n , f ) + r(n , f ), R(, f ) = R(n , f ) + R(n , f ) ,
nN

implying
R R J (f,RI) = J (f, J1 )+J (f, J2 ) and J (f, I) = J (f, J1 )+J (f, J2 ). This proves
I
f = J1 f + J2 f provided f R(I) R(J1 ) R(J2 ). So it just remains to show the
claimed equivalence between f R(I) and f R(J1 ) R(J2 ). If f R(J1 ) R(J2 ),
then J (f, I) = J (f, J1 )+J (f, J2 ) = J (f, J1 )+J (f, J2 ) = J (f, I), showing f R(I).
Conversely, J (f, I) = J (f, I) implies J (f, J1 ) = J (f, J1 ) + J (f, J2 ) J (f, J2 )
J (f, J1 ), showing J (f, J1 ) = J (f, J1 ) and f R(J1 ); f R(J2 ) follows completely
analogous. The general case now follows by induction on M .
(c): If f, g : I R are bounded and f g, then, for each partition of I, r(, f )
r(, g) and R(, f ) R(, g) are immediate from (10.7). As these inequalities are
preserved when taking the sup and the inf, respectively, all claims of (c) are established.
(d): We will see in Th. 10.17(b)
R below,
R that f R R(I)R implies |f | R(I). Since f |f |
and f |f |, (c) implies I f I |f | and I f I |f |, i.e. (10.30).
(e): We compute
Z (c)
Z (c)
Z
(10.11) (10.11)
m |I| = m f M = M |I|, (10.35)
I I I
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 146

thereby establishing the case. 

Theorem 10.12 (Riemanns Integrability Criterion). Let I = [a, b] R and suppose


f : I R is bounded. Then f is Riemann integrable over I if, and only if, for each
> 0, there exists a partition of I such that

R(, f ) r(, f ) < . (10.36)

Proof. Suppose, for each > 0, there exists a partition of I such that (10.36) is
satisfied. Then
J (f, I) J (f, I) R(, f ) r(, f ) < , (10.37)
showing J (f, I) J (f, I). As the opposite inequality always holds, we have J (f, I) =
J (f, I), i.e. f R(I) as claimed. Conversely, if f R(I) and (n )nN is a sequence of
partitions of I with limn |n | = 0, then (10.24a) implies that, for each > 0, there
is N N such that R(n , f ) r(n , f ) < for each n > N . 

The previous theorem will allow us to prove that every continuous function on [a, b] is
Riemann integrable. However, we will also need to make use of the following result:

Proposition 10.13. Let I = [a, b] R, a b, f : I R. If f is continuous, then f


is even uniformly continuous, i.e.

+ + |x y| < |f (x) f (y)| < . (10.38)
R R x,yI

Proof. Arguing by contraposition, we assume f not to be uniformly continuous on I.


Then the negation of (10.38) must hold, i.e.

+ + |x y| < |f (x) f (y)| 0 . (10.39)
0 R R x,yI

In particular, for each n N, there exist xn , yn I such that

|xn yn | < n := 1/n (10.40)

and |f (xn ) f (yn )| 0 . Then the sequence (xn )nN is bounded and the Bolzano-
Weierstrass Th. 7.27 provides a convergent subsequence (x(n) )nN , i.e. there is R
with limn x(n) = . Clearly, [a, b] and (10.40) implies limn y(n) = as
well. However, due to |f (x(n) ) f (y(n) )| 0 > 0, the sequences f (x(n) ) nN and

f (y(n) ) nN can not both converge to f (), showing that f can not be continuous. 

Caveat 10.14. It is important in Prop. 10.13 that f is defined on a compact interval I.


The examples f : ]0, 1] R, f (x) := 1/x, and f : R R, f (x) := x2 are examples
of continuous functions that are not uniformly continuous.

Theorem 10.15. Let I = [a, b] R, a b, f : I R.

(a) If f is continuous, then f is Riemann integrable over I.


10 THE RIEMANN INTEGRAL ON INTERVALS IN R 147

(b) If f is increasing or decreasing, then f is Riemann integrable over I.

Proof. (a): For a = b, there is nothing to prove, so let a < b. First note that, if f
is continuous on I = [a, b], then f is bounded by Th. 7.54. Moreover, f is uniformly
continuous due to Prop. 10.13. Thus, given > 0, there is > 0 such that |x y| <
implies |f (x) f (y)| < /|I| for each x, y I. Then, for each partition of I satisfying
|| < , we obtain
N N
X X
R(, f ) r(, f ) = (Mj mj )|Ij | |Ij | = , (10.41)
j=1
|I| j=1

as || < implies |x y| < for each x, y Ii and each j {1, . . . , N }. Finally, (10.41)
implies f R(I) due to Riemanns integrability criterion of Th. 10.12.
(b): Suppose f : [a, b] R is increasing. Then f is bounded, as f (a) f (x) f (b)
for each x [a, b]. Moreover, if = (x0 , . . . , xN ) is a partition of I as in Def. 10.3, then
N
X N
X  
R(, f ) r(, f ) = (Mj mj )|Ij | = f (xj ) f (xj1 ) |Ij | || f (b) f (a) .
j=1 j=1
(10.42)
Thus, given > 0, we have R(, f ) r(, f ) < for each partition of I satisfying
|| < /(f (b) f (a)). In consequence, f R(I), once again due to Riemanns integra-
bility criterion of Th. 10.12. If f is decreasing, then f is increasing, and Th. 10.11(a)
establishes the case. 
Definition and Remark 10.16. Let M R. A function f : M R f is called
Lipschitz continuous in M with Lipschitz constant L if, and only if,
|f (x) f (y)| L |x y|. (10.43)
LR+ x,yM
0

Every Lipschitz continuous function is, indeed, continuous, since, if M and (yn )nN
is a sequence in M with limn yn = , then (10.43) implies
|f () f (yn )| L | yn |, (10.44)
nN

proving limn f (yn ) = f (). Moreover, it is not too much harder to prove Lipschitz
continuous functions are even uniformly continuous,but we will not pursue this right
now. On the other hand, f : R+ 0 R, f (x) := x, is an example of a continuous
function (actually, even uniformly continuous) that is not Lipschitz continuous.
Theorem 10.17. Let a, b R, a b, I := [a, b].

(a) If f R(I) and : f (I) R is Lipschitz continuous, then f R(I). For


C-valued extensions of this result, see Th. G.4(b),(c).
(b) If f R(I), then |f |, f 2 , f + , f R(I). In particular, we, indeed, have (10.2)
from the introduction (with M replaced by I). If, in addition, there exists > 0
such that f (x) for each x I, then 1/f R(I).
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 148

(c) If f, g R(I), then f g, max(f, g), min(f, g) R(I). If, in addition, there exists
> 0 such that g(x) for each x I, then f /g R(I). For the product and the
quotient, the result remains true for C-valued f, g (see Th. G.4(a)).

Proof. (a): Let f R(I) and let : f (I) R be Lipschitz continuous. Then there
exists L 0 such that

(x) (y) L |x y| for each x, y f (I). (10.45)

As f R(I), given > 0, Th. 10.12 provides a partition of I such that R(, f )
r(, f ) < /L, and we obtain
N
X 
R(, f ) r(, f ) = Mj ( f ) mj ( f ) |Ij |
j=1
N
X 
L Mj (f ) mj (f ) |Ij |
j=1

= L R(, f ) r(, f ) < . (10.46)

Thus, f R(I) by another application of Th. 10.12.


(b): |f |, f 2 , f + , f R(I) follows from (a), since each of the maps x 7 |x|, x 7 x2 ,
x 7 max{x, 0}, x 7 min{x, 0} is Lipschitz continuous on the bounded set f (I) (recall
that f R(I) implies that f is bounded). Since f = f + f , (10.2) is implied by
(10.27). Finally, if f (x) > 0, then x 7 x1 is Lipschitz continuous on the bounded
set f (I), and f 1 R(I) follows from (a).
(c): Since
1
fg = (f + g)2 (f g)2 , (10.47a)
4
max(f, g) = f + (g f )+ , (10.47b)
min(f, g) = g (f g) , (10.47c)

everything is a consequence of (b). 

10.2 Important Theorems


This section compiles a number of important theorems on Riemann integrals, which, in
particular, provide powerful tools to actually evaluate such integrals.

10.2.1 Fundamental Theorem of Calculus

We provide two variants of the fundamental theorem with slightly different flavors: In
the first variant, Th. 10.19(a), we start with a function f , obtain another function F
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 149

by means of integrating f , and recover f by taking the derivative of F . In the second


variant, Th. 10.19(b), one first differentiates the given function F , obtaining f := F ,
followed by integrating f , recovering F up to an additive constant.
Notation 10.18. If a, b R, a b, I := [a, b], f : I R, then denote
Z b Z Z a Z b
f := f, f := f, (10.48a)
a I b a

[f (t)]ba := [f ]ba := f (b) f (a), [f (t)]ab := [f ]ab := f (a) f (b), (10.48b)

where f R(I) for (10.48a).


Theorem 10.19. Let a, b R, a < b, I := [a, b].

(a) If f R(I) is continuous in I, then, for each c I, the function


Z x
Fc : I R, Fc (x) := f (t) dt , (10.49)
c

is differentiable in with F () = f (). In particular, if f C(I), then F C 1 (I)


and F (x) = f (x) for each x I.
(b) If F C 1 (I) or, alternatively, F is differentiable with integrable derivative F
R(I), then
Z b
b
F (b) F (a) = [F (t)]a = F (t) dt , (10.50a)
a

and
Z x
F (x) = F (c) + F (t) dt for each c, x I. (10.50b)
c

Both (a) and (b) extend to the C-valued situation (see Th. G.6).

Proof. (a): We need to show that


F ( + h) F ()
lim A(h) = 0, where A(h) := f (). (10.51)
h0 h
One computes
Z +h Z +h Z +h
1 1 1 
A(h) = f (t) dt f () dt = f (t) f () dt . (10.52)
h h h

Now, given > 0, the continuity of f in allows us to find > 0 such that |(f (t)f ()| <
/2 for each t with |t | < . Thus, for each h with |h| < , we obtain
Z
1 +h h
|A(h)| f (t) f () dt < , (10.53)
h 2h
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 150

thereby proving limh0 A(h) = 0, i.e. f () = F ().


(b): First assume F C 1 (I). Then F is continuous on I, and we can apply (a) to the
function Z x
G : I R, G(x) := F (t) dt , (10.54)
a
to obtain G = F . Thus, for H := F G, we obtain H 0, showing that H must be
constant on I, i.e. H(x) = H(a) = F (a) G(a) = F (a) for each x I. Evaluating at
x = b yields Z b
F (a) = H(b) = F (b) F (t) dt , (10.55)
a
thereby establishing the case.
Now we consider the remaining case of a differentiable F with integrable derivative
F R(I). Consider a partition = (x0 , . . . , xN ) of I as in Def. 10.3. Then, for
each j {1, . . . , N }, the mean value theorem provides j ]xj1 , xj [ such that F (xj )
F (xj1 ) = (xj xj1 ) F (j ). Thus,
N
X N
X

F (b) F (a) = F (xj ) F (xj1 ) = (xj xj1 ) F (j ) = (, F ). (10.56)
j=1 j=1

If we choose a sequence of partitions of I such that || 0, then the integrability of


Rb
f implies that the right-hand side of (10.56) converges to a F , once again establishing
the case. 
Definition 10.20. If I R, f : I K, and F : I K is a differentiable function
with F = f , then F is called a primitive or antiderivative of f .
Example 10.21. Due to the fundamental theorem, if we know a functions antideriva-
tive, we can easily compute its integral over a given interval. Here are three simple
examples:
Z 1  6 1
5 x 3x2 1 3 4
(x 3x) dx = = = , (10.57a)
0 6 2 0 6 2 3
Z e
1
dx = [ln x]e1 = ln e ln 1 = 1, (10.57b)
1 x
Z
sin x dx = [ cos x]0 = 2. (10.57c)
0

10.2.2 Integration by Parts Formula

Theorem 10.22. Let a, b R, a < b, I := [a, b]. If f, g C 1 (I), then the following
integration by parts formula holds:
Z b Z b
b
f g = [f g]a f g. (10.58)
a a

The theorem extends to the C-valued situation (see Th. G.7).


10 THE RIEMANN INTEGRAL ON INTERVALS IN R 151

Proof. If f, g C 1 (I), then, according to the product rule, f g C 1 (I) with (f g) =


f g + f g . Applying (10.50a), we obtain
Z b Z b Z b

b
[f g]a = (f g) = f g+ f g, (10.59)
a a a

which is precisely (10.58). 


R 2
Example 10.23. We compute the integral 0
sin2 t dt :
Z 2 Z 2 Z 2
2 2 2
sin t dt = [ sin t cos t]0 + cos t dt = cos2 t dt . (10.60)
0 0 0
R 2
Adding 0
sin2 t dt on both sides of (10.60) and using sin2 + cos2 1 yields
Z 2 Z 2
2
2 sin t dt = 1 dt = 2, (10.61)
0 0
R 2
i.e. 0
sin2 t dt = .

10.2.3 Change of Variables

Theorem 10.24. Let I, J R be intervals, C 1 (I) and f C(J). If (I) J,


then the following change of variables formula holds for each a, b I:
Z (b) Z (b) Z b Z b

f= f (x) dx = f ((t)) (t) dt = (f ) . (10.62)
(a) (a) a a

The theorem extends to the situation where f is C-valued (see Th. G.8).

Proof. We consider the function


Z x
F : J R, F (x) := f (t) dt . (10.63)
(a)

According to Th. 10.19(a) and the chain rule of Th. 9.10, we obtain
(F ) : I R, (F ) (x) = (x)f ((x)). (10.64)
Thus, we can apply (10.50a), which yields
Z (b) Z b
f = F ((b)) F ((a)) = (f ) , (10.65)
(a) a

proving (10.62). 
R 1 2
Example 10.25. We compute the integral 0 t 1 t dt using the change of variables
x := (t) := 1 t, (t) = 1:
Z 1 Z 0 Z 1
2
2

t 1 t dt = (1 x) x dx = ( x 2x x + x2 x) dx
0 1 0
" 3 5 7
#1
2x 2 4x 2 2x 2 16
= + = . (10.66)
3 5 7 105
0
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 152

10.3 Improper Integrals


For our definition of the Riemann integral in Def. 10.5, it was important that we con-
sidered bounded functions on compact intervals (where the boundedness of the intervals
was more important than the closedness) for unbounded functions and/or unbounded
intervals, even Def. 10.4 of lower and upper Riemann sums no longer makes sense.
Still, for sufficiently benign functions, it is possible to extend the notion of a definite
Riemann integral to both unbounded intervals and unbounded functions, and in such
situations we will speak of improper integrals (cf. Def. 10.29 below).
Definition 10.26. Let 6= I R be an interval. We call f : R R to be locally
Riemmann integrable if, and only if, f R(J) for each compact interval J I. Let
Rloc (I) denote the set of all locally Riemmann integrable functions on I.
Remark 10.27. In particular, locally Riemmann integrable functions are bounded on
every compact interval. Moreover, Rloc (I) = R(I) if, and only if, I is a compact interval.
For example, for each a, b R with a < b, the function given by the assignment rule
1
f (x) :=
(x a)(x b)
is clearly locally Riemmann integrable, but not bounded on each of the intervals ], a[,
]a, b[, and ]b, [.

Before we can define improper Riemann integral, we define, in partial extension of Def.
8.17:
Definition 10.28. Let M R. If M is unbounded from above (resp. below, then
f : M K is said to tend to K (or to have the limit K) for x (resp., for
x ) (denoted by limx f (x) = ) if, and only if, for each sequence (k )kN in M
with limk k = (resp. with limk k = ), the sequence (f (k ))kN converges
to K, i.e.
 
lim f (x) = lim k = lim f (k ) = . (10.67)
x (k )kN in M k k

Definition 10.29. Let a < c < b (a = , b = is admissible).

(a) Let I := [c, b[, f Rloc (I), and assume b = or f is unbounded. Consider the
function Z x
F : I R, F (x) := f.
c
If the limit Z x
lim F (x) = lim f (10.68)
xb xb c
exists in R, then we define
Z Z b Z b Z x
f := f (t) dt := f := lim f.
I c c xb c
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 153

(b) Let I :=]a, c], f Rloc (I), and assume a = or f is unbounded. Consider the
function Z c
F : I R, F (x) := f.
x
If the limit Z c
lim F (x) = lim f (10.69)
xa xa x
exists in R, then we define
Z Z c Z c Z c
f := f (t) dt := f := lim f.
I a a xa x

(c) Let I =]a, b[ , f Rloc (I). If the conditions of both (a) and (b) hold, i.e. (i) (iv),
where

(i) b = or f is unbounded on [c, b[,


Rx
(ii) limxb c f exists in R,
(iii) a = or f is unbounded on ]a, c],
Rc
(iv) limxa x f exists in R,

then we define Z Z Z Z Z
b b c b
f := f (t) dt := f := f+ f.
I a a a c

All the above limits of Riemann integrals (if they exist) are called improper Riemann
integrals. In each case, if the limit exists, we call f improperly Riemann integrable and
write f R(I).

Remark 10.30. (a) The definitions in Def. 10.29 are consistent with what occurs if
the limits are proper Riemann integrals: Let a, c, b R, a < c < b, and f R[a, b].
Then Z Z Z Z
x b c c
lim f= f and lim f= f. (10.70)
xb c c xa x a

Indeed, since f R[a, b], |f | is bounded by some M R+ ; and if (xk )kN is a


sequence in [a, b[ such that limk xk = b, then
Z b Z b

f |f | M (b xk ) 0 for k ,

xk xk

implying
Z xk Z b Z b  Z b Z b
Th. 10.11(b)
lim f = lim f f = f 0= f.
k c k c xk c c

An analogous argument shows the remaining equality in (10.70).


10 THE RIEMANN INTEGRAL ON INTERVALS IN R 154

R
(b) In Def. 10.29(c), it can occur that f does not exist, even though the limit
Rx
limx x f exists: For example, if f : R R, f (x) = x, then f Rloc (R), and,
for each sequence (xk )kN in R such that limk xk = and each c R, one has
Z xk  2  xk
t x2 x2k
lim t dt = lim = lim k = 0,
k x k 2 k 2
k xk
Z xk  2  xk
t x2 c 2
lim t dt = lim = lim k = ,
k c k 2 k 2
c
Z c  2 c
t c2 x2k
lim t dt = lim = lim = ,
k x k 2 k 2
k xk
Rx Rx Rc
i.e. limx x t dt = 0, but neither limx c t dt nor limx x t dt exists in
R.
(c) Let a < c1 < c2 < b (a = , b = is admissible). If I := [c1 , b[, f Rloc (I), and
Rb Rb
b = or f is unbounded, then c1 f exists if, and only if, c2 f exists. Moreover, if
the integrals exist, then Z Z Z
b c2 b
f= f+ f. (10.71a)
c1 c1 c2
Rb
Indeed, if (xk )k is a sequence in [c1 , b[ such that limk xk = b and if c1
f exists,
then Z Z Z  Z Z
xk xk c2 b c2
Th. 10.11(b)
lim f = lim f f = f f,
k c2 k c1 c1 c1 c1
Rb Rb
proving c2
f exists and (10.71a) holds. Conversely, if c2 f exists, then
Z xk Z x k Z c2  Z b Z c2
Th. 10.11(b)
lim f = lim f+ f = f+ f,
k c1 k c2 c1 c2 c1
Rb
proving c1 f exists and (10.71a) holds. Analogously, one shows that if I :=]a, c2 ],
Rc Rc
f Rloc (I), and a = or f is unbounded, then a 1 f exists if, and only if, a 2 f
exists, where, if the integrals exist, then
Z c2 Z c2 Z c1
f= f+ f. (10.71b)
a c1 a

In particular, we see that neither the existence nor the value of the improper integral
in Def. 10.29(c) depends on the choice of c.
Example 10.31. (a) Let 0 < < 1. We claim that
Z 1  Z 1 
1 1 1 1

dt = = yields dt = 2 . (10.72)
0 t 1 2 0 t
Indeed, if (xk )kN is a sequence in ]0, 1] such that limk xk = 0, then
Z 1  1 1
1 t 1 xk1 1
lim
dt = lim = lim = .
k x t k 1 k 1 1
k xk
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 155

(b) If (xk )kN is a sequence in ]0, 1] such that limk xk = 0, then


Z 1 h i1  
1
lim dt = lim ln t = lim 0 ln xk = ,
k x t k xk k
k

showing the limit does not exist in R, but diverges to . Sometimes, this is stated
in the form Z 1
1
dt = . (10.73)
0 t

(c) We claim that Z 0


et dt = 1. (10.74)

Indeed, if (xk )kN is a sequence in R0 such that limk xk = , then


Z 0 h i0
lim t
e dt = lim et = lim (1 exk ) = 1.
k xk k xk k

(d) Consider the function


(
1
n for n t n + , n N,
f: R+
0 R, f (t) := n2n
0 otherwise.

Then limt f (t) does not exist and f is not even bounded. However f R(R+
0)
and Z Z n+1/(n2n )
X X 1
f= n dt = 2n = 1 = 1.
0 n=1 n n=1
1 12

Lemma 10.32. Let a < c < b (a = , b = is admissible). Let I ]a, b[ be one of


the three kinds of intervals occurring in Def. 10.29 (i.e. I = [c, b[, I =]a, c], or I =]a, b[),
and assume f, g : I R to be improperly Riemann integrable over I.

(a) Linearity: For each , R, f + g is improperly Riemann integrable over I and


Z Z Z
(f + g) = f + g.
I I I

(b) Monotonicity: If f g, then Z Z


f g.
I I

Proof. We conduct the proof for the case I = [c, b[ the case I =]a, b] can be shown
analogously, and the case I =]a, b[ then also follows. Let (xk )kN be a sequence in I
such that limk xk = b.
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 156

(a): One computes


Z xk  Z xk Z xk  Z b Z b
Th. 10.11(a)
lim f + g) = lim f + g = f + g,
k c k c c c c

showing (f + g) R(I) and proving (a).


(b): One estimates
Z b Z xk Th. 10.11(c)
Z xk Z b
f = lim f lim g= g,
c k c k c c

proving (b). 

Definition 10.33. Let a < c < b (a = , b = is admissible). Let I ]a, b[ be


one of the three kinds of intervals occurring in Def. 10.29 (i.e. I = [c, b[, IR =]a, c], or
I =]a, b[), and assume f Rloc (I). Then, by Th. 10.17(b), |f
R | Rloc (I). If I |f | exists
as an improper integral, then we call the improper integral I f absolutely convergent.

Before we can proceed to Prop. 10.35 about convergence criteria for improper integrals,
we need to prove the analogon of Th. 7.19 for limits of functions.

Proposition 10.34. Let 6= M R, a R {}, b R {}, and assume


(
inf(M \ {a}) if M is bounded from below,
a= (10.75a)
if M is unbounded from below,
(
sup(M \ {a}) if M is bounded from above,
b= (10.75b)
if M is unbounded from above.

Let f : M R be monotone (increasing or decreasing). Defining A := f (M ) =


{f (x) : x M }, the following holds:


sup A if f is increasing and A is bounded from above,

if f is increasing and A is not bounded from above,
lim f (x) = (10.76a)
xb

inf A if f is decreasing and A is bounded from below,

if f is decreasing and A is not bounded from below,


sup A if f is decreasing and A is bounded from above,

if f is decreasing and A is not bounded from above,
lim f (x) = (10.76b)
xa

inf A if f is increasing and A is bounded from below,

if f is increasing and A is not bounded from below.

Proof. We prove (10.76a) for the case, where f is increasing the remaining case of
(10.76a) as well as (10.76b) can be proved completely analogous. Let (xk )kN be a
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 157

sequence in M \ {b} such that limk xk = b. We have to show that limk f (xk ) = ,
where := sup A for A bounded from above and := for A not bounded from above.
Seeking a contradiction, assume limk f (xk ) = does not hold. Due to the choice of
b, there then must be > 0 and a subsequence (yk )kN of (xk )kN such that (yk )kN is
strictly increasing and
(
if = sup A,
f (yk )
kN if = .

Since limk yk = b and f is increasing, this means sup A or sup A = , which


means a contradiction in each case. Thus, limk f (xk ) = must hold and the proof
is complete. 
Proposition 10.35. Let a < c < b (a = , b = is admissible). Let I ]a, b[ be
one of the three kinds of intervals occurring in Def. 10.29 (i.e. I = [c, b[, I =]a, c], or
I =]a, b[), and assume f Rloc (I).
R R
(a) If g Rloc (I), R0 f g, and I Rg exists, then I f exists as well. Conversely, if
0 g f and I g diverges, then I f diverges as well.
R
(b) If I f is an improper integral that is absolutely convergent, then it is also conver-
gent.

Proof. (a): We consider the case I = [c, b[ the proof for the case I =]a, c] is completely
R
analogous, and the case I =]a, b[ then also follows. First, suppose 0 f g, and I g
exists. Since 0 f , the function
Z x
+
F : [c, b[ R0 , F (x) := f,
c

is increasing. Due to
Z x Z x Z b
F (x) = f g g R+
0,
x[c,b[ c c c

F is also bounded from above (in the sense that {F (x) R: x [c, b[} is bounded from
x
above), i.e. Prop. 10.34 yields that limxb F (x) = limxb c f exists in R as claimed.
R
Now suppose 0 g f and I g diverges. As the function F above, the function
Z x
+
G : [c, b[ R0 , G(x) := g,
c

is increasing. Since we assume that limxb G(x) does not exist in R, Prop. 10.34 im-
plies limxb G(x) = . As a consequence, if (xk )kN is a sequence in [c, b[ such that
limk xk = b, then
Z xk Z xk
lim F (xk ) = lim f = lim g = ,
k k c k c
10 THE RIEMANN INTEGRAL ON INTERVALS IN R 158

R
showing that f diverges as well.
I
R R
(b): We assume I f to converge absolutely, i.e. I |f |Rmust exist inRR. Since 0 f + |f |
and 0 f |f |, R(a) then
R implies
R the existence of I f + and of I f . Thus, according
+
to Lem. 10.32(a), I f = I f I f must also exist. 
Example 10.36. (a) We will use Prop. 10.35(a) to show that the improper integral
Z
2
et dt
0

exists. Indeed,
 2

(t 1)2 = t2 2t + 1 0 t2 2t + 1 0 et e2t+1 ,
tR

and, since
Z Z x  x
2t+1 2t+1 e2t+1 e e2x+1 e
e dt = lim e dt = lim = lim = ,
0 x 0 x 2 0
x 2 2
R t2
Prop. 10.35(a) implies that 0
e dt exists in R.

(b) We will use Prop. 10.35(a) to show that


Z
2
et dt
0

diverges. Indeed,  
2
t2 0 et 1 ,
tR

and, since Z x
lim 1 dt = lim x = ,
x 0 x
R t2
Prop. 10.35(a) implies that 0
e dt = .

(c) We provide an example that shows an improper integral can converge without
converging absolutely: Consider the function
(
(1)n+1 for n t n + n1 , n N,
f : [0, [ R, f (t) := (10.77)
0 otherwise.

Then Z n Z k+ k1 n
X X 1 (7.74)
|f | = lim 1 dt = lim = , (10.78)
0 k
k=1 k k
k=1
k
R
showing 0
f does not converge absolutely. However, we will show
Z
X (1)j+1
f= =: > 0. (10.79)
0 j=1
j
A LOGIC AND SET THEORY 159

We know > 0 from Ex. 7.86(a) and Th. 7.85. Let (xk )kN be a sequence in R+ 0
such that limk xk = . Given > 0, choose K N such that K1 < 2 and N N
such that
xk > K. (10.80)
k>N

Then, for each k > N , there exists K1 N such that K < K1 xk < K1 + 1. Thus
Z xk   KX 1 1
1 (1)j+1
f (t) dt = min xk K1 , + (10.81)
0 K1 j=1
j

and
X 
(1)j+1
Z xk 

1 (7.81) 1 1
f (t) dt = min xk K1 , < +

0
j=K
j K1 K1 K1
1

2
< < 2 = , (10.82)
K 2
thereby proving (10.79).

A Logic and Set Theory

A.1 Principle of Duality


In Th. 1.11, there are several pairs of rules that have an analogous form: (c) and (d), (e)
and (f), (g) and (h), (i) and (j). These analogies are due to the general law called the
principle of duality: If (A1 , . . . , An ) (A1 , . . . , An ) and only the operators , ,
occur in and , then the reverse implication (A1 , . . . , An ) (A1 , . . . , An ) holds,
where one obtains from and from by replacing each with and each with
. In particular, if, instead of an implication, we start with an equivalence (as in the
examples from Th. 1.11), then we obtain another equivalence.

A.2 Russells Antinomy


Russells antinomy is a contradiction named after Bertrand Russell, who described it
in 1901, showing that naive set theory, founded on the definition of a set according
to Cantor (as stated at the beginning of Sec. 1.3) is not suitable to be used in the
foundation of mathematics. This let to the development of axiomatic set theory, where
the construction of sets is restricted via so-called axioms (see, e.g., [Kun80]).
Russells antinomy is obtained when considering the the set X of all sets that do not
contain themselves as an element: When asking the question if X X, one obtains the
contradiction that X X X / X:
Suppose X X. Then X is a set that contains itself. But X was defined to contain
only sets that do not contain themselves, i.e. X
/ X.
A LOGIC AND SET THEORY 160

So suppose X
/ X. Then X is a set that does not contain itself. Thus, by the definition
of X, X X.
Perhaps you think Russells construction is rather academic, but it is easily translated
into a practical situation. Consider a library. The catalog C of the library should contain
all the librarys books. Since the catalog itself is a book of the library, it should occur
as an entry in the catalog. So there can be catalogs such as C that have themselves as
an entry and there can be other catalogs that do not have themselves as an entry. Now
one might want to have a catalog X of all catalogs that do not have themselves as an
entry. As in Russells antinomy, one is led to the contradiction that the catalog X must
have itself as an entry if, and only if, it does not have itself as an entry.
One can construct arbitrarily many versions, which we will not do. Just one more:
Consider a small town with a barber, who, each day, shaves all inhabitants, who do not
shave themselves. The poor barber now faces a terrible dilemma: He will have to shave
himself if, and only if, he does not shave himself.

A.3 Power Sets and Characteristic Functions


In the following, we explain the common notation 2A for the power set P(A) of a set
A. It is related to a natural identification between subsets and their corresponding
characteristic function.
Definition A.1. Let A be a set and let B A be a subset of A. Then
(
1 if x B,
B : A {0, 1}, B (x) := (A.1)
0 if x / B,

is called the characteristic function of the set B (with respect to the universe A). One
also finds the notations 1B and 1B instead of B (note that all the notations supress
the dependence of the characteristic function on the universe A).
Proposition A.2. Let A be a set. Then the map
: P(A) {0, 1}A , (B) := B , (A.2)
is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set of
all functions from A into {0, 1}).

Proof. is injective: Let B, C P(A) with B 6= C. By possibly switching the names


of B and C, we may assume there exists x B such that x / C. Then B (x) = 1,
whereas C (x) = 0, showing (B) 6= (C), proving is injective.
is surjective: Let f : A {0, 1} be an arbitrary function and define B := {x A :
f (x) = 1}. Then (B) = B = f , proving is surjective. 

Proposition A.2 allows one to identify the sets P(A) and {0, 1}A via the bijective map
. This fact and recalling that one can define the number 2 as the set {0, 1} (cf. Rem.
1.27) explains the notation 2A for P(A).
A LOGIC AND SET THEORY 161

A.4 The Axiom of Choice


The axiom of choice is one of the axioms of axiomatic set theory for the admissible
construction of sets (cf. Sec. A.2).
Usually, modern mathematics is founded on a collection of axioms called ZF, the axioms
of Zermelo-Fraenkel set theory (named after two mathematicians), plus the axiom of
choice. For details, it is, once again, referred to [Kun80]. The axioms in ZF include rules
that guarantee the existence of the empty set, pairs, functions, the natural numbers, etc.
Nearly every construction one can think of is admissible in ZF, with certain exceptions
to avoid contradictions such as Russells antinomy from Sec. A.2.

Definition A.3 (Axiom of Choice). The axiom of choice postulates, for each nonempty
set M, whose elements are all nonempty sets, the existence of a choice function, that
means a function that assigns, to each M M, an element m M . Thus, the axiom
of choice postulates the truth of the following implication for each set M:
 
/M S f (N ) N . (A.3)
f : M N N M
N M

Example A.4. For example, the axiom of choice postulates, for each nonempty set A,
the existence of a choice function on P(A) \ {} that assigns each subset of A one of its
elements.

The axiom of choice is remarkable since, at first glance, it seems so natural that one
can hardly believe it is not provable from the axioms in ZF. However, one can actually
show that it is neither provable nor disprovable from ZF (such a result is called an
independence proof and this particular independence proof is one of several included in
[Kun80]).
If you want to convince yourself that the existence of choice functions is, indeed, a tricky
matter, try to define a choice function on P(R) \ {} without AC (but do not spend too
much time on it one can show this is actually impossible to accomplish).

A.5 Rules Concerning Functions and Set-Theoretic Operations


Theorem A.5. Let f : A B be a map, let 6= I be an index set, and assume S, T, Si ,
i I, are subsets of A, whereas U, V, Ui , i I, are subsets of B. Then we have the
A LOGIC AND SET THEORY 162

following rules concerning functions and set-theoretic operations:


f (S T ) f (S) f (T ), (A.4a)
!
\ \
f Si f (Si ), (A.4b)
iI iI
f (S T ) = f (S) f (T ), (A.4c)
!
[ [
f Si = f (Si ), (A.4d)
iI iI
f 1 (U V ) = f (U ) f 1 (V ),
1
(A.4e)
!
\ \
f 1 Ui = f 1 (Ui ), (A.4f)
iI iI
1
f (U V ) = f (U ) f 1 (V ),
1
(A.4g)
!
[ [
f 1 Ui = f 1 (Ui ), (A.4h)
iI iI
f (f (U )) U, f 1 (f (S)) S,
1
(A.4i)
f 1 (U \ V ) = f 1 (U ) \ f 1 (V ). (A.4j)

Proof. For (A.4b), which includes (A.4a) as a special case, one argues
!
\  \
yf Si x Si y = f (x) y f (Si ) y f (Si ).
xA iI iI
iI iI

Since (A.4c) is a special case of (A.4d), it suffices to prove (A.4d):


!
[  [
yf Si x Si y = f (x) y f (Si ) y f (Si ).
xA iI iI
iI iI

Next, we prove (A.4f), which includes (A.4e) as a special case:


!
\ \
x f 1 Ui f (x) Ui f (x) Ui x f 1 (Ui )
iI iI
iI iI
\
x f 1 (Ui ).
iI

We proceed to prove (A.4g), which includes (A.4h) as a special case:


!
[ [
x f 1 Ui f (x) Ui f (x) Ui x f 1 (Ui )
iI iI
iI iI
[
x f 1 (Ui ).
iI
A LOGIC AND SET THEORY 163

Proof of the first part of (A.4i):



y f (f 1 (U )) x f 1 (U ) y = f (x) y U.
xA

The observation

x S f (x) f (S) x f 1 (f (S)).

establishes the second part of (A.4i).


Finally,

x f 1 (U \ V ) f (x) U f (x)
/ V x f 1 (U ) \ f 1 (V ),

which proves (A.4j). 

Example A.6. The following example shows that one can not, in general, replace
the four subset symbols in (A.4) by equalities: For the map f : {1, 2} {1, 2}, f (1) =
f (2) = 1, it is f ({1} {2}) = ( {1} = f ({1}) f ({2}), f (f 1 ({1, 2})) = {1} ( {1, 2},
f 1 (f ({1})) = {1, 2} ) {1}.

A.6 Cardinality
Theorem A.7. Let M be a set of sets. Then the relation on M, defined by

A B : A and B have the same cardinality, (A.5)

constitutes an equivalence relation on M.

Proof. According to Def. 2.20, we have to prove that is reflexive, symmetric, and
transitive. According to Def. 3.12(a), A B holds for A, B M if, and only if, there
exists a bijective map f : A B. Thus, since the identity Id : A A is bijective,
A A, showing is reflexive. If A B, then there exists a bijective map f : A B,
and f 1 is a bijective map f 1 : B A, showing B A and that is symmetric.
If A B and B C, then there are bijective maps f : A B and g : B C.
Then, according to Th. 2.13, the composition (g f ) : A C is also bijective, proving
A C and that is transitive. 

It is intuitively clear that finite cardinalities are uniquely determined. Still one has to
provide a rigorous proof. The key is the following theorem:

Theorem A.8. If m, n N and the map f : {1, . . . , m} {1, . . . , n} is bijective,


then m = n.
A LOGIC AND SET THEORY 164

Proof. We conduct the proof via induction on m. If m = 1, then the surjectivity of f


implies n = 1. For the induction step, we now consider m > 1. From the bijective map
f , we define the map

n
for x = m,
g : {1, . . . , m} {1, . . . , n}, g(x) := f (m) for x = f 1 (n), (A.6)


f (x) otherwise.
Then g is bijective, since it is the composition g = h f of the bijective map f with the
bijective map
h : {f (m), n} {f (m), n}, h(f (m)) := n, h(n) := f (m). (A.7)
Thus, the restriction g{1,...,m1} : {1, . . . , m1} {1, . . . , n1} must also be bijective,
such that the induction hypothesis yields m 1 = n 1, which, in turn, implies m = n
as desired. 
Corollary A.9. Let m, n N and let A be a set. If #A = m and #A = n, then m = n.

Proof. If #A = m, then, according to Def. 3.12(b), there exists a bijective map f :


A {1, . . . , m}. Analogously, if #A = n, then there exists a bijective map g :
A {1, . . . , n}. In consequence, we have the bijective map (g f 1 ) : {1, . . . , m}
{1, . . . , n}, such that Th. A.8 yields m = n. 

The next theorem provides two interesting, and sometimes useful, characterizations of
infinite sets:
Theorem A.10. Let A be a set. Then the following statements (i) (iii) are equivalent:

(i) A is infinite.
(ii) There exists M A and a bijective map f : M N.
(iii) There exists a strict subset B ( A and a bijective map g : A B.

One sometimes expresses the equivalence between (i) and (ii) by saying that a set is
infinite if, and only if, it contains a copy of the natural numbers. The property stated in
(iii) might seem strange at first, but infinite sets are, indeed, precisely those that identical
in size to some of their strict subsets (as an example think of the natural bijection n 7 2n
between all natural numbers and the even numbers).

Proof. (i) (ii): Inductively, we construct a strictly increasing sequence M1 M2


. . . of subsets Mn of A n N, and a sequence of functions fn : Mn {1, . . . , n}
satisfying
fn is bijective, (A.8a)
nN
 
m n f n M m = f m : (A.8b)
m,nN
A LOGIC AND SET THEORY 165

Since A 6= , there exists m1 A. Set M1 := {m1 } and f1 : M1 {1}, f1 (m1 ) := 1.


Then M1 A and f1 bijective are trivially clear. Now let n N and suppose M1 , . . . , Mn
and f1 , . . . , fn satisfying (A.8) have already been constructed. Since A is infinite, there
must be mn+1 A\Mn (otherwise Mn = A and the bijectivity of fn : Mn {1, . . . , n}
shows A is finite with #A = n). Set Mn+1 := Mn {mn+1 } and
(
fn (x) for x Mn ,
fn+1 : Mn+1 {1, . . . , n + 1}, fn+1 (x) := (A.9)
n + 1 for x = mn+1 .

Then the bijectivity of fn implies the bijectivity of fn+1 , and, since fn+1 Mn = fn holds
by definition of fn+1 ,  
m n + 1 fn+1 Mm = fm

holds true as well. An induction also shows Mn = {m1 , . . . , mn } and fn (mn ) = n for
each n N. We now define
[
M := Mn = {mn : n N}, f : M N, f (mn ) := fn (mn ) = n. (A.10)
nN

Clearly, M A, and f is bijective with f 1 : N M , f 1 (n) = mn .


(ii) (iii): Let E denote the even numbers. Then E ( N and h : N E,
h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According for
(ii), there exists M A and a bijective map f : M N. Define B := (A\M ) f 1 (E)
and (
x for x A \ M ,
h : A B, h(x) := 1
(A.11)
f h f (x) for x M .
Then B ( A since B does not contain the elements of M that are mapped to odd
numbers under f . Still, h is bijective, since hA\M = IdA\M and hM = f 1 h f is the
composition of the bijective maps f , h, and f 1 E : E f 1 (E).
(iii) (i): The proof is conducted by contraposition, i.e. we assume A to be finite and
proof that (iii) does not hold. If A = , then there is nothing to prove. If 6= A is finite,
then, by Def. 3.12(b), there exists n N and a bijective map f : A {1, . . . , n}. If
B ( A, then, according to Th. 3.13(a), there exists m N0 , m < n, and a bijective
map h : B {1, . . . , m}. If there were a bijective map g : A B, then h g f 1
were a bijective map from {1, . . . , n} onto {1, . . . , m} with m < n in contradiction to
Th. A.8. 

Theorem A.11 (Schroder-Bernstein). Let A, B be sets. The following statements are


equivalent (even without assuming the axiom of choice):

(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
: A B).

(ii) There exist an injective map f : A B and an injective map g : B A.


A LOGIC AND SET THEORY 166

Proof. (i) trivially implies (ii), as one can simply set f := and g := 1 . It remains
to show (ii) implies (i). We first assume that A and B are disjoint. To define , we first
construct a suitable partition of A B, where the subsets of the partition are given via
sequences defined by using f and g. The idea is to assign a unique sequence (a) to
each a A and a unique sequence (b) to each b B by alternately applying f and g to
advance the sequence to the right and by alternately applying f 1 and g 1 to advance
the sequence to the left, if possible (for a given a A, g 1 (a) might not be defined and,
for a given b B, f 1 (a) might not be defined). Thus, for a A, (a) has the form
 
. . . , f 1 g 1 (a) , g 1 (a), a, f (a), g f (a) , . . . (A.12)

More precisely, for each a A, we define (a) = (i (a))iIa recursively by

i (a) := a for i = 0, (A.13a)



i (a) := f i1 (a) for i > 0 odd, (A.13b)

i (a) := g i1 (a) for i > 0 even, (A.13c)

i (a) := g 1 i+1 (a) for i < 0 odd and i+1 (a) g(B), (A.13d)
ma := i + 1, Ia := {k Z : ma k} for i < 0 odd and i+1 (a)
/ g(B), (A.13e)

i (a) := f 1 i+1 (a) for i < 0 even and i+1 (a) f (A), (A.13f)
ma := i + 1, Ia := {k Z : ma k} for i < 0 even and i+1 (a)
/ g(B), (A.13g)

where the conditions in (A.13e) and (A.13g) include i+1 (a) to be defined for i + 1. By
induction, one shows i1 (a) A for each i > 0 odd, i1 (a) B for each i > 0 even,
i+1 (a) A for each ma i < 0 odd, and i+1 (a) B for each ma i < 0 even, such
that i (a) is well-defined by (A.13) for each i Ia (with Ia = Z if (A.13e) and (A.13g)
are never satisfied). Analogously, for each b B, we define (b) = (i (b))iIb recursively
by

i (b) := b for i = 0, (A.14a)



i (b) := g i1 (b) for i > 0 odd, (A.14b)

i (b) := f i1 (b) for i > 0 even, (A.14c)
1

i (b) := f i+1 (b) for i < 0 odd and i+1 (b) f (A), (A.14d)
mb := i + 1, Ib := {k Z : mb k} for i < 0 odd and i+1 (b)
/ f (A), (A.14e)

i (b) := g 1 i+1 (b) for i < 0 even and i+1 (b) g(B), (A.14f)
mb := i + 1, Ib := {k Z : mb k} for i < 0 even and i+1 (b)
/ f (A), (A.14g)

where the conditions in (A.14e) and (A.14g) include i+1 (b) to be defined for i + 1. By
induction, one shows i1 (b) B for each i > 0 odd, i1 (b) A for each i > 0 even,
i+1 (b) B for each mb i < 0 odd, and i+1 (b) A for each mb i < 0 even, such
that i (b) is well-defined by (A.14) for each i Ib (with Ib = Z if (A.14e) and (A.14g)
are never satisfied). The (a) and (b) now allow us to define the sets

Sx := {i (x) : i Ix } A B. (A.15)
B
xA
A LOGIC AND SET THEORY 167

Moreover, we call x A B an A-stopper if, and only if, (x) terminates to the left
with some element in A; a B-stopper, if, and only if, (x) terminates to the left with
some element in B; and a non-stopper, if (x) does never terminate to the left thus,
 
x A-stopper Ix 6= Z (x A mx even) (x B mx odd) ,
 
x B-stopper Ix 6= Z (x A mx odd) (x B mx even) ,
x non-stopper Ix = Z. (A.16)
Next, we prove that the Sx form a partition of A B. Since, for each x A B,
x = 0 (x) Sx , it only remains to show
 
Sx = Sy Sx Sy = . (A.17)
B
x,yA

To prove (A.17), it clearly suffices to show


 
z Sx Sx = Sz . (A.18)
B
x,zA

To verify (A.17), let z Sx . Then there exists i Ix such that z = 0 (z) = i (x) and
a simple inductions show k (z) = k+i (x) for each k Iz and ki (z) = k (x) for each
k Ix (in particular, i + Iz = Ix ), proving Sx = Sz .
We are now in a position to define the desired bijection : A B:
(
f (a) if a is an A-stopper or a non-stopper,
: A B, (a) := (A.19)
g 1 (a) if a is a B-stopper.
Indeed, is injective: If a1 , a2 {a A : a A-stopper or non-stopper} with a1 6= a2 ,
then (a1 ) 6= (a2 ) due to f being injective; if a1 , a2 {a A : a B-stopper} with
a1 6= a2 , then (a1 ) 6= (a2 ) due to g 1 being injective; and a1 , a2 A with a2 a B-
stopper and a1 not a B-stopper, Sa1 = Sf (a1 ) and Sa2 = Sg1 (a2 ) , i.e. (a2 ) is also a
B-stopper, whereas (a1 ) is not a B-stopper, in particular, (a1 ) 6= (a2 ). Moreover,
is also surjective: If b B is a B-stopper, then, due to Sb = Sg(b) , so is g(b), and
b = g 1 (g(b)) = (g(b)); if b B is not a B-stopper, then f 1 (b) is defined and in Sb ,
i.e. f 1 (b) is not a B-stopper, either, and b = f (f 1 (b)) = (f 1 (b)).
To conclude, the proof, we consider the case that A and B are not necessarily disjoint.
Since A {0} and B {1} are always disjoint with
f : A {0} B {1}, f(a, 0) := (f (a), 1), (A.20a)
g : B {1} A {0}, g(b, 0) := (g(b), 0), (A.20b)
still being injective if f, g are, the first part of the proof yields a bijective function
: A {0} B {1}. Then, using the clearly bijective functions
: A A {0}, (a) := (a, 0), (A.21a)
: B B {1}, (b) := (b, 1), (A.21b)

:= 1 : A B is also bijective. 
A LOGIC AND SET THEORY 168

Remark A.12. The proof of the Schroder-Bernstein Th. A.11 is nonconstructive, since,
in general, one has no algorithm to determine if a given element is an A-stopper, a B-
stopper, or a non-stopper. However, as the following Ex. A.13 shows, in particular
situations, determining A-stoppers, B-stoppers, and non-stoppers does not have to be
difficult.

Example A.13. Let A := N0 , B := {n N0 : n even}. We consider A and B as being


made disjoint (for example, by using the trick employed in the last part of the proof
of Th. A.11 above), but, for the sake of readability, we will not reflect this in the used
notation. Define the maps

f : A B, f (n) := 4n, (A.22a)


g : B A, g(n) := n, (A.22b)

both being clearly injective, but not surjective. The goal is to, explicitly, find the
bijective map : A B, given by (A.19). As an intermediate step, we determine
which elements of A are non-stoppers, A-stoppers, and B-stoppers, and likewise for the
elements of B. Clearly 0 A and 0 B are non-stoppers. We will see that all other
elements are either A-stoppers or B-stoppers. The precise claim is

A1 := {a A : a is A-stopper} = {a A : a = n 4k , n odd, k N0 }, (A.23a)


A2 := {a A : a is B-stopper} = A \ (A1 {0}), (A.23b)
B1 := {b B : b is A-stopper} = B \ (B2 {0}), (A.23c)
B2 := {b B : b is B-stopper} = {b B : b = n 2k ; n, k odd; n, k 1}. (A.23d)

To prove (A.23), denote the sets on the right-hand side of (A.23) by C1 , C2 , D1 , D2 ,


respectively. If c = n 4k C1 , then (f 1 g 1 )k (c) = n is odd, i.e. n
/ g(B), showing c
is an A-stopper, proving C1 A1 . If d = n 2k D2 , then k 1 = 2m with m N0 , i.e.
d = n 2 4m and (g 1 f 1 )m (d) = 2n is not divisable by 4, i.e. 2n / f (A), showing d is
a B-stopper, proving D2 B2 . Clearly, each a N either has the form a = n 4k with n
odd and k N0 (i.e. a C1 ) or a = 2 n4k with n odd and k N0 , i.e.

C2 = {a A : a = 2 n(2 2)k ; n odd; k N0 }


= {a A : a = n 2k ; n, k odd; n, k 1} = g(D2 ). (A.24)

Since D2 B2 , all elements of D2 are B-stoppers, and, thus, so are all elements of C2 ,
proving C2 A2 . Since A = C1 C2 {0},
we then also obtain A1 = C1 and A2 = C2 .
Clearly, each even b N either has the form b = n 2k with odd n, k 1 (i.e. b D2 ) or
b = n4k with n odd and k N, i.e.

D1 = {b B : b = n 4k , n odd, k N} = f (C1 ). (A.25)

Since C1 = A1 , all elements of C1 are A-stoppers, and, thus, so are all elements of D1 .
Since B = D1 D2 {0},
we then also obtain B1 = D1 and B2 = D2 .
B CONSTRUCTION OF THE REAL NUMBERS 169

Now that we have identified explicit formulas for A1 and A2 , we can write the assignment
rule for the bijective : A B, given by (A.19), in the explicit form

0 if a = 0,

(a) := 4a if a = n 4k with n odd and k N0 , (A.26)

k
a if a = 2 n4 with n odd and k N0 .
Thus, starts out with the assignments
0 1 2 3 4 5 6 7 8
: ... (A.27)
0 4 2 12 16 20 6 28 8
Theorem A.14. Let A, B be nonempty sets. Using the axiom of choice (AC) of Def.
A.3, the following statements are equivalent:

(i) There exists an injective map f : A B.


(ii) There exists a surjective map g : B A.

Proof. According to Th. 2.12(b), (i) is equivalent to f having a left inverse g : B A


(i.e. g f = IdA ), which is equivalent to g having a right inverse, which, according to
Th. 2.12(b), is equivalent to (ii) (AC is used in the proof of Th. 2.12(b)). 
Corollary A.15. Let A, B be nonempty sets. Using the axiom of choice (AC) of Def.
A.3, we can expand the two equivalent statements of Th. A.11 to the following list of
equivalent statements:

(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
: A B).
(ii) There exist an injective map f : A B and an injective map g : B A.
(iii) There exist a surjective map f : A B and a surjective map g : B A.
(iv) There exist an injective map f1 : A B and a surjective map f2 : A B.
(v) There exist an injective map g1 : B A and a surjective map g2 : B A.

Proof. The equivalences are an immediate consequence of combining Th. A.11 with Th.
A.14. 

B Construction of the Real Numbers


In Th. 4.5, we have defined the set of real numbers R as a complete totally ordered field
and we claimed that such a complete totally ordered field does actually exist. In the
following, we will describe how R can be constructed. We will follow [EHH+ 95, Chs.
1,2], which contains several different approaches for the construction of R.
B CONSTRUCTION OF THE REAL NUMBERS 170

B.1 Natural Numbers


In the first step, one constructs the natural numbers N or N0 , basically as we did in
Rem. 1.27 and Def. B.9. More precisely, one can proceed as follows:
Definition B.1. For each set A, define S(A) := A {A} (it is no coincidence that
the same symbol S has been used as in the Peano axioms P1 P3 of Sec. 3.1). One
calls the set S(A) the successor of the set A. A set n is called a natural number if,
and only if, it can be obtained by applying S to 0 := (once or multiple times, i.e.
n = S(. . . (S(0)) . . . )). One now employs the following axiom of axiomatic set theory,
sometimes called the axiom of infinity (as it ensures the existence of infinite sets),
 
X (S(n) X) , (B.1a)
X nX

which postulates the existence of a set X, containing 0 and all natural numbers. The
axiom does not prevent X from containing additional elements, but we can now proceed
to define
N := {n X : n is a natural number}, N0 := N {0}. (B.1b)
Notation B.2. Define 0 := , 1 := S(0) = {0}, 2 := S(1) = {0, 1}, 3 := S(2) =
{0, 1, 2}, . . . , n + 1 := S(n) = n {n} = {0, 1, . . . , n}.

One can now prove that N (or N0 if one prefers, where 0 takes over the role of 1) satisfies
the Peano axioms P1 P3 of Sec. 3.1 (see [Kun80, Th. 1.7.16]). Theorem 3.7 allows to
define addition and multiplication on N0 via recursion:
Definition B.3. (a) For each m, n N0 , m + n is defined recursively by

m + 0 := m, m + 1 := S(m), m + S(n) := S(m + n). (B.2)


nN

This fits into the framework of Th. 3.7, using A := N0 , x1 := S(m), and, for each
n N, fn : An A, fn (x1 , . . . , xn ) := S(xn ) (due to the different initializations,
one obtains a different recursion for each m N0 ).

(b) For each m, n N0 , mn := m n is defined recursively by

m 0 := 0, m 1 := m, m (n + 1) := m n + m. (B.3)
nN

This fits into the framework of Th. 3.7, using A := N0 , x1 := m, and, for each
m, n N, fm,n : An A, fm,n (x1 , . . . , xn ) := xn + m.
Theorem B.4. The set N0 of the natural numbers (including 0) with the maps of
addition and multiplication

+ : N0 N0 N0 , (x, y) 7 x + y,
: N0 N0 N0 , (x, y) 7 x y,
B CONSTRUCTION OF THE REAL NUMBERS 171

as defined in Def. B.3(a) and Def. B.3(b), respectively, satisfies Def. 4.3(i),(ii),(iv) for
both addition and multiplication, i.e. associativity, commutativity, and the existence of a
neutral element. This can be summarized as the statement that N0 forms a commutative
semigroup with respect to both addition and multiplication (however, no group, as the
existence of inverse elements is lacking). Moreover, distributivity, i.e. Def. 4.4(iii) is
also satisfied.

Proof. Detailed proofs can be found in [Lan65, Ch. 1, 2] and [Lan65, Ch. 1, 4]. As
examples, let us proof the associativity and commutativity of addition, i.e.

(k + m) + n = k + (m + n), (B.4a)
k,m,nN0

m + n = n + m. (B.4b)
m,nN0

The proof of (B.4a) is carried out by induction on n. The base case (n = 0) follows
from the first definition in (B.2): (k + m) + 0 = k + m = k + (m + 0) for every k, m N0 .
For the induction step, one computes, for every k, m, n N0 ,
(B.2) (B.2) ind. hyp.
(k + m) + (n + 1) = (k + m) + S(n) = S((k + m) + n) = S((k + (m + n))
(B.2) (B.2)
= k + S(m + n) = k + (m + S(n))
(B.2)
= k + (m + (n + 1)), (B.5)

completing the induction.


The proof of (B.4b) is also carried out by induction on n. More precisely, we prove
n = 0 separately, and then carry out the induction for n N. The case n = 0 is proved
by induction on m: The base case (m = 0) is the true statement 0 + 0 = 0 = 0 + 0. For
the induction step, one computes (m + 1) + 0 = m + 1 = S(m) = S(m + 0) = S(0 + m) =
0+S(m) = 0+(m+1). The base case for the induction on n, i.e. n = 1 is also proved by
induction on m: The base case (m = 0) is the true statement 0 + 1 = S(0) = 1 = 1 + 0.
For the induction step, one computes, for every m N0 ,
(B.2) ind. hyp. (B.2)
(m + 1) + 1 = S(m + 1) = S(1 + m) = (1 + m) + 1
(B.4a)
= 1 + (m + 1). (B.6a)

Now, for the induction step of the induction on n, one computes, for every (m, n)
N0 N,
(B.2) (B.2) ind. hyp. (B.2)
m + (n + 1) = m + S(n) = S(m + n) = S(n + m) = n + S(m)
(B.2) base case (B.4a)
= n + (m + 1) = n + (1 + m) = (n + 1) + m, (B.6b)

completing the induction. 

Next, one defines an order on N0 :


B CONSTRUCTION OF THE REAL NUMBERS 172

Definition B.5. For each n, m N0 , let

nm : n + k = m. (B.7)
kN0

Theorem B.6. The relation defined in (B.7) constitutes a total order on N0 that is
compatible with addition and multiplication, i.e. it satisfies Def. 4.4(iv).

Proof. The proofs are carried out in [Lan65, Ch. 1, 3]. 

B.2 Interlude: Orders on Groups


In the succeeding sections, we will construct the set of integers Z, the set of rational
numbers Q, and the set of real numbers R. In each case, we will use the same method
to define a total order on the constructed set, making use of the algebraic structure of
its additive group. It is therefore economical as well as mathematically interesting, to
study this construction once in its abstract form, which is the purpose of the present
section.
Recall the definition of a group from Def. 4.3.
Theorem B.7. Let (G, +) constitute a group (where G plays the role of A in Def. 4.3
and + plays the role of in Def. 4.3), and assume we have a disjoint decomposition

G = P {0}
(P ), P := {x G : x P }, (B.8)

where x denotes the inverse of x with respect to +, then, given that P is closed under
+ (i.e. x, y P implies x + y P ),

yx : x y P {0} (B.9)

defines a total order on G that is compatible with addition, i.e. it satisfies (4.6a). More-
over, if a multiplication is also defined on G and P {0} is closed under this multipli-
cation, then is also compatible with multiplication, i.e. it satisfies (4.6b). Of course,
one refers to the elements of P as positive and to the elements of P as negative.

Proof. For each x G, one has xx = 0 P {0}, i.e. x x and the relation is reflexive.
If x, y G, x y and y x, then x y P {0} and (x y) = y x P {0},
and the disjointness of the union in (B.8) implies x y = 0, i.e. x = y, showing the
relation is antisymmetric. If x, y, z G with x y and y z, then y x P {0},
z y P {0}, and z x = z y +y x P {0} since P is closed under +, showing the
relation is transitive. So we have shown constitutes a partial order on G. It remains
to show the order is total. However, given the decomposition in (B.8), for each x, y G,
precisely one of the statements x y P (i.e. y < x), x y = 0 (i.e. x = y), x y P
(i.e. x < y) must be true, proving that the order is total. To see satisfies (4.6a), let
x, y, z G. If x y, then y x P {0}, i.e. y + z (x + z) = y + z z x P {0},
showing x+z y +z. The proof is completed by noting (4.6b) is precisely the statement
that P {0} is closed under multiplication. 
B CONSTRUCTION OF THE REAL NUMBERS 173

B.3 Integers
As compared to our goal, the set of real numbers R, the set N0 still has three defi-
ciencies, namely the lack of inverse elements for addition, the lack of inverse elements
for multiplication, and that the order lacks completeness. The construction of the
integers will remedy (only) the first of the three deficiencies by providing the inverse
elements of addition.
Definition and Remark B.8. The relation on N0 N0 defined by
(a, b) (c, d) : a + d = b + c, (B.10)
constitutes an equivalence relation on N0 N0 (cf. Def. 2.20).
Definition B.9. (a) Define the set of integers Z as the set of equivalence classes of the
equivalence relation defined in (B.10), i.e.

Z := (N0 N0 )/ = [(a, b)] : (a, b) N0 N0 (B.11)
is the quotient set of N0 N0 with respect to (cf. Ex. 2.21(c)). To simplify
notation, in the following, we will write
[a, b] := [(a, b)] (B.12)
for the equivalence class of (a, b) with respect to .
(b) Addition on Z is defined by

+ : Z Z Z, [a, b], [c, d] 7 [a, b] + [c, d] := [a + c, b + d]. (B.13)
Subtraction on Z is defined by

: Z Z Z, [a, b], [c, d] 7 [a, b] [c, d] := [a, b] + [d, c]. (B.14)

For the definitions in Def. B.9(b) to make sense, one needs to check that they do not
depend on the chosen representatives of the equivalence classes. Moreover, one needs to
convince oneself that these definitions yield the desired familiar operations of addition
and subtraction. Let us start by verifying the independence of the representatives is the
following Lem. B.10.
Lemma B.10. The definitions in Def. B.9(b) do not depend on the chosen representa-
tives, i.e.
 
[a + c, b + d] = [a + c, b + d]
[a, b] = [a, b] [c, d] = [c, d] (B.15)
0
a,b,c,d,a,b,c,dN

and
 

[a, b] = [a, b] [c, d] = [c, d] . (B.16)
[a, b] [c, d] = [a, b] [c, d]
0
a,b,c,d,a,b,c,dN
B CONSTRUCTION OF THE REAL NUMBERS 174

means c + d = d + c,
Proof. (B.15): [a, b] = [a, b] means a + b = b + a, [c, d] = [c, d]
implying a + c + b + d = b + a + d + c, i.e. [a + c, b + d] = [a + c, b + d].

(B.16) is just (B.14) combined with (B.15). 
Theorem B.11. The set of integers Z forms a commutative group with respect to ad-
dition as defined in Def. B.9(b), where [0, 0] is the neutral element, [b, a] is the inverse
element of [a, b] for each a, b N0 , and, denoting the inverse element of [a, b] by [a, b]
in the usual way, [a, b] [c, d] = [a, b] + ([c, d]) for each a, b, c, d N0 .

Proof. One easily verifies that associativity and commutativity of the addition on N0
imply the respective laws on Z. For every a, b N0 , one obtains [a, b]+[0, 0] = [a+0, b+
0] = [a, b], proving neutrality of [0, 0], whereas [a, b]+[b, a] = [a+b, b+a] = [a+b, a+b] =
[0, 0] (since (a+b, a+b) (0, 0)) shows [b, a] = [a, b]. Now [a, b][c, d] = [a, b]+([c, d])
is immediate from (B.14). 
Remark B.12. The map

: N0 Z, (n) := [n, 0], (B.17)

is a monomorphism, i.e. it is injective (since (m) = [m, 0] = (n) = [n, 0] implies


m + 0 = 0 + n, i.e. m = n) and satisfies

(m + n) = [m + n, 0] = [m, 0] + [n, 0] = (m) + (n). (B.18)


m,nN0

It is customary to identify N0 with (N0 ), as it usually does not cause any confusion.
One then just writes n instead of [n, 0] and n instead of [0, n] = [n, 0].
Lemma B.13. We have the disjoint decomposition

Z = N {0} Z , Z := N = {n Z : n N}. (B.19)

Proof. Note that, due to (B.10), an equivalence class remains the same if a natural
number is added or subtracted in both components: [a, b] = [a + m, b + m]. Thus, for
each x = [a, b] Z, if a > b, then x = [a b, 0] N; if a = b, then x = [0, 0] = 0; if
a < b, then x = [0, b a] = [b a, 0] Z . It just remains to verify that the union in
(B.19) is disjoint. However, if [n, 0] = [0, m] with m, n N0 , then n + m = 0, proving
n = m = 0, completing the proof. 
Remark B.14. In the above construction, we obtained the commutative group (Z, +)
from the commutative semigroup (N0 , +). It is worth pointing out that the same con-
struction always works when, instead of with N0 , one starts with any commutative
semigroup (H, +) that satisfies the cancellation law a + c = b + c a = b, to obtain a
commutative group (G, +) and a monomorphism : H G.

To obtain the expected laws of arithmetic, multiplication on Z needs to be defined such


that (a b) (c d) = (ac + bd) (ad + bc), which leads to the following definition.
B CONSTRUCTION OF THE REAL NUMBERS 175

Definition B.15. Multiplication on Z is defined by



: Z Z Z, [a, b], [c, d] 7 [a, b] [c, d] := [ac + bd, ad + bc]. (B.20)

Lemma B.16. The definition in Def. B.15 does not depend on the chosen representa-
tives, i.e.
 
[ac + bd, ad + bc] = [ac + bd,
[a, b] = [a, b] [c, d] = [c, d] ad+ bc] .
0
a,b,c,d,a,b,c,dN
(B.21)

Proof. As mentioned before, due to (B.10), an equivalence class remains the same if a
natural number is added or subtracted in both components. Thus, one computes
(B.10)
[ac + bd, ad + bc] = [ac + bd + bc, ad + bc + bc] = [(a + b)c + bd, ad + bc + bc]
(B.10)
= [(a + b)c + bd, ad + bc + bc] = [ad + ac + bd, ad + ad + bc]
= [a(d + c) + bd, ad + ad + bc] = [a(d + c) + bd, ad + ad + bc]
= [ac + (a + b)d, ad + ad + bc] = [ac + (a + b)d, ad + ad + bc]
(B.10)
= [ac + bd + bc, ad + bc + bc] = [ac + b(d + c), ad + bc + bc]
= [ac + b(d + c), ad + bc + bc] = [ac + bd,
ad + bc], (B.22)

completing the proof. 


Theorem B.17. The set of integers Z is associative and commutative with respect to
the multiplication defined in Def. B.15. Moreover, distributivity, i.e. Def. 4.4(iii) is
satisfied, [1, 0] is the neutral element of multiplication, and there are no zero divisors,
i.e.
 
[a, b] [c, d] = [ac + bd, ad + bc] = [0, 0] [a, b] = [0, 0] [c, d] = [0, 0] .
a,b,c,dN0
(B.23)
Algebraically, the theorem can be summarized by saying that (Z, +, ) constitutes a prin-
cipal ideal domain.

Proof. Associativity and commutativity of multiplication as well as distributivity are


easily verified, while [a, b] [1, 0] = [a 1 + b 0, a 0 + b 1] = [a, b] proves neutrality of
[1, 0]. It remains to prove (B.23). Note that, due to (B.10), the conclusion is equivalent
to a = b or c = d. We assume 0 a < b and have to prove c = d. According to Def.
B.6, a < b means b = a + k for some k N. Thus, [ac + bd, ad + bc] = [0, 0] implies
k>0
ac + (a + k)d = ac + bd = ad + bc = ad + (a + k)c kd = kc c = d, (B.24)

establishing the case. 


Definition B.18. For each k, l Z, let

lk : k l N0 . (B.25)
B CONSTRUCTION OF THE REAL NUMBERS 176

Theorem B.19. (a) The relation defined in (B.25) constitutes a total order on Z that
is compatible with addition and multiplication, i.e. it satisfies Def. 4.4(iv).

(b) The map from (B.17) is strictly increasing.

Proof. (a) follows from (B.25), (B.19), and Th. B.7 since N0 is closed under addition
and multiplication.
(b): According to Def. B.6, if m, n N with n < m, then m = n + k for some k N.
In consequence (m) = (n) + (k) by (B.18), i.e. (m) (n) = (k) N, proving
(n) < (m). 

B.4 Rational Numbers


The remaining two deficiencies of the set of integers Z (as compared with R) are the
lack of inverse elements for multiplication and that the order lacks completeness.
We proceed to the construction of the rational numbers, which will provide the inverse
elements for multiplication. The completion of the order will then be achieved in the
last step in the next section.

Definition and Remark B.20. The relation on Z (Z \ {0}) defined by

(a, b) (c, d) : ad = bc, (B.26)

constitutes an equivalence relation on Z (Z \ {0}) (cf. Def. 2.20).

Definition B.21. (a) Define the set of rational numbers Q as the set of equivalence
classes of the equivalence relation defined in (B.26), i.e.
 
Q := Z (Z \ {0}) / = [(a, b)] : (a, b) Z (Z \ {0}) (B.27)

is the quotient set of Z(Z\{0}) with respect to (cf. Ex. 2.21(c)). As is common,
we will write
a
:= a/b := [(a, b)] (B.28)
b
for the equivalence class of (a, b) with respect to .

(b) Addition on Q is defined by


a c  a c ad + bc
+ : Q Q Q, , 7 + := . (B.29)
b d b d bd
Multiplication on Q is defined by
a c  a c ac
: Q Q Q, , 7 := . (B.30)
b d b d bd

B CONSTRUCTION OF THE REAL NUMBERS 177

For the definitions in Def. B.21(b) to make sense, one needs to check that they do not
depend on the chosen representatives of the equivalence classes, and that the results of
both addition and multiplication are always elements of Q. All this is provided by the
following lemma.
Lemma B.22. The definitions in Def. B.21(b) do not depend on the chosen represen-
tatives, i.e.
!
a a c c ad + bc ad + bc
= = = (B.31)
a,c,a,cZ
b,d,b,dZ\{0} b b d d bd bd
and  
a a c c ac ac
= = = . (B.32)
a,c,a,cZ
b,d,b,dZ\{0} b b d d bd bd
Furthermore, the results of both addition and multiplication are always elements of Q.

Proof. (B.31): a/b = a/b means ab = ab, c/d = c/d means cd = cd, implying

ad + bc ad + bc
(ad + bc)bd = bd(ad + bc), i.e. = (B.33)
bd bd
and
ac ac
acbd = bdac, = .
i.e. (B.34)
bd bd
That the results of both addition and multiplication are always elements of Q follows
from (B.23), i.e. from the fact that Z has no zero divisors. In particular, if b, d 6= 0, then
bd 6= 0, showing (ad + bc)/(bd) Q and (ac)/(bd) Q. 
Theorem B.23. (a) The set of rational numbers Q with addition and multiplication as
defined in Def. B.21 forms a field, where 0/1 and 1/1 are the neutral elements with
respect to addition and multiplication, respectively, (a/b) is the additive inverse
to a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0.
(b) Defining subtraction and division in the usual way, for each r, s Q, by s r :=
s + (r) and s/r := sr1 , respectively, with r denoting the additive inverse of r
and r1 denoting the multiplicative inverse of r 6= 0, all the rules stated in Th. 4.6
are valid in Q.
(c) The map
k
: Z Q, (k) := , (B.35)
1
is a monomorphism, i.e. it is injective and satisfies
(k + l) = (k) + (l), (B.36a)
k,lZ

(kl) = (k) (l). (B.36b)


k,lZ

It is customary to identify Z with (Z), as it usually does not cause any confusion.
One then just writes k instead of k1 .
B CONSTRUCTION OF THE REAL NUMBERS 178

Proof. A detailed proof of (a) is provided in [Lan65, Ch. 2, 34]. Let us check the
claims regarding neutral and inverse elements:
a 0 a1+b0 a
+ = = , (B.37a)
b 1 b1 b
a a ab + b(a) Def. 4.4(iii) for Z (a a)b 0 (B.26) 0
+ = = = = , (B.37b)
b b b2 b2 b2 1
a 1 a1 a
= = , (B.37c)
b 1 b1 b
a b ab (B.26) 1
= = . (B.37d)
b a ba 1

(b) is a consequence of (a), since Th. 4.6 and its proof are valid in every field.
(c): The map is injective, as (k) = k/1 = (l) = l/1 implies k 1 = l 1, i.e. k = l.
Moreover,
k l k1+1l k+l
(k) + (l) = + = = = (k + l), (B.38a)
1 1 1 1
k l kl
(k) (l) = = = (kl), (B.38b)
1 1 1
completing the proof. 

Definition and Remark B.24. Define


 
+ a
Q := r Q : r= . (B.39)
a,bN b

We then have the decomposition

Q = Q+ {0}
Q , Q := Q+ = {r Q : r Q+ }, (B.40)

since

a/b Q+ (a > 0 b > 0) (a < 0 b < 0) , (B.41a)
a/b = 0 a = 0, (B.41b)

a/b Q (a > 0 b < 0) (a < 0 b > 0) . (B.41c)

Definition B.25. For each r, s Q, let

sr : r s Q+ +
0 := Q {0}. (B.42)

Theorem B.26. (a) The relation defined in (B.42) constitutes a total order on Q that
is compatible with addition and multiplication, i.e. it satisfies Def. 4.4(iv); in other
words (Q, +, , ) constitutes a totally ordered field.

(b) All the rules stated in Th. 4.7 are valid in Q.


B CONSTRUCTION OF THE REAL NUMBERS 179

(c) The map from (B.35) is strictly increasing.

Proof. (a) follows from (B.42), (B.40), and Th. B.7, since it is immediate from (B.29)
and (B.30) that Q+ is closed under addition and multiplication.
(b) is a consequence of (a), since Th. 4.7 and its proof are valid in every totally ordered
field.
(c): According to Def. B.26, if k, l Z with l < k, then n := k l N. In consequence
(k) = (l) + (n) by (B.36a), i.e. (k) (l) = (n) = n/1 Q+ , proving (l) < (k). 

B.5 Real Numbers


In the previous section, the construction of the rational numbers Q yielded a totally
ordered field. However, the order on Q is not complete for example, Rem. and Def.
7.62 shows that the set M := {r Q : r2 < 2}, which is bounded from above (for
example by 2), has no supremum in Q (otherwise, we had a rational number q = sup M
with q 2 = 2). Finally, in the present section, we will start out from Q to construct
the set of real numbers R such that it becomes a complete totally ordered field. There
are several different important constructions to obtain R from Q. We will describe
the construction that defines real numbers as equivalence classes of rational Cauchy
sequences (see [EHH+ 95, Ch. 2.3]). The construction using so-called Dedekind cuts
can be found in [EHH+ 95, Ch. 2.2], the construction via nested intervals in [EHH+ 95,
Ch. 2.4].

Definition B.27. (a) Let S denote the set of all Cauchy sequences in Q, where we call
a sequence (rn )nN in Q a Cauchy sequence if, and only if,

|rn rm | < , (B.43)


Q+ N N n,m>N

which defers from (7.25) in that has to be from Q+ rather than from R+ .

(b) Addition on S is defined by

+ : S S S, ((rn )nN , (sn )nN ) 7 (rn )nN + (sn )nN := (rn + sn )nN . (B.44)

Multiplication on S is defined by

: S S S, ((rn )nN , (sn )nN ) 7 (rn )nN (sn )nN := (rn sn )nN . (B.45)

As a consequence of the following Lem. B.28, addition and multiplication are well-
defined on S.

Lemma B.28. If (rn )nN and (sn )nN are Cauchy sequences in Q, so are (rn + sn )nN
and (rn sn )nN .
B CONSTRUCTION OF THE REAL NUMBERS 180

Proof. The proofs are analogous to the proofs of Th. 7.13(7.11b),(7.11c):


Given Q+ , there exists N N such that, for each n, m > N , |rn rm | < /2 and
|sn sm | < /2, implying
|rn + sn (rm + sm )| |rn rm | + |sn sm | < /2 + /2 = , (B.46)
n,m>N

proving (rn + sn )nN is Cauchy.


The proof of Th. 7.29 shows both (rn )nN and (sn )nN are bounded, i.e. there exists M
Q+ that is an upper bound for the sets {|rn | : n N} and {|sn | : n N}. Moreover,
given Q+ , there exists N N such that, for each n, m > N , |rn rm | < /(2M ) and
|sn sm | < /(2M ), implying

|rn sn rm sm | = (rn rm )sn + rm (sn sm )

M M ,
n,m>N
|sn | |rn rm | + |rm | |sn sm | < + =
2M 2M
(B.47)
completing the proof of the lemma. 
Theorem B.29. (S, +) is a group and, in addition, S is associative and commutative
with respect to multiplication. Moreover, distributivity also holds in S. In algebraic
terms, this can be summarized as the statement that (S, +, ) constitutes a commutative
ring.

Proof. Note that, since the rational sequence (rn )nN is nothing but the function f :
N Q, f (n) = rn , addition and multiplication as defined in Def. B.27(b) is analogous
to the definition of addition and multiplication of real-valued functions in (6.1a), (6.1c),
respectively. It is an easy exercise to verify that these function operations always inherit
associativity, commutativity, and distributivity if these rules hold for the operations
defined on the function range (i.e. for + and on Q in our present situation of rational
sequences). The constant sequence (0, 0, . . . ) is the neutral element of addition on S
and (rn )nN = (rn )nN is the additive inverse of (rn )nN . 

The reason that we need another step in our construction of R is the fact that S is not
a field: As soon as 0 occurs, even just once, in the sequence (rn )nN S, the sequence
does not have a multiplicative inverse (where the neutral element of multiplication is
obviously the constant sequence (1, 1, . . . )). The solution to this problem consists of
factoring out all sequences converging to 0.
Definition and Remark B.30. Let
n o
N := (rn )nN S : lim rn = 0 . (B.48)
n

be the set of rational sequences converging to zero. The relation on S defined by


(rn )nN (sn )nN (rn )nN (sn )nN N , (B.49)
constitutes an equivalence relation on S (cf. Def. 2.20).
B CONSTRUCTION OF THE REAL NUMBERS 181

Definition B.31. (a) Define the set of real numbers R as the set of equivalence classes
of the equivalence relation defined in (B.49), i.e.
R := S/ = {[(rn )nN ] : (rn )nN S} (B.50)
is the quotient set of S with respect to (cf. Ex. 2.21(c)).
(b) Addition on R is defined by

+ : R R R, [f ], [g] 7 [f ] + [g] := [f + g]. (B.51)
Multiplication on R is defined by

: R R R, [f ], [g] 7 [f ] [g] := [f g]. (B.52)

Once again, for the definitions in Def. B.31(b) to make sense, one needs to check that
they do not depend on the chosen representatives of the equivalence classes, and once
again, we provide a lemma providing this check:
Lemma B.32. The definitions in Def. B.31(b) do not depend on the chosen represen-
tatives, i.e.

f f N g g N f + g (f + g) N (B.53)
f,g,f,g

and 
f f N g g N f g (fg) N . (B.54)
f,g,f,g

Proof. Let f = (rn )nN , g = (sn )nN , f = (rn )nN , g = (sn )nN be elements of S such
that f f N and g g N , i.e. limn (rn rn ) = limn (sn sn ) = 0.

Then (7.11b) implies 0 = limn rn + sn (rn + sn ) , proving (B.53).
To prove (B.54), one computes
 
lim rn sn rn sn = lim rn (sn sn ) sn (rn rn ) = 0, (B.55)
n n

where the last equality follows from the boundedness of (rn )nN and (sn )nN together
with Prop. 7.11(b). 

We will also use the following auxiliary result:


Proposition B.33. If (rn )nN S, then precisely one of the following statements is
correct:
(rn )nN N , (B.56a)
#{n N : rn } N0 , (B.56b)
Q+

#{n N : rn } N0 . (B.56c)
Q+
B CONSTRUCTION OF THE REAL NUMBERS 182

Proof. Let us first verify that the three statements in (B.56) are mutually exclusive. If
(B.56a) holds, then, for every Q+ , < rn < holds for almost all (in particular,
for infinitely many) n N, i.e. (B.56b) and (B.56c) are both false. If (B.56b) holds,
then (B.56a) must be false as we have just seen. Moreover, if rn holds for at most
finitely many n N, then rn > > 0 must hold for infinitely many n N, i.e. (B.56c)
is false.
Now suppose (B.56a) and (B.56b) are false. We have to show that (B.56c) is true. Since
(B.56a) is false, there exists > 0 and an increasing sequence of indices (nk )kN with
|rnk | > for each k N. Since (B.56b) is false, there is an increasing sequence of indices
(mk )kN with rmk < 1/k. Thus, since (rn )nN is a Cauchy sequence, only finitely many
rnk > and infinitely many rnk < . Now, if N N is such that |rn rm | < /2 for
all n, m > N and k0 N such that nk0 > N , then rn < /2 for each n > N (since
|rn rnk0 | < /2). Thus, (B.56c) holds with := /2. 
Theorem B.34. (a) The set of real numbers R with addition and multiplication as
defined in Def. B.31 forms a field, where [(0, 0, . . . )] and [(1, 1, . . . )] are the neutral
elements with respect to addition and multiplication, respectively.
(b) The map  
: Q R, (r) := (r, r, . . . ) , (B.57)
is a monomorphism, i.e. it is injective and satisfies
(r + s) = (r) + (s), (B.58a)
r,sQ

(rs) = (r) (s). (B.58b)


r,sQ

It is customary to identify Q with(Q), as it usually does not cause any confusion.
One then just writes r instead of (r, r, . . . ) .

Proof. (a): Clearly, Def. B.31(b) ensures the laws of associativity and commutativity
of addition and multiplication valid in S are preserved in R, and, likewise, the law of
distributivity. It is also immediate from (B.51) and (B.52), respectively, that [(0, 0, . . . )]
and [(1, 1, . . . )] are the respective neutral elements of addition and multiplication. More-
over, if f is the additive inverse of f S, then [f ] is the additive inverse of [f ] R.
It remains to show that each x = [(rn )nN ] 6= [(0, 0, . . . )] has a multiplicative inverse x1
in R. We claim x1 = [(sn )nN ], where
(
rn1 for rn 6= 0,
sn := (B.59)
nN 1 for rn = 0.
We need to verify [(sn )nN ] R, i.e. (sn )nN is a Cauchy sequence. We know (rn )nN is
a Cauchy sequence that does not converge to 0. Thus, according to Prop. B.33, there
exists > 0 and M N such that, for each n > M , we have |rn | > (in particular,
rn 6= 0). Let > 0. As (rn )nN is a Cauchy sequence, there exists N N such that
N M and, for each n, m > N , |rn rm | < 2 . Thus,

1 1 rn rm 2
|sn sm | = = < = , (B.60)
n,m>N rn rm rn rm 2
B CONSTRUCTION OF THE REAL NUMBERS 183

proving (sn )nN is a Cauchy sequence. Moreover,


       
(rn )nN (sn )nN = (rn sn )nN = (1, 1, . . . , ) , (B.61)
since rn sn = 1 for almost all n N, and the proof of (a) is complete.
(b): The map is injective, since (r) = [(r, r, . . . )] = (s) = [(s, s, . . . )] implies
limn (r s) = 0, i.e. r = s. Moreover,
     
(r) + (s) = (r, r, . . . ) + (s, s, . . . ) = (r + s, r + s, . . . ) = (r + s), (B.62a)
     
(r) (s) = (r, r, . . . ) (s, s, . . . ) = (rs, rs, . . . ) = (rs), (B.62b)
completing the proof. 
Definition B.35. We define R+ to consist of all real numbers represented by sequences
(rn )nN such that there exists Q+ satisfying rn > for almost all n N, i.e.
 
+
 
R := (rn )nN R : + #{n N : rn } N0 . (B.63)
Q

Proposition B.36. (a) The definition in (B.63) does not depend on the chosen repre-
sentatives (rn )nN .
(b) We have the decomposition
R = R+ {0}
R , R := R+ = {x R : x R+ }. (B.64)

Proof. (a): If (sn )nN S with limn (rn sn ) = 0, then |rn sn | < /2 for almost all
n N. Thus, since |sn | |rn | |rn sn |, we obtain sn > /2 for almost all n N, i.e.
#{n N : sn 2 } N0 .
(b) is an immediate consequence of Prop. B.33. 
Definition B.37. For each x, y R, let
yx : x y R+ +
0 := R {0}. (B.65)
Theorem B.38. (a) The relation defined in (B.65) constitutes a total order on R that
is compatible with addition and multiplication, i.e. it satisfies Def. 4.4(iv); in other
words (R, +, , ) constitutes a totally ordered field.
(b) The map from (B.57) is strictly increasing.

Proof. (a) follows from (B.65), (B.64), and Th. B.7, once we have shown that R+ is
closed under addition and multiplication. Let (rn )nN S, (sn )nN S. If rn > 1 Q+
for almost all n N and sn > 2 Q+ for almost all n N, then rn + sn > 1 + 2 ,
showing R+ is closed under addition. Moreover, rn sn > 1 2 , showing R+ is closed under
multiplication.
(b): According to Def. B.38, if r, s Q with s < r, then q := r s Q+ . In
consequence (r) = (s) + (q) by (B.58a), i.e. (r) (s) = (q) = [(q, q, . . . )] R+ ,
proving (s) < (r). 
B CONSTRUCTION OF THE REAL NUMBERS 184

Finally, we will show in Th. B.40 below that the order on R is complete. However,
we first need some additional auxiliary results.
Proposition B.39. (a) For each x R, there is (rn )nN S satisfying limn rn = x.

(b) Every (rn )nN S converges in R more precisely, limn rn = [(rn )nN ].

(c) Every Cauchy sequence in R converges in R.

Proof. (a) and (b): If x = [(rn )nN ] with (rn )nN S, then, given > 0, choose N N
such that, for each m, n > N , one has |rn rm | < /2. Then, for each k > M , one has
|x rk | = |[(rn rk )nN ]| < , since |rn rk | < /2 for all n k, showing limn rn = x.
(c): Let (xn )nN be a Cauchy sequence in R. According to (a), for each n N, there
exists rn Q such that |xn rn | < n1 . Then (rn )nN is a Cauchy sequence: Given > 0,
choose k N such that k1 < 3 and |xn xm | < 3 for each n, m > k. Then

|rn rm | |rn xn | + |xn xm | + |xm rm | < + + = , (B.66)
n,m>k 3 3 3
showing (rn )nN is Cauchy. Thus, from (b), we obtain x R with limn rn = x. We
can now show, limn xn = x as well: Given > 0, choose N N such that N1 < 2 and
|x rn | < 2 for each n > N . Then

|x xn | |x rn | + |rn xn | < + = , (B.67)
n>N 2 2
showing limn xn = x and completing the proof. 
Theorem B.40. The order on R is complete, i.e. (R, +, , ) constitutes a complete
totally ordered field.

Proof. Let 6= A R and let M R be an upper bound for A. We have to show that
A has a supremum in R. To this end, we recursively construct two Cauchy sequences
(xn )nN and (yn )nN in R such that (xn )nN is increasing, (yn )nN is decreasing, xn < yn ,
and limn (yn xn ) = 0. Let x1 A be arbitrary and y1 := M . Define
(
(xn + yn )/2 if (xn + yn )/2 is not an upper bound for A,
xn+1 :=
xn otherwise,

( (B.68)

nN
(xn + yn )/2 if (xn + yn )/2 is an upper bound for A,
yn+1 :=
yn otherwise.

Then, clearly, the xn are increasing, the yn are decreasing, and xn yn holds for each
n N. Moreover, letting d := M x1 0, a simple induction shows yn xn = d/2n1
and limn (yn xn ) = 0. Also, for m > n,
m1 m1 m1 m1n
X X
i d X i+n d X i 2d
xm xn = (xi+1 xi ) d 2 = n 2 = n 2 n, (B.69)
i=n i=n
2 i=n 2 i=0 2
C SERIES: ADDITIONAL MATERIAL 185

showing (xn )nN is a Cauchy sequence. Analogous, one sees that (yn )nN is a Cauchy
sequence. By Prop. B.39(c), we obtain s R such that s = limn xn = limn (yn
xn + xn ) = limn yn . We claim s = sup A. If s < y, then there is n N with
s yn < y, showing y / A, i.e. s is an upper bound for A. If y < s, then there is n N
with y < xn s, showing y is not an upper bound for A. Thus, s is the smallest upper
bound for A, i.e. s = sup A. 

C Series: Additional Material

C.1 Riemann Rearrangement Theorem


In Th. C.2 below, we provide the striking Riemann rearrangement theorem, holding that,
for each real series being convergent, but not absolutely convergent, one can choose an
arbitrary number S R {, } and reorder the summands such that the new series
converges to S, and that, even more, one can prescribe an entire interval of cluster points
for the rearranged series.
P
Proposition C.1. Let j=1 aj be a series in R. Defining
 
+
aj := max{aj , 0}, aj := max{aj , 0} , (C.1)
jN

the following assertions (a) and (b) hold:


P P P
(a) aj is absolutely convergent if, and only if, both series
j=1 j=1 a+
j and j=1 a
j
are convergent.
P
(b) If j=1 aj is convergent, but not absolutely convergent, then


X
X
a+
j = a
j = . (C.2)
j=1 j=1

Proof. The key observation is that (C.1) implies, for each j N,



a+
j + aj = |aj |, (C.3a)

a+
j aj = aj , (C.3b)

0 a+
j , aj |aj |. (C.3c)
P P
(a): If j=1 a+
j and j=1 a
j are convergent, then


X
X
X
(C.3a),(7.75)
|aj | = a+
j + a
j , (C.4)
j=1 j=1 j=1
P P
and, in particular, j=1 aj is absolutely convergent. Conversely, if j=1 aj is absolutely
P + P
convergent, then j=1 aj and j=1 aj are convergent by (C.3c) and Th. 7.83(a).
C SERIES: ADDITIONAL MATERIAL 186

P P + P
(b): If j=1 aj and j=1 aj are convergent, then (C.3b) implies that
P j=1
P aj is also

convergent and, thus, j=1 aj absolutely convergent by (a). Likewise, if j=1 aj and
P P +
j=1 aP
j are convergent, then (C.3b) implies that j=1 aj isPalso convergent and, once
again, a
j=1 j absolutely convergent by (a). Therefore, if
j=1 aj is convergent, but
not absolutely convergent, then (C.2) must hold by (7.79). 

Theorem
P C.2 (Riemann Rearrangement
P Theorem a.k.a. Riemann Series Theorem).
Let a
j=1 j be a series in R. If a
j=1 j is convergent, but not absolutely
P convergent,
then, given x, y R {, } with x y, there exists P a rearrangement j=1 bj of the
series (i.e. a reordering (bj )jN of (aj )jN ) such that j=1 bj has precisely all elements of
[x, y] as cluster points (where we call (resp. ) a cluster point of the real sequence
(tn )nN if, and only if, #{n N : tn < N } = (resp. #{n N : tn > N } = ) for
each N N). In particular,P choosing S := x = y R {, }, one can prescribe an
arbitrary limit S such that j=1 bj = S.

Proof. We first give a sketch of the proof to convey its fairly simple idea: According to

Prop. C.1(b), (C.2) must hold, where the a+ j and aj are as defined in (C.1). Thus, we
can define


k for x = , k for y = ,

xk := x for x R, yk := y for y R, (C.5)
kN


k for x = , k for y = ,

and, noting xk yk for almost all k N, alternate between adding summands a+ j until
the partial sum exceeds yk and subtracting summands a j until the partial sum falls
below xk . If k is sufficiently large such that xk yk , then, at each switching point (from
adding to subtracting or vice versa), the absolute value of the difference between the
last partial sum and xk or yk , respectively, is less than the value of the last contributing
nonzero summand. Since

lim a+j = lim aj = 0, (C.6)
j j

the partial sums corresponding to the switching points converge to the respective end-
points x or y, respectively, and precisely all points between x and y are cluster points.
We will now carry out the proof in detail. Note that we have

N = I + I , where (C.7a)
I + := {j N : aj 0}, (C.7b)
I := {j N : aj < 0}. (C.7c)

We have to define a suitable bijective map : N N such that

bj := a(j) , (C.8a)
jN
n
X
tn := bj . (C.8b)
nN
j=1
C SERIES: ADDITIONAL MATERIAL 187

The definition of will be recursive, and we will also need to recursively define an
auxiliary sequence (j )jN taking values in {1, 1}, serving as an accounting tool to
keep track if we are in the process of moving right (i.e. adding a+j ) or moving left (i.e.

subtracting aj ). Moreover, we need recursively defined auxiliary function : N N
to update the left and right boundaries xk and yk , respectively, to handle the first and
third case of (C.5) if need be. The recursion is initialized by

(1) := 1, (C.9a)
(
1 if t1 y1 ,
1 := (C.9b)
1 if t1 > y1 ,
(
1 if t1 y1 ,
(1) := (C.9c)
2 if t1 > y1 ,

and completed by
( 
min I + \ {1, . . . , j 1} if j1 = 1,
(j) :=  (C.10a)
j>1 min I \ {1, . . . , j 1} if j1 = 1,


1 if j1 = 1 and tj y(j1) ,

1 if
j1 = 1 and tj > y(j1) ,
j := (C.10b)
j>1

1 if j1 = 1 and tj x(j1) ,

1 if j1 = 1 and tj < x(j1) ,


(j 1) if j1 = 1 and tj y(j1) ,

1 + (j 1) if
j1 = 1 and tj > y(j1) ,
(j) := (C.10c)
j>1

(j 1) if j1 = 1 and tj x(j1) ,

1 + (j 1) if j1 = 1 and tj < x(j1) .

We note that is well-defined, since, according to (C.2), both I + and I must have
infinitely many elements. Moreover, is injective, since, for j1 < j2 , (j2 ) 6= (j1 ) is
immediate from (C.10a). Finally, is also surjective: Otherwise, there is a smallest
n N \ {1} such that n / (N). Suppose n I + . Then, according to (C.10a), there
must be j0 N such that j = 1 for every j > j0 , i.e., according toP(C.10b) and
(C.10c), tj x(j0 ) R for each j > j0 , which is in contradiction to the
j=1 aj =
P
part of (C.2). Analogously, n I leads to a contradiction to the j=1 a+ j =P part

of (C.2), completing the Pproof of surjectivity of . So we have shown that P j=1 bj
is a rearrangement of j=1 aj as desired. We still need to verify that j=1 bj (i.e.
(tn )nN ) has precisely all elements of [x, y] as cluster points. To this end, first note
that, due to (C.2) and (C.5), limj x(j) = holds if, and only if, x = ; and
limj x(j) = holds if, and only if, x = ; and likewise for the y(j) and y. If
x = , then limj x(j) = and the bijectivity of together with (C.10b) and
(C.10c) implies
tj < x(j1) N, (C.11)
N N jN
C SERIES: ADDITIONAL MATERIAL 188

showing is a cluster point of (tn )nN . Analogously, if y = , then limj y(j) =


and the bijectivity of together with (C.10b) and (C.10c) implies

tj > y(j1) N, (C.12)


N N jN

showing is a cluster point of (tn )nN . Now let [x, y] R and > 0. Due to (C.6),

tj tj1 < . (C.13)


N N j>N

Due to the bijectivity of together with (C.10b) and (C.10c), for each j0 N, there
exists j > max{j0 , N } such that tj1 tj , showing is a cluster point of (tn )nN .
On the other hand, if ] , x[, then x 6= . If x = , then limj tj = and
is not a cluster point of (tn )nN . If < x < , then let := (x )/2 and choose N as
in (C.13). Then, by (C.10b) and (C.10c), for each j > N , tj > x = + , showing
is not a cluster point of (tn )nN . Analogously, one sees that ]y, [ can not be a
cluster point of (tn )nN . 

C.2 Absolute Convergence and Rearrangements


The present section provides the proof of Th. 7.93, which, in (a), states that arbitrary
rearrangements of absolutely convergent series do not change the value of the series,
and, in (b), states three characterizations of absolute convergence.

Proof
P of Th. 7.93. P (a): First P note that Th. 7.92 implies that, for absolutely convergent

jI aj , the limit jI aj = j=1 a(j) does not depend on the bijective map : N
I:
P For each bijective
P map : N I, (a(j) )jN is a reordering of (a(j) )jN and, thus,
j=1 a(j) = j=1 a(j) .
P
Analogously, the sums In a do not depend on the order of the indices in In .
P
Claim 3. If M I, then S(I) = S(M ) + S(I \ M ), where S(J) := jJ aj for each
J I.

Proof. If M = , then there is nothing to prove. If #M = n N, then let 1 :


{1, . . . , n} M and 2 : {n + 1, n + 2, . . . } I \ M be bijective maps. Then
(
1 (j) for j n,
: N I, (j) :=
2 (j) for j > n,

is a bijective map. Moreover,



X n
X
X
(7.78)
S(I) = a(j) = a(j) + a(j) = S(M ) + S(I \ M ),
j=1 j=1 j=n+1

establishing the case.


C SERIES: ADDITIONAL MATERIAL 189

If #M = #(I \ M ) = #N, then let 1 : {1, 3, 5, . . . } M and 2 : {2, 4, 6, . . . }


I \ M be bijective maps. Then
(
1 (j) for j odd,
: N I, (j) :=
2 (j) for j even,

is a bijective map. Define,


( (
a(j) for j odd, a(j) for j even,
b(j) := c(j) :=
0 for j even, 0 for j odd.

One then obtains



X
X
X
(7.75)
S(I) = a(j) = b(j) + c(j) = S(M ) + S(I \ M ),
j=1 j=1 j=1

establishing the case. N


Sk
Claim 4. If I = n=1 Mn with k N is a decomposition of I, then, using the notation
P
introduced in Cl. 3, S(I) = kn=1 S(Mn ).

Proof. Follows by an induction from Cl. 3. N

Coming back to (7.87), Cl. 4 implies


k
!
[
S(I) = S(I1 ) + S(I2 ) + + S(Ik ) + S(Mk ), where Mk := I \ Ij .
kN
j=1

(7.88), fix a bijective : N I, and let > 0. Due to Cor.


To prove the equality in P
7.81(d), the sums rn := j=n+1 |a(j) | of the remainder series converge to 0, i.e. there
exists N N such that rn < for each n > N . More generally, for each (empty, finite,
or infinite) subset J {N + 2, N + 3, . . . },
X
X
|a(j) | |a(j) | = rN +1 < .
jJ j=N +2

Next, we choose M N sufficiently large such that {(1), . . . , (N + 1)} I1 IM .


Then, for each k > M ,

X (7.83) X X

|S(Mk )| = aj |aj | |a(j) | = rN +1 < ,

jMk jMk j=N +2

proving
X k
X X
X
aj = S(I) = lim S(In ) = a ,
k
jI n=1 n=1 In
C SERIES: ADDITIONAL MATERIAL 190

which is (7.88).
P
(b): (i) implies (ii) with C := jI |aj | using Cl. 3 (with aj replaced by |aj |). (i)
implies (iii) using P (7.88) (with aj replaced by |aj |). (ii) implies (i) via (7.79), as C is an
n
upper bound P Pj=1 a(j) )nN for each bijection : N I. Finally, (iii) implies (ii)
for (
with C := n=1 In |a |, since, given a finite J I, there exists k N such that
J I1 Ik , i.e.

X k
X X X
X
|aj | |a | |a | = C,
jJ n=1 In n=1 In

thereby completing the proof. 

C.3 b-Adic Representations of Real Numbers


The main goal of this section is to provide a proof of Th. 7.97. We begin with some
preparatory lemmas.
Lemma C.3. Given a natural number b 2, consider the b-adic series given by (7.94).
Then
X
dN bN bN +1 , (C.14)
=0

and, in particular, the b-adic series converges to some x R+ 0 . Moreover, equality in


(C.14) holds if, and only if, dn = b 1 for every n {N, N 1, N 2, . . . }.

Proof. One estimates, using the formula for the value of a geometric series:

X X X 1
dN bN (b 1) bN = (b 1)bN b = (b 1)bN 1 = bN +1 . (C.15)
=0 =0 =0
1 b

Note that (C.15) also shows that equality is achieved if all dn are equal to b 1. Con-
versely, if there is n {N, N 1, N 2, . . . } such that dn < b 1, then there is n N
such that dN n < b 1 and one estimates

X n1
X
X
dN bN < dN bN + (b 1)bN n + dN bN bN +1 , (C.16)
=0 =0 =n+1

showing that the inequality in (C.14) is strict. 


Lemma C.4. Given a natural number b 2, consider two b-adic series

X
X
N
x := dN b = eN bN , (C.17)
=0 =0

N Z and dn , en {0, . . . , b 1} for each n {N, N 1, N 2, . . . }. If dN < eN , then


eN = dN + 1, dn = b 1 for each n < N and en = 0 for each n < N .
C SERIES: ADDITIONAL MATERIAL 191

Proof. By subtracting dN bN from both series, one can assume dN = 0 without loss of
generality. From Lem. C.3, we know

X
X
N
x= dN b = dN 1 bN 1 bN . (C.18a)
=0 =0

On the other hand:


X
x= eN bN bN . (C.18b)
=0

Combining (C.18a) and (C.18b) yields x = bN . Once again employing Lem. C.3, (C.18a)
also shows that dn = b 1 for each n N 1 as claimed. Since eN > 0 and en 0
for each n, equality in (C.18b) can only occur for eN = 1 and en = 0 for each n < N ,
thereby completing the proof of the lemma. 

Notation C.5. For each x R, we let

x := max{k Z : k x} (C.19)

denote the integral part of x (also called floor of x or x rounded down).

Proof of Th. 7.97. We start by constructing numbers N and dn , n {N, N 1, N


2, . . . }, such that (7.95) holds. For x = 0, one chooses an arbitrary N Z and dn = 0
for each n {N, N 1, N 2, . . . }. Thus, for the remainder of the proof, fix x > 0. Let

N := max{n Z : bn x}. (C.20)

The numbers dN n {0, . . . , b 1} and xn R+ , n N0 , are defined inductively by


letting
jxk
dN := N , x0 := dN bN , (C.21a)
 b 
x xn1
dN n := N n
, xn := xn1 + dN n bN n for n 1. (C.21b)
b
Claim 5. One can verify by induction on n that the numbers dN n and xn enjoy the
following properties for each n N0 :

dN n {0, . . . , b 1}, (C.22a)


Xn
0 < xn = dN bN x, (C.22b)
=0
N n
x xn < b . (C.22c)

Proof. The induction is carried out for all three statements of (C.22) simultaneously.
From (C.20), we know bN x < bN +1 , i.e. 1 bxN < b. Using (C.21a), this yields
dN {1, . . . , b 1} and 0 < x0 = dN bN = bN dN bN bxN = x as well as x x0 =
x dN bN = bN ( bxN dN ) < bN . For n 1, by induction, one obtains 0 x xn1 <
C SERIES: ADDITIONAL MATERIAL 192

b1+N n , i.e. 0 xx n1
bN n
< b. Using (C.21b), this yields dN n {0, . . . , b 1} and
xn = xn1 + dN n b N n
xn1 + bN n xx n1
bN n
= x. Moreover, by induction, 0 < xn1 =
Pn1 N N n
=0 dN b , such that (C.21b) implies Pn1 xn = xn1 + dNP n b xn1 > 0 and
N n N n N n
xn = xn1 + dN n b = dN n b + =0 dN b = =0 dN bN . Finally,
x xn = x xn1 dN n bN n = bN n ( xx n1
bN n
dN n ) bN n , completing the proof
of the claim. N

Since, for each n N0 ,


(C.22b) x xn (C.21b)
0 x xn = bN n1 bN n1 (dN n1 + 1) bN n , (C.23)
bN n1
and limn bN n = 0, we have limn xn = x, thereby establishing (7.95).
It remains to verify the equivalence of (i) (iv).
(ii) (i) is trivial.
(iii) (i): Assume (iii) holds. Without loss of generality, we can assume that n0
is the largest index such that dn = 0 for each n n0 . We distinguish two cases. If
n0 < N 1 or dN 6= 1, then
N n
X 0 2
X
dN bN + (dn0 +1 1)bn0 +1 + (b 1) bN
=0 =N n0

is a different b-adic representation of x and its first coefficient is nonzero. If n0 = N 1


and dN = 1, then
X
X
N
(b 1) b = (b 1) bN 1
=1 =0

is a different b-adic representation of x and its first coefficient is nonzero.


(iv) (i): Assume (iv) holds. Without loss of generality, we can assume that n0 is
the largest index such that dn = b 1 for each n n0 . Then
N n
X 0 2
X
N
dN b + (dn0 +1 + 1)b n0 +1
+ 0 bN
=0 =N n0

is a different b-adic representation of x and its first coefficient is nonzero.


We will now show that, conversely, (i) implies (ii), (iii), and (iv). To that end, let x > 0
and suppose that x has two different b-adic representations

X
X
x= dN1 bN1 = eN2 bN2 (C.24)
=0 =0

with N1 , N2 Z; dn , en {0, . . . , b 1}; and dN1 , eN2 > 0. This implies

x bN1 , x bN2 . (C.25a)


C SERIES: ADDITIONAL MATERIAL 193

Moreover, Lem. C.3 yields


x bN1 +1 , x bN2 +1 . (C.25b)
N2 N1 +1
If N2 > N1 , then (C.25) imply N2 = N1 + 1 and b x b = b , i.e. x = bN2 = N2

bN1 +1 . Since eN2 > 0, one must have eN2 = 1, and, in turn, en = 0 for each n < N2 .
Moreover, x = bN1 +1 and Lem. C.3 imply that dn = b 1 for each n {N1 , N1 1, . . . }.
Thus, for N2 > N1 , the value of N1 is determined by N2 and the values of all dn and en are
also completely determined, showing that there are precisely two b-adic representations
of x. Moreover, the dn have the property required in (iv) and the en have the property
required in (iii). The argument also shows that, for N1 > N2 , one must have N1 = N2 +1
with the en taking the values of the dn and vice versa. Once again, there are precisely
two b-adic representations of x; now the dn have the property required in (iii) and the
en have the property required in (iv).
It remains to consider the case N := N1 = N2 . Since, by hypothesis, the two b-adic
representations of x in (C.24) are not identical, there must be a largest index n N
such that dn 6= en . Thus, (C.24) implies

X
X
n
y := dn b = en bn . (C.26)
=0 =0

Now Lem. C.4 shows that there are precisely two b-adic representations of x, one having
the property required in (iii) and the other having property required in (iv).
Thus, in each case (N2 > N1 , N1 > N2 , and N1 = N2 ), we find that (i) implies (ii), (iii),
and (iv), thereby concluding the proof of the theorem. 

In most cases, it is understood that we work only with decimal representations such
that there is no confusion about the meaning of symbol strings like 101.01. However,
in general, 101.01 could also be meant with respect to any other base, and, the number
represented by the same string of symbols does obviously depend on the base used.
Thus, when working with different representations, one needs some notation to keep
track of the base.
Notation C.6. Given a natural number b 2 and finite sequences
(dN1 , dN1 1 , . . . , d0 ) {0, . . . , b 1}N1 +1 , (C.27a)
N2
(e1 , e2 , . . . , eN2 ) {0, . . . , b 1} , (C.27b)
(p1 , p2 , . . . , pN3 ) {0, . . . , b 1}N3 , (C.27c)
N1 , N2 , N3 N0 (where N2 = 0 or N3 = 0 is supposed to mean that the corresponding
sequence is empty), the respective string
(dN1 dN1 1 . . .d0 )b for N2 = N3 = 0,
(C.28)
(dN1 dN1 1 . . .d0 . e1 . . . eN2 p1 . . . pN3 )b for N2 + N3 > 0
represents the number
N1
X N2
X X
X N3


d b + e b + p bN2 N3 . (C.29)
=0 =1 =0 =1
D TRIGONOMETRIC FUNCTIONS 194

Example C.7. For the number from (7.93), we get


x = (131.6)10 = (10000011.10)2 = (83.A)16 (C.30)
(for the hexadecimal system, it is customary to use the symbols 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, A, B, C, D, E, F).

One frequently needs to convert representations with respect to one base into represen-
tations with respect to another base. When working with digital computers, conversions
between bases 10 and 2 and vice versa are the most obvious ones that come up. Con-
verting representations is related to the following elementary remainder theorem and
the well-known long division algorithm.
Theorem C.8. For each pair of numbers (a, b) N2 , there exists a unique pair of
numbers (q, r) N20 satisfying the two conditions a = qb + r and 0 r < b.

Proof. Existence: Define


q := max{n N0 : nb a}, (C.31a)
r := a qb. (C.31b)
Then q N0 by definition and (C.31b) immediately yields a = qb + r as well as r Z.
Moreover, from (C.31a), qb a = qb + r, i.e. 0 r, in particular, r N0 . Since (C.31a)
also implies (q + 1)b > a = qb + r, we also have b > r as required.
Uniqueness: Suppose (q1 , r1 ) N0 , satisfying the two conditions a = q1 b + r1 and
0 r1 < b. Then q1 b = a r1 a and (q1 + 1)b = a r1 + b > a, showing
q1 = max{n N0 : nb a} = q. This, in turn, implies r1 = a q1 b = a qb = r,
thereby establishing the case. 

D Trigonometric Functions

D.1 Additional Trigonometric Formulas


Proposition D.1. We have the following identities:
sin(2z) = 2 sin z cos z, (D.1a)
zC
cos(2z) = (cos z)2 (sin z)2 , (D.1b)
zC
1 cos z  z 2
= sin , (D.1c)
zC 2 2
z sin z
tan = . (D.1d)
zC\{(2k+1): kZ} 2 cos z + 1
1 (tan z2 )2
cos z = . (D.1e)
zC\{(2k+1): kZ} 1 + (tan z2 )2
E CARDINALITY OF R AND SOME RELATED SETS 195

Proof. (D.1a) is immediate from (8.44c), (D.1b) is immediate from (8.44d).


(D.1c): For each z C, one computes
1 cos z (D.1b) 1 (cos z2 )2 + (sin z2 )2 (8.44e) 2 (sin z2 )2  z 2
= = = sin , (D.2)
2 2 2 2
thereby establishing the case.
(D.1d): Note that, according to (8.47d), it is
z
cos = 0 z = (2k + 1) . (D.3)
2 kZ

Thus, for each z C \ {(2k + 1) : k Z}, one computes


z 2 sin z2 cos z2 (D.1a),(8.44e) sin z sin z
tan = = = , (D.4)
2 2 (cos z2 )2 (cos z 2
2
) z 2
(sin 2 ) + 1 cos z + 1
thereby establishing the case.
(D.1e): Once again, using (D.3), one computes for each z C \ {(2k + 1) : k Z}:
(D.1b),(8.44e) (cos z2 )2 (sin z2 )2 1 (tan z2 )2
cos z = = , (D.5)
(cos z2 )2 + (sin z2 )2 1 + (tan z2 )2
as claimed. 

E Cardinality of R and Some Related Sets


Theorem E.1. (a) The set of natural numbers N is countable.
(b) The set of integers Z is countable: #Z = #N.
(c) The set of rational numbers Q is countable: #Q = #N.

Proof. (a): The identity Id : N N shows N is countable.


(b): Using (B.19), the map

n/2
if n is even,
: N Z, (n) := 0 if n = 1, (E.1)


(n 1)/2 if n is odd,
is clearly bijective, proving #Z = #N.
(c): According to (b), Z and Z \ {0} are countable. Then Th. 3.24 implies that A :=
Z (Z \ {0}) is countable and there is a bijective map f : N A. It is then immediate
from Def. B.21(a) that the map
 
: N Q, (n) := f (n) , (E.2)
where [f (n)] denotes the equivalence class of f (n) with respect to from (B.26), is
surjective. Thus, Q is countable by Prop. 3.23. 
E CARDINALITY OF R AND SOME RELATED SETS 196

In the following theorem and its two corollaries, we will see that the set R of real numbers
is not countable, but has the same cardinality as the power set of N. Moreover, the same
is true for every nontrivial interval of real numbers.

Theorem E.2. Let a, b R with a < b. Recalling the notations F N, {0, 1} = {0, 1}N
for the set of sequences in {0, 1}, we obtain the following equalities of cardinalities:

#R = #]a, b[= #{0, 1}N = #P(N). (E.3)

Proof. We devide the proof into the following steps:

(i) #{0, 1}N = #P(N).

(ii) #]0, 1[= #{0, 1}N .

(iii) #] 1, 1[= #R.

(iv) #]a, b[= #]0, 1[.

(i): To prove #{0, 1}N = #P(N), we have to show the existence of a bijective map
f : {0, 1}N P(N). Given {0, 1}N , i.e. is a function : N {0, 1}, define

f () := 1 {1} = {n N : (n) = 1}. (E.4)

Then, indeed, f : {0, 1}N P(N). It remains to show f is bijective. To verify f is


injective, consider , {0, 1}N . If 6= , then there exists n N with (n) 6= (n). If
(n) = 1, then (n) = 0, i.e. n f (), but n / f ( ), showing f () 6= f ( ). Analogously,
if (n) = 0, then (n) = 1, i.e. n f ( ), but n / f (), again showing f () 6= f ( ),
concluding the proof that f is injective. To verify f is surjective, for each A P(N),
define (
1 if n A,
A : N {0, 1}, A (n) := (E.5)
0 if n
/ A.
Then A {0, 1}N and f (A ) = A1 {1} = A, proving f is surjective.
(ii): To prove #{0, 1}N = #]0, 1[, we have to show the existence of a bijective map
f : {0, 1}N ]0, 1[. The map

X

N
g : {0, 1} [0, 1], g (xi )iN := xi 2i , (E.6)
i=1

is well-defined by Lem. C.3 (i.e. 0 g 1). Moreover, according to Th. 7.97, g is


surjective, but not injective, as there are numbers x ]0, 1[, that have two different dual
(i.e. 2-adic) representations. However, as there are only countably many such numbers,
we can use a modification to obtain our desired f . In preparation, we define, for each
E CARDINALITY OF R AND SOME RELATED SETS 197

n N, the sequences en := (eni )iN and fn := (fni )iN , where


(
1 for i = n,
eni := (E.7a)
n,iN 0 for i 6= n,
(
1 for i > n,
fni := (E.7b)
n,iN 0 for i n,

and we note

g (0, 0, . . . ) = 0, (E.8a)

g (1, 1, . . . ) = 1, (E.8b)
g(en ) = g(fn ) = 2n for each n N. (E.8c)

We are now in a position to define




21 if (xi )iN = (0, 0, . . . ),

2
2 if (xi )iN = (1, 1, . . . ),



f : {0, 1}N ]0, 1[, f (xi )iN := 2(2n+1) if xi = eni for each i N, (E.9)

2(2n+2)

if xi = fni for each i N,


P x 2i otherwise.
i=1 i

Introducing the auxiliary sets

A := {(0, 0, . . . ), (1, 1, . . . )} {en : n N} {fn : n N}, (E.10a)


B := {2n : n N}, (E.10b)

it follows from Th. 7.97 that (the following restrictions of f which, to simplify notation,
we also denote by f )

f : {0, 1}N \ A ]0, 1[ \B, (E.11a)

and

f : A B (E.11b)

are bijective, i.e. the full f of (E.9) is itself bijective, completing the proof of (ii).
(iii): To prove #] 1, 1[= #R, we have to show the existence of a bijective map f :
R ] 1, 1[. Since we know from Def. and Rem. 8.27 that arctan : R ] /2, /2[
is bijective, we can define
2 arctan x
f : R ]0, 1[, f (x) := . (E.12)

However, even though this provides a valid proof, arctan is a somewhat complicated
function (as it is defined via sin and cos, which are defined via power series). Thus, it
E CARDINALITY OF R AND SOME RELATED SETS 198

might be desirable to see an alternative proof, using a more elementary f . We claim


that
x
f : R ] 1, 1[, f (x) := , (E.13)
|x| + 1
is also bijective. Since f is clearly continuous, according to the intermediate value Th.
7.57, it suffices to show

f (x1 ) < 1 + < 1 < f (x2 ). (E.14)


]0,1[ x1 ,x2 R

However, for each ]0, 1[,


1 + x1
x1 < = 1 + 1 x1 < x1 1 x1 + f (x1 ) = < 1 + ,
x1 + 1
1 x
x2 > = 1 1 x2 > 1 + x2 x2 f (x2 ) = > 1 ,
x+1
proving (E.14) and the surjectivity of f . To verify f is injective, it suffices to show that
f is strictly increasing. Since
x1 x2
x1 0 x2 x1 < x2 f (x1 ) = 0 = f (x2 )
x1 + 1 x2 + 1
f (x1 ) < f (x2 ),
x1 < x2 0 x1 x2 + x1 < x1 x2 + x2
x1 x2
f (x1 ) = < = f (x2 ),
x1 + 1 x2 + 1
0 x1 < x2 x1 x2 + x1 < x1 x2 + x2
x1 x2
f (x1 ) = < = f (x2 ),
x1 + 1 x2 + 1
showing f is strictly increasing and, hence, injective.
(iv): To prove #]a, b[= #]0, 1[, we have to show the existence of a bijective map f :
]a, b[]0, 1[. Such a bijective map is given by the (restriction of an) affine map
xa
f : ]a, b[]0, 1[, f (x) := . (E.15)
ba
The proof that f is bijective can be conducted analogous to (but much simpler than) the
proof in (iii), or one can use (for example, from Linear Algebra) that every nonconstant
affine map from R into R is bijective. 

Corollary E.3. #R = #P(N) in particular, R is not countable.

Proof. #R = #P(N) was proved in Th. E.2 and P(N) is uncountable by Th. 3.20. 

Corollary E.4. If a, b R with a < b, then #(Q]a, b[) = #N and #(]a, b[ \Q) = #R,
i.e. ]a, b[ contains countably many rational and uncountably many irrational numbers.
F IRRATIONALITY OF e AND 199

Proof. Since Q]a, b[ Q, the claim #(Q]a, b[) = #N follows from Th. E.1(c), Prop.
3.22, and Th. 7.68(a).
To prove #(]a, b[ \Q) = #R, a bijection between ]a, b[ \Q and R can be constructed
analogous to the construction of f in step (ii) of the proof of Th. E.2, making use of the
fact that #]a, b[= #R and #Q = #N. 
Theorem E.5. The set of complex numbers C = R R has the same cardinality as R:
#(R R) = #R = #P(N).

Proof. Let
A := {0, 1}N . (E.16)
By an application of Th. E.2, it suffices to prove #A = #(AA), which is accomplished
by showing the existence of a bijective map f : A A A. We define
 
f : A A A, f (xj )jN := (yj )jN , (zj )jN , (E.17a)
where
yj := x2j1 , (E.17b)
jN

zj := x2j , (E.17c)
jN

and 
g : A A A, g (yj )jN , (zj )jN := (xj )jN , (E.18a)
where (
y(j+1)/2 for j odd,
xj := (E.18b)
jN zj/2 for j even.
Clearly, g = f 1 , proving that f is bijective as desired. 

F Irrationality of e and

F.1 Irrationality of e
The following Prop. F.1, which will then be used to prove the irrationality of e in Th. F.2,
shows, in particular, that the series (8.26) can be used to efficiently compute accurate
approximations of e.
Proposition F.1. Defining
n1 j
X z
Rn (z) := ez , (F.1)
nN zC
j=0
j!

we have  n

Rn (z) 2 |z|

|z| 1 , (F.2)
nN (n + 1)!
F IRRATIONALITY OF e AND 200

i.e. the error made when approximating ez by the partial sum (for |z| 1) is at most as
large as twice the modulus of the first missing summand.

Proof. One estimates, for each n N and each z C with |z| 1,


 
(8.24),(7.83) X |z|j (7.75) |z|n |z| |z|2
Rn (z) = 1+ + + ...
j=n
j! (n + 1)! n + 2 (n + 2)(n + 3)
 
|z|1 |z|n 1 1 (7.73) 2 |z|
n
1 + + 2 + ... = , (F.3)
(n + 1)! 2 2 (n + 1)!

which establishes the case. 

Theorem F.2. Eulers number e is irrational.

Proof. Seeking a contradiction, we assume e to be rational. Then there exist m, n N


with n 2 such that e = mn
. Then n!e N and, thus,
n1
(F.1) X 1
n! Rn (1) = n! e n! Z, (F.4)
j=0
j!

2
in contradiction to 0 < |Rn (1)| n+1
< 1, which holds according to (F.2) (recalling
n 2). 

F.2 Irrationality of
Theorem F.3. 2 is irrational (then, in particular, must be irrational as well).

Proof. Seeking a contradiction, we assume 2 to be rational. Then


a
2 = . (F.5)
a,bN b
We can then choose some n N satisfying
an
0< < 1. (F.6)
n!
We now consider the function
2n  
xn (1 x)n () 1 X k n
f : R R, f (x) := = (1) xk , (F.7)
n! n! k=n kn

where the equality at () is proved by


n   2n  
xn (1 x)n (5.23) xn X k n k 1 X k n
= (1) x = (1) xk . (F.8)
n! n! k=0 k n! k=n kn
F IRRATIONALITY OF e AND 201

Thus, for the polynomial f , we obtain the derivatives



0
for 0 j < n,
(j) j! j n

f (0) = n! (1) jn for n j 2n, (F.9)


0 for 2n < j.
j! n

In consequence, since, for n j 2n, n! N and jn N,

f (j) (0) Z. (F.10)


jN0

Moreover, since f (1 x) = f (x) for each x R, and, thus, f (j) (1 x) = (1)j f (j) (x)
for each x R, we also have
f (j) (1) Z. (F.11)
jN0

Next, we consider another polynomial, namely


n
X
n
g : R R, g(x) := b (1)k 2(nk) f (2k) (x). (F.12)
k=0

Due to (F.5), (F.11), and (F.12), we have


 
g(0) Z g(1) Z . (F.13)
jN0

For each x R, one calculates


n
X n
X
2 n k 2(nk) (2(k+1)) n
g (x) + g(x) = b (1) f (x) + b (1)k 2(n(k1)) f (2k) (x)
k=0 k=0
n+1
X n
X
n k1 2(n(k1)) (2k) n
=b (1) f (x) + b (1)k 2(n(k1)) f (2k) (x)
k=1 k=0

= bn (1)n f (2n+2) (x) + bn 2n+2 f (x) = bn 2n+2 f (x), (F.14)


and, thus, for
h : R R, h(x) := g (x) sin(x) g(x) cos(x), (F.15)
one obtains, for each x R,
h (x) = g (x) sin(x) + g (x) cos(x) g (x) cos(x) + 2 g(x) sin(x)
 (F.14)
= g (x) + 2 g(x) sin(x) = bn 2n+2 f (x) sin(x)
(F.5)
= 2 an f (x) sin(x), (F.16)
implying the function h is the antiderivative of the function x 7 2 an f (x) sin(x).
This, together with the fundamental theorem of calculus in the form Th. 10.19(b) implies
Z
2 an 1 h(1) h(0) g(1) + g(0)
I := f (x) sin(x) dx = = = g(1) + g(0) Z.
0
(F.17)
G RIEMANN INTEGRAL FOR C-VALUED FUNCTIONS 202

On the other hand, the definition of f in (F.7) yields


1
0 < f (x) < , (F.18)
0<x<1 n!
and, thus, by (10.29) (i.e. by the monotonicity of the integral),
an (F.6)
0<I< < 1. (F.19)
n!
The contradiction between (F.19) and (F.17) establishes the case. 

G Riemann Integral for C-Valued Functions

G.1 Riemann Integrability


Notation G.1. Let a, b R, I := [a, b]. By R(I, R) := R(I) we denote the set of all
Riemann integrable functions f : I R (cf. Def. 10.5(b)).
Definition G.2. Let a, b R, I := [a, b]. We call a function f : I C Riemann
integrable if, and only if, both Re f and Im f are Riemann integrable. The set of all
Riemann integrable functions f : I C is denoted by R(I, C). If f R(I, C), then
Z Z Z  Z Z
f := Re f, Im f = Re f + i Im f C (G.1)
I I I I I

is called the Riemann integral of f over I.


Theorem G.3. Let I = [a, b] R, f : I C. If f is continuous, then f is Riemann
integrable over I.

Proof. If f is continuous, then Re f and Im f are both continuous, and, thus, the state-
ment follows from the real-valued case of Th. 10.15(a). 
Theorem G.4. Let a, b R, a b, I := [a, b].

(a) If f, g R(I, C), then f, f g, R(I, C). If, in addition, there exists > 0 such that
|g(x)| for each x I, then f /g R(I, C).
(b) If f R(I, R) and : f (I) C is Lipschitz continuous, then f R(I, C).
(c) If f R(I, C) and : f (I) R is Lipschitz continuous, then f R(I, R).

Proof. (a): Since

f = (Re f, Im f ), (G.2a)
f g = (Re f Re g Im f Im g, Re f Im g + Im f Re g), (G.2b)
1/g = (Re g/|g|2 , Im g/|g|2 ), (G.2c)
G RIEMANN INTEGRAL FOR C-VALUED FUNCTIONS 203

everything follows from the real-valued case of Th. 10.11(a) and of Th. 10.17(b),(c),
where |g| > 0 guarantees |g|2 2 > 0).
(b): Assume to be L-Lipschitz, L 0. For each x, y f (I), one has
Th. 5.11(d)
| Re (x) Re (y)| |(x) (y)| L|x y|, (G.3a)
Th. 5.11(d)
| Im (x) Im (y)| |(x) (y)| L|x y|, (G.3b)
showing Re and Im are L-Lipschitz, such that Re( f ) and Im( f ) are Riemann
integrable by Th. 10.17(a).
(c): Assume to be L-Lipschitz, L 0. If f R(I, C), then Re f, Im f R(I, R), and,
given > 0, Riemanns integrability criterion of Th. 10.12 provides partitions 1 , 2 of
I such that R(1 , Re f ) r(1 , Re f ) < /2L, R(2 , Im f ) r(2 , Im f ) < /2L, where
R and r denote upper and lower Riemann sums, respectively (cf. (10.7)). Letting be
a joint refinement of 1 and 2 , we have (cf. Def. 10.8(a),(b) and Th. 10.10(a))
R(, Re f ) r(, Re f ) < /2L, R(, Im f ) r(, Im f ) < /2L. (G.4)
Recalling that, for each g : I R and = (x0 , . . . , xN ) RN +1 , N N, a = x0 <
x1 < < xN = b, Ij := [xj1 , xj ], it is
N
X N
X
r(, g) = mj |Ij | = mj (g)(xj xj1 ), (G.5a)
j=1 j=1
N
X N
X
R(, g) = Mj |Ij | = Mj (g)(xj xj1 ), (G.5b)
j=1 j=1

where
mj (g) := inf{g(x) : x Ij }, Mj (g) := sup{g(x) : x Ij }, (G.5c)
we obtain, for each j , j Ij ,

( f )(j ) ( f )(j )
Th. 5.11(d)
L f (j ) f (j ) L Re f (j ) Re f (j ) + L Im f (j ) Im f (j )
 
L Mj (Re f ) mj (Re f ) + L Mj (Im f ) mj (Im f ) , (G.6)
and, thus,
N
X 
R(, f ) r(, f ) = Mj ( f ) mj ( f ) |Ij |
j=1
N N
(G.6) X  X 
L Mj (Re f ) mj (Re f ) |Ij | + L Mj (Im f ) mj (Im f ) |Ij |
j=1 j=1

  (G.4)
= L R(, Re f ) r(, Re f ) + L R(, Im f ) r(, Im f ) < . (G.7)
Thus, f R(I, R) by Th. 10.12. 
G RIEMANN INTEGRAL FOR C-VALUED FUNCTIONS 204

Theorem G.5. Let a, b R, a b, I := [a, b].

(a) The integral is linear: More precisely, if f, g R(I, C) and , C, then f +g


R(I, C) and Z Z Z
(f + g) = f + g. (G.8)
I I I

(b) Let = (y0 , . . . , yM ), a = y0 < < yM = b, M N, be a partition of I, Jk :=


[yk1 , yk ]. Then f R(I, C) if, and only if, f R(Jk , C) for each k {1, . . . , M }.
If f R(I, C), then
Z b Z M Z
X M Z
X yk
f= f= f= f. (G.9)
a I k=1 Jk k=1 yk1

(c) For each f R(I, C), one has |f | R(I, R) and


Z Z

f |f |. (G.10)

I I

Proof. (a): One computes, using the real-valued case of Th. 10.11(a),
Z Z Z 
(f ) = (Re Re f Im Im f ), (Re Im f + Im Re f )
I I I
 Z Z Z Z 
= Re Re f Im Im f, Re Im f + Im Re f
I I I I
Z
= f (G.11a)
I

and
Z Z Z  Z Z Z Z 
(f + g) = Re(f + g), Im(f + g) = Re f + Re g, Im g + Im g
I I I I I I I
Z Z  Z Z  Z Z
= Re f, Im f + Re g, Im g = f + g. (G.11b)
I I I I I I

(b): One computes, using the real-valued case of Th. 10.11(b),


Z Z Z  M Z M Z
! M Z
X X X
f= Re f, Im f = Re f, Im f = f. (G.12)
I I I k=1 Jk k=1 Jk k=1 Jk

(c): As the modulus is 1-Lipschitz by the inverse triangle inequality, |f | R(I, R) by


Th. G.4(c). Let be an arbitrary partition of I. Then, using the notation from the
G RIEMANN INTEGRAL FOR C-VALUED FUNCTIONS 205

proof of Th. G.4(c) above,


N N
!
 X X

(, Re f ), (, Im f ) := Re f (j ) |Ij |, Im f (j ) |Ij |

j=1 j=1

N
X 
Re f (j ), Im f (j ) |Ij |
j=1
N
X
= |f (j )| |Ij | =: (, |f |). (G.13)
j=1

Since the intermediate Riemann sums in (G.13) converge to the respective integrals by
(10.24b), one obtains
Z Z
 (G.13)
f = lim (, Re f ), (, Im f ) lim (, |f |) = |f |, (G.14)
||0 ||0
I I

proving (G.10). 

G.2 Fundamental Theorem of Calculus


Theorem G.6. Let a, b R, a < b, I := [a, b].

(a) If f R(I, K) is continuous in I, then, for each c I, the function


Z x
Fc : I K, Fc (x) := f (t) dt , (G.15)
c

is differentiable in with F () = f (). In particular, if f C(I, K), then F


C 1 (I, K) and F (x) = f (x) for each x I.

(b) If F C 1 (I, K) or, alternatively, F : I K is differentiable with integrable


derivative F R(I, K), then
Z b
F (b) F (a) = [F (t)]ba = F (t) dt , (G.16a)
a

and
Z x
F (x) = F (c) + F (t) dt for each c, x I. (G.16b)
c

Proof. The case K = R was proved in Th. 10.19 and the case K = C then follows
by applying the case K = R to Re Fc and Im Fc (for (a)) and to Re F and Im F (for
(b)). 
REFERENCES 206

G.3 Integration by Parts


Theorem G.7. Let a, b R, a < b, I := [a, b]. If f, g C 1 (I, K), then the following
integration by parts formula holds:
Z b Z b
b
f g = [f g]a f g. (G.17)
a a

Proof. If f, g C 1 (I, K), then, according to the product rule, f g C 1 (I, K) with
(f g) = f g + f g . Applying (G.16a), we obtain
Z b Z b Z b

[f g]ba = (f g) = f g+ f g, (G.18)
a a a

which is precisely (G.17). 

G.4 Change of Variables


Theorem G.8. Let I, J R be intervals, C 1 (I) and f C(J, K). If (I) J,
then the following change of variables formula holds for each a, b I:
Z (b) Z (b) Z b Z b

f= f (x) dx = f ((t)) (t) dt = (f ) . (G.19)
(a) (a) a a

Proof. The case K = R was proved in Th. 10.24 and then the computation
Z Z Z !
(b) (b) (b)
f= Re f, Im f
(a) (a) (a)
Z b Z b  Z b

= (Re f ) , (Im f ) = (f ) (G.20)
a a a

establishes the case K = C. 

References
[EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einfuhrung in die math-
ematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007
(German).

[EHH+ 95] H.-D. Ebbinghaus, H. Hermes, F. Hirzebruch, M. Koecher,


K. Mainzer, J. Neukirch, A. Prestel, and R. Remmert. Numbers.
Graduate Texts in Mathematics, Vol. 123, Springer-Verlag, New York, 1995,
corrected 3rd printing.
REFERENCES 207

[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of
Mathematics, Vol. 102, North-Holland, Amsterdam, 1980.

[Lan65] Edmund Landau. Grundlagen der Analysis, 4th ed. American Mathemat-
ical Society, New York, 1965.

[Wal02] Wolfgang Walter. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2002 (Ger-
man).

You might also like