0% found this document useful (0 votes)
24 views352 pages

Analysis Notes

The document consists of lecture notes on analysis, covering various topics such as sets, functions, proofs, real numbers, sequences, and series. It includes detailed sections on convergence, limit theorems, and exercises to reinforce learning. The notes are structured with an introduction and multiple chapters, each focusing on specific concepts in mathematical analysis.

Uploaded by

Robert Stanescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views352 pages

Analysis Notes

The document consists of lecture notes on analysis, covering various topics such as sets, functions, proofs, real numbers, sequences, and series. It includes detailed sections on convergence, limit theorems, and exercises to reinforce learning. The notes are structured with an introduction and multiple chapters, each focusing on specific concepts in mathematical analysis.

Uploaded by

Robert Stanescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 352

Lecture notes analysis

Jim Portegies

January 8, 2024
Contents

1 Introduction 12

2 Sets, spaces and functions 13


2.1 What analysis is about . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Functions between sets . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Set-theoretic definition of functions . . . . . . . . . . . . . . . 15
2.4 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Normed vector spaces . . . . . . . . . . . . . . . . . . . . . . 17
2.6 The reverse triangle inequality . . . . . . . . . . . . . . . . . 22
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 23
2.7.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 24

3 Proofs in analysis 25
3.1 What is a proof? . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Expectations on proofs . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Prove statements block by block . . . . . . . . . . . . . . . . 27
3.4 Directly proving “for all” statements . . . . . . . . . . . . . . 27
3.5 Directly proving “there exists” statements . . . . . . . . . . . 29
3.6 Trying to finish the proof . . . . . . . . . . . . . . . . . . . . . 29

1
CONTENTS 2

3.7 Let’s try again . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


3.8 (Natural) induction . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Negations and quantifiers . . . . . . . . . . . . . . . . . . . . 33
3.10 Proofs by contradiction . . . . . . . . . . . . . . . . . . . . . . 34
3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.11.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 35
3.11.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 35

4 Real numbers 36
4.1 What are the real numbers? . . . . . . . . . . . . . . . . . . . 36
4.2 The completeness axiom . . . . . . . . . . . . . . . . . . . . . 37
4.3 Alternative characterizations of suprema and infima . . . . . 41
4.4 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 The Archimedean property . . . . . . . . . . . . . . . . . . . 44
4.6 Sets can be complicated . . . . . . . . . . . . . . . . . . . . . 49
4.7 Computation rules for suprema . . . . . . . . . . . . . . . . . 49
4.8 Bernoulli’s inequality . . . . . . . . . . . . . . . . . . . . . . . 51
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 52
4.9.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 53

5 Sequences 54
5.1 A sequence is a function from the natural numbers . . . . . . 54
5.2 Terminology around sequences . . . . . . . . . . . . . . . . . 55
5.3 Convergence of sequences . . . . . . . . . . . . . . . . . . . . 57
5.4 Examples and limits of simple sequences . . . . . . . . . . . 58
5.5 Uniqueness of limits . . . . . . . . . . . . . . . . . . . . . . . 59
5.6 More properties of convergent sequences . . . . . . . . . . . 60
CONTENTS 3

5.7 Limit theorems for sequences taking values in a normed


vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.8 Index shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.9.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 67
5.9.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 67

6 Real-valued sequences 68
6.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Monotone, bounded sequences are convergent . . . . . . . . 69
6.3 Limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 The squeeze theorem . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Divergence to ∞ and −∞ . . . . . . . . . . . . . . . . . . . . . 77
6.6 Limit theorems for improper limits . . . . . . . . . . . . . . . 78
6.7 Standard sequences . . . . . . . . . . . . . . . . . . . . . . . . 79
6.7.1 Geometric sequence . . . . . . . . . . . . . . . . . . . 79
6.7.2 The nth root of n . . . . . . . . . . . . . . . . . . . . . 80
6.7.3 The number e . . . . . . . . . . . . . . . . . . . . . . . 81
6.7.4 Exponentials beat powers . . . . . . . . . . . . . . . . 83
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.8.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 86
6.8.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 86

7 Series 87
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 The harmonic series . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4 The hyperharmonic series . . . . . . . . . . . . . . . . . . . . 90
CONTENTS 4

7.5 Only the tail matters for convergence . . . . . . . . . . . . . . 91


7.6 Divergence test . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.7 Limit laws for series . . . . . . . . . . . . . . . . . . . . . . . 95
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.8.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 96
7.8.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 96

8 Series with positive terms 98


8.1 Comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.2 Limit comparison test . . . . . . . . . . . . . . . . . . . . . . . 101
8.3 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.4 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.5.1 Blue Exercises . . . . . . . . . . . . . . . . . . . . . . . 109
8.5.2 Orange Exercises . . . . . . . . . . . . . . . . . . . . . 109

9 Series with general terms 110


9.1 Series with real terms: the Leibniz test . . . . . . . . . . . . . 111
9.2 Series characterization of completeness in normed vector
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.3 The Cauchy product . . . . . . . . . . . . . . . . . . . . . . . 115
9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.4.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 119
9.4.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 119

10 Subsequences, lim sup and lim inf 120


10.1 Index sequences and subsequences . . . . . . . . . . . . . . . 120
10.2 (Sequential) accumulation points . . . . . . . . . . . . . . . . 122
CONTENTS 5

10.3 Subsequences of a converging sequence . . . . . . . . . . . . 122


10.4 lim sup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.5 lim inf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.6 Relations between lim, lim inf and lim sup . . . . . . . . . . 131
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10.7.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 132
10.7.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 133

11 Point-set topology of metric spaces 134


11.1 Open sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.2 Closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
11.3 Cauchy sequences . . . . . . . . . . . . . . . . . . . . . . . . . 145
11.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
11.5 Series characterization of completeness in normed vector
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.6.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 153
11.6.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 153

12 Compactness 154
12.1 Definition of (sequential) compactness . . . . . . . . . . . . . 154
12.2 Boundedness and total boundedness . . . . . . . . . . . . . . 155
12.3 Alternative characterization of compactness . . . . . . . . . . 158
12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
12.4.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 163
12.4.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 163

13 Limits and continuity 165


CONTENTS 6

13.1 Accumulation points . . . . . . . . . . . . . . . . . . . . . . . 166


13.2 Limit in an accumulation point . . . . . . . . . . . . . . . . . 166
13.3 Uniqueness of limits . . . . . . . . . . . . . . . . . . . . . . . 167
13.4 Sequence characterization of limits . . . . . . . . . . . . . . . 168
13.5 Limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
13.6 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
13.7 Sequence characterization of continuity . . . . . . . . . . . . 171
13.8 Rules for continuous functions . . . . . . . . . . . . . . . . . 172
13.9 Images of compact sets under continuous functions are com-
pact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
13.10Uniform continuity . . . . . . . . . . . . . . . . . . . . . . . . 173
13.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
13.11.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 175
13.11.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 176

14 Real-valued functions 177


14.1 More limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . 177
14.2 Building new continuous functions . . . . . . . . . . . . . . . 178
14.3 Continuity of standard functions . . . . . . . . . . . . . . . . 178
14.4 Limits from the left and from the right . . . . . . . . . . . . . 179
14.5 The extended real line . . . . . . . . . . . . . . . . . . . . . . 180
14.6 Limits to ∞ or −∞ . . . . . . . . . . . . . . . . . . . . . . . . 181
14.7 Limits at ∞ and −∞ . . . . . . . . . . . . . . . . . . . . . . . . 182
14.8 The Intermediate Value Theorem . . . . . . . . . . . . . . . . 188
14.9 The Extreme Value Theorem . . . . . . . . . . . . . . . . . . . 188
14.10Equivalence of norms . . . . . . . . . . . . . . . . . . . . . . . 189
14.11Bounded linear maps and operator norms . . . . . . . . . . . 193
14.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
CONTENTS 7

14.12.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 198


14.12.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 199

15 Differentiability 201
15.1 Definition of differentiability . . . . . . . . . . . . . . . . . . 202
15.2 The derivative as a function . . . . . . . . . . . . . . . . . . . 204
15.3 Constant and linear maps are differentiable . . . . . . . . . . 205
15.4 Bases and coordinates . . . . . . . . . . . . . . . . . . . . . . 205
15.5 The matrix representation . . . . . . . . . . . . . . . . . . . . 208
15.6 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
15.7 Sum, product and quotient rules . . . . . . . . . . . . . . . . 213
15.8 Differentiability of components . . . . . . . . . . . . . . . . . 214
15.9 Differentiability implies continuity . . . . . . . . . . . . . . . 215
15.10Derivative vanishes in local maxima and minima . . . . . . . 216
15.11The mean-value theorem . . . . . . . . . . . . . . . . . . . . . 218
15.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
15.12.1 Blue exercises . . . . . . . . . . . . . . . . . . . . . . . 219
15.12.2 Orange exercises . . . . . . . . . . . . . . . . . . . . . 220

16 Differentiability of standard functions 221


16.1 Global context . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
16.2 Polynomials and rational functions are differentiable . . . . 222
16.3 Differentiability of other standard functions . . . . . . . . . . 223
16.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

17 Directional and partial derivatives 226


17.1 A recurring and very important construction . . . . . . . . . 226
17.2 Directional derivative . . . . . . . . . . . . . . . . . . . . . . . 227
CONTENTS 8

17.3 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 230


17.4 The Jacobian of a map . . . . . . . . . . . . . . . . . . . . . . 233
17.5 Linearization and tangent planes . . . . . . . . . . . . . . . . 234
17.6 The gradient of a function . . . . . . . . . . . . . . . . . . . . 235
17.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

18 The Mean-Value Inequality 238


18.1 The mean-value inequality for functions defined on an in-
terval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
18.2 The mean-value inequality for functions on general domains 241
18.3 Continuous partial derivatives implies differentiability . . . 243
18.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

19 Higher order derivatives 248


19.1 Definition of higher order derivatives . . . . . . . . . . . . . 248
19.2 Multi-linear maps . . . . . . . . . . . . . . . . . . . . . . . . . 249
19.3 Relation to n-fold directional derivatives . . . . . . . . . . . . 252
19.4 A criterion for higher differentiability . . . . . . . . . . . . . 253
19.5 Symmetry of second order derivatives . . . . . . . . . . . . . 254
19.6 Symmetry of higher-order derivatives . . . . . . . . . . . . . 256
19.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

20 Polynomials and approximation by polynomials 259


20.1 Homogeneous polynomials . . . . . . . . . . . . . . . . . . . 259
20.2 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 262
20.3 Taylor approximations of standard functions . . . . . . . . . 268
20.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

21 Banach fixed point theorem 271


CONTENTS 9

21.1 The Banach fixed point theorem . . . . . . . . . . . . . . . . . 271


21.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
21.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

22 Implicit function theorem 279


22.1 The objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
22.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
22.3 The implicit function theorem . . . . . . . . . . . . . . . . . . 282
22.4 The inverse function theorem . . . . . . . . . . . . . . . . . . 290
22.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

23 Function sequences 292


23.1 Pointwise convergence . . . . . . . . . . . . . . . . . . . . . . 292
23.2 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . 293
23.3 Preservation of continuity under uniform convergence . . . 294
23.4 Differentiability theorem . . . . . . . . . . . . . . . . . . . . . 296
23.5 The normed vector space of bounded functions . . . . . . . . 298
23.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

24 Function series 300


24.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
24.2 The Weierstrass M-test . . . . . . . . . . . . . . . . . . . . . . 300
24.3 Conditions for differentiation of function series . . . . . . . . 304
24.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

25 Power series 308


25.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
25.2 Convergence of power series . . . . . . . . . . . . . . . . . . 308
CONTENTS 10

25.3 Standard functions defined as power series . . . . . . . . . . 312


25.4 Operations with power series . . . . . . . . . . . . . . . . . . 313
25.5 Differentiation of power series . . . . . . . . . . . . . . . . . 315
25.6 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
25.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

26 Riemann integration in one dimension 319


26.1 Riemann integrable functions and the Riemann integral . . . 319
26.2 Sums, products of Riemann integrable functions . . . . . . . 324
26.3 Continuous functions are Riemann integrable . . . . . . . . . 325
26.4 Fundamental theorem of calculus . . . . . . . . . . . . . . . . 327
26.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

27 Riemann integration in multiple dimensions 332


27.1 Partitions in multiple dimensions . . . . . . . . . . . . . . . . 332
27.2 Riemann integral on rectangles in Rd . . . . . . . . . . . . . . 333
27.3 Properties of the multi-dimensional Riemann integral . . . . 335
27.4 Continuous functions are Riemann integrable . . . . . . . . . 336
27.5 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 336
27.6 The (topological) boundary of a set . . . . . . . . . . . . . . . 337
27.7 Jordan content . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
27.8 Integration over general domains . . . . . . . . . . . . . . . . 339
27.9 The volume of bounded sets . . . . . . . . . . . . . . . . . . . 339
27.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

28 Change-of-variables Theorem 341


28.1 The Change-of-variables Theorem . . . . . . . . . . . . . . . 341
28.2 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 342
CONTENTS 11

28.3 Cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . . 342


28.4 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . 343
28.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

A Best practices 345


Chapter 1

Introduction

This course aims to help you develop an understanding of mathematical


analysis, and help you learn how to prove statements in analysis.
We will use a certain amount of abstraction in the course. For instance, we
will set up Analysis 1 around the concept of a metric space. The reason is
that I believe that this level of abstraction will make things simpler.
The lecture notes are still under construction. Any feedback is more than
welcome. It also means that the lecture notes will keep getting updated as
the course progresses.

12
Chapter 2

Sets, spaces and functions

2.1 What analysis is about


Mathematical analysis is for a large part the rigorous study of the approxi-
mate behavior of functions. Consider for instance the function b : N → Q
given by
1
b(n) := .
n+1
As n becomes very large, the values b(n) get very close to zero. Another
example can be given for the function f : Q → Q given by

f ( x ) = x2 + 2.

As x becomes very close to 1, the values f ( x ) get very close to 3. But


the terms that I used such as “very large” and “very close” do not have a
precise meaning. In analysis, we introduce precise concepts such as limits,
that allow you to make precise statements and give rigorous proofs.

2.2 Functions between sets


The main characters in analysis are therefore functions. We use the nota-
tion f : X → Y to indicate that f is a function from a set X to a set Y. I like
to think of a function f as a machine, with input from a set X and output

13
CHAPTER 2. SETS, SPACES AND FUNCTIONS 14

in a set Y. If you have ever written a computer program before, then it


can also help to see the analogy with functions in computer science. They
take in data of some type (such as the type of integers) and produce data
of another type (such as strings).
The most important example in this course for a set Y is the set R of the real
numbers. In the next chapter, we are going to be more precise about what
the real numbers are, but for now we assume that you have a working
knowledge.
Other examples of sets are the set of natural numbers N, the empty set
∅ (which has no elements at all), the set of integers Z, the set of ratio-
nal numbers Q, and of course any subsets of those, such as the following
subset of Z,
{2k | k ∈ Z}
which is the set of all even integers. More complicated examples of sets
also exist: we can look at the set of all polynomials of degree at most 5,
or the set of all possible grammatically correct sentences in the English
language, or the set of all movies on Netflix. Whenever you see the word
“set”, you can imagine any of these options. I also encourage you to think
of options yourself, it can be helpful to be creative.
Here are some examples of functions in mathematics.

Example 2.2.1. The function b : N → R defined by

1
b(n) := , for n ∈ N
1 + n2
is an example of what we call a real-valued sequence, or a sequence of real
numbers. In general, a real-valued sequence is a function from the natural
numbers N to the real numbers R.

Example 2.2.2. The function b : N → N defined by

b ( n ) : = n3 , for n ∈ N

is an example of a sequence of natural numbers. In general, a sequence of


natural numbers is a function from the natural numbers N to the natural
CHAPTER 2. SETS, SPACES AND FUNCTIONS 15

numbers N.

Example 2.2.3. If M is an (n × m) matrix, with n rows and m columns,


then L : Rm → Rn defined by

L( x ) := Mx

is a function from Rm to Rn .

2.3 Set-theoretic definition of functions


Just so you have seen it before, let us also mention the strict mathematical
definition of functions in set theory. Within set theory, a function f : X →
Y between two sets X and Y is defined as a subset (let’s call it F) of the
Cartesian product X × Y, such that for every x ∈ X, there is exactly one
y ∈ Y such that ( x, y) ∈ F. We usually write this unique value y as f ( x ).

2.4 Metric spaces


In analysis, we don’t just work with any sets. It is crucial that we can
make sense of what it means for two points in a set to be close, or far
away. We can make this precise by the concept of a distance. A distance
is really nothing more than a function, that takes a pair of points in X as
input, and produces a real number as output, satisfying the properties in
the definition listed below.

Definition 2.4.1 (distance). Let X be a set. A function d : X × X → R


is called a distance on X if

i. (positivity) For all a, b ∈ X, it holds that d( a, b) ≥ 0.

ii. (non-degeneracy) For all a, b ∈ X, if d( a, b) = 0 then a = b.

iii. (symmetry) For all a, b ∈ X, d( a, b) = d(b, a).


CHAPTER 2. SETS, SPACES AND FUNCTIONS 16

iv. (triangle inequality) For all a, b, c ∈ X,

d( a, c) ≤ d( a, b) + d(b, c).

v. (reflexivity) For all a ∈ X, d( a, a) = 0.

Usually conditions (ii) and (v) are combined to the condition that for all
a, b ∈ X, d( a, b) = 0 if and only if a = b.

Definition 2.4.2 (metric space). A metric space ( X, dist) is a pair of a


set X and a distance function dist : X × X → R on X.

Example 2.4.3 (example of a metric space). Let X := {red, yellow, blue}


be the set of (traditional) primary colors. The following table describes
a function d : X × X → R as follows: in the row corresponding to a
color x and a column corresponding to a color y the value d( x, y). So
for instance d(yellow, blue) = 3.

red yellow blue


red 0 1 2
yellow 1 0 3
blue 2 3 0

We claim that d : X × X → R is indeed a distance on X. We can loosely


verify the properties in Definition 2.4.1 quite easily by inspecting the
table. For instance, because all elements in the table are positive, the
positivity property holds for the function d.a Similarly, we can verify
the non-degeneracy, symmetry, the triangle inequality and the reflex-
ivity. We conclude that d is a distance on X, the pair ( X, d) is a metric
space.
a thisisn’t quite a rigorous proof. A full proof is a bit cumbersome because we
really need to go through all possible cases, i.e. through all possible combinations of
colors.

It will be very useful to give a definition to all points within a certain dis-
tance of a given point in a metric space.
CHAPTER 2. SETS, SPACES AND FUNCTIONS 17

Definition 2.4.4 ((open) ball). Let ( X, dist) be a metric space. The (open)
ball around a point p ∈ X with radius r > 0 is denoted by B( p, r ) and
is defined as the set

B( p, r ) := {q ∈ X | dist(q, p) < r }.

Example 2.4.5. In 2.4.3, the open ball B(yellow, 3/2) is the set {yellow, red},
because d(yellow, yellow) = 0 which is strictly less than 3/2, and d(yellow, red) =
1 < 3/2, but d(yellow, blue) = 3 > 3/2.

2.5 Normed vector spaces


Great examples of metric spaces are (constructed from) normed vector
spaces. If you have had a course in linear algebra, you will most likely
have seen the following definition of a vector space over a field K. In these
lecture notes, we will mostly only consider vector spaces over R, unless
we explicitly indicate otherwise. It is not necessary to learn this definition
by heart. What is important is to take away that a vector space over a field
is a set, together with operations called scalar multiplication and addition,
that together satisfy some properties.

Definition 2.5.1 (vector space). Let K be a field (or if you don’t know
what a field is, let K be the real or complex numbers or the rational
numbers). A vector space (V, ·, +, 0) over the field K is a set V, to-
gether with functions · : K × V → V and + : V × V → V called scalar
multiplication and addition, and a particular element 0 ∈ V such that
the following properties are satisfied.

i. (commutativity of addition) for all v, w ∈ V,

v+w = w+v
CHAPTER 2. SETS, SPACES AND FUNCTIONS 18

ii. (associativity of addition) for all u, v, w ∈ V,

u + (v + w) = (u + v) + w

iii. (0 is the unit for addition) for all v ∈ V,

v+0 = 0+v = v

iv. every vector v has an opposite (−v) ∈ V, such that

v + (−v) = 0.

v. (1 ∈ K is the unit for scalar multiplication) for all v ∈ V,

1 · v = v.

vi. (associativity of scalar multiplication) for all λ, µ ∈ K and all


v ∈ V,
λ · (µ · v) = (λµ) · v.

vii. for all v, w ∈ V and λ ∈ K,

λ · (v + w) = λ · v + λ · w.

viii. for all λ, µ ∈ K and all v ∈ V,

(λ + µ) · v = λv + µv.

One of the reasons that vector spaces are so important in analysis is that
for functions defined on vector spaces, we can introduce the concept of
differentiation, while this is in general problematic for functions defined on
metric spaces.

Definition 2.5.2 (norm). Let V be a vector space. We say that a function


k · k : V → R is a norm if

i. (positivity) For all v ∈ V, it holds that kvk ≥ 0.


CHAPTER 2. SETS, SPACES AND FUNCTIONS 19

ii. (non-degeneracy) For all v ∈ V, if kvk = 0 then v = 0.

iii. (absolute homogeneity) For all λ ∈ R and all v ∈ V it holds that


kλvk = |λ|kvk.
iv. (triangle inequality) For all v, w ∈ V it holds that

k v + w k ≤ k v k + k w k.

Example 2.5.3. Denote by Rd the vector space of column vectors of


length d with values in R. For a vector x ∈ Rd , denote its components
by x1 , . . . , xd . The Euclidean norm is defined as
v
u d q
k x k2 := t ∑ xi2 = x12 + x22 + · · · + xd2 .
u

i =1

In fact, the norm k · k2 comes from an inner product: If we define the


standard inner product

d
( x, y) := ∑ xi yi
i =1

then k x k22 = ( x, x ). The triangle inequality follows from the Cauchy-


Schwarz inequality in linear algebra. Let a, b ∈ Rd . Then

k a + bk22 = k ak22 + 2( a, b) + kbk22


≤ k ak22 + 2k ak2 kbk2 + kbk22
= (k ak2 + kbk2 )2 .

Example 2.5.4. We now specify the above example to the case d = 1.


Then the vector space is R itself. The norm is given by


(
x, if x ≥ 0,
k x k2 = x 2 =
− x, if x < 0.
CHAPTER 2. SETS, SPACES AND FUNCTIONS 20

which is the absolute value of x. We will denote the norm by | · | : R →


R.

The following proposition tells us that if we have a normed vector space


(V, k · k), we can also interpret it as a metric space.

Proposition 2.5.5. Let (V, k · k) be a normed vector space. Define the


function dk.k : V × V → R by

dk·k ( x, y) := k x − yk.

Then dk·k is a distance on V.

Proof. We need to show that dk·k satisfies all properties of a distance


on V. We will check these properties one by one.
First we check positivity. We need to show that

for all v, w ∈ V
dk·k (v, w) ≥ 0.

Since the statement we need to show starts with “for all v, w ∈ V”, we
start by writing:
Let v, w ∈ V. Then

dk·k (v, w) = kv − wk ≥ 0

by positivity of the norm.


We will now show non-degeneracy. We need to show that

for all v, w ∈ V
if dk·k (v, w) = 0 then v = w.

Let v, w ∈ V. Then

dk·k (v, w) = kv − wk = 0.
CHAPTER 2. SETS, SPACES AND FUNCTIONS 21

It follows by non-degeneracy of the norm that v − w = 0. We conclude


that v = w.
We will show symmetry. We need to show that

for all v, w ∈ V
dk·k (v, w) = dk·k (w, v).

Let v, w ∈ V. Then we use absolute homogeneity of the norm to find

dk·k (v, w) = kv − wk
= k(−1) · (w − v)k
= | − 1|kw − vk
= kw − vk = dk·k (w, v).

We now show the triangle inequality. We need to show that

for all u, v, w ∈ V
dk·k (u, w) ≤ dk·k (u, v) + dk·k (v, w).

Let u, v, w ∈ V. Then

dk·k (u, w) = ku − wk
= k(u − v) + (v − w)k
≤ ku − vk + kv − wk
= dk·k (u, v) + dk·k (v, w).

We finally show reflexivity. We need to show that

for all u ∈ V,
dk·k (u, u) = 0.
CHAPTER 2. SETS, SPACES AND FUNCTIONS 22

Let u ∈ V. Then by the absolute homogeneity of the norm,

dk·k (u, u) = ku − uk
= k0 · u k
= |0| · k u k
= 0 · kuk
= 0.

Although strictly speaking, the metric space (V, dk·k ) and (V, k · k) are
different objects (one is a metric space and the other is a normed vector
space), we will usually be a bit sloppy about this difference.

Remark 2.5.6 (Notation for Euclidean distance on Rd and R). We will


usually write distRd instead of distk·k2 for the standard (Euclidean) dis-
tance on Rd . In particular, if d ≥ 2, we have
v
u d
distRd (v, w) = kv − wk2 = t ∑ (vi − wi )2
u

i =1

and if d = 1 we just have

distR = |v − w|.

And if there is no room for confusion, we will just leave out the sub-
script altogether.

2.6 The reverse triangle inequality

Lemma 2.6.1 (Reverse triangle inequality). Let (V, k · k) be a normed


CHAPTER 2. SETS, SPACES AND FUNCTIONS 23

vector space. Then for all v, w ∈ V,

|kvk − kwk| ≤ kv − wk.

Proof. Let v, w ∈ V. Define a := v − w and b := w. It follows by the


triangle inequality that

k a + bk ≤ k ak + kbk

so that
kvk ≤ kv − wk + kwk
and
k v k − k w k ≤ k v − w k.
Similarly,
k w k − k v k ≤ k w − v k = k v − w k.
We conclude that
|kvk − kwk| ≤ kv − wk.

2.7 Exercises
Please read also the following chapter and the best practices in Appendix
A before attempting the next exercises. In particular, please follow the
best practices in writing down your solutions.

2.7.1 Blue exercises

The following exercises are also available in Waterproof.


Exercise 2.7.1. Let (Y, distY ) be a metric space. Let X be a set. Let f : X → Y
be injective. Define d : X × X → R by
d( x, z) := distY ( f ( x ), f (z)), for all x, z ∈ X.
Show that the function d is a distance on X.
CHAPTER 2. SETS, SPACES AND FUNCTIONS 24

Exercise 2.7.2. Let ( X, dist) be a metric space. Let A ⊂ X be a subset.


Define the function dist| A : A × A → R by

dist| A ( x, y) := dist( x, y), for all x, y ∈ A.

Then dist| A is called the restriction of dist to A. Show that dist| A is a dis-
tance on A.

Exercise 2.7.3. Consider the function d : Z × Z → R defined by


(
0, if a = b,
d( a, b) =
3, if a 6= b.

Show that d is a distance function on Z.

2.7.2 Orange exercises

Exercise 2.7.4. Let ( X, dist) be a metric space. Define d : X × X → R by


q
d( x, y) = dist( x, y).

Show that d is a distance on X.

Exercise 2.7.5. Let (V, k · k) be a normed vector space. We say a subset


U ⊂ V is convex if
for all x, y ∈ U,
for all λ ∈ (0, 1),
λx + (1 − λ)y ∈ U.

Let z ∈ V and r > 0. Recall Definition 2.4.4 of the open ball B(z, r ). Show
that B(z, r ) is convex.
Chapter 3

Proofs in analysis

3.1 What is a proof?


I would like to give two perspectives on what is a mathematical proof.
The first perspective is that a proof is a rigorous, precise argumentation
why a certain mathematical statement is true. The argumentation uses
statements that are just assumed to be true (they are called axioms), uses
statements that are already proven before (lemmas, propositions, theo-
rems), and then follows very specific rules to deduce new statements, rules
such as if A =⇒ B and B =⇒ C then A =⇒ C. These rules are the rules
of logic, and to a certain extent they agree with common sense. However,
some of the rules of logic actually take some getting used to, especially if
they involve many quantifiers.
My personal experience is that especially in the initial stages of learning
how to prove mathematical statements, instead of thinking in terms of
truth, i.e. instead of trying to use common sense to figure out why an ar-
gument shows that a statement is true, it is helpful to learn by heart the
expectations on what is a proof, and to try to satisfy those expectations
when you write proofs yourself.
This brings me to the second perspective on mathematical proofs. From
this perspective, a proof as written down in mathematical books can be
seen as pseudocode that explains how one could construct a formal proof.
A formal proof is in itself a mathematical object, in fact it can be stored as

25
CHAPTER 3. PROOFS IN ANALYSIS 26

(represented by) a computer program. For every mathematical statement,


there are extremely precise rules that a formal proof should satisfy. These
are again the rules of logic. As a consequence, the same rules dictate struc-
ture for the pseudocode. One of your main tasks in this course is therefore
to learn what are the expectations on the pseudocode, i.e. to given a cer-
tain mathematical statement, learn what kind of pseudocode passes as a
proof of the statement.
In the rest of this chapter, I will try to briefly explain the expectations on
proofs, the expectations on pseudocode. I will do so alongside an example.
A more structured guide for proving mathematical statements is given as
a list of “best practices” in Appendix A.

Best practices. Just like big software companies have best practices for
writing code, with the aim of having well-maintainable code with a min-
imal amount of bugs, in this course we have a set of best practices for
writing proofs. These best practices are recorded in Appendix A. If we all
try to adhere to the best practices, authors are helped in structuring their
proofs, and readers, reviewers and authors are all helped in understand-
ing the proofs.

3.2 Expectations on proofs


I believe that one of the main difficulties of Analysis 1 and 2 is learning
to deal with a large number of quantifiers (that is, larger than 1). Imagine
that we need to show the following statement:

∀e > 0, ∃n0 ∈ N, ∀n ≥ n0 , |1/n − 0| < e.

To clarify the nested structure of one mathematical statement inside the


other, I like to use indentation, and I prefer to use words over quantifier-
symbols
CHAPTER 3. PROOFS IN ANALYSIS 27

for all e > 0,


there exists n0 ∈ N,
for all n ≥ n0 ,
|1/n − 0| < e.

Such a statement looks complicated, and you may not know how to start
a proof or how to continue.
But one of the main messages of this chapter is that the statement itself
tells you how to start, and how to continue. In fact, the statement gives
you a template, where there are only a few things left for you to fill in.

3.3 Prove statements block by block


The key to proving a mathematical statement involving many quantifiers
is to prove it block by block. I will explain how to directly prove “for all”
statements and “there exists” statements (that is, without using a contra-
diction argument). As a running example, we will prove that

for all e > 0,


there exists n0 ∈ N,
for all n ≥ n0 ,
|1/n − 0| < e

by peeling layer after layer of the statement.

3.4 Directly proving “for all” statements


If you want to directly (i.e. without a contradiction argument) show a
statement such as
CHAPTER 3. PROOFS IN ANALYSIS 28

for all a ∈ A,
...

you need to do the following. You first need to introduce (i.e. define) the
variable a ∈ A by writing something like

Let a ∈ A.

or

Take a ∈ A arbitrary.

Next, you continue to prove the indented block on the next line.
In our example, we encounter this situation twice: in

for all e > 0,


...

and in
for all n ≥ n0 ,
...

For the statement


for all e > 0,
...

this comes down to writing

Let e > 0.
CHAPTER 3. PROOFS IN ANALYSIS 29

3.5 Directly proving “there exists” statements


If you want to directly (i.e. without using a contradiction argument) show
a statement of the form
there exists b ∈ B,
...

you need to do the following. You need to make a choice for b, e.g. by
writing

Choose b := . . .

after which you continue to show the indented statement (. . . ) with that
choice of b.
In our example we encounter this situation with

there exists n0 ∈ N such that {. . . }

We could for instance write:

Choose n0 := 10.

and then continue with the proof of the block {. . . }, with now n0 fixed as
10. Making choices is hard. With a bad choice, you won’t be able to prove
whatever is inside the block {. . . }. In many of the proofs that you will
write, this is probably the step that requires the most thinking, the most
creativity.

3.6 Trying to finish the proof


Where are we now? Our proof so far consists of
CHAPTER 3. PROOFS IN ANALYSIS 30

Let e > 0.
Choose n0 := 10.

and now we need to prove the statement

for all n ≥ n0 ,
|1/n − 0| < e.

with the only knowledge about e that it is a (real) number larger than 0,
and that n0 = 10.
Let us stick to the recipe. We need to show a statement of the form

for all n ≥ n0

so we define n by writing

Let n ≥ n0

and continue proving the block {|1/n − 0| < e}.


Of course, now we are in big trouble. Because we chose n0 = 10, we can
merely guarantee that

|1/n − 0| = 1/n ≤ 1/n0 = 1/10.

But we do not know whether e is larger than 1/10 or not! All we know is
that it is a positive real number.
We cannot prove that

|1/n − 0| < e

because we made a bad choice for n0 . This happens. It is almost impossible


to figure out the proof ‘linearly’, in a prearranged order of steps. To know
CHAPTER 3. PROOFS IN ANALYSIS 31

how to choose n0 , you need to know your endgame, you already need to
know how you will finish the proof. This means, you need to do a lot of
scratchwork.
Let us do some of that scratchwork. We can figure out what choice of n0
would lead to a proof. For instance if we choose instead

n0 := d1/ee + 1

then n0 is a natural number strictly larger than 1/e (here d1/ee is 1/e
rounded up to a natural number). In that case

|1/n − 0| = 1/n ≤ 1/n0 < 1/(1/e) = e.

It works!
But you need to present the proof following the above steps. You just make
better choices.

3.7 Let’s try again


We need to show
for all e > 0,
there exists n0 ∈ N,
for all n ≥ n0 ,
|1/n − 0| < e

so we write

Let e > 0.

and continue with the proof of the indented statement


CHAPTER 3. PROOFS IN ANALYSIS 32

there exists an n0 ∈ N such that


for all n ≥ n0
|1/n − 0| < e.

We are wiser now, and write

Choose n0 := d1/ee + 1

and continue with the proof of

for all n ≥ n0 ,
|1/n − 0| < e.

Next, we write

Let n ≥ n0 .

and we are ready to, once again, try to prove that

|1/n − 0| < e.

Indeed, now it is time to insert our calculation

|1/n − 0| = 1/n ≤ 1/n0 < 1/(1/e) = e.

This finishes the proof. If we write everything in one go it works as fol-


lows.
Let e > 0.
Choose n0 := d1/ee + 1.
Let n ≥ n0 .
Then |1/n − 0| = 1/n ≤ 1/n0 < 1/(1/e) = e.
CHAPTER 3. PROOFS IN ANALYSIS 33

The statement we proved already suggested to us the template

Let e > 0.
Choose n0 := . . .
Let n ≥ n0 .
Then show desired estimate.

We filled it in by choosing n0 appropriately, and making a correct estimate.

3.8 (Natural) induction


To show a statement
for all n ∈ N,
P(n)

you can use natural induction. Here, P(n) is a statement depending on n.


The template is as follows:

We use induction on n ∈ N.
We first show the base case, i.e. that P(0) holds.
... insert here a proof of P(0) ...
We now show the induction step.
Let k ∈ N and assume that P(k) holds.
We need to show that P(k + 1) holds.
... insert here a proof of P(k + 1) ...

3.9 Negations and quantifiers


We write the negation of a mathematical statement (. . . ) as ¬(. . . ). The so-
called De Morgan’s laws specify how quantifiers behave under negation.
The statement
CHAPTER 3. PROOFS IN ANALYSIS 34

¬ (for all a ∈ A, (. . . ))

is equivalent to

there exists a ∈ A, (¬(. . . )) .

Similarly, the statement

¬ (there exists a ∈ A, (. . . ))

is equivalent to

for all a ∈ A, (¬(. . . )).

3.10 Proofs by contradiction


When you are stuck proving something directly, it is a good idea to try
to give a proof by contradiction: You assume that whatever you want to
show is not true, and derive a contradiction from there. Some statements
in analysis are (almost?) impossible to show without using a contradiction
argument somewhere.
You can use the following template for a contradiction argument.

We argue by contradiction. Suppose ¬(. . . ).


. . . derivation that leads to a contradiction . . .
Contradiction. We conclude that (. . . ) holds.
CHAPTER 3. PROOFS IN ANALYSIS 35

3.11 Exercises

3.11.1 Blue exercises

Exercise 3.11.1. Show that


there exists M ∈ R,
for all x ∈ [0, 5],
x ≤ M.

Exercise 3.11.2. Show that


for all x ∈ R,
there exists y ∈ R,
for all u ∈ R,
if u > 0 then
there exists v ∈ R,
v > 0 and x + u < y + v.

3.11.2 Orange exercises

Exercise 3.11.3. Prove that


for all x ∈ R,
if (for all e > 0, x > 1 − e)
then x ≥ 1.
Chapter 4

Real numbers

The functions that we will study will most often have the real numbers as
a target space. And if the target space is not the space of real numbers,
then it will most often be a normed vector space or a metric space with
properties very similar to the real numbers.
In this chapter we will therefore take a careful look at the properties of the
real numbers. Although most of the properties may seem obvious, there
is one property that you usually don’t encounter so explicitly. The name
of this property is the completeness axiom.
The absolute main message of this chapter is the importance of the com-
pleteness axiom: at the innermost core of almost every proof in these notes
is the completeness axiom. It is the motor of analysis.

4.1 What are the real numbers?


The real numbers are a set, together with two operations called addition
(+) and multiplication (×) and an order (≤), that satisfy a whole list of
properties. These properties are called axioms.
The axioms can be summarized as follows: the real numbers are a com-
plete, totally ordered field. You may remember the concept of a field from
Set Theory and Algebra. A totally ordered field is a field together with a to-
tal order such that the order and the field operations are compatible. The

36
CHAPTER 4. REAL NUMBERS 37

rational numbers with the standard addition, multiplication and order are
an example of a totally ordered field. We will come to the exact meaning
of ‘complete’ in the next section, but let’s already mention that a totally
ordered field is called complete if every non-empty, bounded-from-above
subset of the totally ordered field has a least upper bound.
At this stage we assume that complete, totally ordered fields exist. We
choose one such complete ordered field and call it the real numbers and
denote it by R. Are there other possibilities then? Yes and not really. Yes,
there are different complete totally ordered fields satisfying all these ax-
ioms, but they are all essentially the same in the sense that they are isomor-
phic as totally ordered fields. We will not show these statements in these
notes. For the purpose of the notes it’s enough to just make a choice of a
complete ordered field and call it R.
Between various books there may be some slight difference in choice of
axioms. Although the lists are different, they do specify (essentially) the
same complete totally ordered field. In Abbott’s book [A+ 15], you can find
the axioms of the real numbers in Section 8.6.
In proof assistants such as Waterproof, the axioms for the real numbers are
also introduced and they are used as the fundamental building blocks on
which the rest of the theory is formally built up. You can find a full list
of possible axioms for the real numbers as they are used in Waterproof on
the website:

coq.inria.fr/library/Coq.Reals.Raxioms.html.

4.2 The completeness axiom


In this section, we will one by one introduce the concepts that occur in
the completeness axiom. All these concepts make sense for any totally
ordered field. Throughout this section, we will denote by R an arbitrary
totally ordered field. We use the slight typographic difference between ‘R’
and ‘R’ to allow you to think of R as the real line, but in principle it could
be another totally ordered field as well, such as the rational numbers Q.
However, as soon as you assume that a totally ordered field R satisfies the
completeness axiom, then it is just the same as the real numbers R.
CHAPTER 4. REAL NUMBERS 38

We first define what we mean by upper and lower bounds of subsets of a


totally ordered field.

Definition 4.2.1 (Upper bound and lower bound). We say a number


M ∈ R is an upper bound for a subset A ⊂ R if

for all a ∈ A,
a ≤ M.

We say a number m ∈ R is a lower bound for a subset A ⊂ R if

for all a ∈ A
m ≤ a.

Example 4.2.2. Let A := (0, 2) ∪ (3, 6) be a subset of R. Then 8 is an


upper bound for A. Let us prove this according to best practices.
We need to show that for all a ∈ A, a ≤ M.
Let a ∈ A. We need to show that a ≤ 8. Since a ∈ (0, 2) ∪ (3, 6), it
indeed holds that a ≤ 8.

Given the definition of upper and lower bounds, we define what it means
for a set to be bounded from above, bounded from below and just bounded.

Definition 4.2.3 (bounded from above, bounded from below, bounded).


We say that a subset A ⊂ R is bounded from above if there exists an
M ∈ R such that M is an upper bound for A.
We say that a subset A ⊂ R is bounded from below if there exists an
m ∈ R such that m is a lower bound for A.
We say that a subset A ⊂ R is bounded if A is both bounded from above
and bounded from below.

A least upper bound is an upper bound that is smaller than or equal to any
other upper bound.
CHAPTER 4. REAL NUMBERS 39

Definition 4.2.4 (least upper bound a.k.a. supremum). Precisely, M is


a least upper bound of a subset A if both

i. M is an upper bound

ii. For every upper bound L ∈ R of A, it holds that M ≤ L.

Proposition 4.2.5. Suppose both M and W are a least upper bound of


a subset A ⊂ R. Then M = W. In other words, the least upper bound
is unique.

Proof. Since M is a least upper bound, and W is an upper bound, we


know by item (ii) in the characterization of the least upper bound that
M ≤ W. Similarly, W is a least upper bound, and M is an upper bound,
so W ≤ M. It follows that M = W.

Definition 4.2.6 (The supremum). A different name for the least upper
bound of a set A ⊂ R is the supremum of A, which we also write as
sup A.

Given all the terminology defined above, we can now state the complete-
ness axiom.

Axiom 4.2.7 (Completeness axiom). We say that a totally ordered field


R satisfies the completeness axiom if every non-empty subset of R that
is bounded from above has a least upper bound.

As soon as the completeness axiom is satisfied, we can also derive that


every nonempty subset that is bounded from below has a largest lower
bound.
CHAPTER 4. REAL NUMBERS 40

Lemma 4.2.8. Every non-empty subset of the real line that is bounded
from below has a largest lower bound.

Proof. Let A be a non-empty subset of the real line that is bounded


from below. We need to show that there exists an m ∈ R such that

i. m is a lower bound of A.

ii. for every lower bound l of A, it holds that l ≤ m.

We define B := − A, where by − A we mean

− A := {− a | a ∈ A}.

Then B is nonempty, and B is bounded from above. Therefore, B has


a smallest upper bound M = sup B. We choose m := − M. We will
show (i) and (ii).
We first show (i), in other words we need to show that

for all a ∈ A
m ≤ a.

Let a ∈ A. Because M is the supremum of B, we know that for all


b ∈ B, it holds that b ≤ M. In particular, since − a ∈ − A = B, we
know that − a ≤ M = −m. Therefore indeed m ≤ a.
We now show (ii). Let l be a lower bound for A. Then −l is an upper
bound for − A = B. Because M is the supremum of B, we know that
M ≤ −l. Recall that M = −m. Therefore −m ≤ −l and l ≤ m.

Definition 4.2.9. We usually call the largest lower bound of a non-


empty set A ⊂ R that is bounded from below the infimum of A, and
we denote it by inf A.
CHAPTER 4. REAL NUMBERS 41

4.3 Alternative characterizations of suprema and


infima
In this section we provide alternative characterizations of suprema and in-
fima. These alternative characterizations are usually easier to use in proofs
than the definitions. We will give an example at the end of this section.

Proposition 4.3.1 (alternative characterization of supremum). Let A ⊂


R be non-empty and bounded from above. Let M ∈ R. Then M is the
supremum of A if and only if

i. M is an upper bound for A, and

ii.
for all e > 0,
there exists a ∈ A,
a > M − e.

The proof of Proposition 4.3.1 is the content of Exercise 4.9.4.

Proposition 4.3.2 (alternative characterization of infimum). Let A ⊂ R


be non-empty and bounded from below. Let m ∈ R. Then m is the
infimum of A if and only if

i. m is a lower bound for A, and

ii.
for all e > 0,
there exists a ∈ A,
a < m + e.

These alternative characterizations of the supremum and infimum really


provide a standard way to determining the supremum and infimum of
subsets of the real line. Sometimes there may be a more creative argument
CHAPTER 4. REAL NUMBERS 42

for determining the supremum or infimum, but the alternative characteri-


zation usually provides a decent and solid route to the argument.

Example 4.3.3. Consider the following set

A := (1, 4) ∪ (5, 7) ∪ (8, 9)

We claim that inf A = 1.


We will use the alternative characterization of the infimum. Hence we
need to show that

i. 1 is a lower bound for A

ii.
for all e > 0
there exists a ∈ A
a < 1 + e.

We first show (i). We need to show that for all a ∈ A, 1 ≤ a. Because


we need to show a for-all statement, we start with:
Let a ∈ A. Then a ∈ (1, 4) or a ∈ (5, 7) or a ∈ (8, 9), and in all cases,
1 ≤ a.
We now show (ii). Because we need to show a for-all statement, we
start with: Let e > 0. Now we need to show that there exists a ∈ A
such that
a < 1+e
Therefore the next step is to choose an a ∈ A, in the hope that we can
then show that a < 1 + e. This is often the trickiest part in the proof.
It is very important that the choice for a that we make, is actually an
element of A.
Here’s the danger: We often think of e as small, and if e is strictly less
than 6, the number
1 + e/2
CHAPTER 4. REAL NUMBERS 43

is indeed in the interval (1, 4), and therefore it is an element of A. How-


ever, e could also be very large, for instance e = 1000, and then

1 + e/2 = 501 ∈
/ A.

Therefore in general
1 + e/2 ∈
/ A.
This happens a lot, that there is an initial idea which is almost ok, but
it doesn’t quite work. In that case we can adapt. One way is as follows:
Choose
a := min (1 + e/2, 2)
In that case, we know that a is always between 1 and 2, and therefore
a ∈ A. What we now need to do is show that a < 1 + e.
For this we write a small chain of inequalities. Indeed, it holds that

a = min (1 + e/2, 2) ≤ 1 + e/2 < 1 + e.

4.4 Maxima and minima


In this section we say something about the relationship between the supre-
mum and the maximum, and between the infimum and the minimum.

Definition 4.4.1 (maximum and minimum). Let A ⊂ R be a subset of


the real numbers. We say that y ∈ A is the maximum of A, and write
y = max A, if

for all a ∈ A,
a ≤ y.

We say that x ∈ A is the minimum of A, and write x = min A, if

for all a ∈ A,
x ≤ a.

One of the very important aspects of the above definition is that the min-
CHAPTER 4. REAL NUMBERS 44

ima and maxima are always elements of the set itself.


Warning: even if a set A ⊂ R is non-empty and bounded from above,
a maximum may not always exist. Similarly, even if a set A ⊂ R
is non-empty and bounded from below, a minimum may not always
exist.

Proposition 4.4.2. Let A be a subset of R. If A has a maximum, then


A is non-empty and bounded from above, and sup A = max A. If A
has a minimum, then A is non-empty and bounded from below, and
inf A = min A.

Proposition 4.4.3. Let A be a subset of R. Assume that A is non-empty


and bounded from above. If sup A ∈ A then A has a maximum and
max A = sup A.

Proposition 4.4.4. Let A be a subset of R. Assume that A is non-empty


and bounded from below. If inf A ∈ A then A has a minimum and
min A = inf A.

4.5 The Archimedean property


We also state another property of the real numbers. This one sounds very
logical, but is sometimes needed in mathematical proofs.

Proposition 4.5.1 (Archimedean property). For every real number x ∈


R there exists a natural number m ∈ N such that x < m.

Proof. We argue by contradiction. Suppose therefore that there exists


an x ∈ R such that for all m ∈ N, it holds that m ≤ x. That means that
N is bounded from above. Since N is also nonempty, we know that
the supremum sup N would exist. Then there exists a natural number
m ∈ N such that
m > sup N − 1/2.
CHAPTER 4. REAL NUMBERS 45

Now m + 1 is a natural number as well, and

m + 1 > sup N − 1/2 + 1 > sup N

which is a contradiction.

Given the previous proposition, we can define the ceiling function (that
we actually have already used in the running example in the previous
chapter).

Definition 4.5.2. (ceiling function) The ceiling function d·e : R → Z is


defined as follows. For x ∈ R, d x e denotes the smallest integer z ∈ Z
such that x ≤ z.

The Archimedean property implies that between every two real numbers
you can find a rational number. That is the content of the following propo-
sition.

Proposition 4.5.3. For every two real numbers a, b ∈ R with a < b,


there exists a q ∈ Q with a < q < b.

Proof. Let a, b ∈ R such that a < b. Define


 
3
M := ∈ N.
b−a

Then
 
3 3
Mb − Ma = M(b − a) = (b − a) ≥ (b − a) = 3.
b−a b−a

Therefore
Ma < d Mae + 1 < Ma + 2 < Mb. (4.5.1)
d Mae+1
Choose q := M , which is indeed an element of Q. After dividing
both sides of (4.5.1) by M, we conclude that

d Mae + 1
a< < b.
M
CHAPTER 4. REAL NUMBERS 46


The next proposition loosely speaking says that the 2 is irrational. If we
would be √very precise, though, at this stage we wouldn’t even know how
to define 2. Sure, we could define it as a real number x ∈ R such that
x2 = 2, but who says such a real number exists?

Proposition 4.5.4. There does not exist a rational number r ∈ Q such


that r2 = 2.

Proof. We argue by contradiction. Suppose there exists a rational num-


ber r ∈ Q such that r2 = 2. We can therefore choose such an r, and we
can even choose it such that r > 0. Then r = p/q with p and q nonzero
natural numbers such that their greatest common divisor is one. We
then have
p2
= 2,
q2
so that
p2 = 2q2
Since the right-hand side is divisible by 2 (i.e. it is even), the left-hand
side is divisible by 2 as well. Recall that if a product of positive natural
numbers ab is divisible by 2, then at least one of a or b is divisible
by 2. Therefore p is divisible by 2 and p2 is divisible by 4. However,
because the greatest common divisor of p and q is 1, we have that q
is not divisible by 2. Therefore 2q2 is not divisible by 4, which gives a
contradiction.


The next proposition really defines 2.

Proposition 4.5.5. Consider the set

A := { a ∈ Q | a2 ≤ 2 and a > 0}.

Then A is non-empty, bounded above and (sup A)2 = 2. In other


CHAPTER 4. REAL NUMBERS 47

words, 2 exists as a real number and equals sup A.

Proof. We first show that A is non-empty. This holds because 1 ∈ A.


We will now show that A is bounded above. We need to show that
there exists an M ∈ R such that M is an upper bound for A.
We choose M := 2.
We now have to show that 2 is an upper bound for A. We need to show
that for all a ∈ A, a ≤ 2. We argue by contradiction. Suppose ¬( a ≤ 2).
Then a > 2 and therefore a2 > 4 > 2, which is a contradiction.
Therefore we can conclude that sup A exists and that 1 ≤ sup A ≤ 2.
We will now show that (sup A)2 = 2. We again argue by contradiction.
Suppose (sup A)2 6= 2. Then either (sup A)2 < 2 or (sup A)2 > 2.
We first consider the case (sup A)2 < 2. We also know that sup A ≥ 1.
By Proposition 4.5.3 we may choose a q ∈ Q such that

2 − (sup A)2
sup A < q < sup A + .
4 sup A

We define e := q − sup A, and note that

2 − (sup A)2
0<e< < 1.
4 sup A

Therefore

q2 = (sup A + e)2
= (sup A)2 + 2e sup A + e2
< (sup A)2 + 2e sup A + e
< (sup A)2 + 2e sup A + 2e sup A
= (sup A)2 + 4e sup A < 2,

where in the first inequality we used that e < 1. In other words, we


have found a q ∈ A such that q > sup A. This is a contradiction since
sup A is an upper bound for A.
CHAPTER 4. REAL NUMBERS 48

We now consider the case that (sup A)2 > 2. Define

(sup A)2 − 2
δ := .
2 sup A

By the alternative characterization of the supremum, there exists an


r ∈ A such that
(sup A)2 − 2
r > sup A − .
2 sup A
Choose such an r. Then

r2 > (sup A − δ)2


= (sup A)2 − 2δ sup A + δ2
> (sup A)2 − 2δ sup A
(sup A)2 − 2
= (sup A)2 − 2 sup A = 2,
2 sup A

which is also a contradiction.


We conclude that (sup A)2 = 2.

Corollary 4.5.6. For every two real numbers a, b ∈ R with a < b, there
exists an irrational number r ∈ R \ Q such that a < r < b.

Proof. Let a, b ∈ R. By the Proposition 4.5.3 there exists a q ∈ Q such


that a < q < b. Choose such a q. Set

N := d1/(b − q)e + 1.

Choose √
2
r := q + .
2N
Then r is irrational and

2 1
a < r = q+ ≤ q+ < q + (b − q) = b.
2N N
CHAPTER 4. REAL NUMBERS 49

4.6 Sets can be complicated


Subsets of the real line can be incredibly complicated monstrous objects.
What do I mean by this and why is it relevant?
If you think of examples of sets, you might think of an interval such as
(2, 4], or a subset that consists of a single point {5}, or if you go crazy an
example may be (2, 4] ∪ {37} ∪ [40, ∞). But none of these examples are
very representative: they are much simpler than an ’arbitrary’ set.
Why is this relevant? In this course, the aim is to prove statements such
as for all nonempty, bounded subsets A, B ⊂ R, it holds that sup( A + B) =
sup A + sup B. If you need to prove such a statement, you really need
to show that for every possible subsets A and B, and you can easily fool
yourself by considering examples that are too simple.
So, I encourage you, to every once in a while think about whether the
examples you think of are representative. What is the most complicated
subset of the real line you can think of?

4.7 Computation rules for suprema


In the proposition below, we use the definitions

A + B = { a + b | a ∈ A, b ∈ B}

and
λA = {λa | a ∈ A}
for subsets A, B ⊂ R and a scalar λ ∈ R.

Proposition 4.7.1. Let A, B, C, D be nonempty subsets of R. Assume


that A and B are bounded from above and C and D are bounded from
below. Then

i. sup( A + B) = sup A + sup B

ii. inf(C + D ) = inf C + inf D


CHAPTER 4. REAL NUMBERS 50

iii. for all λ ≥ 0, sup(λA) = λ sup A

iv. for all λ ≥ 0, inf(λC ) = λ inf C

v. sup(−C ) = − inf C

vi. inf(− A) = − sup A

Proof. We first show (i) in the list above. We set M := sup A + sup B
and will show that M is indeed the supremum of the set A + B by
showing items (i) and (ii) of the alternative characterization of the
supremum in Proposition 4.3.1.
We first show item (i) of Proposition 4.3.1, namely that M is an upper
bound. We need to show that
for all c ∈ A + B,
c ≤ M.

Let c ∈ A + B. Then there exists an a ∈ A and a b ∈ B such that


c = a + b. We also know that a ≤ sup A and b ≤ sup B. Therefore

c = a + b ≤ sup A + sup B = M

which was what we wanted to show.


We will now show item (ii) of Proposition 4.3.1 for M, namely that

for all e > 0,


there exists c ∈ A + B,
c > M − e.

Let e > 0. By item (ii) of the characterization of sup A in Proposition


4.3.1,
CHAPTER 4. REAL NUMBERS 51

we know that
for all e1 > 0,
there exists a ∈ A, (4.7.1)
a > sup A − e1 .

Choose e1 := e/2 in (4.7.1). Then

we find that there exists an a ∈ A such that a > sup A − e/2. Similarly,
there exists a b ∈ B such that b > sup B − e/2.
Choose c := a + b. Then

c = a + b > sup A − e/2 + sup B − e/2 = sup A + sup B − e = M − e.

Item (v) was shown in the proof of Lemma 4.2.8.


The other items are left as exercises.

A remark about the presentation of the proof above: The text in the lighter
gray inset around statement (4.7.1) is optional. Once we get more skilled
in proving, we will usually omit it. The omission makes the proof a bit
shorter and perhaps easier to read, while if you have seen this type of
argument a few times, you will know what lines to insert to make the
proof more detailed.

4.8 Bernoulli’s inequality


You may recall that for all a, b ∈ R, the power ( a + b)n can be expanded
using Newton’s binomial coefficients
n  
n k n−k
( a + b) = ∑
n
a b
k =0
k

If a and b are positive, we can get some useful inequalities by just leaving
out some terms on the right hand side. We will use this technique repeat-
edly in the lecture notes.
CHAPTER 4. REAL NUMBERS 52

We can even get some inequalities if b = 1 and a ≥ −1. One such inequal-
ity is Bernoulli’s inequality.

Proposition 4.8.1 (Bernoulli’s inequality). For all a ≥ −1, and all n ∈


N,
(1 + a)n ≥ 1 + na.

Proof. Let a ≥ −1. We prove Bernoulli’s inequality by induction on n.


For n = 0, we have

(1 + a )0 = 1 ≥ 1 = 1 + 0 · a

so the inequality holds.


Suppose the inequality holds for n = k for some k ∈ N. Then we
would like to show the inequality for n = k + 1. We find

(1 + a ) k +1 = (1 + a ) k (1 + a )
≥ (1 + ka)(1 + a)
= 1 + (k + 1) a + ka2
≥ 1 + (k + 1) a.

4.9 Exercises

4.9.1 Blue exercises

Exercise 4.9.1. Show that for all a, b ∈ R, if a < b then

inf[ a, b) = a.

Exercise 4.9.2. Prove Proposition 4.4.3.


Exercise 4.9.3. Show that
sup[0, 4) = 4
CHAPTER 4. REAL NUMBERS 53

4.9.2 Orange exercises

Exercise 4.9.4. Prove Proposition 4.3.1.

Exercise 4.9.5. Prove item (iii) of Proposition 4.7.1.

Exercise 4.9.6. Prove item (vi) of Proposition 4.7.1.


Chapter 5

Sequences

This chapter will introduce sequences. Sequences are extra important since
they can be used to determine whether metric spaces or functions between
metric spaces satisfy certain properties.

5.1 A sequence is a function from the natural num-


bers
Let X be a set, for instance X = R. A sequence in X is just a function from
the natural numbers N to X. We will use the convention that the natural
numbers N include 0.

Definition 5.1.1. Let X be a set. A sequence a : N → X in X is a function


from the natural numbers N to X.

Example 5.1.2. Consider the set X := {red, yellow, blue} of primary col-
ors. The function a : N → X defined by
(
blue, if n is odd,
a(n) :=
red, if n is even,

54
CHAPTER 5. SEQUENCES 55

is an example of a sequence in X.

We really want to stress the point of view here that a sequence, for instance
a sequence a : N → R of real numbers, is really a function. In practice, we
often write ( an )n∈N , ( an ), ( a(n) ), or
a0 , a1 , a2 , a3 , . . .

For k ∈ N, the term ak is called an element of the sequence. It is also


referred to as the kth element. Moreover, k is called the index of the element
ak .

5.2 Terminology around sequences

Definition 5.2.1 (bounded sequences). Let ( X, dist) be a metric space.


We say a sequence a : N → X is bounded if

there exists q ∈ X,
there exists M > 0,
for all n ∈ N,
dist( an , q) ≤ M.

In normed linear spaces, we can use a simpler criterion to check whether


a sequence is bounded. That is the content of the following proposition.

Proposition 5.2.2. Let (V, k · k) be a normed vector space. Let a : N →


V be a sequence. The sequence a is bounded if and only if

there exists M > 0,


for all n ∈ N,
k an k ≤ M.

The proof is not difficult, but it is an excellent opportunity to go through it


slowly, carefully following the proof expectations formulated in Chapter
3.
CHAPTER 5. SEQUENCES 56

Proof. We first show the “if” part of the statement. So we assume that

there exists M1 > 0,


for all n ∈ N,
k an k ≤ M1

and we need to show that


there exists q ∈ V,
there exists M > 0,
for all n ∈ N,
k an − qk ≤ M.

Because the statement we need to show is of the form “there exists a


q ∈ V such that . . . ”, we need to choose an appropriate q ∈ V.
We choose q := 0.
Next, we need to show “there exists an M > 0 such that . . . ”, so we
need to identify an appropriate M.
For this, we first conclude by our assumption that there exists an M1 >
0 such that for all n ∈ N, it holds that k an k ≤ M1 .
We choose M := M1 .
We need to show “for all n ∈ N, . . . ”. Therefore the next line of the
proof is:
Let n ∈ N.
Finally we need to show that k an − qk ≤ M.
But k an − qk = k an − 0k = k an k and we already know that k an k ≤ M1
and we chose M := M1 so that indeed k an k ≤ M.
CHAPTER 5. SEQUENCES 57

Now we will show the “only if” part of the statement. We assume that

there exists q ∈ V,
there exists M > 0,
for all n ∈ N,
k an − qk ≤ M

and we need to show that


there exists M1 > 0,
for all n ∈ N,
k an k ≤ M1 .

Can you see the template we should be using?


Choose M1 := M + kqk.
Let n ∈ N.
Then by the triangle inequality

k an k = k an − q + qk
≤ k an − qk + kqk
≤ M + kqk = M1 .

5.3 Convergence of sequences


Remember that analysis is for a large part about making rigorous state-
ments about the approximate behavior of functions? The following defini-
tion is a rigorous statement about the behavior of a sequence a : N → X,
where ( X, dist) a metric space. The definition is a precise version of the
following approximate statement: for large n, the distance between an and
p is small.

Definition 5.3.1. Let ( X, dist) be a metric space. We say that a sequence


CHAPTER 5. SEQUENCES 58

a : N → X converges to a point p ∈ X if

for all e > 0,


there exists N ∈ N,
for all n ≥ N,
dist( an , p) < e.

We sometimes write
lim an = p
n→∞

to express that the sequence ( an ) converges to p.

Example 5.3.2. Let’s see what this definition looks like when the metric
space ( X, dist) is (R, distR ), where by distR : R × R → R we always
mean the standard distance on R given by

distR ( x, y) = | x − y|.

A sequence a : N → R then converges to L ∈ R if and only if

for all e > 0,


there exists N ∈ N,
for all n ≥ N,
| an − L| < e.

Definition 5.3.3. Let ( X, dist) be a metric space. A sequence a : N → X


is called divergent if it is not convergent.

5.4 Examples and limits of simple sequences

Proposition 5.4.1 (The constant sequence). Let ( X, dist) be a metric


space. Let p ∈ X and assume that the sequence ( an ) is given by an = p
CHAPTER 5. SEQUENCES 59

for every n ∈ N. We also say that ( an ) is a constant sequence. Then


limn→∞ an = p.

The proof of Proposition 5.4.1 is the content of Blue Exercise 5.9.1.

Example 5.4.2 (a standard limit). Let a : N → R be a real-valued


sequence such that an = 1/n for n ≥ 1. Then a : N → R converges to
0.

Proof. Let e > 0. Choose N := d1/ee + 1. Take n ≥ N. Then

distR ( an , 0) = | an − 0| = |1/n| = 1/n ≤ 1/N < e.

5.5 Uniqueness of limits

Proposition 5.5.1 (uniqueness of limits). Let ( X, dist) be a metric space


and let a : N → X be a sequence in X. Assume that p, q ∈ X and
assume that
lim an = p and lim an = q.
n→∞ n→∞
Then p = q.

Proof. We argue by contradiction. Suppose p 6= q.


Set e := dist( p, q)/2 > 0. Since ( an ) converges to p,
CHAPTER 5. SEQUENCES 60

we know by the definition of convergence that

for all e1 > 0,


there exists N1 ∈ N,
(5.5.1)
for all n ≥ N1 ,
dist( an , p) < e1 .

Choose e1 := e. Then we know that

there exists an N1 ∈ N such that for every n ≥ N1 ,

dist( an , p) < e.

Since ( an ) converges to q, there exists an N2 ∈ N such that for every


n ≥ N2 ,
dist( an , q) < e.
Choose N := max( N1 , N2 ). Then

dist( p, q) ≤ dist( p, a N ) + dist( a N , q) < 2e = dist( p, q)

which is a contradiction.

Again, the lighter inset around statement (5.5.1) above denotes optional
text. As we progress in these notes, we will more and more often omit it,
but in the early stages it shows the argumentation a bit more clearly.

5.6 More properties of convergent sequences

Proposition 5.6.1. Let ( X, dist) be a metric space and suppose that a :


N → X is a sequence. Let p ∈ X. Then the sequence a : N → X
converges to p if and only if the real-valued sequence

n 7→ dist( an , p)

converges to 0 in R.
CHAPTER 5. SEQUENCES 61

Proof. We define the real-valued sequence (bn ) by

bn := dist( an , p).

We need to show that ( an ) converges to p if and only if (bn ) converges


to 0.
We first show “only if”. So assume that the sequence a : N → X
converges to p.
Let e > 0. Since ( an ) converges to p, there exists an N0 ∈ N such that
for all n ≥ N0 ,
dist( an , p) < e.
Choose N := N0 . Let n ≥ N. Then,

0 ≤ dist( an , p) < e

so that indeed

distR (dist( an , p), 0) = |dist( an , p)| < e.

We now show the “if” part of the statement. We assume that (bn ) con-
verges to 0 and we need to show that ( an ) converges to p.
Let e > 0.
Since (bn ) converges to zero, there exists an N1 such that for all n ≥ N1 ,

bn = distR (bn , 0) < e

where the first equality holds because bn ≥ 0 for all n ∈ N.


Choose N0 := N1 .
Let n ≥ N0 . Then
dist( an , p) = bn < e.

Proposition 5.6.2 (Convergent sequences are bounded). Let ( X, dist)


be a metric space. Let a : N → X be a sequence in X converging to
CHAPTER 5. SEQUENCES 62

p ∈ X. Then the sequence a : N → X is bounded.

Proof. We need to show that

there exists q ∈ X,
there exists M > 0,
for all n ∈ N,
dist( an , q) ≤ M.

Choose q := p. Because the sequence ( an ) converges to p,

we know that
for all e1 > 0,
there exists N ∈ N,
(5.6.1)
for all n ≥ N,
dist( an , p) < e1 .

Choose e1 := 1 in (5.6.1). Then

there exists an N ∈ N such that for every n ≥ N,

dist( an , p) < 1.

Choose
M := max(dist( a1 , p), . . . , dist( a N −1 , p), 1).
Let n ∈ N. We need to show that dist( an , p) ≤ M. We make a case
distinction.
In the case n ≤ N − 1, then

dist( an , p) ≤ max(dist( a1 , p), . . . , dist( a N −1 , p), 1) = M.


CHAPTER 5. SEQUENCES 63

In the case n ≥ N, then

dist( an , p) < 1 ≤ M.

The following proposition is one of the strongest statements in this chap-


ter, and if you look carefully you can show a few other propositions in this
chapter by just appealing to the next proposition.

Proposition 5.6.3. Let ( X, dist) be a metric space and let a : N → X and


b : N → X be two sequences. Let p ∈ X and suppose that limn→∞ an =
p. Then limn→∞ bn = p if and only if

lim dist( an , bn ) = 0.
n→∞

Proof. We first show the “only if” direction. Assume limn→∞ bn = p.


We need to show that

lim dist( an , bn ) = 0.
n→∞

Let e > 0.
Because limn→∞ an = p, there exists an N0 ∈ N such that for all n ≥
N0 ,
e
dist( an , p) < .
2
Because limn→∞ bn = p, there exists an N1 ∈ N such that for all n ≥
N1 ,
e
dist(bn , p) < .
2
Choose N := max( N0 , N1 ).
Let n ≥ N. Because then n ≥ N0 and n ≥ N1 , we know
e e
dist( an , bn ) ≤ dist( an , p) + dist( p, bn ) < + = e.
2 2
CHAPTER 5. SEQUENCES 64

We now show the “if” direction. Assume limn→∞ dist( an , bn ) = 0. We


need to show that limn→∞ dist(bn , p) = 0.
Let e > 0.
Because limn→∞ an = p, there exists an N0 ∈ N such that for all n ≥
N0 ,
e
dist( an , p) < .
2
Because limn→∞ dist( an , bn ) = 0, there exists an N2 ∈ N such that for
all n ≥ N2 ,
e
dist( an , bn ) = distR (dist( an , bn ), 0) <
2
Choose N := max( N0 , N2 ).
Let n ≥ N.
Because n ≥ N0 and n ≥ N2 , we find

dist(bn , p) ≤ dist(bn , an ) + dist( an , p)


= dist( an , bn ) + dist( an , p)
e e
< + = e.
2 2

I’ve added the next corollary later in the year 2021-2022 just for your
convenience, just to highlight a consequence of the previous proposi-
tion.

Proposition 5.6.4 (Eventually equal sequences have the same limit).


Let ( X, dist) be a metric space and let a : N → X and b : N → X be
two sequences such that there exists an N ∈ N such that for all n ≥ N,

a n = bn .

Then the sequence a : N → X converges if and only if the sequence


CHAPTER 5. SEQUENCES 65

b : N → X converges. If the sequences converge, they have the same


limit.

5.7 Limit theorems for sequences taking values


in a normed vector space
If we want to show that a sequence converges, or if we want to compute
its limit, we don’t always want to go back to the formal definition of a
limit. Instead, we can use a whole collection of theorems such as the next.
The first part of the next theorem says that the sum of two convergent
sequences (taking values in normed vector spaces, after all we need to
have a possibility to add elements) is itself convergent. Theorems like
these are called limit theorems or limit laws.

Theorem 5.7.1. Let (V, k · k) be a normed vector space. Let a : N → V


and b : N → V be two sequences. Assume that the limit limn→∞ an
exists and is equal to p ∈ V and that the limit limn→∞ bn exists and is
equal to q ∈ V. Let λ : N → R be a real-valued sequence. Let µ ∈ R.
Assume that limn→∞ λn = µ. Then

i. The limit limn→∞ ( an + bn ) exists and is equal to p + q.

ii. The limit limn→∞ (λn an ) exists and is equal to µp.

Proof. We leave the proof of (i) as an exercise and prove (ii), which is a
bit more difficult.
We need to show that limn→∞ (λn an ) = µp, i.e. we need to show that

for all e > 0,


there exists N ∈ N,
for all n ≥ N,
distk·k (λn an , µp) < e.

Let e > 0.
CHAPTER 5. SEQUENCES 66

Since the sequence λ : N → R is convergent, it is bounded. Therefore


there exists an M > 0 such that for all n ∈ N,

|λn | ≤ M.

Since limn→∞ λn = µ, there exists an N0 ∈ N such that for all n ≥ N0 ,


e
distR (λn , µ) = |λn − µ| < .
2(k pk + 1)

We have divided here by (k pk + 1) rather than k pk to not run into


trouble (i.e. to not divide by zero) when k pk = 0.
Since limn→∞ an = p, there exists an N1 ∈ N such that for all n ≥ N1 ,
e
distk·k ( an , p) = k an − pk < .
2M

Choose N := max( N0 , N1 ).
Let n ≥ N. Then

kλn an − µpk = kλn an − λn p + λn p − µpk


= kλn ( an − p) + (λn − µ) pk
≤ kλn ( an − p)k + k(λn − µ) pk
= |λn |k an − pk + |λn − µ|k pk
≤ Mk an − pk + |λn − µ|k pk
e e
< M· + · k pk
2M 2(k pk + 1)
< e.

5.8 Index shift


The next proposition is another example of a theorem that allows you to
conclude the existence of a certain limit without going back to the formal
definition. You can use it when you know that a sequence converges, to
CHAPTER 5. SEQUENCES 67

conclude that the same sequence but with index shifted is also convergent.

Proposition 5.8.1 (Index shift). Let ( X, dist) be a metric space and let
a : N → X be a sequence in X. Let k ∈ N and p ∈ X. Then the
sequence ( an ) converges to p if and only if the sequence ( an+k )n (i.e.
the sequence n 7→ an+k ) converges to p.

The proof of Proposition 5.8.1 is the topic of Blue Exercise 5.9.2.

5.9 Exercises

5.9.1 Blue exercises

Exercise 5.9.1. Prove Proposition 5.4.1.


Exercise 5.9.2. Prove Proposition 5.8.1.

5.9.2 Orange exercises

Exercise 5.9.3. Prove item (i) of Theorem 5.7.1.

Exercise 5.9.4. Let ( X, dist) be a metric space and let a : N → X be a


bounded sequence in X. Let p ∈ X. Define also the sequence s : N → R
by
sk := sup{dist( al , p) | l ∈ N, l ≥ k }.
Show that limn→∞ an = p if and only if
inf sk = 0.
k ∈N
Here, infk∈N sk is shorthand for
inf sk := inf {sk | k ∈ N} .
k ∈N
Hint: Due to notation, this exercise may look intimidating. However, if
you let yourself be guided by the best practices and if you use the alterna-
tive characterization of the infimum, it becomes quite a bit easier than it
looks.
Chapter 6

Real-valued sequences

In this chapter, we specify to sequences a : N → R that take values in R.


The main additional aspect with respect to the previous chapter is the fact
that R has an order (≤). The most important result of the chapter is that
monotone bounded sequences are always convergent.

6.1 Terminology
Because the real numbers come with an order (≤), we can define increas-
ing, decreasing and monotone sequences.

Definition 6.1.1 (increasing, decreasing and monotone sequences). We


say a sequence ( an ) is increasing if for every n ∈ N, an+1 ≥ an . We say
it is strictly increasing if for every n ∈ N, an+1 > an . Similarly, we
say a sequence an is decreasing if for every n ∈ N, an+1 ≤ an and we
say it is strictly decreasing if for every n ∈ N, an+1 < an . We finally
say a sequence is (strictly) monotone if it is either (strictly) increasing or
(strictly) decreasing.

The main result of this chapter is that monotone, bounded sequences are
convergent. In order to introduce what it means for a sequence to be
bounded, we first introduce upper and lower bounds.

68
CHAPTER 6. REAL-VALUED SEQUENCES 69

Definition 6.1.2 (upper bound and lower bound for a sequence). We


say that a number M ∈ R is an upper bound for a sequence a : N → R
if
for all n ∈ N,
an ≤ M.

We say a number m ∈ R is a lower bound for a sequence a : N → R if

for all n ∈ N,
m ≤ an .

Definition 6.1.3. We say a sequence a : N → R is bounded above if there


exists an M ∈ R such that M is an upper bound for a.
We say a sequence a : N → R is bounded below if there exists an m ∈ R
such that m is a lower bound for a.

In the previous chapter, we have already defined what it means for a se-
quence to be bounded. The next proposition relates the two definitions to
each other.

Proposition 6.1.4. Let a : N → R be a sequence. Then a : N → R


is bounded (in the sense of Definition 5.2.1) if and only if it is both
bounded above and bounded below (according to Definition 6.1.3).

6.2 Monotone, bounded sequences are convergent

Theorem 6.2.1. Let ( an ) be an increasing sequence that is bounded


from above. Then ( an ) is convergent and
 
lim an = sup an = sup { an | n ∈ N}
n→∞ n ∈N
CHAPTER 6. REAL-VALUED SEQUENCES 70

Proof. Because the sequence ( an ) is bounded from above, we know


that the supremum
sup an
n ∈N
exists. To not get too lengthy expressions, we write

L := sup an .
n ∈N

We need to show that for all e > 0, there exists a N ∈ N such that for
all n ≥ N,
| an − L| < e.
Let e > 0. Then, by the definition of the supremum, there exists a
k ∈ N such that
L − e < ak .
Choose N := k. Let n ≥ N. Because the sequence ( a` ) is increasing,
we find that
an ≥ a N = ak > L − e.
Because of the definition of L, we also know that an ≤ L < L + e.
Summarizing,
| an − L| < e.

Theorem 6.2.2. Let ( an ) be a decreasing sequence that is bounded from


below. Then ( an ) is convergent and

lim an = inf an .
n→∞ n ∈N

6.3 Limit theorems


If you want to show that more complicated limits exist, and if you want
to compute their value, you wouldn’t want to have to use the definition
all the time. Instead there are much more efficient methods to show that
limits exist. They are called limit theorems.
CHAPTER 6. REAL-VALUED SEQUENCES 71

Theorem 6.3.1 (Limit theorems for real-valued sequences). Let a : N →


R and b : N → R be two converging sequences, and let c, d ∈ R be
real numbers such that

lim an = c and lim bn = d.


n→∞ n→∞

Then

i. The limit limn→∞ ( an + bn ) exists and is equal to c + d.

ii. The limit limn→∞ ( an bn ) exists and is equal to c · d.

iii. If d 6= 0, then the limit limn→∞ ( an /bn ) exists and is equal to c/d.

iv. For every nonnegative integer m ∈ N, the limit limn→∞ ( an )m


exists and is equal to cm .

v. If for every n ∈ N, the number an is nonnegative, then for every


positive integer k ∈ N \ {0}, the limit limn→∞ ( an )1/k exists and
is equal to c1/k .

Proof. Let us aim to prove item (v). We first show the statement for
c = 1, then for every k ∈ N+ , the limit limn→∞ a1/k
n exists and is equal
to 1.
Let e > 0. Define e0 := min(e, 1/2). (We will prefer to work with
e0 over e because we know that e0 ≤ 1/2, which will be convenient
below when we want to take the kth root of (1 − e0 ).) Since an → c by
assumption, there exists an n0 ∈ N such that for every n ≥ n0 ,

| a n − 1 | < e0 .

Let n ≥ n0 . Then
1 − e0 < a n < 1 + e0
and therefore

1 − e0 < (1 − e0 )1/k < ( an )1/k < (1 + e0 )1/k < 1 + e0 .


CHAPTER 6. REAL-VALUED SEQUENCES 72

Hence,
( an )1/k − 1 < e0 ≤ e.

Now suppose c > 0. Then we define a new sequence ã : N → R by


ãn = an /c. By item (iii) it holds that the sequence ã : N → R converges
to 1. By the previous part of the proof, we find that

lim ( ãn )1/k = 1.


n→∞

Note that an = ãn · c, so that item (ii) implies that


     
1/k 1/k 1/k 1/k 1/k
lim ( an ) = lim ( ãn ) c = lim ( ãn ) · lim c = c1/k .
n→∞ n→∞ n→∞ n→∞

Finally, we consider the case c = 0. Let e > 0. Then we may choose an


N ∈ N such that for all n ≥ N,

| an | < ek .

Let n ≥ N. Then,

|( an )1/k − 01/k | = | an |1/k < e.

Example 6.3.2. Consider the sequence a : N → R defined (for n ≥ 1),


by
1
an := 3 + 2
n
We claim that the sequence a : N → R converges and that the limit
equals 3.
We will use limit theorems to prove this claim.
We know that the limit of the sequence n 7→ 1/n exists as this is a
standard limit (see Example 5.4.2).
CHAPTER 6. REAL-VALUED SEQUENCES 73

The text here is optional, as on the one hand it is really required for
a rigorous proof but on the other hand the amount to write down
would be way too much for more involved limits.
By the limit theorem for powers, Theorem 6.3.1, item (iv), it follows
that the sequence n 7→ (1/n)2 also converges.
We also know that the sequence n 7→ 3 converges, as this is a con-
stant sequence, see Proposition 5.4.1.

By the limit theorem for the sum and the power, we conclude that the
sequence a : N → R also converges and
 
1
lim an = lim 3 + 2
n→∞ n→∞ n
 2
1
= lim 3 + lim
n→∞ n→∞ n

1 2
 
= 3 + lim
n→∞ n

= 3 + 02 = 3.

(What one really needs to do, when leaving out the optional text, is to
read the above chain of equalities from back to front, and making sure
that all steps are justified. In particular, it is extremely important to
verify that all involved limits exist.)

Example 6.3.3. Consider the sequence a : N → R defined by

3n2 + 5n + 9
an :=
2n2 + 3n + 7
We claim that the sequence a : N → R converges, and that the limit
equals 3/2.
When confronted with a sequence that is given as a fraction of two
terms, the first thing to do is to divide numerator and denominator by
the fastest growing term in n. In this case, we need to divide by n2 . We
CHAPTER 6. REAL-VALUED SEQUENCES 74

get
3 + 5 n1 + 9 n12
an := .
2 + 3 n1 + 7 n12

We would like to use the limit theorem for quotients, namely Theorem
6.3.1, item (iii). However, to apply this limit theorem, we should really
make sure that the limit of numerator and denominator exist, and that
the limit of the denominator is not equal to 0. Whereas the previous
example had optional text to justify all the steps, here we leave it out.
We will use the strategy described at the end of the previous example,
to read chains of equalities backwards while making sure all involved
limits exist.
Note that the limit
1
lim
n→∞ n
exists, and equals 0, as this is the standard limit from Example 5.4.2.
By the limit theorems it follows that
   2
1 1 1 1
lim 2+3 +7 2 = lim 2 + 3 lim + 7 lim
n→∞ n n n→∞ n→∞ n n→∞ n

= 2 + 0 + 0 = 2.

(Here, we read the chain of equalities backwards to make sure every


step is justified.) Because 2 6= 0, we may now apply the limit theorem
CHAPTER 6. REAL-VALUED SEQUENCES 75

for the quotient, and find

3 + 5 n1 + 9 n12
lim an = lim
n→∞ 2 + 3 n1 + 7 n12
n→∞
 
limn→∞ 3 + 5 n1 + 9 n12
=  
1 1
limn→∞ 2 + 3 n + 7 n2
 2
limn→∞ 3 + 5 limn→∞ n1 +9 limn→∞ n1
=
2
3+0+0
= = 3/2.
2
(Again we read the chain of equalities backwards to make sure every
step is justified.)

6.4 The squeeze theorem

Theorem 6.4.1 (Squeeze theorem). Let a, b, c : N → R be three se-


quences. Suppose that there exists an N ∈ N such that for all n ≥ N,

a n ≤ bn ≤ c n

and assume limn→∞ an = limn→∞ cn = L for some L ∈ R. Then the


limit limn→∞ bn exists and is equal to L.

Proof. Take three arbitrary sequences a, b, c : N → R, and assume that


there exists an N ∈ N such that for all n ≥ N,

a n ≤ bn ≤ c n

and that limn→∞ an = limn→∞ cn = L for some L ∈ R. We need to


CHAPTER 6. REAL-VALUED SEQUENCES 76

show that
for all e > 0,
there exists N0 ∈ N,
for all n ≥ N0 ,
|bn − L| < e.

Take e > 0 arbitrary. Since limn→∞ an = L, there exists an N1 ∈ N


such that for all n ≥ N1 , | an − L| < e. Since limn→∞ cn = L, there
exists an N2 ∈ N such that for all n ≥ N2 , |cn − L| < e. Now define
N0 := max( N, N1 , N2 ). Let n ≥ N0 . Then

L − e < a n ≤ bn ≤ c n < L + e

so that indeed, |bn − L| < e.

The squeeze theorem is a great tool to show existence of limits and to com-
pute limits for a sequences that can easily be compared to other sequences
as in the next example.

Example 6.4.2. Consider the sequence b : N → R defined by

sin(n)
bn : = .
n+1
We can use the squeeze theorem to show that

lim bn = 0.
n→∞

Because for every n ∈ N, it holds that

−1 ≤ sin(n) ≤ 1,

we know that
1 sin(n) 1
− ≤ ≤ .
n+1 n+1 n+1
CHAPTER 6. REAL-VALUED SEQUENCES 77

We know by the standard limit in Example 5.4.2 and by index shift


(Proposition 5.8.1) that
1
lim =0
n→∞ n + 1

and we know by the limit theorems that then also

1
lim − = 0.
n→∞ n+1
It follows by the squeeze theorem that

lim bn = 0
n→∞

as well.

6.5 Divergence to ∞ and −∞

Definition 6.5.1. We say a sequence ( an ) diverges to ∞, and write

lim an = ∞
n→∞

if
for all M ∈ R,
there exists N ∈ N,
for all n ≥ N,
an > M.

Similarly, we say a sequence ( an ) diverges to −∞, and write

lim an = −∞
n→∞
CHAPTER 6. REAL-VALUED SEQUENCES 78

if
for all M ∈ R,
there exists N ∈ N,
for all n ≥ N,
an < M.

Proposition 6.5.2. Let a : N → R be a sequence such that

lim an = ∞.
n→∞

Then the sequence ( an ) is bounded from below.


Similarly, let b : N → R be a sequence such that

lim bn = −∞.
n→∞

Then the sequence (bn ) is bounded from above.

6.6 Limit theorems for improper limits

Theorem 6.6.1. Let a, b, c, d : N → R be four sequences such that

lim an = ∞ and lim cn = −∞,


n→∞ n→∞

the sequence (bn ) is bounded from below and the sequence (dn ) is
bounded from above. Let λ : N → R be a sequence bounded below
by some µ > 0. Then

i. limn→∞ ( an + bn ) = ∞.

ii. limn→∞ (cn + dn ) = −∞.

iii. limn→∞ (λn an ) = ∞.


CHAPTER 6. REAL-VALUED SEQUENCES 79

iv. limn→∞ (λn cn ) = −∞.

Proposition 6.6.2. Let a : N → R be a real-valued sequence. Let b :


N → (0, ∞) be a real-valued sequence taking on only strictly positive
values. Then

i. limn→∞ an = ∞ if and only if limn→∞ (− an ) = −∞.

ii. limn→∞ bn = ∞ if and only if limn→∞ 1


bn = 0.

6.7 Standard sequences

6.7.1 Geometric sequence

Proposition 6.7.1 (Standard limit of geometric sequence). Let q ∈ R.


The sequence ( an ) defined by an := qn for n ∈ N

• converges to 0 if q ∈ (−1, 1)

• converges to 1 if q = 1

• diverges to ∞ if q > 1

• diverges, but not to ∞ or −∞ if q ≤ −1.

Proof. If q = 0 then it is clear that the sequence n 7→ qn converges to 0.


If q = 1 then it is clear that the sequence n 7→ qn converges to 1.
If q ∈ (0, 1), then the sequence ( an ) is decreasing and bounded from
below by 0. Therefore, the sequence ( an ) is convergent. Moreover,

s = lim an+1 = q lim an = qs


n→∞ n→∞

so s = 0.
CHAPTER 6. REAL-VALUED SEQUENCES 80

If q ∈ (−1, 0), then


−|q|n ≤ qn ≤ |q|n
and it follows from the squeeze theorem and the previous part of the
proof that limn→∞ qn = 0.
Now assume q > 1. We will show that the sequence n 7→ qn diverges
to ∞. Let M ∈ R.
Then we can write q := 1 + b for some b > 0.
Choose N := M d1/be. Let n ≥ N. By the Bernoulli inequality, it
follows that
1
qn = (1 + b)n ≥ 1 + nb ≥ 1 + Nb ≥ 1 + M b > M.
b
Finally, we consider the case q ≤ −1. Suppose the sequence n 7→
(−q)n converges to some r ≥ 0. Then we may choose an N ∈ N such
that for all n ≥ N, |(−q)n − r | < 1/2. Therefore,

r − (−q)2N +1 < 1/2

which is a contradiction because (−q)2N +1 = −q2N +1 ≤ −1.


In a similar way, we can rule out that n 7→ (−q)n converges to some
r < 0, or diverges to ∞ or diverges to −∞.

6.7.2 The nth root of n



Proposition
√ 6.7.2 (Standard limit of ( n n)). The sequence (bn ) defined
as bn := n n converges to 1.

Proof. We write bn = 1 + dn , where dn ≥ 0. By the limit theorems, it


suffices to show that limn→∞ dn = 0. Note that bnn = (1 + dn )n = n.
Let n ≥ 2. Then
  n  
n 2 n k
1+ dn ≤ ∑ dn = (1 + dn )n = n.
2 k =0
k
CHAPTER 6. REAL-VALUED SEQUENCES 81

Therefore,
2
0 < d2n ≤
n
and r
2
0 < dn ≤ .
n
The limit of the left-hand side is zero, and by the limit theorems, we
know that the limit of the right-hand-side is 0 as well. It follows by the
squeeze theorem that limn→∞ dn = 0.

Corollary 6.7.3. Let a > 0. Then the sequence (bn ) defined by bn := n a
converges to 1.

6.7.3 The number e

In this chapter we are going to introduce the number e as follows. We will


first define the sequence
1 n
 
an := 1 +
n
We will show that this sequence is bounded and increasing. It therefore
has a limit value, and that limit value is called e.

Lemma 6.7.4. The sequence ( an ) defined by an := (1 + 1/n)n for n ∈


N \ {0} and a0 = 1 is strictly increasing.

Proof. We need to show that for all n ∈ N \ {0}, an < an+1 . Let n ∈ N
be larger than or equal to 1. We can just write out
n    k n  k
n 1 1 n! 1
an = ∑ = ∑
k =0
k n k =0
k! (n − k )! n

whereas
n +1  k
1 ( n + 1) ! 1
a n +1 = ∑ k! (n + 1 − k )! n+1
k =0
CHAPTER 6. REAL-VALUED SEQUENCES 82

How to compare these and show that an < an+1 ? First, because all
terms in the sum are positive, we can estimate an+1 from below by
forgetting the last term
n  k
1 ( n + 1) ! 1
a n +1 > ∑
k =0
k! (n + 1 − k )! n+1

Next, we will show that each of the term in this sum is larger than the
corresponding sum for an . We can see this better if we rewrite
n
n+1−1 n + 1 − ( k − 1)
    
1 n+1
a n +1 > ∑ ···
k =0
k! n + 1 n+1 n+1
n
1 n n − 1 n − ( k − 1)
   
> ∑ ···
k =0
k! n n n
= an

Lemma 6.7.5. The sequence ( an ) defined by an = (1 + 1/n)n for n ∈


N \ {0} (and a0 = 1) is bounded from above by 3.

Proof. Again we write


n    k
n 1
an = ∑
k =0
k n
n
1 n n − 1 n − ( k − 1)
   
= ∑ ···
k =0
k! n n n
n n n
1 1 1
≤ ∑ = 1+ ∑ ≤ 1 + ∑ k −1
k =0
k! k =1
k! k =1 2
≤ 1 + 2 = 3.
CHAPTER 6. REAL-VALUED SEQUENCES 83

By the previous lemmas, the sequence

1 n
 
n 7→ 1 +
n

converges. Let’s record in the next definition that we call the limit e.

Definition 6.7.6 (Standard limit corresponding to the number e). We


define
1 n
 
e := lim 1 +
n→∞ n

6.7.4 Exponentials beat powers

Proposition 6.7.7. Let a ∈ (1, ∞) and let p ∈ (0, ∞). Then

np
lim = 0.
n→∞ an

Proof. Define b := a − 1 > 0, so that a = 1 + b. By the Archimedean


property there exists an M ∈ N such that M > p + 1. Define N := 2M.
We now claim that for all n ≥ N,

nM M
an ≥ b .
2 M M!
Indeed, let n ≥ N. First note that because n ≥ 2M, we know
n
n−M ≥ . (6.7.1)
2
CHAPTER 6. REAL-VALUED SEQUENCES 84

We now compute
n 
n k
a = (1 + b ) = ∑
n n
b
k =0
k
 
n
≥ bM
M
1
= n ( n − 1) · · · ( n − M + 1) b M
M!
 n M 1
M
≥ b
2 M!
where for the last inequality we used (6.7.1). This proves our claim.
Since M > p + 1 we find that for all n ≥ N

np 1 1
0 < n < 2 M M! M .
a b n
We know that
1 1
lim 2 M M! =0
n→∞ bM n
by limit theorems and the standard limit limn→∞ n1 = 0. Therefore, it
holds by the squeeze theorem (Theorem 6.4.1) that

np
lim = 0.
n→∞ an

Sequences with values in Rd


(Note: this topic originally occurred further down the lecture notes, but
I have moved it forward so we may get more concrete examples of se-
quences)

Proposition 6.7.8. Consider the metric space (Rd , k · k2 ). Let z ∈ Rd


and let x : N → Rd be a sequence (we are going to denote this se-
CHAPTER 6. REAL-VALUED SEQUENCES 85

quence also as ( x (n) )). Denote by yi the ith component of a vector


y ∈ Rd . Then the sequence ( x (n) ) converges to z if and only if for all
(n)
i ∈ {1, . . . , d}, the sequence ( xi ) converges to zi .

Example 6.7.9. Consider the sequence x : N → R2 taking values in


the normed vector space (Rd , k · k2 ), defined by
  n 
(n) 1 1
x := ,
n 2

for n ≥ 1. We use a superscript for the index (n) of the sequence, so


that we can use subscripts for the components of the sequence, i.e. the
(n)
first component sequence ( a1 ) is given by

(n) 1
x1 =
n
(n)
and the second component sequence ( a2 ) is given by
 n
(n) 1
x2 =
2

By standard limits, we know that both

(n) 1
lim x1 = lim =0
n→∞ n→∞ n

and  n
(n) 1
lim x = lim = 0.
n→∞ 2 n→∞ 2

By Proposition 6.7.8 it follows that the sequence x : N → R2 converges


to   n 
1 1
lim , lim = (0, 0) = 0.
n→∞ n n→∞ 2

Note how in the last term we use the notation 0 for the 0-vector in the
vector space R2 .
CHAPTER 6. REAL-VALUED SEQUENCES 86

6.8 Exercises

6.8.1 Blue exercises

Exercise 6.8.1. Prove Proposition 6.1.4.

Exercise 6.8.2. Prove item (i) of Proposition 6.6.2.

6.8.2 Orange exercises

Exercise 6.8.3. Prove item (ii) of Proposition 6.6.2.

Exercise 6.8.4. Prove the statement about the sequence (bn ) in Proposition
6.5.2.

Exercise 6.8.5. Define the sequence x : N → R recursively by

2 + xn2
x n +1 : =
2xn
for n ∈ N while x0 = 2. Prove that the sequence x : N → R converges
and determine its limit.

Exercise 6.8.6. Determine whether the following sequences converge, di-


verge to ∞, diverge to −∞ or diverge in a different way. In case the se-
quence converges, determine the limit.

1 5n5 + 2n2 √
an := −3 bn : = cn := n − n
n3 3n5 + 7n3 + 4
2n p √
n
dn := 100 e n : = n2 + n − n f n := 3n2
n
2n + 5n200 p
n
gn : = n hn := (−1)n 3n in := 5n + n 2
3 + n10
Chapter 7

Series

7.1 Definitions
Definition 7.1.1. Let (V, k · k) be a normed vector space and let a :
N → V be a sequence in V. Let K ∈ N. We say that a series

∑ an
n=K

is convergent if the associated sequence of partial sums SK : N → V, i.e.


the sequence (SKn )n∈N converges. The term SKn is, for n ∈ N, defined
as
n
SKn := ∑ ak
k=K

If K = 0, we usually just write Sn or even Sn instead of S0n .


If the series ∑∞
n=K an is convergent, the value of the series is by defini-
tion equal to the limit of the sequence of partial sums, i.e.

∞ n
∑ ak := lim SKn = lim
n→∞ n→∞
∑ ak .
k=K k=K

87
CHAPTER 7. SERIES 88

7.2 Geometric series


In this and the next section we will give some examples of sums and series
taking values in the real line (or to be specific the normed vector space
(R, | · |)).

Proposition 7.2.1. Let a 6= 1 and n ∈ N. Then


n
1 − a n +1
∑ ak =
1−a
.
k =0

Proof. We consider
n n n
(1 − a ) ∑ ak = ∑ ak − a ∑ ak
k =0 k =0 k =0
n n
= ∑ a k − ∑ a k +1
k =0 k =0
n n +1
= ∑ ak − ∑ ak
k =0 k =1
n +1
= 1−a .

Proposition 7.2.2. Let a ∈ (−1, 1). Then the series



∑ ak
k =0

is convergent and has the value



1
∑ ak = 1 − a .
k =0
CHAPTER 7. SERIES 89

Proof. By Proposition 7.2.1 it follows for the partial sums that


n
1 − a n +1
Sn : = ∑ ak = 1−a
.
k =0

Because
lim an+1 = 0
n→∞
by index shift and Proposition 6.7.1, we find with the limit laws that
limn→∞ Sn exists as well and equals

1 − a n +1 1
∑ ak := lim Sn = lim
n→∞ n→∞ 1 − a
=
1−a
.
k =0

7.3 The harmonic series


Example 7.3.1 (Harmonic series). The series

1
∑ k
k =1

diverges.

Proof. Consider for every ` ∈ N the partial sum

2`
1
S2 ` = ∑ k
.
k =1

Note that for k ∈ {2` + 1, . . . , 2`+1 } we have that

1 1
≥ `+1 .
k 2
CHAPTER 7. SERIES 90

We can conclude that


1 1
S2`+1 − S2` ≥ 2` × = .
2`+1 2
We can show by induction that

`
S2 ` ≥ .
2
Note also that the sequence of partial sums (Sn ) is increasing. There-
fore, the sequence of partial sums (Sn ) diverges to infinity.

7.4 The hyperharmonic series

Example 7.4.1 (Hyperharmonic series). Let p > 1. Then the series



1
∑ kp
k =1

converges.

Proof. For ` ∈ N \ {0} we now consider the partial sums

2` −1
1
S2 ` − 1 = ∑ kp
.
k =1

For every m ∈ N \ {0} and for k ∈ {2m−1 , . . . , 2m − 1} we have that


 p
1 1 1
p ≤ m − 1
= p ( m −1) .
k 2 2

Since there are 2m−1 such terms, we find that


  m −1
m −1 1 1
S2 m − 1 − S2 m − 1 − 1 ≤ 2 × = .
2 p ( m −1) 2 p −1
CHAPTER 7. SERIES 91

Therefore
` 2m −1
1
S2 ` − 1 = ∑ ∑ kp
m =1 k =2m −1
`   m −1
1
≤ ∑ 2 p −1
.
m =1

We recognize the last sum as a geometric sum, and conclude that

1
S2 ` − 1 ≤ .
1 − 2 p1−1

The last bound is independent of `. We then know that the sequence


(Sn ) is increasing and bounded from above, and therefore convergent.

Example 7.4.2. Here is an example of a series taking values in the


normed vector space (R2 , k · k2 ):

∞  k !
1 1
∑ k2
,
2
k =1

7.5 Only the tail matters for convergence

Lemma 7.5.1. Let (V, k · k) be a normed vector space and let a : N → V


be a sequence taking values in V. Let K, L ∈ N. The series

∑ an
n=K
CHAPTER 7. SERIES 92

is convergent if and only if the series



∑ an
n= L

is convergent. Moreover, if either of the series converges, and K < L,


then
∞ L −1 ∞
∑ an = ∑ an + ∑ an (7.5.1)
n=K n=K n= L

Proof. Without loss of generality, we may assume that K < L. We then


know that for all n ≥ L,
L −1
SKn = ∑ ak + SnL .
k=K

Suppose that the series



∑ an
n= L

is convergent. By definition, this means that the sequence (SnL )n con-


verges. By limit theorems, it follows that SKn converges as well, and
∞ n
∑ ak = lim
n→∞
∑ ak
k=K k=K
L −1
= ∑ ak + nlim Sn
→∞ L
k=K
L −1 ∞
= ∑ ak + ∑ ak .
k=K k= L

which shows the equality (7.5.1).


Similarly, suppose that the series

∑ an
n=K
CHAPTER 7. SERIES 93

is convergent. By definition, this means that the sequence (SKn )n con-


verges. Since
L −1
SnL = SKn − ∑ an
k=K

it follows again by limit theorems that the sequence (SnL )n converges.

Proposition 7.5.2. Let a : N → V be a sequence, let M ∈ N and


assume that the series

∑ ak
k= M
is convergent. Then

lim
m→∞
∑ ak = 0.
k=m

Proof. The sequence of partial sums n 7→ SnM is convergent, with limit



L := ∑ ak .
k= M

We know by Lemma 7.5.1 that for m > M


∞ ∞ m −1
∑ ak = ∑ ak + ∑ ak .
k= M k=m k= M

Rearranging terms, we find


∞ ∞ m −1
∑ ak = ∑ ak − ∑ ak .
k=m k= M k= M

By using limit theorems and index shift we find that the right-hand
side converges to 0 as m → ∞.
CHAPTER 7. SERIES 94

Proposition 7.5.3 (Index shift for series). Let a : N → V be a sequence,


let M ∈ N and let ` ∈ N. Then the series

∑ ak
k= M

converges if and only if



∑ ak+`
k= M
converges. Moreover, if either series converges,
∞ ∞
∑ ak+` = ∑ ak .
k= M k = M+`

7.6 Divergence test

Proposition 7.6.1. Let (V, k · k) be a normed vector space, and let a :


N → V be a sequence in V. Suppose the series ∑∞ n=0 an is convergent.
Then
lim an = 0.
n→∞

Proof. Suppose ∑∞
n=0 an is convergent to L ∈ V. Then

a n = S n − S n −1

where Sn denotes the partial sum ∑nk=0 ak . Because Sn and Sn−1 are
both convergent to L, the sequence ( an ) is convergent as well and con-
verges to L − L = 0.

The following is a very simple, but often useful test for divergence.
CHAPTER 7. SERIES 95

Theorem 7.6.2 (Divergence test). Let (V, k · k) be a normed vector


space, and let a : N → V be a sequence in V. Suppose the limit
limn→∞ an does not exist or is not equal to 0. Then the series

∑ an
n =0

is divergent.

7.7 Limit laws for series


Theorem 7.7.1 (Limit laws series). Let (V, k · k) be a normed vector
space. Let a : N → V and b : N → V be two sequences. Suppose that
the series
∞ ∞
∑ an and ∑ bn
n =0 n =0
are convergent. Suppose λ ∈ R. Then

i. The series

∑ ( a n + bn )
n =0
is convergent and converges to
∞ ∞
∑ a n + ∑ bn .
n =0 n =0

ii. The series



∑ (λan )
n =0
is convergent and converges to

λ ∑ an
n =0
CHAPTER 7. SERIES 96

7.8 Exercises

7.8.1 Blue exercises

Exercise 7.8.1. Let a : N → R be a real-valued sequence. Define the se-


quence b : N → R by

bn : = a n + 1 − a n , for n ∈ N.

i. Show that the series



∑ bn
n =0
converges if and only if the sequence a converges.

ii. Show that if the sequence a : N → R converges, then



lim an = a0 +
n→∞
∑ bn
n =0

Exercise 7.8.2. Show part (i) of Theorem 7.7.1.

7.8.2 Orange exercises

Exercise 7.8.3. We consider in this exercise sequences taking values in the


normed vector space (R2 , k · k2 ) (recall that this is R2 with the standard
Euclidean norm). Give an example of a sequence a : N → R2 such that

i. limn→∞ an = 0,

ii. ∑∞
n=1 an diverges.

(As always, prove that your example satisfies these properties).

Exercise 7.8.4. Determine whether the following series converge or diverge.


As always, give a proof of your statement.
CHAPTER 7. SERIES 97

∞ ∞ ∞  k
2 1
( a) ∑ k3 (b) ∑k (c) ∑ 1+
k
k =3 k =1 k =1
∞  2k ∞ ∞ p
1 2k + 3
∑ (−1) 3 ∑ ∑
k k
(d) (e) (f) 2− k + 3
k =1 k =0
( k + 1)2 ( k + 2)2 k =1
Chapter 8

Series with positive terms

In this chapter, we will consider a very special, but very important type of
series. These are series with real, positive, terms.
The chapter gives tools for answering the question: Does a series of posi-
tive terms converge or not, i.e. does it converge or does it diverge? So far,
we only know this for very specific series: we have seen that the harmonic
series diverges, the hyperharmonic series converges and geometric series
∑∞ k
k=0 q converges if and only if q ∈ (−1, 1). With the tools in this chap-
ter, however, we can conclude for many more series that they converge or
diverge.
As an example, consider the series

k
∑ k2 −1
k =2

For large k, the terms in this series, namely k/(k2 − 1) are very close to 1/k.
We may therefore expect that the series diverges, just like the harmonic
series does. In this chapter, we will see various theorems that allow you to
rigorously derive this conclusion.

8.1 Comparison test

98
CHAPTER 8. SERIES WITH POSITIVE TERMS 99

Theorem 8.1.1 (Comparison test). Let a : N → [0, ∞) and b : N →


[0, ∞) be two sequences. Assume that there exists an N ∈ N such that
for all n ≥ N, an ≤ bn .

i. Suppose the series ∑ bn converges, then the series ∑ an converges


as well.

ii. Suppose the series ∑ an diverges, then the series ∑ bn diverges as


well.

Proof. We first show (i). Suppose the series ∑ bn converges. Denote


n n
Sn : = ∑ ak Tn := ∑ bk .
k= N k= N

Then we know that for every n ≥ N that



Sn ≤ Tn ≤ ∑ bn .
k= N

The sequence (Sn ) is therefore bounded and increasing, thus conver-


gent.
We now show (ii). Suppose the series ∑ an diverges. Let M ∈ N. Then
there exists a n0 ∈ N such that for all n ≥ n0 ,
n
∑ ak > M.
k= N

Then also
n
∑ bk > M
k= N
Therefore the series ∑∞ ∞
k= N bk diverges. Then also the series ∑k =0 bk di-
verges.
CHAPTER 8. SERIES WITH POSITIVE TERMS 100

Example 8.1.2. Consider the series



k
∑ k2 −1
.
k =2

We would like to determine whether this series diverges or converges.


There are usually two stages to reaching a conclusion. First is building
intuition, the second is setting up a rigorous argumentation.
Let us discuss the intuition first. For that, it is helpful to squint your
eyes and get a feeling for the approximate behavior of the terms when
k is large. Since the terms
k
k2 − 1
are very close to 1/k for large k, we may expect that this series di-
verges, as the harmonic series ∑ 1k diverges as well.
The previous theorem allows us to turn this intuition into a precise
argument.
The precise argument looks as follows. We first observe that for all
k ≥ 2,
k k 1
2
≥ 2 = .
k −1 k k
Because the series

1
∑k
k =2
diverges, the series

k
∑ k2 − 1
k =2
diverges as well by the comparison test, Theorem 8.1.1.

Warning: Whenever you want to apply the Comparison Test, as with


all theorems, you first need to check the conditions. If you want to
show that a series ∑∞ ∞
k= N ak converges, by comparing it to a series ∑k= N bk ,
you need to show that there exists some N ∈ N such that for all n ≥ N,
CHAPTER 8. SERIES WITH POSITIVE TERMS 101

an ≤ bn , and you need to show that the series ∑∞


k= N bk indeed con-
verges.
In particular, do not write
∞ ∞
∑ ak ≤ ∑ bk
k= N k= N

before you applied the Comparison Test, because before you concluded
the convergence of the left-hand side, the statement does not make
sense.

8.2 Limit comparison test


As a motivation for the next theorem, consider the series

k
∑ k2 +1
k =2

Just as in the previous example, the terms

k
k2 +1
are very close to 1/k for large k, so we might still expect that the series di-
verges, because the standard harmonic series diverges as well. However,
in contrast to the previous example, the terms
1
k
are larger than the terms k2k+1 . That means we cannot apply the Compari-
son Test directly. There is however a very convenient way around it: this
way is industrialized by the following theorem.

Theorem 8.2.1 (Limit comparison test). Let a : N → [0, ∞) and b :


N → (0, ∞) be two sequences.
CHAPTER 8. SERIES WITH POSITIVE TERMS 102

i. Assume the series ∑ bk converges and assume the limit


an
lim
n → ∞ bn

exists (i.e. the sequence n 7→ an /bn converges). Then the series


∑ ak converges as well.
ii. Assume the series ∑ bk diverges and assume that either the limit
an
lim
n→∞ bn
exists and is strictly larger than zero, or that
an
lim = ∞.
n → ∞ bn

Then the series ∑ ak diverges as well.

Proof. We show item (i). Assume ∑ bk converges and that the limit
an
lim
n → ∞ bn

exists. Let’s call the limit L ∈ [0, ∞).


Since
an
lim =L
n → ∞ bn

we have that
for all e > 0,
there exists N ∈ N,
for all n ≥ N, (8.2.1)
an
− L < e.
bn

Choose e := 1 in (8.2.1). Then


CHAPTER 8. SERIES WITH POSITIVE TERMS 103

there exists an N ∈ N such that for all n ≥ N,

an
− L < 1.
bn

Choose such an N. We claim for all n ≥ N,

a n ≤ bn ( L + 1 ) .

Indeed let n ≥ N. Then


an
−L<1
bn
so that
a n < bn ( L + 1 ) .

Since the series ∑ bk converges, by the limit laws for series, Theorem
7.7.1, the series

∑ bk ( L + 1 )
k= N
converges as well.
Therefore by the comparison test, Theorem 8.1.1, we find that the series

∑ ak
k= N

converges as well.

Example 8.2.2. Let us see how we can use the limit comparison test to
conclude that the series

k
∑ k2 + 1
k =2
diverges.
For this, we will apply part (ii) of the Limit Comparison Test, Theorem
8.2.1.
We use sequences a : N → (0, ∞) and b : N → (0, ∞) defined for
CHAPTER 8. SERIES WITH POSITIVE TERMS 104

k ≥ 2 by
k
ak :=
k2 +1
and
1
bk : = .
k
(In general, for the comparison sequence bk it is good to try a sequence
for which you understand well whether the corresponding series di-
verges or converges, while at the same time you believe, have the in-
tuition, the inkling or the guess that ak and bk are close for k large.)
Then
k
ak k 2 +1 1
= 1 = 1
.
bk k 1 + 2
k
By limit laws, we find that the limit of the denominator is 1, i.e.
 
1 1
lim 1 + 2 = lim 1 + lim 2 = 1 + 0 = 1.
k→∞ k k→∞ k→∞ k

Therefore, we may apply the limit law for the quotient and conclude
that
a 1 1
lim k =   = = 1.
k → ∞ bk lim 1+ 1 1
k→∞ k2

The series ∑∞ 1
k=2 k diverges and therefore it follows from the Limit Com-
parison Test that the series
∞ ∞
k
∑ ak = ∑ k2 +1
k =2 k =2

diverges as well.

Let me also say a word about a crucial technique we used in Theorem 8.2.1:
we used that because the sequence an /bn converges to L, there exists an
N ∈ N such that for all n ≥ N,
an
L−1 < < L+1
bn
an
This expresses that we have some pretty good control on the terms bn
CHAPTER 8. SERIES WITH POSITIVE TERMS 105

when n is larger than or equal to N.


Similarly, it is maybe good to ponder about the fact that if a sequence c :
N → R converges to some L ∈ R, that then there exists an N1 ∈ N such
that for all n ≥ N1 ,
1 1
L− < cn < L + .
58249104762 58249104762
1
The number 58249104762 was a random result of fingers hitting the key-
board. It is true that the existence of such an N1 is a fairly direct conse-
quence of the definition of convergence of a sequence, yet sometimes it
takes some time getting used to what such a definition can actually do for
you.

8.3 Ratio test


The next test, called the ratio test, is very convenient for determining that
a series such as

2k
∑ k!
k =0
converges. Interestingly, such series occur very often ‘in the wild’. We are
not ready to show this yet, but at some point we will see that the value of
the series is actually equal to e2 , where e was introduced in Section 6.7.3 as
the limit
1 n
 
e := lim 1 + .
n→∞ n

Theorem 8.3.1 (Ratio Test). Let a : N → (0, ∞) be a sequence only


taking on strictly positive values.

i. If there exists an N ∈ N and a q ∈ (0, 1) such that for all n ≥ N,


it holds that
a n +1
≤ q,
an
then the series ∑ ak converges.
CHAPTER 8. SERIES WITH POSITIVE TERMS 106

ii. If there exists an N ∈ N such that for all n ≥ N, it holds that


a n +1
≥ 1,
an
then the series ∑ ak diverges.

Proof. We first show (i). So assume there exists an N ∈ N and a q ∈


(0, 1) such that for all n ≥ N, it holds that
a n +1
≤ q.
an
Then it holds for all k ∈ N that

0 < a N +k ≤ qk a N

Note that the series



∑ qk
k =0

is convergent as it is a standard geometric series with |q| < 1. By


Theorem 7.7.1, part (ii), the series

∑ qk a N
k =0

is convergent as well. Therefore, we find by the Comparison Test that


the series

∑ ak
k= N
is convergent as well.
We now show (ii). Assume there exists an N ∈ N such that for all
n ≥ N, it holds that
a n +1
≥ 1.
an
CHAPTER 8. SERIES WITH POSITIVE TERMS 107

Then for all n ≥ N, an ≥ a N , so that an does not converge to zero. By


the divergence test (Theorem 7.6.2), we find that the series

∑ ak
k =0

is divergent.

Corollary 8.3.2 (Ratio Test, limit version). Let ( an ) be a sequence of strictly


positive real numbers.
a n +1
• If limn→∞ an = q with q ∈ [0, 1), then the series ∑k ak converges.
a a n +1
• If limn→∞ na+n 1 = q with q ∈ (1, ∞), or if limn→∞ an = ∞, then the
series ∑k ak diverges.

Warning: We cannot conclude anything about the convergence of a


series ∑k ak when
a
lim n+1 = 1.
n→∞ an

8.4 Root test


Theorem 8.4.1 (Root Test). Let ( an ) be a sequence of nonnegative real
numbers.

i. If there exists an N ∈ N and a q ∈ (0, 1) such that for all n ≥ N,


√n a ≤ q, then the series
n ∑ an converges.

ii. If there exists an N ∈ N such that for all n ≥ N, n an ≥ 1, then
the series ∑ an diverges.

Proof. Suppose there exists an N ∈ N and a q ∈ (0, 1) such that for all
n ≥ N, it holds that √
n
an ≤ q.
CHAPTER 8. SERIES WITH POSITIVE TERMS 108

Then for all n ≥ N, it holds that

0 ≤ an ≤ qn .

The series

∑ qn
n= N

converges as it is a standard geometric series and q ∈ (0, 1).


Therefore, the series

∑ an
n= N
converges by the comparison test, Theorem 8.1.1. Finally, then the se-
quence

∑ an
n =0
converges as well by Lemma 7.5.1.

Corollary 8.4.2 (Root Test, limit version). Let ( an ) be a sequence of non-


negative real numbers.

• If limn→∞ nan = q and q ∈ [0, 1), then the series ∑k ak converges.
√ √
• If limn→∞ n an = q with q ∈ (1, ∞) or if limn→∞ n an = ∞, then the
series ∑k ak diverges.

Warning We cannot conclude anything about the convergence of a se-


ries ∑k ak if √
lim n an = 1.
n→∞
CHAPTER 8. SERIES WITH POSITIVE TERMS 109

8.5 Exercises

8.5.1 Blue Exercises

Exercise 8.5.1. Determine whether the following series converge or diverge.


∞ ∞
3k k k+2
( a) ∑ (2k + 1)! (b) ∑ k3 − 6
k =1 k=10
∞ ∞
1 1
(c) ∑ √
k+1
(d) ∑ k100
3k
k =2 k =3

8.5.2 Orange Exercises

Exercise 8.5.2. Let c : N → (0, ∞) be a sequence taking on only strictly


positive values, and assume that c : N → (0, ∞) converges to 3/2. Deter-
mine whether the following series diverges or converges

1
∑ (ck )
.
k =1 k
Chapter 9

Series with general terms

Whereas in the previous section, we have considered techniques for con-


cluding the convergence or divergence of very special types of series (se-
ries with positive, real, terms), we will in this chapter go back to general
series. How can we conclude convergence or divergence of those?
In the next section, we will first consider alternating series of real terms.
There is a nice convergence test for such series, called the Leibniz Test.
In addition, we will borrow a theorem from a later chapter that works as
follows for sequences taking values in the real numbers. Let a : N → R
be a sequence of real numbers. We can now make the following series of
positive terms (which brings us back in the realm of the previous chapter)

∑ | a k |.
k =0

Suppose this series of absolute values converges (we will say that the se-
ries ∑∞
k=0 ak converges absolutely). Then the theorem will allow us to con-
clude that the series

∑ ak
k =0
converges as well.

110
CHAPTER 9. SERIES WITH GENERAL TERMS 111

9.1 Series with real terms: the Leibniz test


Theorem 9.1.1 (Leibniz Test, a.k.a. Alternating Series Test). Let a, b :
N → R be two real-valued sequence such that for all k ∈ N, bk =
(−1)k ak . Assume that there exists a K ∈ N such that

i. ak ≥ 0 for every k ≥ K

ii. ak ≥ ak+1 for every k ≥ K

iii. limk→∞ ak = 0

Then, the series


∞ ∞
∑ bk = ∑ (−1)k ak
k=K k=K
is convergent. In addition, the following estimate holds for every N ≥
K,

SN − ∑ bk < a N +1
k=K

where for all n ∈ N, Sn := ∑kN=K bk .

Proof. We only prove the case in which K = 0.


We note that (S2n ) is a decreasing sequence, because for all n ∈ N,

S2n+2 = S2n − a2n+1 + a2n+2 ≤ S2n .

Similarly, (S2n+1 ) is an increasing sequence, because for all n ∈ N,

S2n+3 = S2n+1 + a2n+2 − a2n+3 ≥ S2n+1 .

Finally, we note that for all n ∈ N, it holds that

S1 ≤ S2n+1 = S2n − a2n+1 ≤ S2n ≤ S0 .


CHAPTER 9. SERIES WITH GENERAL TERMS 112

As a consequence, the sequence (S2n ) is bounded from below by S1 ,


and the sequence (S2n+1 ) is bounded above by S0 . Therefore, both
sequences are convergent.
Because the sequence ( an ) converges to zero, we can show that the
sequence ( a2n+1 ) converges to zero as well. By the limit laws, we find
that
lim S2n+1 = lim S2n − lim a2n+1 = lim S2n .
n→∞ n→∞ n→∞ n→∞

In words, the sequences (S2n+1 ) and (S2n ) converge to the same limit.
Let’s call this limit s.
Finally,
S2n > s > S2n+1 = S2n − a2n+1
so that |S2n − s| < a2n+1 .
Similarly,
S2n+1 < s < S2n+2 = S2n+1 + a2n+2
so that |S2n+1 − s| < a2n+2 .
In conclusion, in such an alternating series we have the estimate for all
n∈N
| S n − s | ≤ a n +1
Therefore the sequence (Sn ) converges to s.

Example 9.1.2. We claim that the series



1
∑ (−1)k k
k =1

converges.
We would like to apply the Alternating Series Test. To do so, we need
to check its conditions.
We define the sequence a : N → R by

1
ak :=
k
CHAPTER 9. SERIES WITH GENERAL TERMS 113

for k ≥ 1 (and a0 = a1 = 1).


We now check the conditions for the Alternating Series Test.

i. We need to show that ak ≥ 0 for all k ∈ N. Let k ∈ N. Then


1
ak = ≥ 0.
k

ii. We need to show that ak ≥ ak+1 for all k ∈ N. Let k ∈ N. Then

1 1
ak = ≥ = a k +1 .
k k+1

iii. We need to show that


lim ak = 0.
k→∞
This follows as this is a standard limit.

It follows from the Alternating Series Test that the series


∞ ∞
1
∑ (−1)k ak = ∑ (−1)k k
k =1 k =1

converges.

9.2 Series characterization of completeness in normed


vector spaces

Definition 9.2.1. Let (V, k · k) be a normed vector space. Let a : N →


V be a sequence in V. We say the series

∑ ak
k =0
CHAPTER 9. SERIES WITH GENERAL TERMS 114

converges absolutely if

∑ k ak k
k =0
converges.

Definition 9.2.2 (Series characterization of completeness). We say a


normed vector space (V, k · k) satisfies the series characterization of com-
pleteness if every series in V that is absolutely convergent is also con-
vergent.

In a later chapter, we will prove the following proposition.

Proposition 9.2.3. Every finite-dimensional normed vector space sat-


isfies the series characterization of completeness.

In particular, Rd endowed with the standard Euclidean norm satisfies the


series characterization of completeness, and (R, | · |) satisfies the series
characterization of completeness.

Example 9.2.4. Consider the series



sin(k)
∑ k2
.
k =1

Since this is not an alternating series, we cannot apply the Leibniz test.
However, for every k ∈ N \ {0}, we have

sin(k) 1
2
≤ 2.
k k

The series

1
∑ k2
k =1
is a standard hyperharmonic series, of which we know that it con-
CHAPTER 9. SERIES WITH GENERAL TERMS 115

verges. By the Comparison Test, we conclude that the series



sin(k )
∑ k2
k =1

converges as well.
Therefore, the series

sin(k )
∑ k2
k =1

converges absolutely. Because (R, | · |) is complete, we find that



sin(k )
∑ k2
k =1

converges.

Definition 9.2.5. Let (V, k · k) be a normed vector space and let a :


N → V be a sequence. We say that a series

∑ ak
k =0

converges conditionally if it converges, but doesn’t converge absolutely.

9.3 The Cauchy product


Intuitively, this section is about the multiplication of two real-valued se-
ries. The precise statement is covered in the next theorem.

Theorem 9.3.1 (Cauchy product). Let ( Ak ) and ( Bk ) be two real-valued


CHAPTER 9. SERIES WITH GENERAL TERMS 116

sequences, and assume that the series


∞ ∞
∑ Ak and ∑ Bk
k =0 k =0

both converge absolutely. Then the series



∑ Ck
k =0

converges absolutely as well, where Ck := ∑k`=0 A` Bk−` , and

∞ ∞ ∞
! !
∑ Ck = ∑ Ak ∑ Bk .
k =0 k =0 k =0

Proof. We will first show that the series



∑ Ck
k =0

converges absolutely. Note that

n n k
∑ Ck = ∑ ∑ A` Bk−`
k =0 k=0 `=0
n n
= ∑ ∑ A` Bk−`
`=0 k=`
n n−`
= ∑ ∑ A` Bm .
`=0 m=0
CHAPTER 9. SERIES WITH GENERAL TERMS 117

Therefore
n n n−`
∑ |Ck | ≤ ∑ | A` | ∑ | Bm |
k =0 `=0 m =0
! !
n n
≤ ∑ | A` | ∑ | Bm |
`=0 m =0
∞ ∞
! !
≤ ∑ | A` | ∑ | Bm |
`=0 m =0

It follows that the sequence of partial sums


n
n 7→ ∑ |Ck |
k =0

is bounded from above. It is also increasing, and therefore it con-


verges.
Now let n ∈ N. Then
2n 2n 2n 2n 2n
∑ Ck − ∑ A` ∑ Bm = ∑ ∑ A` Bm
k =0 `=0 m =0 `=0 m=2n−`+1
2n 2n
= ∑ A` ∑ Bm
`=0 m=2n−`+1
n 2n 2n 2n
= ∑ A` ∑ Bm + ∑ A` ∑ Bm
`=0 m=2n−`+1 `=n+1 m=2n−`+1

It follows that
2n 2n 2n n ∞ n ∞
∑ Ck − ∑ A` ∑ Bm ≤ ∑ | A` | ∑ | Bm | + ∑ | B` | ∑ | A n |.
k =0 `=0 m =0 `=0 m=n `=0 m=n
(9.3.1)
CHAPTER 9. SERIES WITH GENERAL TERMS 118

Now note that because the series



∑ A`
`=0

is absolutely convergent, the series



∑ | A` |
`=0

is convergent, which exactly means that



lim
n→∞
∑ | A` |
`=0

exists. Similarly,

lim
n→∞
∑ | B` |
`=0
exists.
Moreover, because the series

∑ | Bm |
m =0

converges, it follows by Proposition 7.5.2 that



lim ∑
n→∞ m=n
| Bm | = 0.

Similarly,

lim
n→∞ m=n
∑ | Am | = 0.

It follows by limit laws that the right-hand side of (9.3.1) converges to


0 as n → ∞.
CHAPTER 9. SERIES WITH GENERAL TERMS 119

It follows that the sequence


n
n 7→ ∑ Ck
k =0

converges to
∞ ∞
∑ A` ∑ Bm .
`=0 m =0

9.4 Exercises

9.4.1 Blue exercises

Exercise 9.4.1. Determine whether the following series converge or diverge.


∞ ∞  
1 kπ 1
( a) ∑ (−1) √ k +1
(b) ∑ sin
k =0 k+1 k =1
6 k4

9.4.2 Orange exercises

Exercise 9.4.2. Give an example of a sequence a : N → R such that

• for all k ∈ N, it holds that ak > 0,

• limk→∞ ak = 0,

yet the series



∑ (−1)k ak
k =0
diverges.
Chapter 10

Subsequences, lim sup and lim


inf

Why are subsequences useful?


For many students, the topic of subsequences is initially difficult to grasp.
I believe that what makes it easier, is to keep reminding yourself that se-
quences (and subsequences too!), are functions from the natural numbers.
This is one of the main reasons we spent so much time on this aspect be-
fore.

10.1 Index sequences and subsequences


Subsequences are made by precomposing a sequence by a very special
type of sequence: an index sequence.

Definition 10.1.1 (index sequence). We say a sequence n : N → N is


an index sequence if n is strictly increasing.

There are two important elements to this definition: first of all, index se-
quences are sequences taking values in the natural numbers (as opposed
to just an arbitrary space). Secondly, an index sequence is strictly increas-
ing, so for every k ∈ N, nk+1 > nk .

120
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 121

We often write (nk )k∈N or just (nk ) to denote an index sequence.

Example 10.1.2. The sequence n : N → N defined by

nk := 2k

is a is strictly increasing sequence of natural numbers. In other words,


it is an index sequence.

The next definition describes how to make a subsequences by precompos-


ing a sequence with an index sequence.

Definition 10.1.3 (subsequence). Let a : N → X be a sequence. A


sequence b : N → X is called a subsequence of a if there exists an index
sequence n : N → N such that b = a ◦ n.

Just as we often write ( an )n∈N for a sequence called a, we often write


( ank )k∈N for the subsequence a ◦ n.

Example 10.1.4. Let’s see what happens in an example. Let a : N → R


be defined by a` := `+1 1 for ` ∈ N. Let n : N → N be the index
sequence defined by n` := (2` + 1).
Here is a table that indicates for this example the first terms of the
sequences, ( a` )` , ( ank )k and (nk )k .

` 0 1 2 3 4 5 6 7 8 9 10 11 12 13 . . .
a` a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13
a` 11 12 13 41 15 16 17 18 19 10
1 1
11
1
12
1
13
1
14
1 1 1 1 1 1 1
a nk 2 4 6 8 10 12 14
a nk a1 a3 a5 a7 a9 a11 a13
nk 1 3 5 7 9 11 13
k 0 1 2 3 4 5 6

It could really help to think of an example sequence yourself, and an


CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 122

example index sequence, and create a similar table for your own ex-
ample.

Warning: the following subtle point requires some quiet overthinking.


The subsequences ( ank )k∈N and ( an` )`∈N are exactly the same! Both
notations represent the subsequence a ◦ n.

10.2 (Sequential) accumulation points


As a motivation for the following definition, let us consider the sequence
a` = (−1)` . A term a` of this sequence equals 1 if its index ` is even,
and equals (−1) if its index ` is odd. This sequence does not converge,
but if we just consider the subsequence ( ank )k with nk := 2k, then this
subsequence is a constant sequence, every term equals 1 and therefore it
does converge.
In general then, we will find it interesting to know if a subsequence of a
sequence converges. And the limit of such a subsequence is special too.
We call a limit of a subsequence a (sequential) accumulation point.

Definition 10.2.1 ((Sequential) accumulation points). Let ( X, dist) be


a metric space. A point p ∈ X is called an accumulation point of a
sequence a : N → X if there is a subsequence a ◦ n of a such that a ◦ n
converges to p.

10.3 Subsequences of a converging sequence

Proposition 10.3.1. Let ( X, dist) be a metric space. Let ( an ) be a se-


quence in X converging to p ∈ X. Then every subsequence of ( an ) is
convergent to p.
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 123

Proof. Let ( an ) be a sequence converging to p ∈ X and let ( ank ) be a


subsequence of a. We need to show that for all e > 0 there exists a
k0 ∈ N such that for all k ≥ k0 ,

dist( ank , p) < e.

Let e > 0. Because ( am ) converges to p, there exists an m0 ∈ N such


that for all m ≥ m0 ,
dist( am , p) < e. (10.3.1)
Choose k0 := m0 . Let k ≥ k0 . Then

n k ≥ n k 0 ≥ k 0 = m0

where in the last inequality we made use of the fact that the index
sequence n : N → N is strictly increasing. Because nk ≥ m0 , it follows
by (10.3.1) that
dist( ank , p) < e.

10.4 lim sup


In this section we are going to define a function (called the lim sup) that
takes in a real-valued sequence and outputs either

• the symbol “∞” if the sequence is not bounded from above

• the symbol “−∞”, if the sequence diverges to −∞

• a real number otherwise.

Consider a real-valued sequence ( an ) that is bounded from above and does


not diverge to −∞. We can then define a new sequence

k 7→ sup an .
n≥k
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 124

Note that this sequence is decreasing, because for larger k the supremum
is taken over a smaller set. We will show in the lemma below that then the
sequence k 7→ supn≥k an is also bounded from below.
Therefore, the sequence k 7→ supn≥k an has a limit, and the limit is in fact
equal to the infimum of the sequence. This limit is called the lim sup

lim sup an := inf sup an


n→∞ k ∈N n ≥ k
!
= lim sup an .
k→∞ n≥k

We still need to show the announced lemma.

Lemma 10.4.1. Let a : N → R be a sequence that is bounded from


above and does not diverge to −∞. Then the sequence k 7→ supn≥k an
is bounded from below.

Proof. We argue by contradiction. Suppose therefore that the sequence


k 7→ supn≥k an is not bounded from below. We are going to show that
the sequence an diverges to −∞, which would indeed be a contradic-
tion.
We will show that
for all M ∈ R
there exists N ∈ N
for all n ≥ N,
an < M.

Let M ∈ R. Since the sequence k 7→ supn≥k an is not bounded from


below, there exists an m ∈ N, such that

sup an < M
n≥m
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 125

Choose N := m. Let n ≥ N. Then

an ≤ sup a` < M.
`≥m

This finishes our proof that ( an ) diverges to −∞, and we have de-
rived a contradiction. Hence the sequence k 7→ supn≥k ( an ) is in fact
bounded from below.

Proposition 10.4.2 (Alternative characterization of lim sup). Let ( an )


be a real-valued sequence. Let M ∈ R. Then M ∈ R equals lim sup`→∞ a`
if and only if the following two conditions hold:

i.
for all e > 0,
there exists N ∈ N,
for all ` ≥ N,
a` < M + e.

ii.
for all e > 0,
for all k ∈ N,
there exists m ≥ k,
am > M − e.

Proof. Let M ∈ R. We first show that if M = lim supn→∞ an , then the


conditions (i) and (ii) hold. Assume M = lim supn→∞ an . Then by
definition of the lim sup, it follows that a : N → R is bounded from
above and does not diverge to −∞.
We will now show (i). Let e > 0. We need to show that there exists a
N ∈ N such that for all ` ≥ N, it holds that a` < M + e.
By the definition of lim sup`→∞ a` as inf`∈N supk≥` ak , there exists an
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 126

`0 ∈ N such that
sup ak < M + e.
k≥`0

Choose N := `0 . Let ` ≥ N. Then by the previous inequality we


conclude that
a` < M + e.

We will now show (ii). Let e > 0. Let k ∈ N. We need to show


that there exists an m ≥ k such that am > M − e. By the definition of
lim sup`→∞ a` as inf`∈N supk≥` ak , we know that

sup an ≥ M.
n≥k

Therefore, there exists an m ≥ k such that

am > sup an − e ≥ M − e.
n≥k

We will now show that if M ∈ R satisfies conditions (i) and (ii), that
then M equals lim sup`→∞ a` .
Assume M satisfies conditions (i) and (ii).
We first need to settle that a : N → R is bounded from above and does
not diverge to −∞.
We will show that a : N → R is bounded from above. By (i), we
may obtain an N ∈ N such that for all ` ≥ N, a` < M + 1. Choose
L := max( a0 , a1 , . . . , a N −1 , M + 1). Then for all ` ∈ N, it holds that
a` ≤ L. Hence L is an upper bound for a : N → R and a : N → R is
indeed bounded.
We will now show that a : N → R does not diverge to −∞. We argue
by contradiction. Suppose a : N → R diverges to −∞. Then there
exists an N0 ∈ N such that for all ` ≥ N0 , a` < M − 1. However, by (ii)
there exists an m ≥ N0 such that am > M − 1. This is a contradiction.
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 127

It now follows that lim sup`→∞ a` = infk∈N supn≥k an . We therefore


need to show that
M = inf sup an .
k ∈N n ≥ k

We first show that for every k ∈ N,

M ≤ sup an .
n≥k

Let k ∈ N. Note that it suffices to show that for every e > 0

M − e < sup an
n≥k

Let e > 0. By (ii) we know that there exists an m ≥ k such that

M − e < am .

Therefore also
M − e < sup an
n≥k

Finally, we show that for every e > 0, there exists a k ∈ N such that

sup an < M + e
n≥k

Let e > 0. By condition (i) we know that there exists an n0 ∈ N such


that for all n ≥ n0 ,
an < M + e/2
Choose k := n0 . Then

sup an ≤ M + e/2 < M + e.


n≥k

Theorem 10.4.3. Let a : N → R be a real-valued sequence that is


bounded from above and does not diverge to −∞. Then lim sup`→∞ a`
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 128

is a (sequential) accumulation point of the sequence a, i.e. there exists


a subsequence of a that converges to lim sup`→∞ a` .

Proof. We denote lim supk→∞ ak by M.


We need to find an index sequence n : N → N such that

lim ank = M.
k→∞

We do this inductively. We first know that there exists an m0 such that


for all m ≥ m0 ,
am ≤ M + 1/1.
Then we know there exists an n0 such that n0 > m0 and

M − 1/1 < an0

Because n0 > m0 , we also know that

an0 < M + 1/1

Suppose now that n`−1 is defined for some ` ∈ N \ {0}. We are going
to define n` . We know that there exists an m` ∈ N such that for every
m ≥ m` ,
1
am ≤ M + .
`+1
Now, there exists an n` ≥ max(n`−1 , m` ) + 1 such that

1
M− < a n`
`+1
and because n` > m` , we also know that
1
a n` < M + .
`+1
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 129

By construction, we know that n : N → N is strictly increasing. In


other words, we know that it is an index sequence. Also by construc-
tion, we know that for all ` ∈ N,
1 1
M− < a n` < M + .
`+1 `+1
By the squeeze theorem, we know that the lim`→∞ an` exists and equals
M.

The previous theorem has the following consequence, which is a key fact
in analysis.
Corollary 10.4.4 (Bolzano-Weierstrass). Every bounded, real-valued se-
quence has a subsequence that converges in (R, distR ).

Theorem 10.4.3 shows that if a sequence a : N → R is bounded from


above and does not diverge to infinity, the number lim sup`→∞ a` is a se-
quential accumulation point. However, we can derive more: in fact it is
the maximum of the set of accumulation points.

Theorem 10.4.5. Suppose a sequence a : N → R is bounded from


above and does not diverge to −∞. Then

lim sup a`
`→∞

is the maximum of the set of sequential accumulation points.

10.5 lim inf


Similarly to the lim sup, we can also define the lim inf. In some sense,

lim inf a` = − lim sup(− a` )


`→∞ `→∞

More precisely, the lim inf is a function that takes in a real-valued sequence
a : N → R and outputs
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 130

• The symbol ”−∞” if the sequence is not bounded from below


• The symbol ”∞ if the sequence diverges to ∞, and otherwise
• the real number
lim inf a` := sup inf ak
`→∞ `∈N k≥`
 
= lim inf ak .
`→∞ k≥`

Let us also record an alternative characterization of the lim inf.

Proposition 10.5.1 (Alternative characterization lim inf). Let a : N →


R be a real-valued sequence and let M ∈ R. Then M equals lim inf`→∞ a`
if and only if the following two conditions hold

i.
for all e > 0,
there exists N ∈ N,
for all ` ≥ N,
a` > M − e.

ii.
for all e > 0,
for all K ∈ N,
there exists m ≥ K,
am < M + e.

Theorem 10.5.2. Let a : N → R be a real-valued sequence that is


bounded below and does not diverge to ∞. Then lim inf`→∞ a` is a
sequential accumulation point of the sequence a, i.e. there is a subse-
quence of a that converges to lim inf`→∞ a` .

We can in fact show a bit more, namely that if a : N → R is bounded below


and does not diverge ∞, the number lim inf`→∞ is the smallest sequential
accumulation point of the sequence a : N → R.
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 131

Theorem 10.5.3. Let a : N → R be a real-valued sequence that is


bounded below and does not diverge to ∞. Then lim inf`→∞ a` is the
minimum of the set of sequential accumulation points of the sequence
a : N → R.

10.6 Relations between lim, lim inf and lim sup


A bounded sequence may not always converge. In other words, its limit
may not always exist. But the lim sup and lim inf always do exist for
bounded sequences. The following proposition tells us moreover that the
sequence converges if and only if the lim sup and the lim inf are the same.

Proposition 10.6.1. Let a : N → R be a real-valued sequence and let


L ∈ R. Then a : N → R converges to L if and only if

lim inf a` = lim sup a` = L.


`→∞ `→∞

So far we haven’t seen a statement that said that if two convergent se-
quences a : N → R and b : N → R are ordered as in a` ≤ b` for all `,
then lim`→∞ a` ≤ lim`→∞ b` . Part of the reason is that the assumption of
convergence of both sequences is a bit unsatisfactory. The next proposi-
tion is a generalization that can be much more useful, especially when it is
combined with the previous proposition.

Proposition 10.6.2. Let a : N → R and b : N → R be two real-valued


sequences, such that there exists an N ∈ N such that for all ` ≥ N,
a` ≤ b` . Then
lim sup a` ≤ lim sup b`
`→∞ `→∞
and
lim inf a` ≤ lim inf b` .
`→∞ `→∞
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 132

10.7 Exercises

10.7.1 Blue exercises

Exercise 10.7.1. Consider the set A := { a, b, · · · , z} of letters in the english


alphabet and let α : N → A be a sequence of which the first terms are (in
the order in which you would normally read)

r j z b a g w q o r
x o l b d x s l e e
u h g c e c k v n i
v c n i l t j n h c
e e i u s u m e c t
c r y o b v n d f d
b d h z f a z s l i
h f k s o x c f n o
a x c n d i a d c l
e u y i j s c v i k
m s g n o c d n f g

Let ν : N → N be the index sequence defined by νκ := κ + 5 and let


µ : N → N be the index sequence defined by µκ := 3κ.
Write down the first 33 terms of the sub-subsequence (ανµκ )κ of the se-
quence (ακ )κ .
Exercise 10.7.2. Let ( X, dist) be a metric space. Let a : N → X be a se-
quence, and let n : N → N be an index sequence. Suppose that the
subsequence a ◦ n converges. Show that every subsequence of a ◦ n is con-
vergent.
Exercise 10.7.3. Let ( X, dist) be a metric space and let a : N → X and
b : N → X be two sequences, such that a : N → X converges to some
p ∈ X.
Now consider the following sequence c : N → X, defined by
(
ak if k even
ck :=
bk if k odd.
Show that p is an accumulation point of c : N → X. (See Definition 10.2.1).
CHAPTER 10. SUBSEQUENCES, LIM SUP AND LIM INF 133

10.7.2 Orange exercises

Exercise 10.7.4. Let ( X, dist) be a metric space and let a : N → X be a


sequence with values in X. Let p ∈ X. Suppose that every subsequence
of a : N → X has itself a subsequence that converges to p. Show that
a : N → X itself converges to p as well.
Hint: Argue by contradiction, and use a similar proof technique as Theo-
rem 10.4.3.
Exercise 10.7.5. Prove Proposition 10.6.1.
Exercise 10.7.6. Let P : N → {blue, orange} be a sequence taking values in
the set with exactly the two elements blue and orange. Assume that

for all k ∈ N,
there exists m ≥ k,
Pm = blue.

Show that there is a subsequence of P : N → {blue, orange} for which


every term equals blue by going through the following steps:

i. Inductively define an index sequence n : N → N such that for all


k ∈ N, Pnk = blue following the template in the Best Practices item
(xiii):

(a) First define n0 ∈ N appropriately and prove that Pn0 = blue.


(b) For k ∈ N, with n0 , . . . , nk defined, define nk+1 appropriately,
and prove that nk+1 > nk and

Pnk+1 = blue.

ii. Conclude your proof by saying that the sequence P ◦ n is a subse-


quence of P and that by construction, for all k ∈ N,

Pnk = blue.
Exercise 10.7.7. Let a : N → R be a sequence with (at least) two sequential
accumulation points p, q ∈ R (with p 6= q). Prove that the sequence a :
N → R does not converge.
Chapter 11

Point-set topology of metric


spaces

The main purpose of the current section and the next is to introduce three
stronger and stronger properties for subsets of a metric space: closedness,
completeness and compactness. Here ‘stronger and stronger’ means that
every compact set is complete, and every complete set is closed. However,
not every closed set is complete, and not every complete set is compact.
If we know that a subset of K of a metric space is compact, we get a lot of
amazing properties for free.

11.1 Open sets


Before we can define closed sets in the next section (according to the stan-
dard definition), we first need to introduce open sets. First let us recall the
definition of an (open) ball B( p, r ) around a point p with radius r from
definition 2.4.4

B( p, r ) := { x ∈ X | dist( x, p) < r } .

The reason for the parentheses around ‘open’ is that yes, soon we will
prove that this set is indeed open, however so far we have not defined
what ‘open’ really is!

134
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 135

Before we define what it means for a set to be open, we define when a


point in a subset is an interior point.

Definition 11.1.1. Let ( X, dist) be a metric space and let A be a subset


of X. A point a ∈ A is called an interior point of A if

there exists r > 0,


B( a, r ) ⊂ A.

Open sets are subsets for which every point in the subset is an interior
point.

Definition 11.1.2. Let ( X, dist) be a metric space. We say that a subset


O ⊂ X is open if every x ∈ O is an interior point of O.

Having defined what it means for a set to be open, we can now prove that
the (open) ball is indeed open.

Proposition 11.1.3. Let ( X, dist) be a metric space. The ball

B( p, r ) := { x ∈ X | dist( x, p) < r }

is indeed open.

Before giving the proof of the proposition, I’d like to say the following.
If by this point, you have the blue exercises and the best practices down,
the proof of the proposition may come to you very easily. This is one of
the reasons that I stress the best practices so much: whereas without them
it may be difficult to even see where to start, with them the proof can be
written down almost mechanically.
If you still would have difficulties giving such a proof yourself, don’t
worry, it takes time to get used to proving mathematical statements. If
you’re still struggling with following the best practices, the proofs in this
chapter may help you get further in your understanding. It is especially
helpful to see if you can recognize the various components of the best
practices in the proofs that are given.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 136

Proof. We need to show that every x ∈ B( p, r ) is an interior point.


Let x ∈ B( p, r ). We need to show that x is an interior point, i.e. we need
to show that there exists a ρ > 0 such that B( x, ρ) ⊂ B( p, r ). Judging
from what we need to show, we now need to prepare ourselves for
choosing such a ρ > 0.
Since x ∈ B( p, r ), we know that dist( x, p) < r. Then

r − dist( x, p) > 0.

Choose ρ := r − dist( x, p) > 0.


We need to show that B( x, ρ) ⊂ B( p, r ). The standard proof of such
a set inclusion is by showing that for all z ∈ B( x, ρ) it holds that z ∈
B( p, r ). So let z ∈ B( x, ρ). We need to show that z ∈ B( p, r ), i.e. that
dist(z, p) < r.
Because z ∈ B( x, ρ), it holds that dist(z, x ) < ρ. It now follows by the
triangle inequality that

dist(z, p) ≤ dist(z, x ) + dist( x, p)


< ρ + dist( x, p)
= r − dist( x, p) + dist( x, p) = r.

which was what we needed to show.

The following proposition characterizes which intervals are open.

Proposition 11.1.4 (‘Open’ intervals are open). Let a, b ∈ R with a < b.


Then the intervals ( a, b), (−∞, b) and ( a, ∞) are all open subsets of R
(i.e. of the normed vector space (R, | · |)).

The second part of the proof gives another good example of a proof that
shows that a set is open.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 137

Proof. Note that the interval ( a, b) is exactly equal to the (open) ball

a+b b−a b−a


   
a+b
B , = x ∈R| x− < .
2 2 2 2

We therefore know that it is open by Proposition 11.1.3.


Let us now prove that (−∞, b) is open. Let x ∈ (−∞, b). We need
to show that x is an interior point of (−∞, b). That is, we need to
show that there exists an r > 0 such that B( x, r ) ⊂ (−∞, b). Choose
r := b − x, which is indeed strictly positive (r > 0) because b > x.
We now need to show that B( x, r ) ⊂ (−∞, b). Let y ∈ B( x, r ). Then
|y − x | < r. In particular

y < x + r ≤ x + (b − x ) = b

so indeed y ∈ (−∞, b).


In a similar way, we can prove that ( a, ∞) is open.

Proposition 11.1.5. Let ( X, dist) be a metric space. Then both the empty
set ∅ and the set X itself (both of these are subsets of X) are open.

Proof. We will first show that the empty set is open. The argument is
a bit silly (yet logically correct). We argue by contradiction. Suppose
there exists a point x ∈ ∅ such that x is not an interior point of X. Then
we have a contradiction, because the empty set has no elements.
We will now show that X is open. Let x ∈ X. We will show that x is
an interior point, i.e. we will show that there exists an r > 0 such that
B( x, r ) ⊂ X.
Choose r := 1. Then B( x, r ) = B( x, 1) ⊂ X.

The set of all interior points of a subset A ⊂ X is called the interior of the
set A.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 138

Definition 11.1.6 (The interior of a set). Let ( X, dist) be a metric space


and let A ⊂ X be a subset of X. Then the interior of the set A, denoted
by int A is the set of all interior points of A, i.e. int A is defined as

int A := { x ∈ A | x is an interior point of A}.

Example 11.1.7. The interior of the interval [2, 5) (viewed as a subset


of (R, | · |), is the interval (2, 5).

Proof. We already know that (2, 5) is an open subset of R. Therefore,


for all x ∈ (2, 5), there exists an r > 0 such that B( x, r ) ⊂ (2, 5). Since
(2, 5) ⊂ [2, 5), we can easily show that for all x ∈ (2, 5) there exists an
r > 0 such that B( x, r ) ⊂ [2, 5). We conclude that the interval (2, 5) is
at least contained in int[2, 5).
Since the interior of a set is by definition a subset of the set, the only
other possible interior point is 2.
Now we will show that 2 is not an interior point of the interval [2, 5).
We argue by contradiction. Suppose 2 is an interior point. Then there
exists an r > 0 such that B(2, r ) ⊂ [2, 5). Choose such an r. Then y :=
(2 − r/2) ∈ B(2, r ) but y ∈/ [2, 5). This is a contradiction. Therefore 2
is not an interior point of the interval [2, 5).

The interior of a set is always open.

Proposition 11.1.8. Let ( X, dist) be a metric space and let A ⊂ X. Then


int A is open.

At the end of this section we provide a few ways to create new open sets
out of sets about which you already know that they are open.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 139

The union of open sets is always open

Unions of open sets are always open. You may recall that if I is some set,
and if for every α ∈ I we have a subset Aα ⊂ X, then the union
[
Aα ⊂ X
α∈I

is defined as
[
Aα := { x ∈ X | there exists α ∈ I such that x ∈ Aα }.
α∈I

Proposition 11.1.9. Let ( X, dist) be a metric space, let I be some set


and assume that for every α ∈ I , we have a subset Oα ⊂ X. Suppose
moreover that for all α ∈ I , the set Oα is open. Then also the union
[

α∈I

is open.

Example 11.1.10. We already know that for every n ∈ N, the interval


(2n, 2n + 1) is an open subset of (R, | · |). Therefore, (choosing I = N
and Oα = (2α, 2α + 1) in the previous proposition,) we also know that
the set [
(2n, 2n + 1)
n ∈N

is an open subset of (R, | · |) as well.

Finite intersections of open sets are open

Proposition 11.1.11. Let ( X, dist) be a metric space and let O1 , . . . , O N


be open subsets of X. Then the intersection

O1 ∩ · · · ∩ O N
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 140

is also open.

Cartesian products of open sets

Proposition 11.1.12. Let O1 , · · · , Od be open subsets of R. Then

O1 × · · · × Od (= {(o1 , · · · , od ) | oi ∈ Oi })

is an open subset of (Rd , k · k2 ).

11.2 Closed sets


We are now ready to give a definition of a closed set.

Definition 11.2.1. Let ( X, dist) be a metric space. We say a set C ⊂ X


is closed if its complement X \ C is open.

Both the empty set and the full set are closed.

Proposition 11.2.2. Let ( X, dist) be a metric space. Then both the empty
set ∅ and the set X itself are closed.

Proof. The empty set is closed because its complement, X \ ∅ = X is


open by Proposition 11.1.5.
The set X is closed because its complement, X \ X = ∅ is open by
Proposition 11.1.5.

Warning: The following two facts may conflict your expectations when
you use intuition for the meaning of ‘open’ and ‘closed’ from daily life:

i. Combining Propositions 11.1.5 and 11.2.2 we see that both the


empty set and the full set X are both open and closed.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 141

ii. In Exercise 11.6.2 we will see that there is a set (and many more)
that are neither open nor closed.

What does it mean in practice? If you want to show that a set is closed,
it is not enough to show that the set is not open.

Proposition 11.2.3 (Sequence characterization of closedness). A set C ⊂


X is closed if and only if for every sequence (cn ) in C converging to
some x ∈ X, it holds that x ∈ C.

Proof. Assume C is closed. Let (cn ) be a sequence in C converging to


some x ∈ X. We need to show that x ∈ C, and we will argue by
contradiction. So suppose that x ∈ O := X \ C. Since C is closed, we
have (by definition of ‘closed’) that O is open. Since x ∈ O, and O is
open, x is (by the definition of ‘open’) an interior point of O. Therefore,
there exists an r > 0 such that B( x, r ) ⊂ O.
Since the sequence (cn ) converges to x, we may obtain an N ∈ N such
that for all n ≥ N, dist(cn , x ) < r. In particular, dist(c N , x ) < r which
means that c N ∈ B( x, r ) ⊂ O, which is a contradiction because c N ∈ C.
Now assume that for every sequence (cn ) in C converging to some
x ∈ X, it holds that x ∈ C. We want to show that O := X \ C is open,
i.e. that every p ∈ O is an interior point. Let p ∈ O. We need to show
that p is an interior point. We argue by contradiction. Suppose p ∈ O
is not an interior point, then

for all r > 0,


(11.2.1)
B( p, r ) 6⊂ O.

In other words
for all r > 0,
there exists a ∈ B( p, r ) (11.2.2)
a∈ / O.

Then there exists a point p ∈ X \ C which is not an interior point.


CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 142

We claim that there exists a sequence y : N → C with values in C


converging to p. For n ∈ N we choose r := 2−n in (11.2.2). Then there
exists an an ∈ B( p, 2−n ) such that an ∈
/ O. Choose such an an and
define yn := an . Note that yn ∈ C and

0 ≤ dist(yn , p) < 2−n .

It now follows by the squeeze theorem (Theorem 6.4.1) that

lim dist(yn , p) = 0
n→∞

and therefore (by Proposition 5.6.1) that

lim yn = p.
n→∞

Since (yn ) converges to p, we know by assumption that p ∈ C, which


is a contradiction.

Here is a typical example of how you can show that set is closed.

Example 11.2.4. Consider the subset A of the metric space (R2 , k · k2 )


defined by
A := {( x1 , x2 ) ∈ R2 | x1 ≤ ( x2 )2 }

Proof. By the sequence characterization of closedness, it suffices to show


that for all sequences y : N → A, if the sequence y converges to some
point z ∈ R2 , then actually z ∈ A.
Let therefore y : N → A be a sequence in A. Assume that the sequence
y : N → A converges to some point z ∈ X. We need to show that
actually z ∈ A.
By Proposition 6.7.8, we know that the component sequences of the
sequence y converge as well to the components of z ∈ R2 , namely
(n)
lim y = z1
n→∞ 1
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 143

and
(n)
lim y = z2 .
n→∞ 2

(n)
By limit theorems, we know that the limit of the sequence n 7→ (y2 )2
also exists and
(n) 2
 
lim y2 = ( z2 )2 .
n→∞
(n)
Since for all n ∈ N, y(n) ∈ A, we also know that for all n ∈ N, y1 ≤
(n)
( y2 )2 . Therefore,

(n) 2
 
(n)
z1 = lim y1 ≤ lim y2 = ( z2 )2 .
n→∞ n→∞

We conclude that indeed z ∈ A.

Proposition 11.2.5. Let a, b ∈ R with a < b. Then the intervals [ a, b],


(−∞, b] and [ a, ∞) are all closed.

We now provide a few ways to create new closed sets out of sets about
which you already know that they are closed.

Intersections of closed sets are always closed

Let ( X, dist) be a metric space. If I is a set, and for every α ∈ I , we have a


subset Aα of X, then the intersection
\

α∈ I

is defined as
\
Aα := { x ∈ X | for all α ∈ I , x ∈ Aα }.
α∈ I
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 144

Proposition 11.2.6. Let ( X, dist) be a metric space. Let I be a set and


suppose for every α ∈ I we have a subset Cα ⊂ X. Assume that for
every α ∈ I the set Cα is closed. Then the intersection
\

α∈I

is closed as well.

Finite unions of closed sets are closed


Proposition 11.2.7. Let ( X, dist) be a metric space. Let C1 , . . . CN be
closed subsets of X. Then the finite union

C1 ∪ · · · ∪ CN

is also closed.

Products of closed sets


Proposition 11.2.8. Let C1 , . . . , Cd be closed subsets of R. Then the
Cartesian product

C1 × · · · × Cd (= {(c1 , · · · , cd ) | ci ∈ Ci })

is a closed subset of (Rd , k · k2 ).

The topological boundary of a set

We now give the definition of the topological boundary of a subset A of a


metric space ( X, dist).

Although for some sets, the topological boundary may coincide with
what you intuitively think of as a ‘boundary’ of a set, for many sets
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 145

the topological boundary is a very counter-intuitive set!

Definition 11.2.9 (The topological boundary). Let ( X, dist) be a metric


space and let A ⊂ X. The topological boundary of a set A is denoted by
∂A and defined as

∂A := X \ (int A) ∪ (int( X \ A))

Example 11.2.10. The topological boundary of the interval [2, 5) (viewed


as a subset of the normed vector space (R, | · |) ) is the set {2, 5} that
exactly consists of the points 2 and 5.

Proof. In a previous example we have already shown that

int[2, 5) = (2, 5).

Moreover, R \ [2, 5) = (−∞, 2) ∪ [5, ∞).


With a similar argument, we can show that

int[5, ∞) = (5, ∞).

and
int(R \ [2, 5)) = (−∞, 2) ∪ (5, ∞).
Therefore

∂([2, 5)) = R \ ((2, 5)) ∪ ((−∞, 2) ∪ (5, ∞)) = {2, 5}.

11.3 Cauchy sequences


Recall that the aim of this chapter and the next is to define subsequently
stronger properties for subsets of a metric space: closedness, completeness
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 146

and compactness. In the previous section, we have defined what it means


for a subset of a metric space to be closed, and now we are slowly going
to make our way towards the definition of complete subsets. For that, we
need the concept of Cauchy sequences.

Definition 11.3.1 (Cauchy sequence). Let ( X, dist) be a metric space.


We say that a sequence a : N → X is a Cauchy sequence if

for all e > 0,


there exists N ∈ N,
for all m, n ≥ N,
dist( am , an ) < e.

Proposition 11.3.2. Every Cauchy sequence is bounded.

Proof. We need to show that every Cauchy sequence is bounded. So let


a : N → X be a Cauchy sequence. We need to show that a is bounded,
i.e. we need to show that
there exists p ∈ X,
there exists M > 0,
for all n ∈ N,
dist( an , p) ≤ M.

Because ( an ) is a Cauchy sequence, we know that there exists an N ∈


N such that for every m, n ≥ N, it holds that dist( am , an ) < 1. Choose
p := a N and choose

M := max(dist( a0 , p), dist( a1 , p), . . . , dist( a N −1 , p), 1).

Let n ∈ N. Then
dist( an , p) ≤ M.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 147

Proposition 11.3.3. Let a : N → X be a Cauchy sequence and assume


that a has a subsequence converging to p ∈ X. Then the sequence a
itself converges to p.

Proof. Let a : N → X be a Cauchy sequence and assume that a has a


subsequence a ◦ n converging to p ∈ X, where n : N → N is an index
sequence. We need to show that a converges to p, i.e. we need to show
that
for all e > 0,
there exists N ∈ N,
for all m ≥ N,
dist( am , p) < e.

Let e > 0. Because a is a Cauchy sequence, there exists an `0 such that


for all `, m ≥ `0 ,
dist( a` , am ) < e/2.
Choose N := `0 . Let m ≥ N. Because a ◦ n converges to p, there exists
a k0 ∈ N such that for all k ≥ k0

dist( ank , p) < e/2.

Because n is an index sequence, there exists a k ∈ N such that nk ≥ `0 .


We find by the triangle inequality that indeed

dist( am , p) ≤ dist( am , ank ) + dist( ank , p) < e/2 + e/2 = e.

Proposition 11.3.4. Let ( X, dist) be a metric space. Let ( xn ) be a con-


verging sequence in X. Then ( xn ) is a Cauchy sequence.

Proof. Assume that ( xn ) is a converging sequence, converging to a


point p ∈ X, say. Let e > 0. Because ( xn ) converges to p, there ex-
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 148

ists an N0 ∈ N such that for all m ≥ N0 ,

dist( xm , p) < e/2.

Choose such an N0 and choose N := N0 . Let m, n ≥ N. Then

dist( xm , xn ) ≤ dist( xm , p) + dist( p, xn ) < e/2 + e/2 = e.

11.4 Completeness
We first now give a definition of completeness for a metric space.

Definition 11.4.1. Let ( X, dist) be a metric space. We say that a sub-


set A ⊂ X is complete (in ( X, dist)) if every Cauchy sequence in A is
convergent, with limit in A.
We also say the metric space ( X, dist) itself is complete if X is a com-
plete subset of X in ( X, dist).

Note that we have used the term ‘complete’ various times in the lecture
notes: completeness of a totally ordered field, the series characterization
of completeness in normed vector spaces and now completeness of a metric
space. In the next section we will see that a normed vector space satisfies
the series characterization of completeness if and only if the correspond-
ing metric space is complete. What we will do next is show that the metric
space (R, distR ) is complete (as a metric space). Under the hood, we re-
ally use the Completeness Axiom 4.2.7 for this: that axiom is really what
makes everything work.

Theorem 11.4.2. The metric space (R, distR ) is complete.

Proof. Let a : N → R be a Cauchy sequence. Because a is a Cauchy


sequence, it is in particular bounded. As a consequence, by Theorem
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 149

10.4.3, there is a subsequence a ◦ n such that a ◦ n converges to

lim sup ak .
k→∞

Finally, we know from Proposition 11.3.3 that if a subsequence of a


Cauchy sequence converges, that then the whole sequence converges.
Therefore, the sequence a : N → R is convergent.

Proposition 11.4.3. The metric space (Rd , distk·k2 ) is complete, where


k · k2 is the Euclidean norm.

The proof of Proposition 11.4.3 is the topic of Orange Exercise 11.6.3.

Proposition 11.4.4. Let ( X, dist) be a metric space. Suppose A ⊂ X is


complete. Then A is closed.

Proof. Let ( xn ) be a sequence in A, converging to a point x ∗ ∈ X. Then


( xn ) is a Cauchy sequence. Since A is complete, the sequence ( xn ) is
converging to a point p ∈ A. By uniqueness of limits, we know that
x ∗ = p. We conclude that x ∗ ∈ A.

The following proposition says that a subset of a complete set is complete


if and only if it is closed.

Proposition 11.4.5. Let ( X, dist) be a metric space and let C ⊂ X be a


complete subset. Let A ⊂ C be a subset of C. Then, A is complete if
and only if A is closed.

Proof. The “only if” side of this proposition follows from Proposition
11.4.4.
We will now show the “if” part of the proposition. Suppose A is closed.
Let ( xn ) be a Cauchy sequence in A. Then ( xn ) is also a sequence in C.
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 150

Since C is complete, there exists a point p ∈ C such that ( xn ) converges


to p. Because A is closed, in fact p ∈ A.

11.5 Series characterization of completeness in


normed vector spaces
We will now show that a normed vector space satisfies the series charac-
terization of completeness if and only if the corresponding metric space is
complete. We already know that (Rd , k · k) is complete, therefore after we
have proved the theorem we know that if a series in (Rd , k · k) converges
absolutely then it also converges.

Theorem 11.5.1. Let (V, k · k) be a normed vector space. Then (V, k · k)


is complete if and only if every absolutely converging series is conver-
gent.

Proof. We first show that ‘only if’ direction. Suppose (V, k · k) is com-
plete. Let a : N → V be a sequence and suppose the series

∑ k ak k
k =0

is convergent. Consider also the sequence of partial sums


n
Sn := ∑ ak .
k =0

We are going to show that (Sn ) is a Cauchy sequence. Let e > 0. Since
the series

∑ k ak k
k =0
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 151

converges, we know by Proposition 7.5.2 that



lim
N →∞
∑ k ak k = 0.
k= N

Choose N ∈ N such that



∑ k ak k < e.
k= N

Let m, n ≥ N. Assume without loss of generality that n > m. Then

n n ∞
kSn − Sm k = ∑ ak ≤ ∑ k ak k ≤ ∑ k ak k < e.
k = m +1 k = m +1 k = m +1

We have shown that (Sn ) is a Cauchy sequence. Since (V, k · k) was


assumed to be complete, the sequence (Sn ) is convergent.
We now show the ‘if’ direction. Suppose that every absolutely con-
verging series in V is convergent. Let a : N → V be a Cauchy se-
quence. We need to show that a : N → V is convergent.
We can construct a subsequence such that

k a n k +1 − a n k k ≤ 2 − k .

Define the sequence b : N → V by

bk : = a n k +1 − a n k .

Then the series



∑ k bk k
k =0
is convergent by comparison with the converging series

∑ 2− k .
k =0
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 152

It follows that the series



∑ bk
k =0
is convergent. Note that the partial sums corresponding to this series
are
`
`
S = ∑ bk = an`+1 − an0 .
k =0

It follows by limit theorems that the sequence ( an` ) is convergent. Then


it follows by Proposition 11.3.3 that the sequence ( am ) converges as
well.
Corollary 11.5.2. Let a : N → R be a real-valued sequence. Suppose the
series

∑ an
n =0
converges absolutely, i.e. the series

∑ | an |
n =0

converges. Then also the series



∑ an
n =0

converges.

Example 11.5.3. The series



1
∑ (−1)k k2
k =1

converges, because it converges absolutely. Indeed,


∞ ∞
1 1
∑ (−1)k
k 2
= ∑ k2
k =1 k =1
CHAPTER 11. POINT-SET TOPOLOGY OF METRIC SPACES 153

is a standard converging hyperharmonic series.

11.6 Exercises

11.6.1 Blue exercises

Exercise 11.6.1. Let (V, k · k) be a normed linear space and let A be the
closed ball of radius 1 around the origin, i.e.

A : = { v ∈ V | k v k ≤ 1}.

Show that the set A is closed.

Exercise 11.6.2. Show that the interval [0, 1) is neither open nor closed
(seen as a subset of the normed linear space (R, | · |) ).

Note the moral of the previous exercise: there are sets that are neither open
nor closed.

11.6.2 Orange exercises

Exercise 11.6.3. Prove Proposition 11.4.3.

Exercise 11.6.4. Consider the following line in R2

L := {( x, y) ∈ R2 | x + 2y = 1}.

Show that L is a closed subset of R2 and that L is complete.

Exercise 11.6.5. Give an example of a metric space ( X, dist) that is not com-
plete (as always, actually prove that ( X, dist) is indeed not complete).

Exercise 11.6.6. Consider the following subset A of R2

A := {( x1 , x2 ) ∈ R2 | 4( x1 )2 + ( x2 )2 ≤ 25}.

Prove that the set A is a closed and bounded subset of (R2 , k · k2 ).


Chapter 12

Compactness

In this chapter, we are going to define what it means for a subset of a met-
ric space to be compact. Compactness is a strong property: Every compact
subset is complete, and every complete subset is closed. We will in this
chapter also give an alternative characterization of compactness: We will
define what it means for a subset to be totally bounded and will use this
concept to show that a subset is compact if and only if it is complete and
totally bounded. In (Rd , k · k2 ) however, we will see that a subset is com-
pact if and only if it is closed and bounded.

12.1 Definition of (sequential) compactness


In this short section, we define what it means for a subset of a metric space
to be (sequentially) compact. We usually leave out the word ‘sequentially’.

Definition 12.1.1 ((sequential) compactness). Let ( X, dist) be a metric


space. We say a subset K ⊂ X is (sequentially) compact if every sequence
x : N → K in K has a converging subsequence x ◦ n, converging to a
point z ∈ K.

The rest of the chapter will be devoted to deriving alternative characteri-


zations of compactness. Especially in (Rd , k · k), these alternative charac-
terizations are a bit easier to deal with.

154
CHAPTER 12. COMPACTNESS 155

12.2 Boundedness and total boundedness


We first define what it means for a subset of a metric space to be bounded.
This definition has many similarities with the definition of boundedness
for sequences.

Definition 12.2.1 (bounded sets). Let ( X, dist) be a metric space. We


say that a subset A ⊂ X is bounded if

there exists q ∈ X,
there exists M > 0,
for all p ∈ A,
dist( p, q) ≤ M.

Just as with the concept of boundedness for sequences, in normed vector


spaces boundedness has a somewhat easier alternative characterization.

Proposition 12.2.2. Let (V, k · k) be a normed linear space. A subset


A ⊂ V is bounded if and only if

there exists M > 0,


for all v ∈ A,
kvk ≤ M.

We will now define what it means for a subset to be totally bounded. In-
tuitively, it means that for every radius r > 0 (which could be extremely
small) the subset can be covered with only a finite number of balls with
radius r.

Definition 12.2.3 (totally bounded sets). Let ( X, dist) be a metric space.


CHAPTER 12. COMPACTNESS 156

We say that a subset A ⊂ X is totally bounded if

for all r > 0,


there exists N ∈ N,
there exists p1 , . . . , p N ∈ X,
N
[
A⊂ B ( p i , r ).
i =1

In the next proposition we will see that “total boundedness” is a stronger


property than just “boundedness”.

Proposition 12.2.4. Let ( X, dist) be a metric space and let A be a subset


of X. If A is totally bounded, it is bounded.

Proof. Assume A is totally bounded. We need to show that A is bounded,


i.e. we need to show that
there exists q ∈ X,
there exists M > 0,
for all p ∈ A,
dist( p, q) ≤ M.

Because A is totally bounded, we may obtain an N ∈ N and points


p1 , . . . , p N ∈ X such that
N
[
A⊂ B ( p i , 1).
i =1

Choose q := p1 .
Choose M := max(dist( p2 , p1 ), . . . , dist( p N , p1 )) + 1.
Let p ∈ A. Then there exists an i ∈ {1, . . . , N } such that p ∈ B( pi , 1).
CHAPTER 12. COMPACTNESS 157

It follows that
dist( p, q) = dist( p, p1 )
≤ dist( p, pi ) + dist( pi , p1 )
≤ 1 + max(dist( p2 , p1 ), . . . , dist( p N , p1 ))
= M.

In the special case of the normed vector space (Rd , k · k2 ), however, a sub-
set is totally bounded if and only if it is bounded.

Proposition 12.2.5. Consider now the normed vector space (Rd , k · k2 ).


A subset A ⊂ Rd is bounded in (Rd , k · k2 ) if and only if it is totally
bounded.

Proof. The “if” direction follows from the previous theorem.


Let us prove the “only if” direction. Suppose A ⊂ Rd is bounded.
Then we may obtain an M > 0 such that for all x ∈ A, k x k2 ≤ M. We
need to show that
for all r > 0,
there exists N ∈ N,
there exists p1 , . . . , p N ∈ Rd ,
N
[
A⊂ B ( p i , r ).
i =1

Let r > 0. Define δ := r/(2 d). To choose N ∈ N and the points
p1 , . . . , p N we are going to make a large grid. We define the set

G := [− M, M]d ∩ (δZ)d .

These are all points in Rd of which the coordinates are an integer mul-
tiple of δ, and lie between − M and M. Note that G is a finite set, so we
CHAPTER 12. COMPACTNESS 158

can choose an N ∈ N and points p1 , . . . , p N such that

G = { p1 , . . . , p N }.

Now let a ∈ A. It suffices to show that there exists a point g ∈ G such


that
dist( a, g) < r.
We can express a in its components in Rd as a = ( a1 , . . . , ad ). We can
then define the point

g := (δd a1 /δe, . . . , δd ad /δe)

in G. Note that for all i ∈ {1, . . . , d}, | gi − ai | ≤ δ. Therefore


v v

u d u d
r
k g − ak2 = t ∑ ( gi − ai )2 ≤ t ∑ δ2 = dδ ≤ < r.
u u

i =1 i =1
2

12.3 Alternative characterization of compactness


We are now ready to show that a subset K ⊂ X is compact if and only if
it is complete and totally bounded. The proof is one of the most beautiful,
but also one of the most complicated in these lecture notes. You may want
to skip it on first reading.

Theorem 12.3.1. A subset K ⊂ X is compact if and only if it is complete


and totally bounded.

Proof. We first show the “only if” direction. Suppose K ⊂ X is com-


pact.
We are going to show that K is totally bounded. We argue by contra-
CHAPTER 12. COMPACTNESS 159

diction. So suppose the set K is not totally bounded. Then

there exists r > 0,


for all N ∈ N,
for all p1 , . . . , p N ∈ X,
N
[
K 6⊂ B ( p i , r ).
i =1

Choose such an r > 0. We can now inductively construct a sequence


of points p0 , p1 , p2 , . . . in K such that

dist( pi , p j ) ≥ r

for every i 6= j. Note that we can also phrase this last property as that
for all k ∈ N, and all i < k, dist( pi , pk ) ≥ r.
We first just take some point p0 ∈ K. Now let k ∈ N and assume the
points p0 , . . . , pk have already been defined, and dist( pi , pk ) ≥ r for
i ∈ {0, . . . , k − 1}. Then we know that
k
[
K 6⊂ B ( p i , r ).
i =0

In other words, there exists a point


!
k
[
q∈K\ B ( pi , r ) .
i =0

Now define pk+1 := q. Then indeed for all i = 0, . . . , k, it holds that


dist( pk+1 , pi ) ≥ r.
Because K is compact, there is a converging subsequence ( pni ) in K.
In particular, the sequence ( pni ) is a Cauchy sequence, so that for k, `
large enough in fact
dist( pnk , pn` ) < r.
This is a contradiction.
CHAPTER 12. COMPACTNESS 160

We will now show that K is complete. Let ( xn ) be a Cauchy sequence


in K. Since K is compact, there is a converging subsequence ( xnk ) of
( xn ), converging to a point z ∈ K. But then the original sequence ( xn )
converges to z as well by Proposition 11.3.3.
We will now show the “if” direction. Assume that K ⊂ X is complete
and totally bounded. We are going to show that K is compact.
Let ( xn ) be an arbitrary sequence in K. We are going to construct a
limit point x ∗ ∈ K by what is called a diagonal argument. The prepa-
ration for this is as follows. We are going to define subsequences of
( xn ) inductively. Precisely, we will use induction to, for every k ∈ N,
construct an index sequence n(k) : N → N. Said differently, we are
going to construct a sequence of index sequences. Moreover, we will
construct this sequence of index sequences in such a way that for ev-
ery k̃, k ∈ N, if k̃ ≥ k then the index sequence n(k̃) is a subsequence of
the index sequence n(k) , and such that for all k ∈ N, and all `, j ∈ N,
 
2
dist x (k) , x (k) < . (12.3.1)
n` nj k

We will now tackle the base case of the inductive definition. For this,
we let the index sequence n(0) : N → N be just the identity function,
i.e.
(0)
n ` : = `.
We now continue with the inductive step of the inductive definition.
Let k ∈ N and assume that the index sequence n(k−1) : N → N is de-
fined for some k ∈ N \ {0}. We are going to define the index sequence
n(k) : N → N as a subsequence of n(k−1) . Note that there exists an
(k) (k)
Nk ∈ N and points p1 , . . . , p N ∈ K such that
k

Nk  
[ (k)
K⊂ B pi , 1/k .
i =1

Said differently, the set K is covered by only finitely many balls of ra-
(k) (k)
dius 1/k. Hence, there exists a point pi such that x (k−1) ∈ B( pi , 1/k )
k n` k
CHAPTER 12. COMPACTNESS 161

for infinitely many ` ∈ N. Therefore, there exists an index sequence


n(k) : N → N, itself a subsequence of n(k−1) , such that
 
(k)
x (k) ∈ B pi , 1/k
n` k

for all ` ∈ N. In particular, for all `, j ∈ N,


     
(k) (k) 2
dist x (k) , x (k) ≤ dist x (k) , pi + dist pi , x (k) < .
n` nj n` k k nj k

Now define the sequence m : N → N


(`)
m` := n` .

We claim that m : N → N is an index sequence. To prove the claim,


we need to show that m : N → N is strictly increasing. This follows
because for every ` ∈ N,
(`) (`) (`+1)
m` = n` < n`+1 ≤ n`+1 = m`+1

where for the strict inequality we used that n(`) is an index sequence
and therefore strictly increasing, while the second inequality follows
because for every two index sequences a, b : N → N and every i ∈ N,
abi ≥ ai (this can be applied to the case where a ◦ b equals n(`+1) and a
equals n(`) ). This finishes the proof of the claim that m : N → N is an
index sequence.
We now claim that the sequence ( xm` )` is a Cauchy sequence. Let e >
˜
0. Choose M := d2/ee + 1. Let `˜ , j̃ ≥ M. Then, because n(`) is a
subsequence of n( M) , we can find an ` ∈ N such that
˜
(`) ( M)
m`˜ = n`˜ = n` .
CHAPTER 12. COMPACTNESS 162

( M)
For the same reason, we may find a j ∈ N such that m j̃ = n j . It
follows by (12.3.1) that
 
  2
dist xm`˜ , xm j̃ = dist x ( M) , x ( M) < < e.
n` nj M

Since K is complete, the sequences ( xm` )` is convergent with a limit


x ∗ ∈ K. This was what we needed to show.

In the special case of (Rd , k · k2 ) we have an easier alternative characteri-


zation of compactness.

Theorem 12.3.2 (Heine-Borel Theorem). A subset of (Rd , k · k2 ) is com-


pact if and only if it is closed and bounded.

Proof. We first show the “only if” direction. Suppose A ⊂ Rd is com-


pact. We will first show that A is closed. By the alternative character-
ization of compactness in Theorem 12.3.1, we know that the set A is
complete. By Proposition 11.4.4, which says that every complete set is
closed, it follows that the set A is closed.
We will now show that A is bounded. By Theorem 12.3.1 we know
that A is totally bounded. By Proposition 12.2.4, which says that every
totally bounded set is bounded, we know that A is bounded.
Suppose now that A ⊂ Rd is closed and bounded. Because Rd is com-
plete, the closed set A is a subset of a complete set, and (by Proposi-
tion 11.4.5) therefore itself complete as well. Moreover, by Proposition
12.2.5, we know that every bounded subset of (Rd , k · k2 ) is also to-
tally bounded. In particular, the set A is totally bounded. We conclude
that A is complete and totally bounded, and therefore compact by the
alternative characterization of compactness in Theorem 12.3.1.
CHAPTER 12. COMPACTNESS 163

12.4 Exercises

12.4.1 Blue exercises

Exercise 12.4.1. Let a, b ∈ R be two real numbers such that a < b. Prove
that the interval [ a, b] is a compact subset of the normed vector space (R, | ·
|).
Exercise 12.4.2. Let ( X, dist) be a metric space and let a : N → X be a
sequence in X. Show that the sequence a : N → X is bounded (according
to Definition 5.2.1) if and only if the set

A : = { a n | n ∈ N}

is bounded (according to Definition 12.2.1).

12.4.2 Orange exercises

Exercise 12.4.3. Consider the metric space ((0, 1), dist), where (0, 1) de-
notes the interval from 0 to 1 and dist( x, y) = | x − y|. Prove that (0, 1) is
a closed and bounded subset of the metric space ((0, 1), dist). Also prove
that (0, 1) is not a compact subset of the metric space ((0, 1), dist).

The moral of this exercise is that the Heine-Borel theorem (Theorem 12.3.2)
is an alternative characterization of compactness in (Rd , k · k2 ): for other
metric spaces or normed vector spaces it does not hold that subsets are
compact if and only if they are closed and bounded.
Exercise 12.4.4. Let ( X, dist) be a metric space and let K ⊂ X be a compact
subset. Let a : N → X be a sequence with values in X, such that

for all N ∈ N,
there exists ` ≥ N, (12.4.1)
a` ∈ K.
(this is formal equivalent way of saying that there are infinitely many ` ∈
N such that a` ∈ K).
The exercise consists of two parts:
CHAPTER 12. COMPACTNESS 164

i. Use (12.4.1) to inductively define an index sequence n : N → N such


that for every k ∈ N, ank ∈ K. See the best practices on how to set up
such an inductive definition.

ii. Use the fact that K is compact to show that there is a point p ∈ K and
a subsequence of a : N → X converging to p.

Exercise 12.4.5. Consider the sets

A := {( x1 , x2 ) ∈ R2 | x1 − x2 = 1}

and
B := {( x1 , x2 ) ∈ R2 | ( x1 )2 + ( x2 )2 ≤ 1}
Prove that the set A ∩ B is compact (as a subset of the normed vector space
(R2 , k · k2 )).
Chapter 13

Limits and continuity

We are finally ready to treat perhaps the most important target of Anal-
ysis 1: the concepts of limits and continuity. These concepts embody the
adagio of this course: to make rigorous statements about the approximate
behavior of functions.
The setting is as follows: We will consider functions f : D → Y map-
ping from a subset D ⊂ X of a metric space ( X, distX ) to a metric space
(Y, distY ). These are quite some actors: an input metric space ( X, distX ),
a subset D of the metric space, and an output metric space (Y, distY ), and
the concept of limits and continuity depend on all these actors. That makes
it a bit tricky.
On the coarsest level, if p ∈ X and q ∈ Y then the statement that

lim f ( x ) = q
x→ p

will mean that if the distance between x and p is small, but not zero, the
distance between f ( x ) and q will be small. Using a similar approach as
we took with sequences, we will make this vague statement completely
rigorous.
There is, however, one critical, tricky point. This concept of limits only
behaves nicely if p satisfies a special property with respect to the set D on
which f is defined. We will discuss this property in the next section.

165
CHAPTER 13. LIMITS AND CONTINUITY 166

13.1 Accumulation points


To get a useful concept of a limit in a point p ∈ X, the point p needs to be
an accumulation point of the domain D of the function.

Definition 13.1.1 (Accumulation points). Let ( X, distX ) be a metric space


and let D ⊂ X be a subset of X. We say a point p ∈ X is an accumulation
point of the set D if

for all e > 0,


there exists x ∈ D,
0 < distX ( x, p) < e.

We denote the set of accumulation points of a set D by D 0 .

Note that accumulation points of a set D do not have to lie in the set D
themselves. If a point does lie in D, but is not an accumulation point, then
we call it an isolated point.

Definition 13.1.2 (Isolated points). Let ( X, dist) be a metric space and


let D ⊂ X be a subset of X. We say a point a ∈ D is an isolated point if
it is not an accumulation point, i.e. if a ∈ D \ D 0 .

13.2 Limit in an accumulation point


We can now define limits in accumulation points of D.

Definition 13.2.1 (Limit in an accumulation point). Let ( X, distX ) and


(Y, distY ) be two metric spaces and let D ⊂ X be a subset of X. Let
f : D → Y be a function and let q ∈ Y be a point in Y. Let a ∈ D 0 be an
accumulation point of D. Then we say f converges to q as x goes to a,
and write
lim f ( x ) = q
x→a
CHAPTER 13. LIMITS AND CONTINUITY 167

if
for all e > 0,
there exists δ > 0,
for all x ∈ D,
if 0 < distX ( x, a) < δ, then distY ( f ( x ), q) < e.

13.3 Uniqueness of limits

Proposition 13.3.1. Let ( X, distX ) and (Y, distY ) be metric spaces and
let D ⊂ X be a subset of X. Let f : D → Y be a function on D. Let
a ∈ D 0 and assume

lim f ( x ) = p and lim f ( x ) = q


x→a x→a

for points p, q ∈ Y. Then p = q.

Proof. Suppose p 6= q. Choose e := distY ( p, q)/2. Then there exists a


δ1 > 0 such that for all x ∈ D, if 0 < distX ( x, a) < δ1 then

distY ( p, q)
distY ( f ( x ), p) < e =
2
and there exists a δ2 > 0 such that for all x ∈ D, if 0 < distX ( x, a) < δ2
then
distY ( p, q)
distY ( f ( x ), q) < e = .
2
Choose such δ1 > 0 and δ2 > 0.
Now define δ := min(δ1 , δ2 ) > 0. Because a is an accumulation point
of D, there exists a point b ∈ B( a, δ). Then

distY ( p, q)
distY ( f (b), p) <
2
CHAPTER 13. LIMITS AND CONTINUITY 168

and
distY ( p, q)
distY ( f (b), q) < .
2
Therefore by the triangle inequality

distY ( p, q) ≤ distY ( p, f (b)) + distY ( f (b), q)


distY ( p, q) distY ( p, q)
< +
2 2
= distY ( p, q).

which gives a contradiction.

13.4 Sequence characterization of limits

Theorem 13.4.1 (Sequence characterization of limits). Let ( X, distX )


and (Y, distY ) be two metric spaces. Let D ⊂ X. Let f : D → Y
and let a ∈ D 0 . Let q ∈ Y. Then

lim f ( x ) = q
x→a

if and only if

for all sequences ( x n ) in D \ { a} converging to a,


lim f ( x n ) = q.
n→∞

Proof. We will first show the “only if” direction.


So assume that limx→a f ( x ) = q. We need to show that

for all sequences ( x n ) in D \ { a} such that lim x n = a,


n→∞
n
lim f ( x ) = q.
n→∞

Take therefore a sequence ( x n ) in D \ { a} such that limn→∞ x n = a.


We now need to show that limn→∞ f ( x n ) = q, i.e. we need to show
CHAPTER 13. LIMITS AND CONTINUITY 169

that
for all e > 0,
there exists N ∈ N,
for all n ≥ N,
distY ( f ( x n ), q) < e.

Let e > 0. Because limx→ a f ( x ) = q, there exists a δ > 0 such that for
all x ∈ D, if 0 < distX ( x, a) < δ then

distY ( f ( x ), q) < e.

Choose such a δ > 0.


Because limn→∞ x n = a, we know that there exists an N ∈ N such that
for all n ≥ N,
distX ( x n , a) < δ.
Choose such an N ∈ N.
Let n ≥ N. Then because distX ( x n , a) < δ indeed

distY ( f ( x n ), q) < e.

This finishes the proof of the “only if” direction.


We will now show the “if” direction. Assume that
for all sequences ( x n ) in D \ { a} such that lim x n = a,
n→∞
n
lim f ( x ) = q.
n→∞

We will show that


lim f ( x ) = q.
x→a
We argue by contradiction. So assume there exists an e > 0 such that
for all δ > 0 there exists a point x ∗ ∈ D such that 0 < distX ( x ∗ , a) < δ
CHAPTER 13. LIMITS AND CONTINUITY 170

but distY ( f ( x ∗ ), q) ≥ e. Choose such an e > 0. Then

for all δ > 0,


there exists x ∗ ∈ D, (13.4.1)
(0 < distX ( x ∗ , a) < δ) and distY ( f ( x ∗ ), q) ≥ e.

We are now going to define a sequence ( x n ) in D \ { a}, converging to


a. Let n ∈ N. Choose δ := 2−n in (13.4.1). Then we may obtain an
x ∗ ∈ D such that 0 < distX ( x ∗ , a) < 2−n while distY ( f ( x ∗ ), q) ≥ e.
Define x n := x ∗ .
Then the sequence ( x n ) is a sequence in D \ { a} converging to a, how-
ever it does not hold that limn→∞ f ( x (n) ) = q, which is a contradic-
tion.

13.5 Limit laws


Just as with limits of sequences, if we need to show that a certain limit
exists or if we need to determine its value, we usually want to avoid going
back to the formal definition. Instead, we would like to rely on limit laws
like the following.

Theorem 13.5.1. Let ( X, distX ) be a metric space and let (V, k · k) be a


normed vector space. Let D ⊂ X and let f : D → V and g : D → V be
two functions. Let a ∈ D 0 . Moreover, assume that the limit limx→ a f ( x )
exists and equals p ∈ V and the limit limx→ a g( x ) exists and equals
q ∈ V. Let λ ∈ R. Then

i. The limit limx→a ( f ( x ) + g( x )) exists and equals p + q.

ii. The limit limx→a (λ f ( x )) exists and equals λp.

13.6 Continuity
CHAPTER 13. LIMITS AND CONTINUITY 171

Definition 13.6.1 (Continuity in a point). Let ( X, distX ) and (Y, distY )


be two metric spaces and let D ⊂ X be a subset of X. We say a function
f : D → Y is continuous in a point a ∈ D ∩ D 0 if

lim f ( x ) = f ( a).
x→a

If a ∈ D is an isolated point, i.e. if a ∈ D \ D 0 , then we also say that f


is continuous in a.

We say a function is continuous if it is continuous in every point in its


domain.

Definition 13.6.2 (Continuity on the domain). Let ( X, distX ) and (Y, distY )
be two metric spaces and let D ⊂ X be a subset of X. We say a function
f : D → Y is continuous on D if f is continuous in a for every a ∈ D.

Sometimes it is a bit cumbersome to make the distinction between isolated


points and accumulation points. The following alternative characteriza-
tion of continuity in a point circumvents this issue.

Proposition 13.6.3 (Alternative e − δ characterization of a continuity


in a point). Let ( X, distX ) and (Y, distY ) be two metric spaces and let
D ⊂ X be a subset of X. Let a ∈ D. Then the function f is continuous
in a if and only if

for all e > 0,


there exists δ > 0,
for all x ∈ D,
if 0 < distX ( x, a) < δ, then distY ( f ( x ), f ( a)) < e.

13.7 Sequence characterization of continuity


As with many concepts in the course, continuity is conveniently probed
with sequences.
CHAPTER 13. LIMITS AND CONTINUITY 172

Theorem 13.7.1 (Sequence characterization of continuity). Let ( X, distX )


and (Y, distY ) be metric spaces. Let D ⊂ X and let f : D → Y be a func-
tion. Let a ∈ D. The function f is continuous in a if and only if

for all sequences ( x n ) in D converging to a,


lim f ( x n ) = f ( a).
n→∞

13.8 Rules for continuous functions


The following proposition implies that the composition of two continuous
functions is also continuous.

Proposition 13.8.1. Let ( X, distX ), (Y, distY ) and ( Z, distZ ) be metric


spaces, let D ⊂ X and E ⊂ Y. Let f : D → Y and g : E → Z be
two functions, and assume that f ( D ) ⊂ E. Let a ∈ D. If f is continu-
ous in a and g is continuous in f ( a) then g ◦ f is continuous in a.

Proof. We use the sequence characterization of continuity. We need to


show that for every sequence ( x n ) in D converging to a, in fact

lim ( g ◦ f )( x n ) = ( g ◦ f )( a)
n→∞

or written differently that

lim g( f ( x n )) = g( f ( a)).
n→∞

Let ( x n ) be a sequence in D converging to a. Since f is continuous, we


know that
lim f ( x n ) = f ( a)
n→∞

or in words the sequence ( f ( x n )) converges to f ( a). Because g is con-


tinuous in f ( a), it follows that

lim g( f ( x n )) = g( f ( a))
n→∞
CHAPTER 13. LIMITS AND CONTINUITY 173

which was what we needed to show.

13.9 Images of compact sets under continuous func-


tions are compact

Proposition 13.9.1. Let ( X, distX ) and (Y, distY ) be two metric spaces
and let K ⊂ X be a compact subset of X. Let f : K → Y be continuous
on K. Then f (K ) is a compact subset of Y.

Proof. It suffices to show that for every sequence (yn ) in f (K ) there is


a subsequence (ynk ) and a point p ∈ f (K ) such that ynk → p as k → ∞.
Let (yn ) be a sequence in f (K ). Then there exists a sequence ( xn ) such
that yn = f ( xn ) for every n ∈ N. Because K is compact, there ex-
ists a point z ∈ K and a subsequence ( xnk ) such that the subsequence
converges to z. Because f is continuous, we find

lim ynk = lim f ( xnk ) = f (z) ∈ f (K ).


k→∞ k→∞

Therefore the subsequence (ynk ) of yn converges to the point f (z) in


f (K ), which shows that f (K ) is compact.

13.10 Uniform continuity

Definition 13.10.1. Let ( X, distX ) and (Y, distY ) be metric spaces and
let D ⊂ X be a non-empty subset. We say that f : D → Y is uniformly
continuous on D if
for all e > 0,
there exists δ > 0,
for all p, q ∈ D,
0 < distX ( p, q) < δ =⇒ distY ( f ( p), f (q)) < e.
CHAPTER 13. LIMITS AND CONTINUITY 174

The following proposition shows that uniform continuity is a stronger prop-


erty than continuity.

Proposition 13.10.2. Let ( X, distX ) and (Y, distY ) be metric spaces and
let D ⊂ X be a non-empty subset. Let f : D → Y be uniformly contin-
uous on D. Then f is continuous on D.

Proof. We need to show that for all a ∈ D the function f is continuous


in a.
If a ∈ D \ D 0 , then f is continuous in a by definition. Let a ∈ D 0 . We
need to show that
lim f ( x ) = f ( a).
x→a
Let e > 0. Since f is uniformly continuous, there exists a δ > 0 such
that for all p, q ∈ D,

0 < distX ( p, q) < δ =⇒ distY ( f ( p), f (q)) < e.

Choose such a δ > 0.


Let p ∈ D. Assume that 0 < distX ( p, a) < δ. Then

distY ( f ( p), f ( a)) < e.

Therefore, f is continuous in a.

Although uniform continuity is stronger than continuity, if a function is


continuous on a compact set, it is even uniformly continuous.

Theorem 13.10.3. Let ( X, distX ) and (Y, distY ) be metric spaces, let K ⊂
X be compact and let f : K → Y be continuous on K. Then f is uni-
formly continuous on K.

Proof. We argue by contradiction. Suppose f is not uniformly contin-


uous. Then there exists an e > 0 such that for every δ > 0, there exist
points x, y ∈ K with 0 < distX ( x, y) < δ yet distY ( f ( x ), f (y)) ≥ e. We
will now use this to construct sequences ( pn ) and (qn ) in K.
CHAPTER 13. LIMITS AND CONTINUITY 175

We define these as follows. For n ∈ N, we know that there exist points


pn , qn ∈ K with
1
0 < distX ( pn , qn ) <
n
yet distY ( f ( pn ), f (qn )) ≥ e.
Because K is compact, there exists a subsequence ( pnk ) of ( pn ) con-
verging to some point a ∈ K. Since 0 < distX ( pn , qn ) < n1 , it follows by
the triangle inequality that

0 ≤ distX ( a, qnk ) ≤ distX ( a, pnk ) + distX ( pnk , qnk )


1
< distX ( a, pnk ) +
nk

so that by the squeeze theorem and Proposition 5.6.1 we conclude that


(qnk ) converges to a as well. Since f is continuous in a, we know that

lim distY ( f ( pnk ), f ( a)) = lim distY ( f (qnk ), f ( a)) = 0.


k→∞ k→∞

As a consequence, there exists a k such that

distY ( f ( pnk ), f (qnk )) < e,

which is a contradiction.

13.11 Exercises

13.11.1 Blue exercises

Exercise 13.11.1. Let ( X, distX ) be (R2 , distk·k2 ) (i.e. the metric space associ-
ated to the normed vector space (R2 , k · k2 )) and let (Y, distY ) be (R, distR ).
Let D = B(0, 1) ⊂ R2 . Let f : D → R be defined as
(
x12 + x22 if x 6= (0, 0)
f ( x ) :=
185 if x = (0, 0).
CHAPTER 13. LIMITS AND CONTINUITY 176

Show that  
lim f ( x ) = lim x12 + x22 = 0.
x →(0,0) x →(0,0)

Exercise 13.11.2. Consider the function f : D → R defined by

f (x) = x for x ∈ R

where D = R. Prove that for every a ∈ D, the function f is continuous in


a (when viewed as a function from (R, distR ) to (R, distR )).

Exercise 13.11.3. Let ( X, distX ) := (R, distR ) and set D := N ⊂ R. Let


(Y, distY ) be a metric space and let a : N → Y be a function. Show that
a : N → Y is continuous (when viewed as a function defined on D :=
N as a subset of the metric space ( X, distX ) mapping to the metric space
(Y, distY )).

13.11.2 Orange exercises

Exercise 13.11.4. Let ( X, distX ) and (Y, distY ) be two metric spaces, and let
D ⊂ X be a subset of X. Let f : D → Y be a bounded function (i.e. f ( D )
is a bounded subset of (Y, distY )) and let a ∈ D ∩ D 0 . Define ω : (0, ∞) →
[0, ∞) by

ω (r ) := sup{distY ( f ( x ), f ( a)) | x ∈ D and distX ( x, a) < r }.

Suppose
inf{ω (r ) | r ∈ (0, ∞)} = 0.
Show that f : D → Y is continuous in a ∈ D ∩ D 0 .

Exercise 13.11.5. Let ( X, distX ) and (Y, distY ) be metric spaces, let D ⊂ X
and let f : D → Y. Assume that f : D → Y is Lipschitz continuous, that
means that there exists a constant M > 0 such that for all x, z ∈ D,

distY ( f ( x ), f (z)) ≤ M distX ( x, z).

Show that f : D → Y is uniformly continuous on D.


Chapter 14

Real-valued functions

14.1 More limit laws


If we need to show that a limit exists, or if we need to compute its value, we
usually try to avoid going back to the formal definition of a limit. Instead,
just as we did when we were working with sequences, we prefer to rely
on limit laws.
The following theorem presents some limit laws for real-valued functions.

Theorem 14.1.1 (Limit laws for real-valued functions). Let ( X, dist) be


a metric space, let D be a subset of X and assume that a ∈ D 0 . Let
f : D → R and g : D → R be two real-valued functions and assume
that limx→ a f ( x ) exists and equals M ∈ R and limx→ a g( x ) exists and
equals L ∈ R. Then

i. For every m ∈ N, the limit limx→ a ( f ( x ))m exists and equals Mm .

ii. The limit limx→a ( f ( x ) g( x )) exists and equals ML.

iii. If L 6= 0, the limit


f (x)
lim
x→a g( x )
exists and equals M/L.

177
CHAPTER 14. REAL-VALUED FUNCTIONS 178

iv. If for all x ∈ D, f ( x ) ≥ 0, then for every k ∈ N \ {0},


q √k
lim k f ( x ) = M.
x→a

The proof of this theorem follows from the sequence characterization of


limits (Theorem 13.4.1), and from limit laws for sequences (Theorem 6.3.1).
The proof of part (ii) will be the aim of Exercise 14.12.1.

14.2 Building new continuous functions


The following theorem translates the limit laws of Section 14.1 into state-
ments about continuity.

Theorem 14.2.1. Let ( X, dist) be a metric space, let D be a subset of X


and assume a ∈ D. Let f : D → R and g : D → R be two real-valued
functions that are continuous in a ∈ D.

i. For every m ∈ N, the function f m is continuous in a.

ii. The function f + g is continuous in a.

iii. The function f · g is continuous in a.

iv. If g( a) 6= 0, the function f /g is continuous in a.

v. If
p for all x ∈ D, f ( x ) ≥ 0, then for every k ∈ N \ {0}, the function
k
f is continuous in a.

14.3 Continuity of standard functions

Proposition 14.3.1. Every (possibly multivariate) polynomial is con-


tinuous as a function from (Rd , k · k2 ) to (R, | · |).
CHAPTER 14. REAL-VALUED FUNCTIONS 179

Proposition 14.3.2. Every (possibly multivariate) rational function is


continuous on its domain of definition (viewed as a function from a
domain in (Rd , k · k2 ) to (R, | · |)).

In some sense, we’re not ready for the next proposition: not only do we not
yet have the tools to prove it, worse, we are not even ready to define the
functions involved. Most likely, however, you have seen these functions
in high school or in Calculus. I think it’s good to mention the proposition
now anyway, as it plays a central role when you want to show that some
more complicated functions are continuous.

Proposition 14.3.3 (Continuity of some standard functions). The func-


tions

exp : R → R ln : (0, ∞) → R
sin : R → R arcsin : [−1, 1] → R
cos : R → R arccos : [−1, 1] → R
tan : (−π/2, π/2) → R arctan : R → R

are all continuous.

14.4 Limits from the left and from the right

Definition 14.4.1 (Limit from the left). Let (Y, distY ) be a metric space,
and let D ⊂ R be a subset of R. Let f : D → Y be a function. Let a ∈ R
be such that a ∈ ((−∞, a) ∩ D )0 , i.e. such that a is an accumulation
point of the set (−∞, a) ∩ D in the metric space (R, distR ). Let q ∈ Y.
We say that f ( x ) converges to q as x approaches a from the left (or from
below), and write
 
lim f ( x ) = q or sometimes lim f ( x ) = q
x↑a x → a−
CHAPTER 14. REAL-VALUED FUNCTIONS 180

if
for all e > 0,
there exists δ > 0,
for all x ∈ D ∩ (−∞, a),
0 < distR ( x, a) < δ =⇒ distY ( f ( x ), q) < e.

Definition 14.4.2 (Limit from the right). Let (Y, distY ) be a metric space,
and let D ⊂ R be a subset of R. Let f : D → Y be a function. Let a ∈ R
be such that a ∈ (( a, ∞) ∩ D )0 , i.e. such that a is an accumulation point
of the set ( a, ∞) ∩ D in the metric space (R, distR ). Let q ∈ Y. We say
that f ( x ) converges to q as x approaches a from the right (or from above),
and write
 
lim f ( x ) = q or sometimes lim f ( x ) = q
x↓a x → a+

if
for all e > 0,
there exists δ > 0,
for all x ∈ D ∩ ( a, ∞),
0 < distR ( x, a) < δ =⇒ distY ( f ( x ), q) < e.

14.5 The extended real line


In the next few sections, we will discuss several limits involving infin-
ity. There are many combinations of such limits possible, all of them with
their own types of limit theorems and alternative characterizations, and
because of this it tends to get a bit out of hand. One remedy is to organize
the information, definitions and arguments by introducing the extended
real line. With that tool, we can use previous limit laws and sequence char-
acterizations for the analysis of limits involving infinity.
We first discuss the extended real line as a set.
CHAPTER 14. REAL-VALUED FUNCTIONS 181

Definition 14.5.1 (The extended real line). The extended real line Rext
is the union of the set R and two symbols, ”∞” and ”−∞”. That is
Rext = R ∪ {∞} ∪ {−∞}.

We now want to turn the extended real line into a metric space. For that,
we need to define a distance on the extended real line. We do this by first
defining the map ι : Rext → [−1, 1] by

= −∞,

 −1
 if x
 x ∈ R and x ≥ 0,

1+ x if x
ι( x ) := x
if x ∈ R and x < 0,
 1− x



1 if x = ∞.

Because this function is injective, we can now use Exercise 2.7.1 to build a
distance on Rext .

Definition 14.5.2 (Distance on extended real line). Given the defini-


tion of the injective function ι : Rext → [−1, 1] above, we define the
distance on Rext by

distRext ( x, y) := distR (ι( x ), ι(y)), for x, y ∈ Rext ,

where distR denotes the standard Euclidean distance on R.

14.6 Limits to ∞ or −∞
Definition 14.6.1 (divergence to ∞). Let ( X, distX ) be a metric space,
let D be a subset of X and assume a ∈ D 0 . Let f : D → R. We say that
CHAPTER 14. REAL-VALUED FUNCTIONS 182

f diverges to ∞ in a if

for all M ∈ R,
there exists δ > 0,
for all x ∈ D,
0 < distX ( x, a) < δ =⇒ f ( x ) > M.

Definition 14.6.2 (divergence to −∞). Let ( X, distX ) be a metric space,


let D be a subset of X and assume a ∈ D 0 . Let f : D → R. We say that
f diverges to −∞ in a if

for all M ∈ R,
there exists δ > 0,
for all x ∈ D,
0 < distX ( x, a) < δ =⇒ f ( x ) < M.

Using the extended real line, we can give an alternative characterization


of divergence to ∞. This alternative characterization brings us back to the
‘usual’ limits of functions between metric spaces as introduced in Defini-
tion 13.2.1.

Proposition 14.6.3 (Alternative characterization of divergence to ∞).


Let ( X, distX ) be a metric space, let D be a subset of X and assume
a ∈ D 0 . Let f : D → R. Then f diverges to ∞ in a (as described by
Definition 14.6.1) if and only if f converges in a to the element ∞ ∈ Rext
when viewed as a function mapping from D as a subset of ( X, distX )
to the extended real line (Rext , distRext ).

14.7 Limits at ∞ and −∞


Definition 14.7.1. Let (Y, distY ) be a metric space and let D be a subset
of R that is unbounded from above. Let q ∈ Y. Let f : D → Y be a
CHAPTER 14. REAL-VALUED FUNCTIONS 183

function. We say that f ( x ) converges to q as x → ∞, and write

lim f ( x ) = q
x →∞

if
for all e > 0,
there exists z ∈ R,
for all x ∈ D,
x > z =⇒ distY ( f ( x ), q) < e.

Definition 14.7.2. Let (Y, distY ) be a metric space and let D be a subset
of R that is unbounded from below. Let q ∈ Y. Let f : D → Y be a
function. We say that f ( x ) converges to q as x → −∞, and write

lim f ( x ) = q
x →−∞

if
for all e > 0,
there exists z ∈ R,
for all x ∈ D,
x < z =⇒ distY ( f ( x ), q) < e.

We can also combine divergence to and at infinity.

Definition 14.7.3. Let D ⊂ R be a subset of R that is unbounded from


above. Let f : D → R be a function. We say that f ( x ) diverges to ∞ as
x approaches ∞ and write

lim f ( x ) = ∞
x →∞
CHAPTER 14. REAL-VALUED FUNCTIONS 184

if
for all M ∈ R,
there exists z ∈ R,
for all x ∈ D,
x > z =⇒ f ( x ) > M.

Definition 14.7.4. Let D ⊂ R be a subset of R that is unbounded from


above. Let f : D → R be a function. We say that f ( x ) diverges to −∞
as x approaches ∞ and write

lim f ( x ) = −∞
x →∞

if
for all M ∈ R,
there exists z ∈ R,
for all x ∈ D,
x > z =⇒ f ( x ) < M.

Overview of limit statements


To help get an overview of all the possible formal limit statements, we
have created an overview in Tables 14.1 and 14.2. The first column in the
tables indicate the general form of the limit definition, whereas the second
column indicates what lines in the formal limit definition correspond to
this format. The last column in the table indicates whether the domain
or target actually needs to be the real line for the limit statement to make
sense.
In total, the tables give rise to 15 possible limit statements. The following
example gives an indication on how to use them.
CHAPTER 14. REAL-VALUED FUNCTIONS 185

Limit statement
Lines in formal definition: Needs X = R?
of form:
lim f ( x ) = . . . ..., No
x→a
there exists δ > 0,
for all x ∈ D,
if 0 < distX ( x, a) < δ,
...
lim f ( x ) = . . . ..., Yes
x↑a
there exists δ > 0,
for all x ∈ D ∩ (−∞, a),
if 0 < distX ( x, a) < δ,
...
lim f ( x ) = . . . ..., Yes
x↓a
there exists δ > 0,
for all x ∈ D ∩ ( a, ∞),
if 0 < distX ( x, a) < δ,
...
lim f ( x ) =
x ←−∞ ..., Yes
...
there exists z ∈ R,
for all x ∈ D,
if x < z,
...
lim f ( x ) = . . . ..., Yes
x →∞
there exists z ∈ R,
for all x ∈ D,
if x > z,
...

Table 14.1: Possible patterns for limit statements regarding the domain,
and the lines that correspond to them in the formal definition
CHAPTER 14. REAL-VALUED FUNCTIONS 186

Limit statement
Lines in formal definition: Needs Y = R?
of form:
lim f ( x ) = q for all e > 0, No
...
...,
...,
...,
distY ( f ( x ), q) < e
lim f ( x ) = −∞ for all M ∈ R, Yes
...
...,
...,
...,
f (x) < M
lim f ( x ) = ∞ for all M ∈ R, Yes
...
...,
...,
...,
f (x) > M

Table 14.2: Possible patterns for limit statements regarding the target
space, and the lines that correspond to them in the formal definition
CHAPTER 14. REAL-VALUED FUNCTIONS 187

Example 14.7.5. If we are interested in the formal limit definition for

lim f ( x ) = ∞,
x↑a

we can look in Table 14.1 to find that the pattern

lim f ( x ) = . . .
x↑a

corresponds to the lines

...,
there exists δ > 0,
for all x ∈ D ∩ (−∞, a),
if 0 < distX ( x, a) < δ,
...

and the pattern


lim f ( x ) = ∞
...
corresponds to the lines

for all M ∈ R,
...,
...,
...,
f ( x ) > M.

Combining these, we get as the formal definition

for all M ∈ R,
there exists δ > 0,
for all x ∈ D ∩ (−∞, a),
if 0 < distX ( x, a) < δ,
f (x) > M
CHAPTER 14. REAL-VALUED FUNCTIONS 188

14.8 The Intermediate Value Theorem


Theorem 14.8.1 (Intermediate Value Theorem). Let f : [ a, b] → R be
continuous, and let c ∈ R be a value between f ( a) and f (b). Then,
there exists an x ∈ [ a, b] such that f ( x ) = c.

Proof. Let f : [ a, b] → R be continuous, and let c ∈ R be a value be-


tween f ( a) and f (b). We need to find an x ∈ [ a, b] such that f ( x ) = c.
Without loss of generality, assume that f ( a) < c (otherwise, if f ( a) =
c, then you are already done, or if f ( a) > c first consider g = − f ).
Now define
x := sup{y ∈ [ a, b] | f (y) < c}.
We will show that f ( x ) = c. First, we note that by properties of the
supremum, there exists a sequence ( xn ) in [ a, x ] such that xn → x and
f ( xn ) < c for every n ∈ N. By continuity of f , it holds that

f ( x ) = lim f ( xn ) ≤ c.
n→∞

Similarly, there is a sequence (yn ) in [ x, b] converging to x such that


f (yn ) ≥ c. By continuity of f , it holds that

f ( x ) = lim f (yn ) ≥ c.
n→∞

In conclusion, f ( x ) = c.

14.9 The Extreme Value Theorem


The Extreme Value Theorem states that a continuous, real-valued function
defined on a non-empty, compact domain K always attains both a max-
imum and minimum on K. This is quite special, because the concepts
of maxima and minima come with large alarm bells: maxima and min-
ima may not always exist and one usually needs to be very careful about
this fact. And indeed, there are discontinuous functions defined on com-
pact domains that neither attain a maximum nor a minimum. And there
CHAPTER 14. REAL-VALUED FUNCTIONS 189

are continuous functions defined on non-compact domains that neither at-


tain a maximum nor a minimum. However, continuous functions that are
defined on a compact set always attain a maximum and a minimum.

Theorem 14.9.1 (The Extreme Value Theorem). Let ( X, dist) be a metric


space, let K ⊂ X be a nonempty, compact subset and let f : K → R be
continuous. Then f attains a maximum and a minimum on K.

Proof. Since f is continuous and K is compact, the image f (K ) is com-


pact as well. Therefore, the set f (K ) is bounded, and M := sup f (K ) is
well-defined. We can construct a sequence ( xn ) in K such that

lim f ( xn ) = M.
n→∞

Because K is compact, the sequence ( xn ) has a converging subsequence


( xnk ), to y ∈ K. Because f is continuous,

lim f ( xnk ) = f (y).


k→∞

On the other hand, the sequence ( f ( xnk )) is a subsequence of the se-


quence f ( xn ), so that

M = lim f ( xn ) = lim f ( xnk ) = f (y).


n→∞ k→∞

In conclusion, f attains its maximum in y.


The fact that f attains a minimum on K now follows from the fact that
the (continuous) function − f : K → R attains a maximum on K.

14.10 Equivalence of norms


Using the Extreme Value Theorem, we can finally prove a beautiful state-
ment that implies that for many purposes, on finite-dimensional vector
spaces, the choice of norm is not so important. Precisely, we will show
that any two norms are equivalent, which is a concept specified precisely in
the following definition.
CHAPTER 14. REAL-VALUED FUNCTIONS 190

Definition 14.10.1 (Equivalent norms). Let V be a vector space and let


k · k A and k · k B be two different norms on V. We say that the norms
k · k A and k · k B are equivalent if there exist constants c1 > 0 and c2 > 0
such that for all x ∈ V

c1 k x k A ≤ k x k B ≤ c2 k x k A .

We will now show that any two norms on a finite-dimensional vector space
are equivalent.

Theorem 14.10.2 (Equivalence of norms on finite-dimensional vector


spaces). Let V be a finite-dimensional vector space and let k · k A and k ·
k B be two norms on V. Then the norms k · k A and k · k B are equivalent.

Proof. Because V is assumed to be finite-dimensional, there exists a


basis v1 , . . . vd in V where d ∈ N is the dimension of V. Define the
linear map L : Rd → V by

L ( x ) : = x1 v1 + · · · + x d v d .

We claim that L is continuous. To show this, we will use Exercise


13.11.5 and show that L is even Lipschitz continuous, therefore uni-
formly continuous and therefore (by Proposition 13.10.2) continuous.
Indeed, let x, y ∈ Rd . Then

k L(y − x )k A = k(y1 − x1 )v1 + · · · + (yd − xd )vd k A


d
≤ ∑ |yi − xi |kvi k A
i =1
  d
≤ max kvi k A
i =1,...,d
∑ 1 · | yi − xi |
i =1
 
≤ max kvi k A k(1, . . . , 1)k2 ky − x k2
i =1,...,d
 √
= max kvi k A d k y − x k2
i =1,...,d
CHAPTER 14. REAL-VALUED FUNCTIONS 191

where in the last inequality we used the Cauchy-Schwarz inequality.


We have shown that L is Lipschitz continuous.
Note that the unit sphere

S = {( x1 , . . . , xd ) T ∈ Rd | x12 + · · · + xd2 = 1}

in Rd is compact by the Heine-Borel theorem, because it is a bounded


and closed subset of Rd .
The function f : Rd → R given by

f ( x ) = (k · k A ◦ L)( x ) = k L( x )k A

is continuous, as it is a composition of two continuous functions.


By the Extreme Value Theorem, the function f attains a maximum and
a minimum on S.
We claim that minx∈S f ( x ) is strictly positive. It is clear that it is larger
than or equal to zero. Suppose it is equal to zero. Then there exists an
x ∈ S such that
k x1 v1 + · · · + x d v d k A = 0
By the property of the norm, it follows that

x1 v1 + · · · + x d v d = 0

But since v1 , · · · , vd is a basis, it follows that

x1 = · · · = xd = 0.

This is a contradiction, because x was supposed to be a point on the


unit sphere S.
We conclude that for all s ∈ S,

0 < min f ( x ) ≤ k L(s)k A ≤ max f ( x )


x ∈S x ∈S
CHAPTER 14. REAL-VALUED FUNCTIONS 192

Now let y ∈ Rd \ {0}. Then the point y/kyk2 is in S. Therefore


 
y
min f ( x ) ≤ L ≤ max f ( x ).
x ∈S k y k2 A x ∈S

By multiplying these inequalities by kyk2 and using the homogeneity


of the norm, we find that
   
min f ( x ) kyk2 ≤ k L(y)k A ≤ max f ( x ) kyk2 .
x ∈S x ∈S

Because v1 , . . . , vd is a basis, the map L is bijective. It follows that there


exist constants c3 > 0 and c4 > 0 such that for all v ∈ V,

c3 k L−1 (v)k2 ≤ kvk A ≤ c4 k L−1 (v)k2 .

Similarly, there exist constants c5 , c6 > 0 such that for all v ∈ V

c5 k L−1 (v)k2 ≤ kvk B ≤ c6 k L−1 (v)k2 .

We conclude that there exist constants c1 , c2 > 0 such that for all v ∈ V,

c1 k v k A ≤ k v k B ≤ c2 k v k A .

The equivalence of norms on finite-dimensional vector spaces has many


important consequences. Let us mention a few. The first is that every
finite-dimensional normed vector space is complete.

Theorem 14.10.3. Let (V, k · k) be a finite-dimensional normed vector


space. Then (V, k · k) is complete.

From this theorem it also follows (by the series characterization of com-
pleteness in Theorem 11.5.1) that every absolutely converging series in a
finite-dimensional normed vector space is converging: a statement that we
announced as Proposition 9.2.3.
CHAPTER 14. REAL-VALUED FUNCTIONS 193

Another consequence of the equivalence of norms on finite-dimensional


vector spaces is that in a finite-dimensional normed-vector space, a subset
is compact if and only if it is closed and bounded.

Theorem 14.10.4 (Heine-Borel Theorem for finite-dimensional normed


vector spaces). Let (V, k · k) be a finite-dimensional vector space. Then
a subset A ⊂ V is compact if and only if it is closed and bounded.

14.11 Bounded linear maps and operator norms


We close this chapter with a section about linear maps from one normed
vector space to another. We will show that linear maps are continuous if
and only if they are bounded. Moreover, we will see that a linear map de-
fined on a finite-dimensional vector space is always bounded, and therefore
always continuous.
Let us first give the definition of a linear map.

Definition 14.11.1 (Linear map). Let V and W be two vector spaces. A


function L : V → W is called a linear map if both

i. for all a, b ∈ V,
L( a + b) = L( a) + L(b)

ii. for all λ ∈ R and a ∈ V,

L(λa) = λL( a).

Now we will define what it means for a linear map to be bounded.

Definition 14.11.2 (Bounded linear map). Let (V, k · kV ) and


(W, k · kW ) be two normed vector spaces. We say that a linear map
L : V → W is bounded if the image under L of the closed unit ball

B̄V (0, 1) = {v ∈ V | kvkV ≤ 1}


CHAPTER 14. REAL-VALUED FUNCTIONS 194

is a bounded subset of (W, k · kW ), i.e. if

L( B̄V (0, 1))

is a bounded subset of (W, k · kW ).

The following is an alternative characterization of boundedness of linear


maps. It is usually a bit easier to deal with.

Proposition 14.11.3. Let (V, k · kV ) and (W, k · kW ) be two normed vec-


tor spaces. A linear map L : V → W is bounded if and only if there
exists an M > 0 such that for all v ∈ V,

k L(v)kW ≤ MkvkV .

Proposition 14.11.4. The space of bounded linear maps between one


normed vector space to another is itself again a vector space, that we
denote by BLin(V, W ). Addition and scalar multiplication are defined
pointwise, that means that if L : V → W and K : V → W are two
linear maps and λ ∈ R is a scalar, then the linear map L + K : V → W
is defined by
( L + K )(v) = L(v) + K (v)
and the map λL : V → W is defined by

(λL)(v) = λ( L(v)).

The zero-element in this vector space BLin(V, W ) is the map that maps
every vector to the zero-element of W.

We would like to be able to talk about the norm of such a linear map. We
now introduce one such norm, called the operator norm.

Proposition 14.11.5. Let (V, k · kV ) and (W, k · kW ) be two normed vec-


tor spaces. Consider the vector space BLin(V, W ) of bounded linear
maps L : V → W. Then the function k · kV →W : BLin(V, W ) → R
CHAPTER 14. REAL-VALUED FUNCTIONS 195

defined by
k L k V →W : = sup k L( x )kW
x ∈ B̄V (0,1)

is a norm on BLin(V, W ).

The proof of Proposition 14.11.5 is the topic of Exercise 14.12.8.

Definition 14.11.6. The norm k · kV →W on the vector space BLin(V, W )


is called the operator norm.

Proposition 14.11.7. Let (V, k · kV ) and (W, k · kW ) be two normed vec-


tor spaces. Let L : V → W be a bounded linear map. Then for all
v ∈ V,
k L(v)kW ≤ k LkV →W kvkV
and in fact

k LkV →W = min{C ≥ 0 | for all v ∈ V, k L(v)kW ≤ C kvkV }. (14.11.1)

Proof. Let v ∈ V. If v = 0, then indeed

k L(v)kW = k0kW = 0 ≤ k LkV →W k0kV = k LkV →W kvkV .

If v 6= 0, then define x := v/kvkV . Then x ∈ B̄V (0, 1) because

v 1
k x kV = = kvkV = 1.
k v kV V k v kV

It follows that

k L( x )kW ≤ sup k L(z)kW = k LkV →W .


z∈ B̄V (0,1)

Therefore

k L(v)kW = k L(kvkV x )kW = kvkV k L( x )kV ≤ k LkV →W kvkV .


CHAPTER 14. REAL-VALUED FUNCTIONS 196

To show (14.11.1), let C ≥ 0 be a constant such that for all v ∈ V,


k L(v)kW ≤ C kvkV . Then for all x ∈ B̄V (0, 1), since k x kV ≤ 1, it holds
that k L( x )kW ≤ C k x kV ≤ C. Therefore C is an upper bound for the
set
{k L( x )kW | x ∈ B̄V (0, 1)}
and since k LkV →W is the smallest upper bound of this set, we conclude
that k LkV →W ≤ C.

Theorem 14.11.8. Let (V, k · kV ) and (W, k · kW ) be two normed vector


spaces and assume that V is finite-dimensional. Let L : V → W be a
linear map. Then L is bounded.

Proof. Since V is finite-dimensional, we may select a basis v1 , . . . , vd of


V, where d is the dimension of V. Define the map ι : Rd → V by

ι ( x ) = x1 v1 + · · · + x d v d .

The map ι is a bijective linear map, and therefore its inverse ι−1 is a
bijective linear map as well: it is the map that assigns to a vector v its
components x1 , . . . , xd with respect to the basis v1 , . . . , vd . Because ι−1
is injective, the function
k · k 2 ◦ ι −1
is a norm on V. By the equivalence of norms, there exists a constant
C > 0 such that for every v ∈ V,

kι−1 (v)k2 ≤ C kvkV .

Let v ∈ B̄(0, 1). Define


x : = ι −1 ( v ).
CHAPTER 14. REAL-VALUED FUNCTIONS 197

Then
k L(v)kW = k L( x1 v1 + · · · + xd vd )kW
≤ | x1 |k L(v1 )kW + · · · + | xd |k L(vd )kW
!
d
≤ ∑ | xi | max k L(v j )kW
j=1,...,d
i =1
 √
≤ max k L(v j )kW d k x k2
j=1,...,d
 √
≤ max k L(v j )kW dC kvkV
j=1,...,d

Theorem 14.11.9. Let (V, k · kV ) and (W, k · kW ) be two normed vector


spaces. Let L : V → W be a linear map. The function L is continuous
if and only if it is bounded.

Proof. We first show the “only if” direction. Assume therefore that
L : V → W is continuous. Then L is in particular continuous in 0 ∈ V.
Therefore, there exists a δ > 0 such that for all v ∈ V, if 0 < kvkV < δ,
then k LvkW < 1. Choose such a δ > 0.
Choose M := 2/δ. Let v ∈ B̄(0, 1). If v = 0, then k LvkW = 0 < M.
Suppose now v 6= 0. We also know that kvkV < 2. It follows that

2δ 2 2 2
k LvkW = k L(v)kW = k L(δv/2)kW < · 1 = = M.
2δ δ δ δ

We now show the “if” direction. Because L is bounded, there exists an


M > 0 such that for all v ∈ V,

k L(v)kW ≤ MkvkV .

Now let v ∈ V and let u : N → V be a sequence in V converging to v.


CHAPTER 14. REAL-VALUED FUNCTIONS 198

Then
0 ≤ k L(un ) − L(v)kW
= k L(un − v)kW
≤ M k u n − v kV

Since the sequence u : N → V converges to v, it follows that

lim kun − vkW = 0,


n→∞

and by a limit law it holds that also

lim Mkun − vkW = 0.


n→∞

Since we also know that limn→∞ 0 = 0, it follows by the squeeze theo-


rem that
lim k L(un ) − L(v)kW = 0.
n→∞

Therefore, the sequence ( Lun ) converges to Lv. We have shown that L


is continuous in v.

14.12 Exercises

14.12.1 Blue exercises

Exercise 14.12.1. Prove Part (ii) of Theorem 14.1.1.

Exercise 14.12.2. Consider the function f : R2 \ {0} → R defined by

exp( x12 − 3x2 )


f (x) = .
x12 + x22

Prove that f : R2 \ {0} → R is continuous considered as a function map-


ping from the domain R2 \ {0} in the normed vector space (R2 , k · k2 ) to
(R, | · |). Hint: Argue very precisely, using the results in this chapter, but
avoid going back to the definition.
CHAPTER 14. REAL-VALUED FUNCTIONS 199

Exercise 14.12.3. Consider the function f : R → R defined by f ( x ) = x2 .


Prove that
lim f ( x ) = ∞.
x →∞

14.12.2 Orange exercises

Exercise 14.12.4. Show that the function f : R2 → R defined by


 4 4
 x1 +2x2 ( x , x ) 6= (0, 0)
x12 + x22 1 2
f (x) =
0 ( x , x ) = (0, 0)
1 2

is continuous as a function from the normed vector space (R2 , k · k2 ) to the


normed vector space (R, | · |).
Exercise 14.12.5. Let f : (0, ∞) → R be a continuous function (viewed as a
function from the domain (0, ∞) in the normed vector space (R, | · |)) such
that
lim f ( x ) = 1
x ↓0

and
lim f ( x ) = −∞.
x →∞
Show that there exists a c ∈ (0, ∞) such that f (c) = 0.
Exercise 14.12.6. Let f : R → R be a function and let a ∈ R. Let L ∈ R.
Show that
lim f ( x ) = L
x→a
if and only if
lim f ( x ) = lim f ( x ) = L.
x↓a x↑a

Exercise 14.12.7. Let (V, k · k A ) be a finite-dimensional normed vector


space, let D ⊂ V be a subset of V and consider a function f : D → R.
Let a ∈ D and suppose f is continuous in a in the normed vector space
(V, k · k A ). Let k · k B : V → R be another norm on V. Show that f is also
continuous in a in the normed vector space (V, k · k B ).
Exercise 14.12.8. Prove Proposition 14.11.5.
CHAPTER 14. REAL-VALUED FUNCTIONS 200

Exercise 14.12.9. Let f : (−∞, 3) → R be a continuous function (viewed as


a function from the domain (−∞, 3) in the normed vector space (R, | · |)).
Assume that
lim f ( x ) = ∞
x →−∞

and
lim f ( x ) = ∞.
x ↑3

Show that f attains a minimum on the interval (−∞, 3). I.e., show that
there exists a c ∈ (−∞, 3) such that for all x ∈ (−∞, 3),

f ( c ) ≤ f ( x ).

Exercise 14.12.10. Let f : R2 → R be given by

( x1 )4 −2( x2 )2
(
( x1 )4 +( x2 )4
( x1 , x2 ) 6= (0, 0)
f ( x ) :=
0 ( x1 , x2 ) = (0, 0).

Either prove that the function f is continuous or prove that it is not contin-
uous (where f is viewed as a function from the domain R2 in the normed
vector space (R2 , k · k2 ) to the normed vector space (R, | · |)).
Chapter 15

Differentiability

If a function f : Ω → Y is continuous in a point a ∈ Ω, then close to a,


the function f is reasonably well approximated by the constant function
x 7→ f ( a). In some sense, the constant function x 7→ f ( a) is a basic, a
zeroth, approximation of the function f around a. The good thing about
this approximation is that it is very simple. The bad thing is that it is
maybe too simple, and therefore the approximation may not be very good.
Can we do better?
Differentiability is all about approximating functions by affine functions,
i.e. functions that are the sum of a constant and a linear map. The good
thing is that affine functions are still rather simple and the approximation
with an affine function will usually be better than the approximation with
just a constant function. For all this to make sense though, we will need to
start restricting the context to functions mapping from (a domain in) one
normed vector space to another normed vector space.
The approach that I follow in these chapters on differentiability is very
close to the approach followed by Rodney Coleman in his book Calculus
on normed vector spaces [Col12].

201
CHAPTER 15. DIFFERENTIABILITY 202

15.1 Definition of differentiability


The following is the definition of differentiability in a point. This defini-
tion most likely differs from what you have seen in Calculus. The reason
for this deviation is that we really want a concept that works for maps
between normed vector spaces. After we give the definition, we will pro-
vide a first indication on how the definition relates to the one you are more
familiar with.

Definition 15.1.1 (Differentiability in a point). Let (V, k · kV ) and (W, k ·


kW ) be two (if you prefer, but not necessarily finite-dimensional) vec-
tor spaces. Let Ω ⊂ V be an open subset of V. Let f : Ω → W be
a function and let a ∈ Ω. We say that f is differentiable in a if there
exists a bounded linear map L a : V → W such that, if we define the
error function Err a : Ω → W by

Err a ( x ) := f ( x ) − f ( a) − L a ( x − a)

it holds that
kErr a ( x )kW
lim = 0.
x → a k x − a kV

We call L a the derivative of f in a, and instead of L a we often write


( D f )a .

Definition 15.1.2 (Differentiability on an open set). Let (V, k · kV ) and


(W, k · kW ) be two normed vector spaces. Let Ω ⊂ V be open. We say
that a function f : Ω → W is differentiable on Ω if for every a ∈ Ω, the
function f is differentiable in a.

The following proposition relates the derivative to the derivative you are
used to from Calculus.

Proposition 15.1.3. Let Ω ⊂ R be an open subset of R and consider a


function f : Ω → R interpreted as a function from the subset Ω of the
normed vector space (R, | · |) to the normed vector space (R, | · |). Let
CHAPTER 15. DIFFERENTIABILITY 203

a ∈ Ω. Then f is differentiable in a if and only if the limit

f ( x ) − f ( a)
lim
x→a x−a
exists. Moreover, if this limit exists, we call it f 0 ( a), and then for all
h ∈ R,
f 0 ( a ) · h = ( D f ) a ( h ). (15.1.1)

Warning: It is good to take a moment and internalize the difference


between the left-hand side and the right-hand side of (15.1.1). The left-
hand side is a product of two real numbers, the number f 0 ( a) ∈ R and
the number h ∈ R. The right-hand side is a linear map ( D f ) a : R → R
applied to the real number h ∈ R.

Example 15.1.4. Consider the function f : R → R given by

f ( x ) = x.

You have probably learned for every a ∈ R, that f 0 ( a) = 1, with the


same limit definition of f 0 ( x ) as given above, and indeed

f ( x ) − f ( a) x−a
lim = lim
x→a x−a x→a x − a
= 1.

We can then also write down the derivative ( D f ) a : R → R which is a


linear map from R to R. To describe ( D f ) a , we need to specify what it
does to an element h ∈ R, and again with the previous proposition we
know that
( D f ) a (h) = f 0 ( a) · h = h.

The previous proposition can be generalized to the case in which the target
is an arbitrary normed vector space.
CHAPTER 15. DIFFERENTIABILITY 204

Proposition 15.1.5. Let Ω ⊂ R be open and consider a function f :


Ω → W interpreted as a function from the subset Ω of the normed
vector space (R, | · |) to a normed vector space (W, k · kW ). Let a ∈ Ω.
Then f is differentiable in a if and only if the limit

f ( x ) − f ( a)
lim (15.1.2)
x→a x−a
exists. Moreover, if this limit exists we denote it by f 0 ( a), and then for
all h ∈ R,
f 0 ( a ) · h = ( D f ) a ( h ).

The limit in (15.1.2) exists if and only if the limit


f ( a + h) − f ( a)
lim
h →0 h
exists, and then they have the same value.
As an alternative notation, we will sometimes write
d
f (t)
dt t= a

instead of f 0 ( a).

15.2 The derivative as a function


Definition 15.2.1 (The derivative as a function). Let f : Ω → W be
a function from an open domain Ω in a finite-dimensional normed
vector space (V, k · kV ) to a finite-dimensional normed vector space
(W, k · kW ). Suppose that f is differentiable on Ω, (i.e. suppose that
for every a ∈ Ω, the function f is differentiable in a). Then we define
the derivative of f as the function

D f : Ω → Lin(V, W )

that maps every a ∈ Ω to the derivative of f in a, i.e. to ( D f ) a ∈


CHAPTER 15. DIFFERENTIABILITY 205

Lin(V, W ).

15.3 Constant and linear maps are differentiable

Proposition 15.3.1 (Constant maps are differentiable). Let (V, k · kV )


and (W, k · kW ) be two normed vector spaces. Let b ∈ W and consider
the constant function f : V → W given by f (v) = b for all v ∈ V. Then
f is differentiable and for all a ∈ V, ( D f ) a = 0, i.e. it is the (linear)
function that maps every element to 0.

We now give a first example of a differentiable functions: linear functions


are always differentiable.

Proposition 15.3.2 (Linear maps are differentiable). Let A : V → W


be a linear map between the finite-dimensional normed vector spaces
(V, k · kV ) and (W, k · kW ). Then the function A : V → W is differ-
entiable on V and for every a ∈ V the derivative ( DA) a ∈ Lin(V, W )
is just equal to A. Hence, the derivative of A is the constant function
DA : V → Lin(V, W ) given by

a 7→ A.

The proof of this proposition is the topic of Exercise 15.12.1.

15.4 Bases and coordinates


In this section we will give many examples of linear maps that are at the
same time related to the choice of coordinates in the spaces V and W.
We first consider standard coordinate projections in Rm .

Definition 15.4.1 (Coordinate projections). Let i ∈ {1, . . . , m}, and


CHAPTER 15. DIFFERENTIABILITY 206

consider the map Pi : Rm → R given by

Pi ( x ) = xi .

The map Pi is called the projection to the ith coordinate.

Proposition 15.4.2. The coordinate projections Pi in the above defini-


tion are linear.

Proof. Indeed, for all λ ∈ R and x ∈ Rd ,

Pi (λx ) = λxi = λPi ( x )

and for all x, y ∈ Rd

P i ( x + y ) = x i + y i = P i ( x ) + P i ( y ).

Since Pi is linear, it follows that the map Pi is differentiable and DPi :


Rd → Lin(Rd , R) is the constant map a 7→ Pi .
We will now discuss coordinates in arbitrary finite-dimensional vector
spaces.
Recall that w1 , . . . , wm forms a basis of a vector space W if and only if for
every v ∈ W, there are unique constants x1 , . . . , xm such that
v = x 1 w1 + · · · + x m w m
The numbers xi are called the coordinates of the vector v with respect to
the basis w1 , . . . wm . The map that assigns to every element v in W the
coordinate vector ( x1 , . . . , xm ) with respect to the basis w1 , . . . , wm is in fact
a linear map and is called the coordinate map.

Definition 15.4.3 (Coordinate map). Let W be a finite-dimensional vec-


tor space and assume that w1 , . . . , wm is a basis of W. The map Ψ :
W → Rm that assigns to every v ∈ W its coordinates with respect
to the basis w1 , . . . , wm is called the coordinate map with respect to the
CHAPTER 15. DIFFERENTIABILITY 207

basis w1 , . . . , wm .

Proposition 15.4.4. The coordinate map Ψ : W → Rm with respect to


a basis w1 , . . . , wm is linear.

As a consequence, the derivative DΨ : W → Lin(W, Rm ) is given by

( DΨ) a = Ψ
for all a ∈ W.
The component functions Ψ1 , . . . Ψm of Ψ are together sometimes called
the dual basis of w1 , . . . , wm . Here, by component functions we mean the
functions Ψi : W → R that are defined by Ψi := Pi ◦ Ψ, i.e.

Ψ = ( Ψ1 , . . . , Ψ m ).

Proposition 15.4.5 (Dual basis). If W is a finite-dimensional normed


vector space, and w1 , . . . , wm is a basis of W, then there exist linear
maps Ψi : W → R for i = 1, . . . , m such that for all v ∈ W,
m
v = Ψ 1 ( v ) w1 + · · · + Ψ m ( v ) w m = ∑ Ψ i ( v ) wi .
i =1

Together, the functions Ψ1 , . . . , Ψm form a basis of the vector space


Lin(W, R) and they are called the dual basis of w1 , . . . , wm .
Every Ψi is a linear map from W to R, and
(
1 if i = j,
Ψi ( w j ) =
0 if i 6= j.

Proof. See e.g. Theorem 1.6.7 in the Linear Algebra 2 lecture notes

Since Ψi is linear, it is differentiable and DΨi : W → Lin(W, R) is the


constant map a 7→ Ψi .
CHAPTER 15. DIFFERENTIABILITY 208

15.5 The matrix representation


We now briefly review matrix representations, a concept from linear algebra.
Let L : V → W be a linear map between two finite-dimensional vector
spaces with bases v1 , · · · , vd and w1 , . . . wm respectively. Let Ψ : W →
Rm denote the coordinate map for the basis w1 , . . . , wm . Then the matrix
(representation) of L is the m × d matrix M such that for all i = 1, . . . , m and
j = 1, . . . d, the element of M in the ith row and jth column is

( M)ij = (Ψ( Lv j ))i = Ψi ( Lv j ),

which in words means that ( M )ij is the ith coordinate of the vector L(v j )
expressed in the basis w1 , . . . , wm .
The matrix M is precisely that matrix such that for all x ∈ Rd , with y = Mx
it holds that

L ( x 1 v 1 + · · · + x d v d ) = y 1 w1 + · · · + y m w m .

In other words, for all x ∈ Rd ,

Ψ ◦ L( x1 v1 + · · · + xd vd ). = Mx

Definition 15.5.1 (Jacobian with respect to bases). We will sometimes


call the matrix representation of a derivative ( D f ) a : V → W the Ja-
cobian of f (with respect to the bases v1 , . . . , vd and w1 , . . . wm ) in the
point a, and we will denote it by [ D f ] a .

As a preview, I’d already like to mention that if a function f : Ω → W


is differentiable in a point a ∈ Ω, then you can easily find [ D f ] a with the
rules of calculus: to determine ([ D f ] a )ij , i.e. the element in the ith row and
jth column, you first compute the coordinate representation of f , namely

f¯ : Φ(Ω) → Rm

defined by
f¯( x ) = Ψ ◦ f ◦ Φ−1 ( x ).
CHAPTER 15. DIFFERENTIABILITY 209

Then you view f¯ as a function of x j only, keeping the other xk for k 6= j,


and take the derivative of the ith component of f¯ with respect to x j , which
is denoted by
∂ f¯i
∂x j
and is called the partial derivative of f¯i with respect to x j . Then

∂ f¯i
([ D f ] a )ij = (Φ( a)).
∂x j

Only later can we give a proper definition of the partial derivative.


For instance, if
f¯( x1 , x2 ) = x12 + x23 , x14 + x25


then
3(b2 )2
 
2b1
[ D f ]a = .
4(b1 ) 5(b2 )4
3

where b := Φ( a).
But this was just a preview. Before we can show this, we need computation
rules such as the chain rule, the sum, product and quotient rules.

15.6 The chain rule


Theorem 15.6.1 (Chain rule). Let (U, k · kU ), (V, k · kV ), and (W, k · kW )
be normed vector spaces.
Let Ω ⊂ U and E ⊂ V both be open. Let f : Ω → V be such that
f (Ω) ⊂ E. Let g : E → W.
If f is differentiable in a point a ∈ Ω, and g is differentiable in the point
f ( a), then the function g ◦ f is differentiable in the point a.
Moreover,
( D ( g ◦ f )) a = ( Dg) f (a) ◦ ( D f ) a .
CHAPTER 15. DIFFERENTIABILITY 210

Proof. We need to show that there exists a linear map A : U → W such


g◦ f
that, if we define the error function Err a : D → W by
g◦ f
Err a ( x ) := ( g ◦ f )( x ) − ( g ◦ f )( a) − A( x − a)

that then
g◦ f
kErr a ( x )kW
lim = 0.
x→a k x − a kU

We choose A := ( Dg) f (a) ◦ ( D f ) a . This is indeed a bounded linear


operator from U to W.
We need to show that
for all e > 0,
there exists δ > 0,
for all x ∈ Ω,
if 0 < k x − akU < δ,
g◦ f
kErr a ( x )kW
then < e.
k x − a kU

Let e > 0.
According to the template, the next step would be to find a δ > 0, but
for this step we need quite some preparation. First of all it is helpful to
define the error functions
f
Err a ( x ) := f ( x ) − f ( a) − ( D f ) a ( x − a)

and
g
Err f (a) (y) := g(y) − g( f ( a)) − ( Dg) f (a) (y − f ( a))
CHAPTER 15. DIFFERENTIABILITY 211

We can then make for x ∈ Ω the following computation


g◦ f
Err a ( x ) = g( f ( x )) − g( f ( a)) − (( Dg) f (a) ◦ ( D f ) a )( x − a)
g
= ( Dg) f (a) ( f ( x ) − f ( a)) + Err f (a) ( f ( x ))
− ( Dg) f (a) (( D f ) a ( x − a))
g
= Err f (a) ( f ( x ))

+ ( Dg) f (a) f ( x ) − f ( a) − ( D f ) a ( x − a)
g f
= Err f (a) ( f ( x )) + ( Dg) f (a) (Err a ( x )).

Using the triangle inequality, we can then estimate


g◦ f g f
kErr a ( x )kW = kErr f (a) ( f ( x )) + ( Dg) f (a) (Err a ( x ))kW
g f
≤ kErr f (a) ( f ( x ))kW + k( Dg) f (a) (Err a ( x ))kW
g f
≤ kErr f (a) ( f ( x ))kW + k( Dg) f (a) kV →W kErr a ( x )kV
(15.6.1)

Our strategy will be to find a δ > 0 such that for all x ∈ Ω, if 0 <
k x − akU < δ, the right-hand-side of (15.6.1), and therefore also the
left-hand-side, is less than e. Let’s see how we can find such a δ > 0.
Because g is differentiable in f ( a) with derivative ( Dg) f (a) , it holds
that g
kErr f (a) (y)kW
lim = 0.
y→ f ( a) k y − f ( a )kV

Therefore, there exists a ρ > 0 such that for all y ∈ E, if 0 < ky −


f ( a)kV < ρ, then
g
kErr f (a) (y)kW e
< . (15.6.2)
ky − f ( a)kV 2(k( D f ) a kU →V + 1)

Choose such a ρ > 0.


CHAPTER 15. DIFFERENTIABILITY 212

Now define
!
e
e2 := min 1, .
2k( Dg) f (a) kV →W + 1

Because f is differentiable in a with derivative ( D f ) a , it holds that


f
kErr a ( x )kV
lim = 0.
x → a k x − a kU

Therefore there exists a δ1 > 0 such that for all x ∈ Ω, if 0 < k x −


akU < δ1 then
f
kErr a ( x )kV < e2 k x − akU .
Choose such a δ1 > 0.
Then, for x ∈ Ω, if 0 < k x − akU < δ1 also
f
k f ( x ) − f ( a)kV = k( D f ) a ( x − a) + Err a ( x )kV
f (15.6.3)
≤ k( D f ) a ( x − a)kV + kErr a ( x )kV
≤ k( D f ) a kU →V k x − akU + e2 k x − akU .

Define
ρ
δ2 := .
1 + k( D f ) a kU →V

Choose δ := min(δ1 , δ2 ). Let x ∈ Ω. Assume 0 < k x − akU < δ. Then,


it follows by (15.6.3) and the fact that e2 ≤ 1 that

k f ( x ) − f ( a)kV ≤ k( D f ) a kU →V k x − akU + e2 k x − akU


≤ (1 + k( D f ) a kU →V )k x − akU
< (1 + k( D f ) a kU →V )δ ≤ ρ
CHAPTER 15. DIFFERENTIABILITY 213

so that by estimate (15.6.2) it follows that


g e
kErr f (a) ( f ( x ))kW < k f ( x ) − f ( a)kV
2(k( D f ) a kU →V + 1)
e
< k x − a kU .
2
Therefore, using the result of our earlier computation (15.6.1)
g◦ f g f
kErr a ( x )kW ≤ kErr f (a) ( f ( x ))kW + k( Dg) f (a) kV →W kErr a ( x )kV
e
< k x − akU + e2 k( Dg) f (a) kV →W k x − akU
2
e
≤ k x − a kU
2
e
+ k( Dg) f (a) kV →W k x − akU
2k( Dg) f (a) kV →W + 1
< e k x − a kU .

15.7 Sum, product and quotient rules


The following sum, product and quotient rules will be familiar from Cal-
culus, but now they are formulated in the language of linear maps.

Theorem 15.7.1. Let (V, k · kV ) and (W, k · kW ) be two normed vector


spaces. Let Ω ⊂ V be open and let f : Ω → W and g : Ω → W be two
functions, that are both differentiable in a point a ∈ Ω, with derivative
( D f ) a : V → W and ( Dg) a : V → W respectively.
Then the function f + g : Ω → W is also differentiable in a with deriva-
tive
( D ( f + g)) a = ( D f ) a + ( Dg) a

Theorem 15.7.2. Let (V, k · kV ) and (W, k · kW ) be normed vector spaces.


Let Ω ⊂ V be open and let f : Ω → W and g : Ω → R be two functions
CHAPTER 15. DIFFERENTIABILITY 214

mapping to the normed vector space (R, | · |). Assume both f and g
are differentiable in the point a ∈ Ω, with derivative ( D f ) a : V → W
and ( Dg) a : V → R respectively. Then

i. (product rule) The function f · g is differentiable in a as well, with


derivative given by

( D ( f · g)) a (h) = f ( a)( Dg) a (h) + g( a)( D f ) a (h)

for all h ∈ V.

ii. (quotient rule) If g( a) 6= 0, the function f /g is differentiable in a


as well, with derivative given by

1 
( D ( f /g)) a (h) = g ( a )( D f ) a ( h ) − f ( a )( Dg ) a ( h )
( g( a))2

for all h ∈ V.

15.8 Differentiability of components

Proposition 15.8.1. Let w1 , . . . , wm be a basis of W and let Ψ1 , . . . , Ψm


be the dual basis.
Then a function f : Ω → W is differentiable in a point a ∈ Ω if and
only if for every i ∈ {1, . . . , m}, the function

Ψi ◦ f

is differentiable in a ∈ Ω. Moreover, if the function f is differentiable


in a ∈ Ω, then for every v ∈ V,
m
( D f ) a (v) = ∑ wi D ( Ψ i ◦ f ) a ( v )
i =1

The proof of this proposition is the topic of Exercise 16.4.2.


CHAPTER 15. DIFFERENTIABILITY 215

Corollary 15.8.2. A function f : Ω → Rm is differentiable in a point a ∈ Ω


if and only if for i = 1, . . . , m the component function f i : Ω → R given by
f i = Pi ◦ f is differentiable. Moreover, if f is differentiable in a, then for all
v ∈ V,
m
∑ ei ( D f i ) a ( v ) =

( D f ) a (v) = ( D f 1 ) a ( v ), · · · , ( D f m ) a ( v )
i =1

where ei denote the standard unit vectors.


If in fact Ω is a subset of R, then

f 0 ( a) = f 10 ( a), · · · , f m0 ( a) .


15.9 Differentiability implies continuity


The next theorem tells us that differentiability in a point is a stronger con-
dition than continuity in a point: whenever a function is differentiable in
a point, it is also continuous in that point.

Theorem 15.9.1. Let Ω ⊂ V be open and suppose a function f : D →


W is differentiable in a point a ∈ Ω. Then f is continuous in a.

Proof. Suppose f : Ω → W is differentiable in a point a ∈ Ω. Then


there exists a δ > 0 such that for all x ∈ Ω, if 0 < k x − akV < δ, then

kErr a ( x )kW
< 1.
k x − a kV
Now let y : N → Ω be a sequence in Ω converging to a. Then, there
exists an N ∈ N such that for all n ≥ N,

kyn − akV < δ.

Choose such an N ∈ N and let n ≥ N. Then

k f (yn ) − f ( a) − L a (yn − a)kW = kErr a (yn )kW < kyn − akV .


CHAPTER 15. DIFFERENTIABILITY 216

By the reverse triangle inequality (see Lemma 2.6.1), and by Proposi-


tion 14.11.7 we find
k f (yn ) − f ( a)kW ≤ kyn − akV + k L a (yn − a)kW
≤ kyn − akV + k L a kV →W k(yn − a)kV
= (1 + k L a kV →W )kyn − akV .

It follows by Proposition 5.6.1 and the squeeze theorem that the se-
quence n 7→ k f (yn ) − f ( a)kW converges to zero, and we conclude by
Proposition 5.6.1 that the sequence ( f (yn )) converges to f ( a).

15.10 Derivative vanishes in local maxima and


minima
Theorem 15.10.1. Let Ω be an open subset of a normed vector space
V. Suppose f : Ω → R is differentiable in a ∈ Ω. Suppose that f ( a) is
a local maximum or minimum, i.e. suppose there exists an r > 0 such
that either
for all x ∈ B( a, r ),
f ( x ) ≤ f ( a)
or
for all x ∈ B( a, r ),
f ( x ) ≥ f ( a ).

Then ( D f ) a = 0.

Proof. We will show the statement for the case in which f attains a local
maximum in a. In that case, there exists an r > 0 such that f ( x ) ≤ f ( a)
for all x ∈ B( a, r ). Because f is differentiable in a,
f
f ( x ) = f ( a) + ( D f ) a ( x − a) + Err a ( x )
CHAPTER 15. DIFFERENTIABILITY 217

where
f
|Err a ( x )|
lim = 0. (15.10.1)
x → a k x − a kV

We argue by contradiction, so suppose ( D f ) a 6= 0, i.e. suppose ( D f ) a


is not the zero map. Then there exists a vector u ∈ V such that

( D f ) a (u) 6= 0.

We choose such a u and define v to be either equal to u/kukV or to


−u/kukV , in such a way that

( D f ) a (v) > 0.

Note also that by the homogeneity of the norm,

u 1
k v kV = = kukV = 1.
k u kV V k u kV

The intuition behind the rest of the proof is that f evaluated in points
in the direction of v, close enough to a, is larger than f ( a).
By (15.10.1) there exists an δ > 0 such that for all x ∈ Ω, if 0 < k x −
akV < δ then
f
|Err a ( x )| 1
< |( D f ) a (v)| (15.10.2)
k x − a kV 2
1
Now choose ρ := 2 min(r, δ) and define

y := a + ρv.

Then by positive homogeneity of the norm and the fact that kvkV = 1,

1
0 < ky − akV = kρvkV = ρkvkV = ρ = min(r, δ).
2
CHAPTER 15. DIFFERENTIABILITY 218

Therefore on the one hand ky − akV < r and thus f (y) ≤ f ( a), but on
the other hand ky − akV < δ and thus
f
f (y) = f ( a) + ( D f ) a (y − a) + Err a (y)
f
= f ( a) + ( D f ) a (ρv) + Err a (y)
f
= f ( a) + ρ( D f ) a (v) + Err a (y)
f
≥ f ( a) + ρ( D f ) a (v) − |Err a (y)|.

We now use (15.10.2) to find


1
f (y) ≥ f ( a) + ρ( D f ) a (v) − |( D f ) a (v)|ky − akV
2
ρ
= f ( a) + ρ( D f ) a (v) − |( D f ) a (v)|
2
ρ
= f ( a) + ( D f ) a (v) > f ( a)
2
which is a contradiction.

15.11 The mean-value theorem


Theorem 15.11.1 (Rolle’s theorem). Let f : [ a, b] → R be continuous,
assume that f is differentiable on ( a, b) and that f ( a) = f (b). Then
there exists a c ∈ ( a, b) such that f 0 (c) = 0.

Proof. Since f is continuous, it achieves both a maximum and a mini-


mum on [ a, b] by the Extreme Value Theorem. Since f ( a) = f (b) either
the minimum or the maximum is achieved in some c ∈ ( a, b). By The-
orem 15.10.1 it holds that f 0 (c) = 0.

Theorem 15.11.2 (Mean-value theorem). Let f : [ a, b] → R be contin-


uous, and assume that f is differentiable on ( a, b). Then there exists a
CHAPTER 15. DIFFERENTIABILITY 219

c ∈ ( a, b) such that
f (b) − f ( a)
f 0 (c) = .
b−a

Proof. Define the function g : [ a, b] → R by

x−a b−x
g( x ) = f ( x ) − f (b) − f ( a ).
b−a b−a
By the sum and product rules, the function g is differentiable on ( a, b).
By the rules for continuous functions, the function g is also continuous
on [ a, b]. Moreover,
g( a) = g(b) = 0.
It follows by Rolle’s theorem that there exists a c ∈ ( a, b) such that

g0 (c) = 0.

Then
1 1
0 = g0 (c) = f 0 (c) − f (b) + f ( a)
b−a b−a
so that indeed
f (b) − f ( a)
f 0 (c) = .
b−a

15.12 Exercises

15.12.1 Blue exercises

Exercise 15.12.1. Let A : V → W be a linear map from a finite-dimensional


normed vector space (V, k · kV ) to a normed vector space (W, k · kW ).
Show that A is differentiable on V.
CHAPTER 15. DIFFERENTIABILITY 220

15.12.2 Orange exercises

Exercise 15.12.2. Let f : V → R be a differentiable function from a finite-


dimensional normed vector space (V, k · kV ) to the normed vector space
(R, | · |). Assume that for all a ∈ V, ( D f ) a = 0. Let v ∈ V. Show that
f (v) = f (0). (This would essentially show that f is constant on V).

i. Define the function `v : R → V by `(t) = tv. Show that `v is differ-


entiable.

ii. Show that the function g := f ◦ `v is differentiable and compute its


derivative.

iii. Conclude that f (v) = g(1) = g(0) = f (0).

Exercise 15.12.3. The function ln : (0, ∞) → R is the unique, differentiable


function such that ln(1) = 0 and ln0 ( x ) = 1/x. Show that for all x ∈
(−1, ∞), it holds that
ln(1 + x ) ≤ x
with equality if and only if x = 0.

Exercise 15.12.4. Prove Proposition 15.1.5. You may assume that W is finite-
dimensional.

Exercise 15.12.5. Let (V, k · kV ) and (W, k · kW ) be two two-dimensional


vector spaces with bases v1 , v2 and w1 , w2 respectively. Assume that a
function f : V → W is differentiable in 0 with

( D f ) 0 ( v 1 + v 2 ) = w1

and
( D f )0 (v1 − 2v2 ) = w1 − w2 .
Give the matrix representation of the linear map ( D f )0 : V → W with
respect to the bases v1 , v2 and w1 , w2 .
Chapter 16

Differentiability of standard
functions

Which functions are differentiable? We would like to give examples of


large classes of functions that are differentiable. How can we find such
classes? For now, we can combine the following observations:

• The constant function is differentiable

• Linear functions between finite dimensional normed vector spaces


are always differentiable

• Sums, products, compositions and at times quotients of differen-


tiable functions are differentiable. The precise statements are given
by the sum rule, the product rule, the chain rule and the quotient
rule in the previous chapter.

With these observations we can get pretty far, and conclude that polyno-
mials and rational functions are differentiable.

16.1 Global context


Before going on with the lecture notes, I’d like to introduce the global con-
text that I will use most often, so that we don’t have to introduce it again

221
CHAPTER 16. DIFFERENTIABILITY OF STANDARD FUNCTIONS 222

for every definition, lemma etc. I will usually not reintroduce these vari-
ables, but only indicate deviations from it.
We will consider two normed vector spaces (V, k · kV ) and (W, k · kW )
and a function f : Ω → W where Ω ⊂ V is an open subset of V. We
will from now assume that V and W are finite-dimensional, and we will
denote by v1 , . . . , vd a basis in V, with corresponding coordinate map Φ,
and by w1 , . . . , wm a basis in W with corresponding coordinate map Ψ.

16.2 Polynomials and rational functions are dif-


ferentiable
Proposition 16.2.1 (Differentiability of polynomials in one variable).
For every n ∈ N, it holds that the function f : R → R given by

f (x) = xn

is differentiable with
f 0 ( x ) = nx n−1 .
In other words, the derivative of f , i.e. ( D f ) : R → Lin(R, R) is given
by  
x 7→ h 7→ nx n−1 h

Proposition 16.2.2 (Every polynomial is differentiable). Every polyno-


mial on Rd is differentiable.

Proposition 16.2.3 (Every rational function is differentiable on its do-


main). Let p : Rd → R and q : Rd → R be two polynomials. Let

D : = { x ∈ Rd | q ( x ) 6 = 0 } .
CHAPTER 16. DIFFERENTIABILITY OF STANDARD FUNCTIONS 223

Then the function f : D → R given by

p( x )
f (x) =
q( x )

is differentiable.
In other words, every rational function is differentiable on its domain
of definition.

16.3 Differentiability of other standard functions


The following functions, that you may know from Calculus, are also dif-
ferentiable. Just like when we introduced Proposition 14.3.3, we are not
even ready to define these functions, but I think it’s useful to mention the
result here anyways.

Proposition 16.3.1. The functions

exp : R → R ln : (0, ∞) → R
sin : R → R cos : R → R
tan : (−π/2, π/2) → R arctan : R → R

are all differentiable on their domain, while the functions

arcsin : [−1, 1] → R arccos : [−1, 1] → R

are both differentiable on the interval (−1, 1).


The derivatives are given by:

exp0 (t) = exp(t) ln0 (t) = 1/t


sin0 (t) = cos(t) cos0 (t) = − sin(t)
1 1
tan0 (t) = arctan0 (t) =
cos2 (t) 1 + t2
1 1
arcsin0 (t) = √ arccos0 (t) = − √
1 − t2 1 − t2
CHAPTER 16. DIFFERENTIABILITY OF STANDARD FUNCTIONS 224

Example 16.3.2. Consider the function f : R → R2 given by

f (t) = t2 , sin(t)


The component functions f 1 : R → R and f 2 : R → R are given by

f 1 ( t ) = t2

and
f 2 (t) = sin(t)
Since these component functions are differentiable standard functions,
we find by Corollary 15.8.2 that f is differentiable as well and

f 0 (t) = f 10 (t), f 20 (t) = 2t, cos(t)


 

16.4 Exercises
Exercise 16.4.1. Consider the polynomial f : R2 → R given by
f ( x1 , x2 ) = 3x1m x2n + x1k
for some nonnegative integers m, n and k. Since f is a polynomial, it is dif-
ferentiable on R2 . Give ( D f ) : R2 → Lin(R2 , R) and justify your answer.
Exercise 16.4.2. Prove Proposition 15.8.1.
Exercise 16.4.3. Consider the function f : R2 → R2 given by

 x1 x23 , 5x

x12 + x22 2 if ( x1 , x2 ) 6= (0, 0)
f ( x1 , x2 ) =
(0, 0) if ( x , x ) = (0, 0).
1 2

Prove that f is differentiable on R2 .


Exercise 16.4.4. i. Consider the function f : R → R3 given by

f (t) := cos(t), sin(t), arctan(t) .
Show that f is differentiable and give an expression for the function
f 0 : R → R3 and for the derivative ( D f ) : R → Lin(R, R3 ).
CHAPTER 16. DIFFERENTIABILITY OF STANDARD FUNCTIONS 225

ii. Let w1 and w2 be two vectors in a finite-dimensional normed vector


space (W, k · kW ). Consider the function g : R → W given by

g(t) = cosh(t)w1 + sinh(t)w2 .

Show that g is differentiable and give an expression for the function


g0 : R → W and for the derivative ( Dg) : R → Lin(R, W ).
Chapter 17

Directional and partial derivatives

17.1 A recurring and very important construction


The following construction is so important, that it really pays off to go
through the following text very slowly and/or several times, making sure
you understand every step.
To analyze the behavior of a function f : Ω → W, where Ω is some subset
of V, we will often study how f behaves on lines. Let’s make this precise.
We will often select a point a ∈ Ω, a direction v ∈ V, and consider the
composition f ◦ ` a,v , where the function ` a,v maps from a small interval
(−δ, δ) around 0 ∈ R to Ω and is given by

` a,v (t) := a + tv.

Note that ` a,v is an affine map, i.e. it is the sum of a constant and a linear
map. It is therefore differentiable on (−δ, δ), and for every t ∈ (−δ, δ), its
derivative ( D ` a,v )t in t is a linear map from R to V. In fact, for all h ∈ R,

( D ` a,v )t (h) = hv.

If f is differentiable in a, then it follows by the chain rule that f ◦ ` a,v is


differentiable in 0, and,

( D ( f ◦ ` a,v ))0 = ( D f ) a ◦ ( D ` a,v )0 .

226
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 227

To figure out what this means, we realize that ( D ( f ◦ ` a,v ))0 is a linear map
from R to W. So let’s see its output when the input is h ∈ R:

( D ( f ◦ ` a,v ))0 (h) = ( D f ) a ◦ ( D ` a,v )0 (h)

= ( D f ) a ( D ` a,v )0 (h)

= ( D f ) a hv
= h( D f ) a (v)

17.2 Directional derivative


We will now introduce the concept of the directional derivative, which can
be viewed as the rate of change of a function when varying the input in a
certain direction.

Definition 17.2.1 (Directional derivative). Let f : Ω → W be a function


from an open domain Ω in a finite-dimensional normed vector space V
to a finite-dimensional normed vector space W. Let a ∈ Ω and v ∈ V.
Then we say the directional derivative in the direction of v of f exists in
the point a ∈ Ω if there exists a δ > 0 such that the function

g := f ◦ ` a,v : (−δ, δ) → W

is differentiable in 0, where the function ` a,v : (−δ, δ) → V is defined


by
` a,v (t) := a + tv.
Moreover, if it exists, we define the directional derivative in the direc-
tion of v of f in the point a as

f ( a + hv) − f ( a)
( Dv f ) a := g0 (0) = lim .
h →0 h

How does the directional derivative relate to the derivative of a function? The
answer is subtle. If the derivative exists in a point a, then for all v ∈ V,
the directional derivative in the direction of v of f in the point a exists as
well and
( Dv f ) a = ( D f ) a ( v ) .
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 228

The last equality tells us that in this case the directional derivative in the
direction of v at a point a, namely ( Dv f ) a , is just the derivative of f in the
point a (which is a linear map!) applied to the vector v, namely ( D f ) a (v).
The precise statement is given by the next proposition. After the proposi-
tion, we will give a warning about the reverse direction: existence of direc-
tional derivatives does not say anything about existence of the derivative.

Proposition 17.2.2. Suppose f : Ω → W is differentiable in a point


a ∈ Ω. Then for all v ∈ V, the directional derivative of f at a in the
direction of v
( Dv f ) a
exists and is equal to the derivative of f at the point a (which is a linear
map) applied to the vector v

( D f ) a ( v ).

Proof. This proposition follows from the Chain Rule, Theorem 15.6.1.
Indeed, let v ∈ V.
Because Ω is open, there exists a δ1 > 0 such that B( a, δ1 ) ⊂ Ω.
Consider now the function g := f ◦ ` a,v , which is a function from
(−δ, δ) → W where δ := δ1 /kvkV .
The function ` a,v is an affine function, and therefore it is differentiable
in 0 with derivative
 
( D ` a,v )0 := h 7→ hv .

By the chain rule, Theorem 15.6.1, g is differentiable and the derivative


of g, which is a linear map from R to W, is given by

( Dg)0 = ( D ( f ◦ ` a,v ))0 = ( D f )`a,v (0) ◦ ( D ` a,v )0 (17.2.1)

Since g is differentiable, by definition of the directional derivative, the


directional derivative in the direction of v of f in a exists.
To find the value of the derivative, we now apply the left hand side
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 229

and the right-hand side in (17.2.1) to the vector 1 ∈ R



( Dg)0 (1) = ( D f )`a,v (0) ( D ` a,v )0 (1) = ( D f ) a (v)

On the other hand, by Proposition 15.1.5,

( Dv f ) a = g0 (0) · 1 = ( Dg)0 (1) = ( D f ) a (v)

which is what we wanted to show.

There are functions f : Ω → W that are not differentiable in a point a


even though for every v ∈ V, the directional derivative ( Dv f ) a exists.
See for instance the next example.

Example 17.2.3. Consider the following function f : R2 → R:


(
x1 , x2 6 = 0
f ( x1 , x2 ) : =
0, x2 = 0.

Let us verify that for all v ∈ R2 , the directional derivative ( Dv f )0 ex-


ists. Let v ∈ R2 . If v2 6= 0, then

f (0 + tv) − f (0)
( Dv f )0 = lim
t →0 t
tv − 0
= lim 1
t →0 t
= lim v1
t →0
= v1 .

while if v2 = 0 then

f (0 + tv) − f (0)
( Dv f )0 = lim
t →0 t
0−0
= lim
t →0 t
= 0.
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 230

In both cases, the directional derivative exists.


We now claim that f is not differentiable in 0. Indeed, if f would be
differentiable in 0, then the derivative ( D f )0 would be a linear map
from R2 to R. Since ( D f )0 (e1 ) = 0 and ( D f )0 (e2 ) = 0, in fact ( D f )0
maps every vector to zero. In particular ( D(1,1) f )0 = 0. However, our
computation above shows that ( D(1,1) f )0 = 1. This is a contradiction.

17.3 Partial derivatives


Partial derivatives are special types of directional derivatives, for func-
tions that are defined on the vector space Rd .

Definition 17.3.1. Let f : Ω → W be a function defined on an open


domain Ω ⊂ Rd . The ith partial derivative in a point a ∈ Ω, denoted
by
∂f
( a ),
∂xi
is the directional derivative in the direction of the ith unit vector ei

∂f d f ( a + hei ) − f ( a)
( a ) : = ( Dei f ) a = f ( a + tei ) = lim .
∂xi dt t =0 h →0 h

Here
ei := (0, . . . , 0, 1, 0 . . . , 0).
1 in ith position

Example 17.3.2. Consider the function f : R2 → R given by

f ( x1 , x2 ) = x12 + 2x1 x2 + 3x24 .

Let us determine whether the partial derivative

∂f
.
∂x2
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 231

exists in the point a := ( a1 , a2 ).


To do so, by definition, we need to see if the directional derivative of f
in the direction of e2 in the point a, namely

( De2 f ) a

exists.
By Definition 17.2.1, we need to verify whether the derivative of the
function g : R → R defined by

g(t) := ( f ◦ ` a,e2 )(t) = f ( a + te2 )


= f ( a1 , a2 + t) = a21 + 2a1 ( a2 + t) + 3( a2 + t)4 .


exists in the point t = 0. Since g is a polynomial in one variable, it is


indeed differentiable, and the derivative in the point t = 0 exists and

g0 (0) = 2a1 + 12a32 .

Therefore, according to Definition 17.2.1, the partial derivative of f in


the point ( a1 , a2 ) exists and equals

∂f
( a) = ( De2 f ) a = 2a1 + 12a32 .
∂x2

In general, there are many different expressions for the partial derivative
of a function in some point a. Here are a few of them

∂f d
( a ) : = ( Dei f ) a = f ( a + tei )
∂xi dt t =0
d
= f ( a1 , . . . , ai−1 , ai + t, ai+1 , . . . , ad )
dt t =0
d
= f ( a1 , . . . , ai−1 , s, ai+1 , . . . , ad ) .
ds s = ai

The moral of the last expression is very nice: to determine the partial
derivative of f in a point a, you keep all coordinates fixed except for the
ith coordinate, and you then view the function as a function of only that
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 232

ith coordinate. It then is a function of only one variable, and you can dif-
ferentiate according to the one-variable definition in calculus.
Let us record the statement in a proposition.

Proposition 17.3.3. Let f : Ω → W be a function from an open domain


Ω in Rd to a (finite-dimensional) normed vector space (W, k · kW ). Let
a ∈ Ω.
The ith partial derivative of f in the point a exists if and only if the
function
t 7→ f ( a1 , . . . , ai−1 , t, ai+1 , . . . , ad )
is differentiable in the point ai , and in this case

∂f d
( a) = f ( a1 , . . . , ai−1 , t, ai+1 , . . . , ad )
∂xi dt t = ai

Example 17.3.4. Consider again the function f : R2 → R given by

f ( x1 , x2 ) = x12 + 2x1 x2 + 3x24 .

By the previous proposition, to determine whether the partial deriva-


tive
∂f
( x1 , x2 )
∂x2
exists in a point ( x1 , x2 ) and to determine its value, we just verify that

d d 2
f ( x1 , t ) = ( x + 2x1 t + 3t4 ) = 2x1 + 12x23 .
dt t = x2 dt 1 t = x2

We conclude as before that


∂f
( x1 , x2 ) = 2x1 + 12x23 .
∂x2
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 233

17.4 The Jacobian of a map


In Section 15.5 and in particular in Definition 15.5.1 we introduced the
Jacobian of a map (with respect to some bases). The Jacobian [ D f ] a of f
in a is the matrix (representation) of the linear map ( D f ) a with respect to
the bases v1 , . . . , vd and w1 , . . . , wm . We will now give a way to compute
the Jacobian, that was already announced in Section 15.5 but back then we
didn’t have the means to prove our statements.
First we start with the particular case when f : Ω → Rm with Ω ⊂ Rd ,
and we choose the standard bases of unit vectors in Rd and Rm .

Proposition 17.4.1. Suppose f : Ω → Rm is a function defined on


an open domain Ω ⊂ Rd , and suppose f is differentiable in a point
a ∈ Ω. Then the Jacobian matrix of f (with respect to the standard
bases) is given by
 
∂ f1 ∂ f1 ∂ f1

 ∂x1 ( a ) ∂x2 ( a ) ··· ∂xd ( a )


 
∂ f2 ∂ f2 ∂ f2
 

 ∂x1 ( a ) ∂x2 ( a ) · · · ∂xd ( a ) 

[ D f ] a :=  
.. .. .
 
.. ..
. . .
 
 
 
 
 ∂ fm ∂ fm ∂ fm 
∂x ( a ) ∂x2 ( a ) · · · ∂x ( a )
1 d

In other words, for all x ∈ Rd , it holds that


  
∂ f1 ∂ f1 ∂ f1

 ∂x1 ( a ) ∂x2 ( a ) ··· ∂xd ( a )

 x1 
  
∂ f2 ∂ f2 ∂ f2
  

 ∂x1 ( a ) ∂x2 ( a ) · · · ∂xd ( a ) 
 x2 
( D f )a (x) =   
.. .. . .. 
  
.. ..
. . . . 
 
 
  
  
 ∂ fm ∂ fm ∂ fm  
∂x ( a ) ∂x2 ( a ) · · · ∂x ( a )
1 d
xd
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 234

Proof. Since ( D f ) a is a linear map, it follows from linear algebra that


we just need to check that for every j = 1, . . . , m, the jth column of
the matrix corresponds to the image of the standard unit vector e j , i.e.
to ( D f ) a (e j ). However, since f is differentiable, this last expression
corresponds to the jth partial derivative of f :
∂ f1
 
∂x j ( a )
 ∂ f2 
∂f ∂x j ( a )
 
( D f ) a (e j ) = ( a) = 
 
∂x j .. 

 . 

∂ fm
∂x ( a )
j

so indeed this corresponds with the jth column of the matrix.

In the more general case of a map f : Ω → W from a subset Ω in a finite-


dimensional vector space V to a finite dimensional vector space W with
basis v1 , . . . , vd and w1 , . . . , wm and coordinate maps Φ and Ψ respectively,
we can compute [ D f ] a from the coordinate representation f¯ = Ψ ◦ f ◦ Φ−1
of f .

Proposition 17.4.2. Let f : Ω → W with Ω ⊂ V open, and let v1 , . . . , vd


be a basis of V with coordinate map Φ and let w1 , . . . , wm be a basis of
W with coordinate map Ψ. Let a ∈ Ω. Then the Jacobian of f with
respect to these bases is given by

[ D f ] a = [ D f¯]Φ(a)

where f¯ := Ψ ◦ f ◦ Φ−1 is the coordinate representation of f .

17.5 Linearization and tangent planes


If a function f : Ω → W is differentiable in a ∈ Ω, then by definition it can
be well approximated by an affine function. This affine function is also
called the linearization of f .
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 235

Definition 17.5.1 (Linearization). Let f : Ω → W be differentiable in


a point a ∈ Ω. Then the linearization of f is the function L a : V → W
given by
L a ( x ) = f ( a) + ( D f )a ( x − a)

Recall that the graph of a function f : Ω → R is the following subset of


Ω × R:
Graph( f ) := {( x, f ( x )) | x ∈ Ω}.

Definition 17.5.2. Let f : Ω → R, where Ω is a subset of a normed


vector space V. Assume f is differentiable in a ∈ Ω. Then the tangent
plane to the graph of f at a is the graph of the linearization L a of f , i.e.

Ta := {(v, L a (v)) | v ∈ V }

Definition 17.5.3. Let f : Ω → R where Ω is a subset of a normed


vector space V. Let a ∈ Ω, and set c := f ( a). Assume f is differentiable
in a with ( D f ) a 6= 0. Then the tangent plane to the level set

f −1 ( c ) = { x ∈ V | f ( x ) = c }

at a is given by
{ x ∈ V | L a ( x ) = c }.

17.6 The gradient of a function

Definition 17.6.1. Let f : Ω → R be a function from an open domain


Ω in (Rd , k · k2 ) to (R, | · |) and suppose f is differentiable in the point
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 236

a ∈ Ω. Then we call the vector


 
∂f

 ∂x1 ( a ) 

 
∇ f ( a) := 
 .. 
 . 

 
 
∂f
∂xd ( a )

the gradient of f in the point a.

If a function f : Ω → R is differentiable in a point a, then the derivative


( D f ) a relates to the gradient ∇ f ( a) as follows.

Proposition 17.6.2. Let f : Ω → R be a function from an open domain


Ω in (Rd , k · k2 ) to (R, | · |) and suppose f is differentiable in the point
a. Then for all v ∈ Rd ,
d
∂f
T
( D f ) a (v) = (∇ f ( a), v) = (∇ f ( a)) v = ∑ ∂xi (a)vi
i =1

where (·, ·) denotes the standard inner product on Rd .

Proposition 17.6.3. Let f : Ω → R from an open domain Ω in (Rd , k ·


k2 ). Assume f is differentiable in a point a ∈ Ω with ( D f ) a 6= 0. Set
c := f ( a). Then the tangent plane to the level set f −1 (c) at a is given
by
a + { x ∈ Rd | (∇ f ( a), x ) = 0}.

17.7 Exercises
Exercise 17.7.1. Consider the function f : R2 → R given by
( 2
( x2 )
x1 if x1 6= 0,
f (( x1 , x2 )) :=
0 if x1 = 0.
CHAPTER 17. DIRECTIONAL AND PARTIAL DERIVATIVES 237

(a). Show that for all v ∈ R2 , the directional derivative ( Dv f )0 (i.e. the
directional derivative at 0 in the direction of v) exists and compute
its value.

(b). Show that f is not continuous in 0 ∈ R2 .

Exercise 17.7.2. Consider the map g : (0, ∞) × R → R2 defined by

g((r, φ)) := (r cos φ, r sin φ).

(a). Show that g is differentiable.

(b). Compute for every (r, φ) ∈ (0, ∞) × R the Jacobian [ Dg](r,φ) .

Exercise 17.7.3. Let f : Ω → W be differentiable in a point a ∈ Ω and let


v1 , . . . , vd and w1 , . . . , wm be bases of V and W respectively. Show that the
matrix (representation) [ D f ] a of ( D f ) a with respect to the bases {v j } and
{wi } is a matrix [bij ] of which the element in the ith row and jth column
equals
bij = ( Dv j (Ψi ◦ f )) a
where Ψ1 , . . . , Ψm is the dual basis to w1 , . . . , wm .
Hint: It may help to re-read Section 15.5 and to apply the Chain rule.

Exercise 17.7.4. Consider the function f : R3 → R given by

f (( x1 , x2 , x3 )) = sin ( x1 )2 + ( x2 )3 + (cos( x3 ))4




(a). Prove that for all a ∈ R3 the partial derivatives

∂f ∂f ∂f
( a ), ( a ), ( a)
∂x1 ∂x2 ∂x3
exist and compute their values.

(b). Compute for every a ∈ R3 , the gradient ∇ f ( a).


Chapter 18

The Mean-Value Inequality

In this chapter, we prove what is perhaps the most important inequality


of Analysis. It is a statement about functions f : [ a, b] → W, that are con-
tinuous on [ a, b] and differentiable on ( a, b). In one version, the inequality
says that
k f (b) − f ( a)kW ≤ sup k f 0 (t)kW (b − a).
t∈( a,b)

To get the most out of this inequality, you often don’t apply it to a function
directly, but rather to the difference of two functions. This difference could
for instance be a difference of the function you are interested in, and a
linear function. By picking a good difference of functions, the right-hand
side in the inequality actually becomes small, so that you can conclude
that the left-hand side becomes small too.

18.1 The mean-value inequality for functions de-


fined on an interval
Lemma 18.1.1 (Mean-value inequality (v0)). Let f : [ a, b] → W be con-

238
CHAPTER 18. THE MEAN-VALUE INEQUALITY 239

tinuous on [ a, b] and differentiable on ( a, b). Then

k f (b) − f ( a)kW ≤ sup k f 0 (t)kW (b − a).


t∈( a,b)

Proof. Denote
K := sup k f 0 (t)kW .
t∈( a,b)

In this first part of the proof we will a few times use an ”It suffices to
show that-” construction to reduce what we need to show to an easier
statement.
We first claim that it suffices to show that for all ā ∈ ( a, b)

k f (b) − f ( ā)kW ≤ K (b − ā). (18.1.1)

To see why this suffices, note that the left-hand side and right-hand
side can be viewed as continuous functions of ā. Therefore, if we know
that (18.1.1) holds, we can take the limit as ā → a on both sides, and
conclude that also

k f (b) − f ( a)kW ≤ K (b − a).

Let therefore ā ∈ ( a, b). We aim to show (18.1.1).


We now claim that it suffices to show that for all e > 0, all s ∈ [ ā, b],

k f (s) − f ( ā)kW ≤ (K + e)(s − ā). (18.1.2)

To prove the claim, we aim to show (18.1.1) from (18.1.2). First note
that if (18.1.2) holds for all e > 0 and all s ∈ [ ā, b], then it also holds for
all e > 0 and s = b. We now argue by contradiction. Suppose

k f (b) − f ( ā)kW > K (b − ā)


CHAPTER 18. THE MEAN-VALUE INEQUALITY 240

Then, we may define

k f (b) − f ( ā)kW
 
1
e1 : = −K >0
2 b − ā

so that
k f (b) − f ( ā)kW > (K + e1 )(b − ā).
Yet when we choose e = e1 in (18.1.2), we obtain

k f (b) − f ( ā)kW ≤ (K + e1 )(b − ā).

which is a contradiction. We conclude that (18.1.1) holds.


After all these reductions, we are left with showing that for all e > 0
and all s ∈ [ ā, b], indeed (18.1.2) holds.
Let e > 0.
We will now show three claims.
Our first claim is that inequality (18.1.2) holds for s = ā. This holds
because
k f ( a) − f ( ā)kW = 0 ≤ 0 = (K + e)( ā − ā).

Our second claim is that whenever inequality (18.1.2) holds for all
s ∈ [ ā, c) for some c ∈ [ ā, b], it also holds for s = c. This follows
since the left-hand side and the right-hand of the inequality (18.1.2)
are continuous when interpreted as functions in s.
Our third claim is that whenever the inequality holds for all s ∈ [ ā, c]
for some c ∈ [ ā, b), there exists a δ > 0 such that the inequality holds
for all s ∈ [ ā, c + δ).
To prove this third claim, let c ∈ [ ā, b) and assume the inequality holds
for all s ∈ [ ā, c]. Since f is differentiable in c, there exists a δ > 0 such
that for all s ∈ [c, δ),
f
k f (s) − f (c) − f 0 (c)(s − c)kW = kErrc (s)kW ≤ e|s − c| = e(s − c)
CHAPTER 18. THE MEAN-VALUE INEQUALITY 241

Therefore by the triangle inequality for all s ∈ [c, δ),


f
k f (s) − f (c)kW = kErrc (s) + f 0 (c)(s − c)kW
≤ k f 0 (c)(s − c)kW + e(s − c) ≤ (K + e)(s − c)

As a consequence,

k f (s) − f ( ā)kW ≤ k f (s) − f (c)kW + k f (c) − f ( ā)kW


≤ (K + e)(s − c) + (K + e)(c − ā) = (K + e)(s − ā).

which shows that indeed inequality (18.1.2) is also satisfied for all s ∈
[ ā, c + δ). Hence we have proved the third claim.
We now define the set S as those s ∈ [ ā, b] such that for all σ ∈ [ ā, s],
inequality (18.1.2) is satisfied. In other words,

S := {s ∈ [ ā, b] | k f (σ ) − f ( ā)kW ≤ (K + e)(σ − ā) for all σ ∈ [ ā, s]}.

Note that just from its definition, it follows that S is either the empty
set, or just the point { ā} or it is an interval that is closed on the left with
ā as the left endpoint. From the first claim, we know that ā ∈ S, so S is
not empty. The second claim tells us that S is closed, so it is either { ā}
or it is a closed interval of the form [ ā, c] with c ∈ ( ā, b]. The third claim
gives a contradiction when S = { ā} or S = [ ā, c] with c < b. Therefore
S = [ ā, b] and inequality (18.1.2) is satisfied in s = b.

18.2 The mean-value inequality for functions on


general domains
Before we can state the mean-value inequality for functions defined on
general domains, let’s have a small recap about the operator norm
k · k V →W
defined on the space of linear maps Lin(V, W ). What we need to remember
about this norm is the following. Given a linear map L : V → W, the norm
k LkV →W is the smallest constant K ∈ R such that for all v ∈ V,
k L(v)kW ≤ K kvkV .
CHAPTER 18. THE MEAN-VALUE INEQUALITY 242

In other words,

i. for all v ∈ V, and every L ∈ Lin(V, W ),

k L(v)kW ≤ k LkV →W kvkV

ii. for every L ∈ Lin(V, W ) we have the following. If K ∈ R is a constant


such that for all v ∈ V

k L(v)kW ≤ K kvkV ,

then
k LkV →W ≤ K.

If we combine the mean-value inequality from the previous section with


the chain rule, we obtain the following version of the mean value-inequality.

Corollary 18.2.1 (Mean-value inequality). Let f : Ω → W be differentiable


on an open domain Ω ⊂ V. Then, for all a, b ∈ Ω, if for every τ ∈ (0, 1),
also
(1 − τ ) a + τb ∈ Ω
then

k f (b) − f ( a)kW ≤ sup k( D f )(1−τ )a+τb kV →W kb − akV .


τ ∈(0,1)

The derivation of Corollary 18.2.1 from Lemma 18.1.1 is the topic of Exer-
cise 18.4.1.

Lemma 18.2.2. Suppose f : Ω → W is differentiable on Ω, and sup-


pose its derivative function D f : Ω → Lin(V, W ) is bounded. Let
a ∈ Ω and assume r > 0 is such that B( a, r ) ⊂ Ω. Then for all
x ∈ B( a, r ),
f
kErr a ( x )kW ≤ sup k( D f )z − ( D f ) a kV →W k x − akV
z∈ B( a,r )
CHAPTER 18. THE MEAN-VALUE INEQUALITY 243

18.3 Continuous partial derivatives implies dif-


ferentiability
As a first consequence of the Mean-Value Inequality, let us show that func-
tions with continuous partial derivatives are continuously differentiable.
This is quite useful, because this way, in order to conclude that a function
is differentiable, it suffices to show that the partial derivatives exist and
that they are continuous.
Let’s first define what it means for a function to be continuously differen-
tiable.

Definition 18.3.1. We say a function f : Ω → W is continuously differ-


entiable if it is differentiable and its derivative ( D f ) : Ω → Lin(V, W )
is a continuous function on Ω.

We are now ready to state the proposition.

Proposition 18.3.2. Let f : Ω → W be a function defined on some open


set Ω ⊂ Rd and let a ∈ Ω. Assume that there exists a radius r > 0 such
that for all x ∈ B( a, r ), and for all i ∈ {1, . . . , d}, the partial derivative

∂f
(x)
∂xi

exists and the function


∂f
:Ω→W
∂xi
is continuous on B( a, r ).
Then the function f is continuously differentiable on B( a, r ).

Proof. We need to show that for all b ∈ B( a, r ), the function f is dif-


ferentiable in b. Define ρ := r − kb − ak2 . Then B(b, ρ) ⊂ B( a, r ) and
therefore the partial derivatives exist and are continuous on B(b, ρ).
As a possible candidate for the derivative, we define the linear map
CHAPTER 18. THE MEAN-VALUE INEQUALITY 244

Lb : Rd → W by
d
∂f
Lb (v) = ∑ ∂xi (b)vi .
i =1

f
Now define the error function Errb : Ω → W by
f
Errb ( x ) = f ( x ) − ( f (b) + Lb ( x − b)).

According to the definition of differentiability, in order to show that


the linear map Lb is the derivative of f in b ∈ Ω, we need to show that
f
kErrb ( x )kW
lim = 0.
x → b k x − b k2

Let e > 0.
By assumption, for every i ∈ {1, . . . , d}, the partial derivative

∂f
:Ω→W
∂xi

is continuous. Therefore, by the definition of continuity, for every i ∈


{1, . . . , d} there exists a δi > 0 such that for all z ∈ B(b, δi ),

∂f ∂f e
(z) − (b) < .
∂xi ∂xi W d

Choose such δi and choose

δ := min(δ1 , . . . , δd , ρ).
f
Now let x ∈ B(b, δ). To show that Errb ( x ) is small, we are going to ap-
ply the Mean-Value Inequality (a few times), on paths that are parallel
to the axes in Rd .
CHAPTER 18. THE MEAN-VALUE INEQUALITY 245

We define the points

y0 := (b1 , . . . , bd )
y j : = ( x 1 , . . . , x j , b j + 1 , b j +2 , . . . , bd )
y d : = ( x1 , . . . , x d )

and write
d
∂f
= f ( x ) − f (b) − ∑
f
Errb ( x ) (b)( xi − bi )
i =1
∂xi
d  d
 ∂f
= ∑ f ( y i ) − f ( y i −1 ) − ∑ (b)( xi − bi ).
i =1 i =1
∂x i

We now apply the Mean-Value Inequality to the functions gi given by

∂f
gi (t) = f ( x1 , . . . , xi−1 , t, bi+1 , . . . , bd ) − (b)(t − bi )
∂xi

and find that


d
∂f ∂f

f
kErrb ( x )k ≤ sup (z) − (b) | x i − bi |
i =1 z∈ B(b,δ)
∂xi ∂xi W

e d
d i∑
< | x i − bi |
=1
e
≤ d k x − b k2
d
= e k x − b k2 .

Now that we know that the function f is differentiable, we also know


that
 ∂f ∂f
( D f ) b ( x1 , . . . , x d ) = ( b ) x1 + · · · + (b) xd .
∂x1 ∂xd
and therefore, for c ∈ B( a, r ), using first the triangle inequality and
CHAPTER 18. THE MEAN-VALUE INEQUALITY 246

then the Cauchy-Schwarz inequality

d
∂f ∂f


k ( D f )c − ( D f )b ( x1 , . . . , xd )kW ≤ (c) − (b) | xi |
i =1
∂xi ∂xi W
v
2
u d
∂f ∂f
≤ t∑
u
(c) − (b) k x k2
i =1
∂xi ∂xi W

It follows that
v
2
u d
∂f ∂f
≤ t∑
u
k( D f )c − ( D f )b kV →W (c) − (b)
i =1
∂xi ∂xi W

so that the continuity of ( D f ) follows from the continuity of the partial


derivatives.

18.4 Exercises
Exercise 18.4.1. Prove Corollary 18.2.1.

Exercise 18.4.2. Give a proof of Lemma 18.2.2.

Exercise 18.4.3. Define the subset Ω ⊂ R2 as follows

Ω := {( x1 , x2 ) ∈ R2 | k( x1 , x2 )k2 > 1.9}.

Let f : Ω → W be a differentiable function and assume that for all a ∈ Ω,

k( D f ) a kR2 →W ≤ 5.

Prove that
k f ((2, 0)) − f ((−2, 0))kW ≤ 10π.
Exercise 18.4.4. Consider the function f : R2 → R given by

( x1 )2 ( x2 )7
(
( x1 )2 +( x2 )2
if ( x1 , x2 ) 6= (0, 0)
f (( x1 , x2 )) :=
0 if ( x1 , x2 ) = (0, 0).
CHAPTER 18. THE MEAN-VALUE INEQUALITY 247

a. Show that f is differentiable on R2 by showing that the partial deriva-


tives exist and are continuous.

b. For a ∈ R2 , compute ∇ f ( a).


Chapter 19

Higher order derivatives

The second order derivative is the derivative of the derivative, the third
order derivative is the derivative of the second order derivative, the fourth
order derivative is the derivative of the third order derivative, etc.. This
way, we create quite complicated objects, and I would therefore like to
encourage you to, at least at first, study what the statements in this chapter
are, and what the mathematical objects are, rather than the proofs of the
statements.
As a bit of help, here’s a list of most important messages for this chapter:

• That the (n + 1)st derivative is the derivative of the nth derivative


• The interpretation of the nth order derivative in terms of iterated
directional derivatives.
• Concluding higher-order differentiability from continuity of higher
order derivatives
• The symmetry of nth order derivatives

19.1 Definition of higher order derivatives


Higher order derivatives are defined inductively. If f : Ω → W is differ-
entiable on a domain Ω ⊂ V, then the derivative itself can be interpreted

248
CHAPTER 19. HIGHER ORDER DERIVATIVES 249

as a function
( D f ) : Ω → Lin(V, W )
i.e. it is a function from Ω to Lin(V, W ). Because Lin(V, W ) is a finite-
dimensional vector space again, we can use the definition of differentia-
bility to check whether the function ( D f ) : Ω → Lin(V, W ) is differen-
tiable in a point a. If so, we denote the derivative of ( D f ) in the point a by
( D ( D f )) a .
If ( D f ) is differentiable in every point a in V, then we say f is twice differ-
entiable, and the second derivative is a function
( D ( D f )) : Ω → Lin(V, Lin(V, W )).
Similarly, the third derivative is a function
( D ( D ( D f ))) : Ω → Lin(V, Lin(V, Lin(V, W ))).

The general definition can be given by induction. We first define the space
Linn (V, W ) inductively.

Definition 19.1.1. We set Lin1 (V, W ) := Lin(V, W ) and for every n ∈


N \ {0}, we define Linn+1 (V, W ) := Lin(V, Linn (V, W )).

Definition 19.1.2 (Higher-order derivatives). Let n ∈ N \ {0, 1}. Sup-


pose f : Ω → W is n times differentiable on a ball B( a, r ) ⊂ Ω. We then
say that f is (n + 1) times differentiable in the point a if the function

D n f : B( a, r ) → Linn (V, W )

is differentiable in a. The (n + 1)th derivative in the point a is then


defined as

( D n+1 f ) a := ( D ( D n f )) a ∈ Linn+1 (V, W ).

19.2 Multi-linear maps


The space Linn (V, W ) is a bit cumbersome to work with, but we may
equivalently interpret elements from Linn (V, W ) as o-called multi-linear
CHAPTER 19. HIGHER ORDER DERIVATIVES 250

maps from the n-fold Cartesian product

V ×n = V × · · · × V
n times

to W.

Definition 19.2.1 (Multi-linear maps). A map L : V ×n → W is called


multi-linear, or n-linear, if for every i ∈ {1, . . . , n} and every v1 , . . . , vi−1 , vi+1 , . . . , vn ∈
V, the map
u 7→ L(v1 , . . . , vi−1 , u, vi+1 , . . . , vn )
(which is a map from the vector space V to the vector space W) is
linear. We will denote the vector space of n-linear maps from V ×n to
W by MLin(V ×n , W ).

The statement that we may equivalently interpret elements from Linn (V, W )
as multi-linear maps, precisely means that there is an invertible linear map
from Linn (V, W ) to MLin(V ×n , W ) that preserves norm (with the choice of
norm on MLin(V ×n , W ) that we will give later). Intuitively, this has as a
consequence that for all accounts and purposes these two spaces are the
same.
The linear mapJn that brings elements in Linn (V, W ) to multilinear maps
in MLin(V ×n , W ) is given by

(Jn A)(v1 , . . . , vn ) = A(v1 )(v2 ) · · · (vn ).

More precisely, we define the map Jn inductively, namely

J1 A : = A

while for n ∈ N \ {0},

(Jn+1 A)(v1 , · · · , vn+1 ) := Jn ( A(v1 ))(v2 , · · · , vn+1 )

We define the following norm on the space MLin(V ×n , W ) of multi-linear


maps from V ×n to W:

k LkMLin(V ×n ,W ) = sup k L(v1 , . . . , vn )kW (19.2.1)


kv1 kV ,...,kvn kV ≤1
CHAPTER 19. HIGHER ORDER DERIVATIVES 251

We also define inductively the maps Kn : MLin(V ×n , W ) → Linn (V, W ),


which will turn out to be the inverses of the map Jn . We define

K1 B = B

and
(Kn+1 B)(v1 ) = Kn ( B(v1 , · · · )).

Proposition 19.2.2. Then Jn is invertible with inverse Kn , and it pre-


serves the norm.

Proof. We are going to prove this by induction.


For n = 1, the maps Jn and Kn are both just the identity map. There-
fore it is clear that they are each others inverses. They also preserve
the norm, because the norm k · kV →W and k · kMlin(V ×1 ,W ) are actually
the same.
Now let n ∈ N \ {0} and assume the statement is proven for Jn . We
will first show that Kn+1 ◦ Jn+1 is the identity.
Let A ∈ Linn+1 (V, W ) and let v1 ∈ V. Then
 
(Kn+1 ◦ Jn+1 ( A))(v1 ) = Kn+1 (Jn+1 ( A)) (v1 )
 
= Kn Jn+1 ( A)(v1 , · · · )
= Kn (Jn ( A(v1 )))
= A ( v1 ).

We will now show that Jn+1 ◦ Kn+1 is the identity. Let v1 ∈ V and let
B ∈ MLin(V ×(n+1) , W ). Then

(Jn+1 (Kn+1 ( B)))(v1 , · · · ) = Jn ((Kn+1 B)(v1 ))


= Jn (Kn ( B(v1 , · · · )))
= B ( v1 , · · · ).
CHAPTER 19. HIGHER ORDER DERIVATIVES 252

Finally, Jn+1 preserves the norm since

k AkLinn+1 (V,W ) = sup k A(v1 )kLinn (V,W )


k v1 kV ≤1
= sup kJn ( A(v1 ))kMLin(V ×n ,W )
k v1 kV ≤1
= sup sup kJn ( A(v1 ))(v2 , . . . , vn+1 )kW
kv1 kV ≤1 kv2 kV ≤1,...,kvn+1 kV ≤1
= sup sup kJn+1 A(v1 , v2 , . . . , vn+1 )kW
kv1 kV ≤1 kv2 kV ≤1,...,kvn+1 kV ≤1
= kJn+1 AkMLin(V ×(n+1) ,W ) .

As a consequence of the previous proposition, it doesn’t really matter


whether we view a map as an element of Linn (V, W ) or as an element of
MLin(V ×n , W ).
We will therefore just leave out the explicit application of Jn , i.e. we write
A instead of Jn ( A) and use notation
A ( v1 , . . . , v n )
and
A ( v1 ) · · · ( v n )
interchangeably.

19.3 Relation to n-fold directional derivatives


It is a bit difficult sometimes to interpret nth order derivatives, but it gets
easier if we relate them to directional derivatives. If f : Ω → W is a
function, defined on an open subset Ω of V, if v1 ∈ V and the directional
derivative ( Dv1 f ) a exists in every point a ∈ Ω, then we can build the func-
tion
( Dv1 f ) : Ω → W, a 7 → ( Dv 1 f ) a
that maps a ∈ Ω to ( Dv1 f ) a ∈ W. Now we can continue and take a new
direction v2 ∈ V and check if the directional derivatives of the function
CHAPTER 19. HIGHER ORDER DERIVATIVES 253

( Dv1 f ) in the direction of v2 exists in the point a. In notation, we can check


whether the directional derivative
( Dv2 ( Dv1 f )) a
exists. We call this a two-fold directional derivative. If it exists for every
a ∈ Ω, then we can consider the function
( Dv2 ( Dv1 f )) : Ω → W, a 7→ ( Dv2 ( Dv1 f )) a
and see if directional derivatives of this function exist to get three-fold
directional derivatives. Continuing in this way, we generally obtain n-fold
directional derivatives.
The following proposition states that if a function f is n times differen-
tiable, then also all n-fold directional derivatives exist and the nth deriva-
tive and the n-fold directional derivatives have a simple relationship to
each other.

Proposition 19.3.1. Suppose a function f : Ω → W is n times differen-


tiable in a point a ∈ Ω. Then all directional n-fold derivatives exist in
a and for all v1 , . . . , vn ∈ V,

( D n f ) a (vn , vn−1 , · · · , v2 , v1 ) = ( Dvn ( Dvn−1 · · · ( Dv2 ( Dv1 f )) · · · )) a .

For functions defined on a subset of Rd , we can relate nth order derivatives


to partial derivatives of partial derivatives of partial derivatives (n times).
In particular if a function f : Ω → W, with Ω ⊂ Rd , is n times differen-
tiable, then all partial derivatives up to order n exist. For instance, if f is 3
times differentiable in a point a ∈ Ω, then
 
3 ∂ ∂ ∂
( D f ) a ( e1 , e5 , e2 ) = f ( a ).
∂x1 ∂x5 ∂x2

19.4 A criterion for higher differentiability


The following theorem is often useful in practice to verify that a function
is n times differentiable.
CHAPTER 19. HIGHER ORDER DERIVATIVES 254

Theorem 19.4.1. Let f : Ω → W where Ω is an open subset of Rd . If


all partial derivatives of f of order less than or equal to n exist, and if
all partial derivatives of order n are continuous on Ω, then f is n times
differentiable on Ω.

19.5 Symmetry of second order derivatives

Lemma 19.5.1. Let f : Ω → W be a function defined on an open do-


main Ω ⊂ V. Let a ∈ Ω and assume that f is twice differentiable in a.
Then for all u, v ∈ V,

( D2 f ) a (u, v) = ( D2 f ) a (v, u).

Proof. We will show that for all u, v ∈ V, the limit

1
lim ( f ( a + tu + tv) − f ( a + tu) − f ( a + tv) + f ( a)) (19.5.1)
t →0 t 2

exists and is equal to


( D2 f ) a (u, v)
Note that the expression in (19.5.1) remains the same when u and v are
interchanged, so that the limit is also equal to

( D2 f ) a (v, u).

From there, we conclude that ( D2 f ) a (u, v) = ( D2 f ) a (v, u).


Consider now for s, t real numbers (that are small enough so that a +
su + tv ∈ Ω) the expression

k f ( a + su + tv) − f ( a + su) − f ( a + tv) + f ( a) − st( D2 f ) a (u, v)kW


= k f ( a + su + tv) − st( D2 f ) a (u, v) − f ( a + su) − f ( a + tv) + f ( a)kW
CHAPTER 19. HIGHER ORDER DERIVATIVES 255

By the Mean-Value Inequality, applied to the function

g(t) := f ( a + su + tv) − st( D2 f ) a (u, v) − f ( a + tv)

we find that

k f ( a + su + tv) − st( D2 f ) a (u, v) − f ( a + su) − f ( a + tv) + f ( a)kW


= k g(t) − g(0)kW
≤ sup k( D f ) a+su+τv (v) − s( D2 f ) a (u, v) − ( D f ) a+τv (v)kW |t|
τ ∈(−|t|,|t|)
(19.5.2)

We now use the differentiability of the function ( D f ) : Ω → Lin(V, W )


in the point a so that
Df
( D f ) a+su+τv = ( D f ) a + ( D ( D f )) a (su + τv) + Err a ( a + su + τv)

and
Df
( D f ) a+τv = ( D f ) a + ( D ( D f )) a (τv) + Err a ( a + τv)
Note that the left-hand side and right hand side of these equations are
linear maps. We apply these linear maps to the vector v ∈ V and find

( D f ) a+su+τv (v) := ( D f ) a (v) + ( D ( D f )) a (su + τv)(v)


Df
+ Err a ( a + su + τv)(v)
= ( D f a )(v) + s( D2 f ) a (u, v) + τ ( D2 f ) a (v, v)
Df
+ Err a ( a + su + τv)(v)

and
Df
( D f ) a+τv (v) := ( D f ) a (v) + ( D ( D f )) a (τv)(v) + Err a ( a + τv)(v)
Df
:= ( D f ) a (v) + τ ( D2 f ) a (v, v) + Err a ( a + τv)(v).
CHAPTER 19. HIGHER ORDER DERIVATIVES 256

We now continue from the final line in the computation (19.5.2)

sup k( D f ) a+su+τv (v) − s( D2 f ) a (u, v) − ( D f ) a+τv (v)kW |t|


τ ∈(−|t|,|t|)
Df Df
= sup kErr a ( a + su + τv)(v) − Err a ( a + τv)(v)kW |t|
τ ∈(−|t|,|t|)
 
Df Df
≤ sup Err a ( a + su + τv) V →W
+ Err a ( a + τv) V →W
|t|kvkV .
τ ∈(−|t|,|t|)

It follows that for t 6= 0,

1
( f ( a + su + tv) − f ( a + su) − f ( a + tv) + f ( a)) − ( D2 f ) a (u, v)
t2 W
1 
Df Df

≤ sup Err a ( a + tu + τv) + Err a ( a + τv) k v kV
|t| τ ∈(−|t|,|t|) V →W V →W

and observe that the limit of the right-hand side as t → 0 is indeed


0.

19.6 Symmetry of higher-order derivatives


The previous section also has an immediate consequence for the symmetry
of higher-order derivatives. In fact, if f is n times differentiable in a point
a ∈ Ω, then for every permutation σ : {1, . . . , n} → {1, . . . , n} we have

( D n f ) a ( v 1 , v 2 , · · · , v n ) = ( D n f ) a ( v σ (1) , v σ (2) , · · · , v σ ( n ) )

With the following definition, we can just express this by saying that if f
is n times differentiable in a point a, that then D n f can be viewed as an
element in the space of n-linear, symmetric maps Symn (V, W ), where the
space Symn (V, W ) is defined as follows.

Definition 19.6.1. We denote by Symn (V, W ) the collection of symmet-


ric, n-linear maps from V ×n to W. That is, a map S : V ×n → W is in
Symn (V, W ) if and only if it is linear in every argument and if for every
CHAPTER 19. HIGHER ORDER DERIVATIVES 257

permutation σ : {1, . . . , n} → {1, . . . , n} it holds that

S(v1 , · · · , vn ) = S(vσ(1) , · · · , vσ(n) ).

In conclusion, we may as well write

D n f : Ω → Symn (V, W )

19.7 Exercises
The following exercises are mostly meant as practice to get familiar with
the concepts and theorems in this chapter.
Exercise 19.7.1. Consider the function f : Ω → R where Ω = R2 given by

f (( x1 , x2 )) = exp(( x1 )2 − ( x2 )3 )

Show that f is twice differentiable on R2 by going through the following


steps:

a. Show that the partial derivative functions of f , namely


∂f ∂f
:Ω→R and :Ω→R
∂x1 ∂x2
exist and compute them.

b. Show that the second order partial derivative functions

∂2 f ∂2 f
:Ω→R :Ω→R
∂x1 ∂x1 ∂x2 ∂x1
∂2 f ∂2 f
:Ω→R :Ω→R
∂x1 ∂x2 ∂x2 ∂x2
exist and compute them.

c. Now show that the second order partial derivatives are continuous
and conclude, by quoting the right theorem, that f is twice differen-
tiable.
CHAPTER 19. HIGHER ORDER DERIVATIVES 258

Exercise 19.7.2. Consider the function f : R2 → R given by

f (( x1 , x2 )) = ( x1 )5 ( x2 )8

a. For arbitrary u ∈ R2 , give the function

Du f : R2 → R

b. For arbitrary v ∈ R2 , give the function

( Dv ( Du f )) : R2 → R

c. Define the vectors u := (1, 3) and v := (7, 2). Let a = (1, 1) ∈ R2 .


Give
( D2 f ) a (v, u)

d. For the same choice of u, v and a, also give

( D2 f ) a (u, v)

Exercise 19.7.3. About a certain function f : R5 → R2 the following is


known in a point a ∈ R5 . The function is 4 times differentiable in a, and

( D4 f ) a (e2 , e3 , e3 , e5 ) = (5, 0)
( D4 f ) a (e2 , e3 , e5 , e5 ) = (2, 3)
∂4 f
( a) = (0, 1)
∂x5 ∂x3 ∂x3 ∂x3
∂4 f
( a) = (1, 2)
∂x5 ∂x3 ∂x5 ∂x3

Give
( D4 f ) a (e3 − 2e2 , 6e5 , e3 + e5 , e3 ).
Chapter 20

Polynomials and approximation


by polynomials

20.1 Homogeneous polynomials


A homogeneous polynomial in d variables of degree k is a polynomial
of which every term (i.e. every monomial) has degree precisely k. For
instance, the function

( x1 , x2 ) 7→ 3x14 x21 + 2x13 x22 + x25

is a homogeneous polynomial in two variables of degree 5.

Definition 20.1.1 (multi-index). A d-dimensional multi-index α of or-


der k ∈ N is a map {1, . . . , d} → N such that

α1 + · · · + α d = k

We write |α| for the order of a multi-index α.

If α is a multi-index, we use the notation


α
x α = x1α1 x2α2 · · · xd d

259
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS260

Similarly, for a function f : Ω → W where Ω is a subset of Rd , we will use


the notation
∂|α| f ∂ α1 ∂ αd
   
:= ··· f
∂x α ∂x1 ∂xd
We also define
α! := α1 !α2 ! · · · αd !

Note that
∂|α| α
x = α!
∂x α
This is maybe easier to appreciate in an example:
∂14 
3 7 4

( x 1 ) ( x 2 ) ( x 3 ) = 3!7!4!
∂x13 ∂x27 ∂x34

This observation brings us to the following proposition.

Proposition 20.1.2. Every homogeneous polynomial f : Rd → R of


degree n can be written as

1
∑ α!
sα x α
|α|=n

for some coefficients sα ∈ R. Moreover, the coefficients sα are precisely


determined by
∂|α| f
sα = (0).
∂x α

Lemma 20.1.3. Given a basis v1 , . . . , vd of the vector space V, there is


a one-to-one correspondence between homogeneous polynomials of
degree n and Symn (V, R). More precisely there is an invertible linear
map F from Symn (V, R) to the vector space of homogeneous polyomi-
als in d variables of degree n. With the linear map ι : Rd → V defined
as
ι ( x ) = x1 v1 + · · · + x d v d ,
the n-linear symmetric map S ∈ Symn (V, R) gets mapped to the ho-
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS261

mogeneous polynomial F := F (S) : Rd → R given by

1
F (S)( x ) = S(ι( x ), · · · , ι( x ))
n!

Then the following equality holds for all x ∈ Rd

1 1 α (α)
F (S)( x ) = S(ι( x ), · · · , ι( x )) = ∑ x S
n! |α|=n
α!

where
S ( α ) = S ( v i1 , v i2 , . . . , v i n )
where i1 , · · · , in ∈ {1, . . . , d} are such that v1 appears α1 times, v2 ap-
pears α2 times etc. In particular,

∂|α| F
S(α) = (0).
∂x α

In particular, an element of Symn (V, R) is completely determined by


the values on the diagonal, i.e. if S , T ∈ Symn (V, R), then S = T if
and only if for all v ∈ V,

S(v, · · · , v) = T (v, · · · , v).

Proof. Let v1 , . . . , vd be a basis of V. By n-linearity of S , the map S is


completely determined by how S evaluates on basis vectors, i.e. by
the values of  
S v i1 , . . . , v i n

where i1 , . . . , in are indices in {1, . . . , d}. Moreover, because S is sym-


metric, it does not matter in which order the basis vectors appear.
Therefore we may introduce for α a multi-index of order n the nota-
tion
S ( α ) : = S v i1 , · · · , v i n


where i1 , . . . in are such that v1 appears α1 times, v2 appears α2 times,


etc..
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS262

If we now compute
S(u, · · · , u)
where u := ∑dj=1 x j v j , we find
!
d d
1 1
n!
S(u, · · · , u) = S
n! ∑ x j1 v j1 , · · · , ∑ x jn v jn
j1 =1 jn =1
d d
1
∑ ∑

= ··· x j1 · · · x jn S v j1 , · · · , v jn
n! j1 =1 jn =1
 
1 n α (α)
= ∑
n! |α|=n α
x S

1 α (α)
= ∑ α!
x S .
|α|=n

Therefore the coefficients S (α) can be read off by inspecting the poly-
nomial F : Rd → R given by

1 1 α (α)
F ( x ) := S(ι( x ), · · · , ι( x )) = ∑ x S ,
n! |α|=n
α!

or alternatively,
1 ∂|α|
S (α) = F (0)
n! ∂x α

20.2 Taylor’s theorem


When a function f is n times differentiable in a point, it can be approxi-
mated well by its Taylor expansion.

Definition 20.2.1 (Taylor expansion). Let f : Ω → W be n times dif-


ferentiable in a point a ∈ Ω. Then the function Ta,n : V → W given
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS263

by
n
1
Ta,n ( x ) := f ( a) + ∑ k!
( D k f ) a ( x − a, · · · , x − a)
k =1
is called the Taylor expansion of f around a.

Taylor’s theorem says that the Taylor expansion provides a good approxi-
mation.

Theorem 20.2.2. Let Ω ⊂ V be open, let a ∈ Ω and suppose f : Ω → W


is n times differentiable in a point a ∈ Ω. Then there exists a function
Err a,n : Ω → W such that
n
1
f (v) = f ( a) + ∑ k!
( D k f ) a (v − a, · · · , v − a) + Err a,n (v).
k =1

and such that


kErr a,n (v)kW
lim = 0.
v→ a k v − a kn
V

Before we prove Taylor’s theorem, let’s record the following proposition.


It allows us to determine the derivatives of the Taylor expansion of f
around a, and in particular it allows us to conclude that the derivatives
up to order n of the Taylor expansion Ta,n of f of order n in the point a are
exactly the same as those of f in a, while the derivatives of order higher
than n all vanish.

Proposition 20.2.3. Let a ∈ V, let k ∈ N \ {0}, let S ∈ Symk (V, W ) and


consider the function f : V → W defined by

1
f ( x ) := S( x − a, . . . , x − a)
k!
Then

i. for all b ∈ V,
( D k f )b = S
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS264

ii. for all b ∈ V and all j > k,

( D j f )b = 0,

iii. for all b ∈ V and all j < k and all u1 , . . . , u j ∈ V,

1
( D j f ) b ( u1 , · · · , u j ) = S(u1 , · · · , u j , b − a, · · · , b − a).
(k − j)!

We can prove the above proposition for instance with help of the corre-
spondence between homogeneous polynomials and symmetric multilin-
ear forms of Lemma 20.1.3. The first approach is illustrated by Exercise
20.4.4, while the latter approach is illustrated by Exercise 20.4.2.
We will now give a sketch of the proof of Taylor’s theorem.

Proof of Theorem 20.2.2. We give a sketch of the proof. First of all, by us-
ing the previous proposition we may without loss of generality assume
that f ( a) = 0 and that all derivatives of f up to and including order n
vanish (because otherwise we just consider the function g := f − Ta,n ).
By repeatedly applying the Mean-Value Inequality we then find that

k f (v) − f ( a)kW ≤ sup k( D f )(1−τ )a+τv kV →W kv − akV


τ ∈(0,1)

≤ sup k( D2 f )(1−τ )a+τv kSym2 (V,W ) kv − ak2V


τ ∈(0,1)
≤ ···
≤ sup k( D n−1 f )(1−τ )a+τv kSymn−1 (V,W ) kv − akV
n −1
τ ∈(0,1)

and therefore
D n −1 f n −1
k f (v) − f ( a)kW ≤ sup kErr a ((1 − τ ) a + τv)kSymn−1 (V,W ) kv − akV .
τ ∈(0,1)

The following proposition is in some sense a uniqueness statement about


the Taylor expansion.
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS265

Proposition 20.2.4. Suppose f : Ω → W and g : Ω → W are both n


times differentiable in a ∈ Ω and
k f ( x ) − g( x )kW
lim n = 0.
x→a k x − a kV

Then for all k = 0, . . . , n,

( Dk f )a = ( Dk g)a .

We would now like to give a version of Taylor’s theorem in coordinates.


For that, we first need the following proposition.

Proposition 20.2.5. Let Ω ⊂ Rd be open, let a ∈ Ω and suppose f :


Ω → Rm is n times differentiable in a ∈ Ω. then for all k = 1, . . . , n
and all x ∈ Rd ,

1 1 ∂|α| f
( D k f ) a ( x, · · · , x ) = ∑ ( a) xα .
k! |α|=k
α! ∂x α

The previous proposition also implies that when q is a homogeneous poly-


nomial of degree k, that then

1
q( x ) = ( D k q)0 ( x, · · · , x ).
k!

Example 20.2.6. Let f : R2 → R, let a ∈ R2 and suppose we want to


find a function f : R2 → R such that for all u ∈ R2 ,

( D3 f ) a (u, u, u) = 2(u1 )2 (u2 ).

Note that it is necessary for the right-hand side to be a homogenous


polynomial of degree three, otherwise such an f cannot be found. We
call this homogeneous polynomial q : R2 → R.
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS266

By an earlier proposition, we know that

1
q(u) = ∑ α!
sα uα
|α|=3

where
∂3 q
sα = (0).
∂x α
We know by the previous proposition that if such a function f exist
that then for all u ∈ R2 ,

1 ∂|α| f 1 2
∑ α! ∂xα (a)uα = 3! ( D3 f )a (u, u, u) = 3! (u1 )2 (u2 ).
|α|=3

If we compare the left-hand side and right-hand side this may suggest
us to find a function such that
1 ∂3 f 2
2
( a) = .
3! (∂x1 ) ∂x2 3!

and all other partial derivatives in a vanish. Now the polynomial f :


R2 → R given by

1
f ( x ) 7→ ( x1 − a1 )2 ( x2 − a2 )
3
is such a polynomial.

Theorem 20.2.7 (Taylor’s theorem in coordinates). Let Ω ⊂ Rd be


open, let a ∈ Ω and suppose f : Ω → Rm is n times differentiable
in the point a ∈ Ω. Then, defining the function Err a,n : Ω → Rm by
 
1∂ f| α |
Err a,n ( x ) := f ( x ) −  f ( a) + ∑ α
( a)( x − a)α 
1≤|α|≤n
α! ∂x
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS267

we have that
kErr a,n ( x )k2
lim = 0.
x→a k x − ak2n

Definition 20.2.8. Let Ω ⊂ Rd be open, let a ∈ Ω and suppose f : Ω →


R is n times differentiable in the point a ∈ Ω. Then the polynomial

1 ∂|α| f
f ( a) + ∑ α
( a)( x − a)α
1≤|α|≤n
α! ∂x

is called the nth order Taylor polynomial of f around the point a.

In the context of Proposition 20.2.4, if g is an at most nth degree polynomial


such that
| f ( x ) − g( x )|
lim =0
x → a k f ( x ) − g ( x )kn
2
then it is necessarily the Taylor polynomial of f .
The approximation formula simplifies a bit if f is just a function of one
variable.
Corollary 20.2.9 (Taylor’s theorem for functions of one variable). Let Ω ⊂
R be open and let f : Ω → R be a function such that f is n times differen-
tiable in a point a ∈ Ω. Then there exists a function Err a,n : Ω → R such
that
n
1
f ( x ) = f ( a) + ∑ f (k) ( a) · ( x − a)k + Err a,n ( x )
k =1
k!
and such that
|Err a,n ( x )|
lim = 0.
x→ a | x − a|n

Here f (1) ( a) is notation for f 0 ( a), f (2) ( a) is notation for f 00 ( a) etc..


Finally, we note that we can give an explicit expression for the error term if
f maps to R and f has higher differentiability than the order of the Taylor
polynomial.
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS268

Theorem 20.2.10 (Taylor’s theorem with Lagrange remainder). Let f :


Ω → R be (n + 1) times differentiable on Ω. Let a ∈ Ω. Then there
exists a θ ∈ (0, 1) such that

1 ∂|α| f
f ( x ) = f ( a) + ∑ α! ∂xα (x − a)α
1≤|α|≤n
1
+ ( D n+1 f ) a+θ (x−a) ( x − a, . . . , x − a).
( n + 1) !

20.3 Taylor approximations of standard functions


When we apply the theorems of the previous sections to the standard func-
tions, we get the following approximations.

Corollary 20.3.1. For every n ∈ N, it holds that


n
xk
exp( x ) = ∑ k! + O(|x|n+1 )
k =0
n
x2k+1
sin( x ) = ∑ (−1)k (2k + 1)! + O(|x|2n+3 )
k =0
n
x2k
cos( x ) = ∑ (−1)k (2k)!
+ O(| x |2n+2 )
k =0
n
1
ln(1 + x ) = ∑ (−1)k+1 k xk + O(|x|n+1 )
k =1

The notation
f ( x ) = g( x ) + O(| x | N )
should be read as that there exists a C ≥ 0 and a δ > 0 such that for all
x ∈ (−δ, δ),
| f ( x ) − g( x )| ≤ C | x | N .
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS269

20.4 Exercises
Exercise 20.4.1. Consider the function f : R2 → R given by
π  
f (( x1 , x2 )) := sin ( x1 )2 + ( x2 ) .
2

a. Show that f is differentiable on R2 and compute the partial deriva-


tive functions
∂f ∂f
: R2 → R and : R2 → R.
∂x1 ∂x2

b. Show that all second order partial derivatives of f exist, compute


them, and show that they are continuous.

c. Give the second-order Taylor polynomial T2 : R2 → R of f around


the point (1, 2).

d. Show that
| T2 ( x ) − f ( x )|
lim = 0.
( x1 ,x2 )→(1,2) k( x1 , x2 ) − (1, 2)k22

Exercise 20.4.2. The aim of this exercise is essentially to provide a proof of


Proposition 20.2.3 in case k = 2. You can therefore not use this proposition
in this exercise.
Let S be a symmetric, 2-linear map, from V ×2 to W. Now consider the
map f : V → W given by

f (v) := S(v, v).

a. Show that f is differentiable on V, and that for all a ∈ V, and all


u ∈ V,
( D f ) a (u) = 2S(u, a)

b. Show that f is twice differentiable on V, and that for all a ∈ V, and


all u ∈ V,
( D2 f ) a (u, u) = 2S(u, u).
CHAPTER 20. POLYNOMIALS AND APPROXIMATION BY POLYNOMIALS270

Note: You can use that there exists a constant K ≥ 0 such that for all
u, v ∈ V it holds that

kS(u, v)kW ≤ K kukV kvkV

Some extra information: Such a constant exists because we assume that V


and W are finite-dimensional, and the smallest such constant is actually
the norm kSkMLin(V ×2 ,W ) of S , that was introduced in (19.2.1).

Exercise 20.4.3. Determine whether the following limit exists, and if so,
determine its value:
exp(( x1 )2 + ( x2 )2 ) − 1
lim
( x1 ,x2 )→(0,0) sin(( x1 )2 + ( x2 )2 )

Exercise 20.4.4. Let α and β be two d-dimensional multi-indices.

a. Suppose that for all i ∈ {1, . . . , d}, it holds that αi ≤ β i . Give an


expression for
∂|α| β
x
∂x α
and prove that your expression is correct. Hint: Make an induction
argument on the order of α.

b. Suppose that there is an i ∈ {1, . . . , d} such that αi > β i . Show that

∂|α| β
x = 0.
∂x α
Chapter 21

Banach fixed point theorem

21.1 The Banach fixed point theorem

Definition 21.1.1 (Contraction). Let ( X, distX ) and (Y, distY ) be metric


spaces. Let D ⊂ X be a subset of X and let f : D → Y be a function.
We say that f is a contraction if there exists a κ ∈ [0, 1) such that for all
x, z ∈ D, it holds that

distY ( f ( x ), f (z)) ≤ κdistX ( x, z).

If f satisfies this inequality for all x, z, and a constant κ ∈ [0, 1), then
we will also sometimes say that f is a κ-contraction.

We first formulate the theorem for arbitrary metric spaces. Afterwards, we


give a version for Rd .

Theorem 21.1.2 (Banach fixed point theorem, metric space version).


Let ( X, distX ) be a metric space. Let D ⊂ X be a non-empty and com-
plete subset of X. Let f : D → D be a function, let κ ∈ [0, 1) and
assume that for all x, z ∈ D, it holds that

distX ( f ( x ), f (z)) ≤ κ distX ( x, z).

271
CHAPTER 21. BANACH FIXED POINT THEOREM 272

Then there exists a unique point p ∈ D such that f ( p) = p.


Moreover, for all q ∈ D, if we define the sequence ( x (n) )n by

x (0) : = q
(21.1.1)
x ( n +1) : = f ( x ( n ) ) for n ∈ N.

Then the sequence ( x (n) )n converges to p, and for all n ∈ N,

κn
distX ( x (n) , p) ≤ distX ( x (1) , x (0) ). (21.1.2)
1−κ

Before given the proof of the theorem, we first formulate a version in case
X is just Rd . Because every closed subset of Rd is complete, we get the
following theorem.

Theorem 21.1.3 (Banach fixed point theorem, Rd version). Let D ⊂ Rd


be closed and non-empty. Let f : D → D be a function, let κ ∈ [0, 1)
and assume that for all x, z ∈ D, it holds that

k f ( x ) − f (z)k2 ≤ κ k x − zk2 .

Then there exists a unique point p ∈ D such that f ( p) = p.


Moreover, for all q ∈ D, if we define the sequence ( x (n) )n by

x (0) : = q
(21.1.3)
x ( n +1) : = f ( x ( n ) ) for n ∈ N.

Then the sequence ( x (n) )n converges to p, and for all n ∈ N,

κn
k x ( n ) − p k2 ≤ k x (1) − x (0) k 2 . (21.1.4)
1−κ

Proof of the metric space version of the Banach Fixed Point theorem. We will
first show that f has at most one fixed point. To this end, let z ∈ X be
a fixed point of f , i.e. f (z) = z, and let p ∈ X be another fixed point of
CHAPTER 21. BANACH FIXED POINT THEOREM 273

f . We need to show that z = p. Because z and p are fixed points, we


find that

distX ( p, z) = distX ( f ( p), f (z)) ≤ κdistX ( p, z)

so that
(1 − κ )distX ( p, z) = 0
and therefore distX ( p, z) = 0, from which it indeed follows that p = z.
We will now show that for all q ∈ D, the sequence ( x (n) )n defined
inductively by (21.1.1) converges to a fixed point p ∈ D. From this, of
course, it follows immediately that such a fixed point exists.
Let q ∈ D.
Now define the sequence ( x (n) )n inductively according to (21.1.1), i.e.

x (0) : = q
x ( n +1) : = f ( x ( n ) ) for n ∈ N.

Then for all n ∈ N,

distX ( x (n+2) , x (n+1) ) = distX ( f ( x (n+1) ), f ( x (n) ))


(21.1.5)
≤ κ distX ( x (n+1) , x (n) )

It follows by induction that for all n ∈ N it holds that

distX ( x (n+1) , x (n) ) ≤ κ n distX ( x (1) , x (0) ) (21.1.6)


CHAPTER 21. BANACH FIXED POINT THEOREM 274

By the triangle inequality, for all m, n ∈ N with m ≤ n,

distX ( x (n) , x (m) ) ≤ distX ( x (n) , x (n−1) ) + · · · + distX ( x (m+1) , x (m) )


n −1
= ∑ distX (x(`+1) , x(`) )
`=m
n −1
≤ ∑ κ ` distX (x(1) , x(0) )
`=m
κm − κn
= distX ( x (1) , x (0) )
1−κ
κm
≤ distX ( x (1) , x (0) ).
1−κ

Therefore, the sequence ( x (n) ) is a Cauchy sequence. Since D is com-


plete, it follows that ( x (n) ) is convergent, to some p ∈ D say.
We will now show that f ( p) = p. For this, we first write inequality
(21.1.6) as
distX ( f ( x (n) ), x (n) ) ≤ κ n distX ( x (1) , x (0) ).
Since the sequence ( x (n) ) converges to p, it follows for instance from
Proposition 5.6.3 that also the sequence ( f ( x (n) )) converges to p. On
the other hand, f is Lipschitz continuous and therefore continuous (see
Exercise 13.11.5) so that by the sequence characterization of continuity
we know that
p = lim f ( x (n) ) = f ( p).
n→∞

Note that for all m, n ∈ N, with m > n,

distX ( p, x (n) ) ≤ distX ( p, x (m) ) + distX ( x (m) , x (n) )


κn
≤ distX ( p, x (m) ) + distX ( x (1) , x (0) ).
1−κ
By taking the limit m → ∞, we find that

κn
distX ( p, x (n) ) ≤ distX ( x (1) , x (0) ).
1−κ
CHAPTER 21. BANACH FIXED POINT THEOREM 275

21.2 An example

Example 21.2.1. Consider the function

F : [0, 1]2 → R2

given by
 
1 1 1 1
F (( x1 , x2 )) := ( x2 )2 + x1 , arctan( x1 ) +
6 3 π 2

We would like to show that there is a unique fixed point q of the func-
tion F in the set [0, 1]2 (i.e. F (q) = q). To do this, we would like to
apply Banach’s fixed point theorem, so we need to check the condi-
tions of the theorem.
First note that [0, 1]2 is closed.
We will now show that the range of F is contained in [0, 1]2 ⊂ R2 .
Let ( x1 , x2 ) ∈ [0, 1]2 . Then

1 1 1 1
0 ≤ F1 (( x1 , x2 )) = ( x2 )2 + x1 ≤ + < 1
6 3 6 3
and because for all z ∈ R, −π/2 < arctan(z) < π/2

1 1 1π 1
F2 (( x1 , x2 )) = arctan( x1 ) + < + =1
π 2 π2 2
and
1 1 1π 1
F2 (( x1 , x2 )) = arctan( x1 ) + > − + = 0.
π 2 π2 2
So indeed F maps into [0, 1]2 .
CHAPTER 21. BANACH FIXED POINT THEOREM 276

We will now show that F is a contraction. In fact we will show that for
all x, y ∈ [0, 1]2 ,
k F ( x ) − F (y)k2 ≤ κ k x − yk2

with κ = 31 3, so indeed κ is strictly smaller than 1.
In proving such an inequality, the Mean-Value Inequality will often
play an important role.

1 1 1 1
| F1 (( x1 , x2 )) − F1 ((y1 , y2 ))| = ( x2 )2 + x1 − ( y2 )2 − y1
6 3 6 3
1   1
≤ ( x2 )2 − ( y2 )2 + ( x1 − y1 )
6 3
1 1
= | x2 − y2 || x2 + y2 | + | x1 − y1 |
6 3
1 1
≤ | x2 − y2 |(| x2 | + |y2 |) + | x1 − y1 |
6 3
1 1
≤ | x2 − y2 | + | x1 − y1 |
3 3
We now use the inequality that for all a, b ∈ R,

( a + b)2 ≤ 2a2 + 2b2 ,

(which follows from the Cauchy-Schwarz inequality).


It follows that
2
| F1 (( x1 , x2 )) − F1 ((y1 , y2 ))|2 ≤ (( x1 − y1 )2 + ( x2 − y2 )2 )
9

1 1
| F2 (( x1 , x2 )) − F2 ((y1 , y2 ))| = arctan( x1 ) − arctan(y1 )
π π

To estimate this, we can use the mean-value inequality. Since for all
t ∈ R,
1
0 ≤ arctan0 (t) = ≤1
1 + t2
CHAPTER 21. BANACH FIXED POINT THEOREM 277

the Mean-Value inequality yields that for all a, b ∈ R, if a < b then

| arctan( a) − arctan(b)| ≤ sup | arctan0 (t)|(b − a)


t∈( a,b)
≤ 1 · | b − a |.

Because this inequality is symmetric in a and b, we actually don’t need


the assumption that a < b, and we have that for all a, b ∈ R,

| arctan( a) − arctan(b)| ≤ |b − a|.

It follows that
1 1
| F2 (( x1 , x2 )) − F2 ((y1 , y2 ))| = arctan( x1 ) − arctan(y1 )
π π
1
= | arctan( x1 ) − arctan(y1 )|
π
1
≤ | x1 − y1 |
π

Therefore

k F (( x1 , x2 )) − F ((y1 , y2 ))k22 = | F1 (( x1 , x2 )) − F1 ((y1 , y2 ))|2


+ | F2 (( x1 , x2 )) − F2 ((y1 , y2 ))|2
2 1
≤ (( x1 − y1 )2 + ( x2 − y2 )2 ) + 2 ( x1 − y1 )2
9 π
2 1
≤ (( x1 − y1 )2 + ( x2 − y2 )2 ) + ( x1 − y1 )2
9 9
1
≤ k( x1 , x2 ) − (y1 , y2 )k22 .
3

21.3 Exercises
Exercise 21.3.1. Consider the function F : [−1, 1]2 → R2 given by
 
1 1 1 1 3 1
F (( x1 , x2 )) := sin( x2 ) + x1 + , ( x1 ) −
2 3 6 4 6
CHAPTER 21. BANACH FIXED POINT THEOREM 278

Show that the function F has a fixed point.


Chapter 22

Implicit function theorem

22.1 The objective


Before we state the implicit function theorem, it would be good to explain
some notation.
We will be considering continuously differentiable functions f : Ω ⊂
Rd+m → Rm . It is good to think of the vector space Rd+m as the vector
space Rd ⊕ Rm , i.e. as the vector space of pairs ( x, y) of vectors x ∈ Rd
and y ∈ Rm .
The implicit function theorem comes to rescue in the following situation:
when we want to know that there exists a function g that satisfies for
(some) x ∈ Rd ,
f ( x, g( x )) = 0 (22.1.1)
and when we want to know that g has nice properties, i.e. that g itself is
continuously differentiable. Rather than giving a functional description of
g, the function g is what-is-called implictly defined by the equation (22.1.1).

Example 22.1.1. A standard example is when f : R1+1 → R is the


function f (( x, y)) = x2 + y2 − 1, and we would like to write ‘y in terms
of x‘. More precisely, we would like to find a function g such that

f (( x, g( x ))) = x2 + ( g( x ))2 − 1 = 0.

279
CHAPTER 22. IMPLICIT FUNCTION THEOREM 280

We immediately see two issues here:

• Such a function g cannot be defined for all x (the equation has no


solutions if | x | > 1),

• and for | x | < 1, there are always two possible solutions.

The first issue will be addressed by assumptions in the implicit function


theorem: the theorem will need a good starting position, i.e. a point ( a, b)
such that f ( a, b) = 0, but it will also need a condition on the derivative
of f in the point ( a, b). This condition will prohibit us from applying the
theorem in the problematic points (1, 0) and (−1, 0) in the example above.
The second issue will be addressed by the conclusions in the implicit func-
tion theorem: it only makes statements about points close to ( a, b).

22.2 Notation
Before we describe the theorem, we will need to introduce more notation.
Since we will assume the function f : Ω → Rm to be continuously dif-
ferentiable (with Ω an open subset of Rd+m ), we will have that in a point
( a, b) ∈ Ω, the derivative ( D f )(a,b) exists and is a linear map from Rd+m to
Rm . To get a feeling for this, let’s see what it looks like in an example.

Example 22.2.1. We could for instance be considering the function F :


R3+2 → R2 defined by

F (( x1 , x2 , x3 ), (y1 , y2 )) = (( x1 )2 y2 + y1 − 2, sin( x2 y2 ) + ( x3 )4 − 3)

The function F is indeed differentiable and


 
h1

 h2 

( DF )(a,b) (((h1 , h2 , h3 ), (k1 , k2 ))) = [ DF ](a,b) 
 h3 

 k1 
k2
CHAPTER 22. IMPLICIT FUNCTION THEOREM 281

where the Jacobian [ DF ](a,b) of F in the point ( a, b) ∈ R3+2 is given by

∂ f1 ∂ f1 ∂ f1 ∂ f1 ∂ f1
!
∂x1 (( a, b )) ∂x2 (( a, b )) ∂x3 (( a, b )) ∂y1 (( a, b )) ∂y2 (( a, b ))
[ DF ](a,b) = ∂ f2 ∂ f2 ∂ f2 ∂ f2 ∂ f2
∂x1 (( a, b )) ∂x2 (( a, b )) ∂x3 (( a, b )) ∂y1 (( a, b )) ∂y2 (( a, b ))
( a1 )2
 
2a1 b2 0 0 1
= 3
0 cos( a2 b2 )b2 4( a3 ) 0 cos( a2 b2 ) a2

We will denote by
( D1 f )(a,b) : Rd → Rm
the restriction of the derivative ( D f )(a,b) : Rd+m → Rm to the subspace
Rd ⊂ Rd+m . In other words, for all h ∈ Rd ,

( D1 f )(a,b) (h) = ( D f )(a,b) ((h, 0)).


Similarly, we will denote by

( D2 f )(a,b) : Rm → Rm

the restriction of the derivative ( D f )(a,b) : Rd+m → Rm to the subspace


Rm ⊂ Rd+m . In other words, for all k ∈ Rm ,

( D2 f )(a,b) (k) = ( D f )(a,b) ((0, k)).

By linearity of ( D f )(a,b) , we have the following relationship

( D f )(a,b) (h, k) = ( D1 f )(a,b) (h) + ( D2 f )(a,b) (k).


We will denote the matrix representations (with respect to the standard
bases) of the maps ( D1 f )(a,b) and ( D2 f )(a,b) by [ D1 f ](a,b) and [ D2 f ](a,b)
respectively.
Then
( D f )(a,b) ((h, k)) = ( D1 f )(a,b) (h) + ( D2 f )(a,b) (k)
 
h1  
k 1
= [ D1 f ](a,b)  h2  + [ D2 f ](a,b)
k2
h3
CHAPTER 22. IMPLICIT FUNCTION THEOREM 282

Example 22.2.2. In our previous example, the matrix [ D1 f ](a,b) is given


by
∂ f1 ∂ f1 ∂ f1
!
∂x1 (( a, b )) ∂x2 (( a, b )) ∂x3 (( a, b ))
[ D1 f ](a,b) = ∂ f2 ∂ f2 ∂ f2
∂x1 (( a, b )) ∂x2 (( a, b )) ∂x3 (( a, b ))
 
2a1 b2 0 0
=
0 cos( a2 b2 )b2 4( a3 )3

and the matrix [ D2 f ](a,b) is given by

∂ f1 ∂ f1
!
∂y1 (( a, b )) ∂y2 (( a, b ))
[ D2 f ](a,b) = ∂ f2 ∂ f2
∂y1 (( a, b )) ∂y2 (( a, b ))
( a1 )2
 
1
=
0 cos( a2 b2 ) a2

22.3 The implicit function theorem


We are now ready to formulate the implicit function theorem.

Theorem 22.3.1 (Implicit function theorem). Let Ω ⊂ Rd+m and let


f : Ω → Rm be a function which is continuously differentiable. Let
a ∈ Rd and b ∈ Rm and assume that ( a, b) ∈ Ω and f (( a, b)) = 0.
Suppose that ( D2 f )(a,b) is invertible (or equivalently, that the matrix
[ D2 f ](a,b) is non-singular). Then there exists an r1 > 0 and an r2 > 0,
and a continuously differentiable function g : B( a, r1 ) → Rm such that
for all x ∈ B( a, r1 ) and all y ∈ B(b, r2 ),

f ( x, y) = 0 if and only if y = g ( x ).

Moreover, for all x ∈ B( a, r1 ),

( Dg) x = −( D2 f )− 1
( x,g( x ))
◦ ( D1 f )(x,g(x)) . (22.3.1)
CHAPTER 22. IMPLICIT FUNCTION THEOREM 283

The expression for the derivative of g is actually easy to derive when you
already know that g is differentiable. Because in that case, we can start
from the equality
f ( x, g( x )) = 0,
and use the chain rule to compute that
( D1 f )(x,g(x)) + ( D2 f )(x,g(x)) ◦ ( Dg) x = 0.

We then use that ( D2 f )( x,g( x)) is non-singular, multiply by its inverse, and
conclude the expression (22.3.1).
The proof of this theorem is very technical, but the underlying idea is sim-
ple and very beautiful. Given an x ∈ Rd close to a, how do we find y such
that f ( x, y) = 0? It won’t be possible to find such a y immediately, but
let’s first see what the guess y(0) := b will give us. We will then make an
error, because f ( x, y(0) ) will not be equal to 0 in general. Therefore we will
make a new guess, y(1) that aims to correct for the error. We will choose
the difference y(1) − y(0) in the way that would give us the exact solution
if f were in fact affine:

− f ( x, y(0) ) ≈ ( D2 f )(a,b) (y(1) − y(0) ).


Thus, we define
y (1 ) : = y (0 ) − ( D2 f ) − 1
f ( x, y(0) )

( a,b)

where you can see the important assumption that ( D2 f )(a,b) is non-singular
coming in.
Of course, in general y(1) will also not be the value we are looking for, i.e.
in general f ( x, y(1) ) 6= 0, but we can make a new guess y(2) , solving for
the update y(2) − y(1) again acting as if f were affine:

− f ( x, y(1) ) ≈ ( D2 f )(a,b) (y(2) − y(1) ).


Continuing in this fashion, we arrive at the following recursive scheme:

y ( n + 1 ) : = y ( n ) − ( D2 f ) − 1 (n)

( a,b)
f ( x, y )

And that’s basically it! We just need to show that this works (that this re-
cursive scheme converges, and that the resulting solutions depend nicely
CHAPTER 22. IMPLICIT FUNCTION THEOREM 284

on x). But this is exactly the virtue of the Banach fixed point theorem. In
other words, we need to check that we can apply the Banach fixed point
theorem, but if we can apply it then we indeed find a y such that

y = y − ( D2 f ) − 1
( a,b)
f ( x, y)

from which it follows that indeed f ( x, y) = 0.


This finishes the intuition for the approach. To show that the involved map
is indeed a contraction, we will need to use the Mean-Value Inequality.
Now follows the full proof, but it may be good to skip the proof on first
reading.

Proof. Right at the beginning of this proof, we choose two radii r1 and
r2 according to the following criteria:

i. The Cartesian product B( a, r1 ) × B(b, r2 ) is contained in Ω,

ii. for all x ∈ B( a, r1 ) and all y ∈ B(b, r2 ) we have


f r2
max(k( D2 f )− 1
k m m , 1)kErr(a,b) (( x, y))k2 <
( a,b) R →R 3

iii. for all x ∈ B( a, r1 ) and all y ∈ B(b, r2 ) we have


r2
k( D2 f )− 1
k m m k( D1 f )(a,b) kRd →Rm k x − ak2 <
( a,b) R →R 3

iv. for all x ∈ B( a, r1 ) and all y ∈ B(b, r2 ) we have

1
k( D2 f )− 1
k m m k( D2 f )(x,z) − ( D2 f )(a,b) kRm →Rm < .
( a,b) R →R 2

v. For all x ∈ B( a, r1 ) and all y ∈ B(b, r2 ),

k( D1 f )(x,y) kRd →Rm < 2k( D1 f )(a,b) kRd →Rm


CHAPTER 22. IMPLICIT FUNCTION THEOREM 285

For x ∈ B( a, r1 ) we define F ( x) : B(b, r2 ) → Rm by

F ( x ) ( y ) : = y − ( D2 f ) − 1

( a,b)
f ( x, y ) .

We will want to use the Banach fixed point theorem to the function
F ( x) , and therefore we need to show that F ( x) maps every element in
B(b, r2 ) back into B(b, r2 ), and we need to check that F ( x) is a contrac-
tion.
We first check that F ( x) maps B(b, r2 ) back to B(b, r2 ). We show a
slightly stronger property, namely that F ( x) maps B(b, r2 ) to B(b, 2r2 /3).
Indeed, by the criteria above, if x ∈ B( a, r1 ) and y ∈ B(b, r2 ) then

k F ( x ) ( y ) − b k 2 = k y − b − ( D2 f ) − 1

( a,b)
f ( x, y ) k2
= k y − b − ( D2 f ) − 1
( a,b)
f ( a, b) + ( D1 f )(a,b) ( x − a)
f 
+ ( D2 f )(a,b) (y − b) + Err(a,b) (( x, y)) k2
f
= k − ( D2 f ) − 1

( a,b)
( D 1 f )( x − a ) + Err ( a,b)
(( x, y )) k2
2r2
< .
3
(22.3.2)

where we used the assumption that f ( a, b) = 0.


Next, we check that F ( x) : B(b, r2 ) → B(b, r2 ) is a contraction.
We note that F ( x) is continuously differentiable and

( DF (x) )z = I − ( D2 f )− 1
( a,b)
◦ ( D2 f )(x,z) .
CHAPTER 22. IMPLICIT FUNCTION THEOREM 286

We compute

k( DF (x) )z kRm →Rm


= k I − ( D2 f ) − 1
( a,b)
◦ ( D2 f )(x,z) kRm →Rm
= k I − ( D2 f ) − 1
( a,b)
◦ (( D2 f )(a,b) + ( D2 f )(x,z) − ( D2 f )(a,b) )kRm →Rm
= k( D2 f )− 1
( a,b)
◦ (( D2 f )(x,z) − ( D2 f )(a,b) )kRm →Rm
≤ k( D2 f )− 1
k m m k(( D2 f )(x,z) − ( D2 f )(a,b) )kRm →Rm
( a,b) R →R
1
<
2
where we used criterion (iv) on the choice of r1 and r2 .
It follows by the Mean-Value Inequality that for every x ∈ B( a, r1 ), the
function F ( x) : B(b, r2 ) → B(b, r2 ) is a (1/2)-contraction.
By the Banach fixed point theorem, the function F ( x) has a unique fixed
point. We define by g : B( a, r1 ) → Rm the function that assigns to
x ∈ B(r1 ) the fixed point of F ( x) . Then for every x ∈ B( a, r1 ),

f ( x, g( x )) = 0.

We will now show that the function g : B( a, r1 ) → Rm is differentiable.


Unfortunately and fortunately we need to do this in two steps. First
we will show that g is Lipschitz, and only then will we be able show
that g is differentiable.

proof that the function g : B( a, r1 ) → B(b, r2 ) is Lipschitz. To derive


that g is Lipschitz continuous, we take u ∈ B( a, r1 ) and we are going
to just use g(u) as an initial condition for the fixed-point iteration of
F(x) .
We would like to see how large is the difference

F ( x) ( g(u)) − g(u).
CHAPTER 22. IMPLICIT FUNCTION THEOREM 287

Therefore we compute

F ( x) ( g(u)) − g(u) = −( D2 f )− 1
( a,b)
( f ( x, g(u)))
= −( D2 f )− 1

( a,b)
f ( x, g(u)) − f (u, g(u))

where we used that f (u, g(u)) = 0.


Define

M := 2k( D2 f )− 1
k m m
( a,b) R →R
sup k( D1 f )(σ,τ ) kRm →Rm
(σ,τ )∈ B( a,r1 )× B(b,r2 )

which exists for instance by criterion v above.


It follows by the Mean-Value Inequality that

1
k F (x) ( g(u)) − g(u)kRm ≤ M k x − u k Rd
2

By the estimate from the Banach fixed-point theorem, also

k g( x ) − g(u)kRm ≤ Mk x − ukRd .

so that g is indeed M-Lipschitz.

proof that the function g : B(b, r2 ) → B(b, r2 ) is differentiable. Let


u ∈ B( a, r1 ) and define v := g(u). Note that in fact, v ∈ B(b, 2r2 /3) by
the estimate in (22.3.2).
We expect that the derivative of g in the point u would be

( Dg)u = −( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v)

(as this expression can be derived from the chain rule assuming that g
is indeed differentiable).
Therefore, we consider the error function
g
Erru ( x ) := g( x ) − g(u) + ( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v) ( x − u)
CHAPTER 22. IMPLICIT FUNCTION THEOREM 288

and we need to show that


g
kErru ( x )kRm
lim = 0.
x →u k x − u k d
R

We are now going to use

g ( u ) − ( D2 f ) − 1
(u,v)
◦ ( D1 f )(u,v) ( x − u)

as the initial condition in the fixed-point iteration at x. Note that this


is possible for x close enough to u, since then this point is in B(b, r2 ).
We would like to see how large is the difference

F(x) g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)
− g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)

as by the estimate from the Banach fixed-point theorem, this would


immediately give us control over

g ( x ) − g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)

as well.
Therefore, we compute

F(x) g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)
− g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)
= −( D2 f )− 1
f ( x, g(u) − ( D2 f )− 1

( a,b) (u,v)
◦ ( D1 f )(u,v) ( x − u))
= −( D2 f )− 1
( a,b)
f ( x, g(u) − ( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v) ( x − u))

− f (u, g(u))

where we used in the last line that f (u, g(u)) = 0. Because f is differ-
CHAPTER 22. IMPLICIT FUNCTION THEOREM 289

entiable, we find

F(x) g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)
− g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)
= −( D2 f )− 1
( a,b)
( D1 f )(u,v) ( x − u)
+ ( D2 f )(u,v) (−( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v) ( x − u))
f
+ Err(u,v) (( x, g(u) − ( D2 f )− 1

(u,v)
◦ ( D1 f )(u,v) ( x − u)))
f
= −( D2 f )− 1
Err(u,v) (( x, g(u) − ( D2 f )− 1

( a,b) (u,v)
◦ ( D1 f )(u,v) ( x − u))

It follows that

k F(x) g ( u ) − ( D2 f ) − 1

(u,v)
◦ ( D 1 f ) ( u,v ) ( x − u )
−1

− g(u) − ( D2 f )(u,v) ◦ ( D1 f )(u,v) ( x − u) kRm
≤ k( D2 f )− 1
k m m
( a,b) R →R
f
× kErr(u,v) (( x, g(u) − ( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v) ( x − u)))kRm

By the estimate in the Banach fixed-point theorem, we find


g
kErr a ( x )kRm
= k g ( x ) − ( g ( u ) − ( D2 f ) − 1
(u,v)
◦ ( D1 f )(u,v) ( x − u))k
≤ 2k( D2 f )− 1
k m m
( a,b) R →R
f
× kErr(u,v) (( x, g(u) − ( D2 f )− 1
(u,v)
◦ ( D1 f )(u,v) ( x − u)))kRm

so that, using the Lipschitz continuity of g, it follows by the squeeze


theorem that indeed
g
kErr a ( x )kRm
lim = 0.
x → a k x − a k Rm
CHAPTER 22. IMPLICIT FUNCTION THEOREM 290

22.4 The inverse function theorem


Theorem 22.4.1 (Inverse function theorem). Let Ω ⊂ Rm be open and
let h : Ω → Rm be a function which is continuously differentiable.
Suppose b ∈ Ω and suppose that ( Dh)b is non-singular.
Then there exists an r1 > 0 and an r2 > 0, and a continuously differ-
entiable function g : B(h(b), r1 ) → Rm such that for all x ∈ B(h(b), r1 )
and all y ∈ B(b, r2 ),

x = h(y) if and only if y = g ( x ).

Moreover, for all x ∈ B(h(b), r1 ),

( Dg) x = ( Dh)− 1
g( x )
.

In particular, for r3 > 0 small enough, the function h restricted to


B(b, r3 ) mapping to h( B(b, r3 )) is invertible with continuously differ-
entiable inverse g.

Proof. The proof of the theorem follows from applying the Implicit
Function Theorem to the function F : Rm × Ω → Rm given by

F ( x, y) := x − h(y)

The inverse function theorem is useful to conclude for instance that for
k ∈ N, the function x 7→ x1/k is differentiable on the domain (0, ∞).
The implicit function theorem would also allow us to conclude that the
function ln : (0, ∞) → R is differentiable on its domain, given that the
exponentional function exp : R → R is differentiable. But truly, we still
haven’t given a proper definition of the exponential function. The next
chapters will allow us to provide such a definition.
CHAPTER 22. IMPLICIT FUNCTION THEOREM 291

22.5 Exercises
Exercise 22.5.1. Consider the function F : R2 → R given by
F ( x, y) = x2 − y3 + 3y

a. Specify precisely the points ( a, b) on the curve


Γ := {( x, y) ∈ R2 | F ( x, y) = 1}
such that there does not exist r1 , r2 > 0 and a continuously differ-
entiable function g : B( a, r1 ) → R such that for all x ∈ B( a, r1 ) and
y ∈ B(b, r2 )
F ( x, y) = 1 if and only if y = g ( x ).

b. For a continuously differentiable function g : B( a, r1 ) → R such that


F ( x, g( x )) = 1
compute g0 ( x ) for every x ∈ B( a, r1 ) in terms of x and g( x ).
Exercise 22.5.2. Let ρ : R → R be a continuously differentiable function,
and consider the map F : R2 → R2 given by
F ( x, t) := ( x + ρ( x )t, t)
Show that there exist radii r1 > 0 and r2 > 0 and a function G : B(0, r2 ) →
B(0, r1 ) such that for all (y, s) in B(0, r2 ) and ( x, t) in B(0, r1 )
(y, s) = F (( x, t)) if and only if ( x, t) = G ((y, s)).
and for every (y, s) give an expression for
[ DG ](y,s)

Exercise 22.5.3. Consider the function F : R3 × R2 → R2 given by


F ( x1 , x2 , x3 , y1 , y2 ) = (( x2 )2 y1 + y2 − 3, cos( x3 y2 ) + y2 + ( x1 )2 )
Prove that there exist r1 , r2 > 0 and a continuously differentiable function
g : B((1, 3, 2), r1 ) → R2 such that for all x ∈ B((1, 3, 2), r1 ) and all y ∈
B((1, 0), r2 ),
F ( x1 , x2 , x3 , y1 , y2 ) = (6, 2) if and only if y = g ( x ).
Moreover, compute the Jacobian [ Dg](1,3,2) .
Chapter 23

Function sequences

23.1 Pointwise convergence

Definition 23.1.1. Let ( X, distX ) and (Y, distY ) be two metric spaces
and let D ⊂ X. We say that a sequence of functions f : N → ( D → Y )
converges pointwise to a function f ∗ : D → Y if

for all x ∈ D,
lim f n ( x ) = f ∗ ( x ).
n→∞

Example 23.1.2. We consider a case in which ( X, distX ) = (R, distR )


and D is the interval [0, 1] ⊂ R. We consider the sequence of functions
f : N → ([0, 1] → R) defined by

f n (x) = xn .

Then the sequence ( f n ) converges pointwise to the funtion f ∗ : [0, 1] →


R defined by (
0 if x ∈ [0, 1)
f ∗ (x) =
1 if x = 1.

To show this, let x ∈ [0, 1]. Then we consider two cases. In case x ∈

292
CHAPTER 23. FUNCTION SEQUENCES 293

[0, 1), then


lim f n ( x ) = lim x n = 0 = f ∗ ( x )
n→∞ n→∞
as this is a standard limit. In case x = 1, then

lim f n ( x ) = lim 1n = 1 = f ∗ ( x ).
n→∞ n→∞

23.2 Uniform convergence

Definition 23.2.1. Let ( X, distX ) and (Y, distY ) be two metric spaces
and let D ⊂ X. We say that a sequence of functions f : N → ( D → Y )
converges uniformly to a function f ∗ : D → Y if

for all e > 0,


there exists N ∈ N,
for all n ≥ N,
for all x ∈ D,
distY ( f n ( x ), f ∗ ( x )) < e.

Proposition 23.2.2. Let ( X, distX ) and (Y, distY ) be two metric spaces,
let D ⊂ X, and assume that a sequence of functions f : N → ( D → Y )
converges uniformly to a function f ∗ : D → Y. Then ( f n ) converges to
f ∗ pointwise on D.

Proof. We need to show that ( f n ) converges pointwise to f ∗ . That


means that we need to show that
for all x ∈ D,
lim f n ( x ) = f ∗ ( x ).
n→∞

Let x ∈ D. We need to show that for every e > 0, there exists an


N1 ∈ N such that for all n ≥ N1 ,

distY ( f n ( x ), f ∗ ( x )) < e.
CHAPTER 23. FUNCTION SEQUENCES 294

Let e > 0. Since ( f n ) converges to f ∗ uniformly, there exists an N ∈ N


such that for all n ≥ N and for all p ∈ D,

distY ( f n ( p), f ∗ ( p)) < e.

Choose N1 := N. Let n ≥ N1 . Then, since n ≥ N, also

distY ( f n ( x ), f ∗ ( x )) < e.

The previous proposition has a simple consequence that is very useful in


practice. Suppose for instance that you need to check whether a sequence
of functions converges uniformly, and you already know that the sequence
converges pointwise to a function f ∗ . Then you only need to check whether
the sequence of functions converges uniformly to f ∗ .

Corollary 23.2.3. Suppose a sequence of functions f : N → ( D → Y ) con-


verges pointwise to a function f ∗ : D → Y. Then ( f n ) converges uniformly
on D if and only if ( f n ) converges uniformly to f ∗ on D.

23.3 Preservation of continuity under uniform con-


vergence

Theorem 23.3.1. Let ( f n ) be a sequence of continuous functions from


a domain D in the metric space ( X, distX ) to the metric space (Y, distY )
that converges uniformly to a function g : D → Y. Then the function
g is also continuous on D.

Proof. We need to show that g : D → Y is continuous, so we need to


show that for all a ∈ D, the function g is continuous in a.
CHAPTER 23. FUNCTION SEQUENCES 295

Let a ∈ D. We need to show that


for all e > 0,
there exists δ > 0,
for all x ∈ D,
if 0 < distX ( x, a) < δ
then distY ( g( x ), g( a)) < e.

Let e > 0. Since the function sequence ( f n ) converges to g uniformly,


there exists an N ∈ N such that for all n ≥ N, and all x ∈ D
e
distY ( f n ( x ), g( x )) < .
3
Choose such an N ∈ N. Because the function f N is continuous, there
exists a δ0 > 0 such that for all x ∈ D, if 0 < distX ( x, a) < δ0 , then
e
distY ( f N ( x ), f N ( a)) < .
3

Choose δ := δ0 .
Let x ∈ D. Assume that 0 < distX ( x, a) < δ. Then by the triangle
inequality

distY ( g( x ), g( a)) ≤ distY ( g( x ), f N ( x )) + distY ( f N ( x ), f N ( a))


+ dist( f N ( a), g( a))
e e e
< + +
3 3 3
= e.

The previous theorem is sometimes very useful to rule out that a sequence
of functions converges uniformly. If the functions in the sequence are all
continuous, but the pointwise limit is not continuous, then the sequence
does not converge uniformly.
CHAPTER 23. FUNCTION SEQUENCES 296

Example 23.3.2. Consider the sequence of functions ( f n ) from [0, 1] to


R defined by
f n (x) = xn .
We have seen that the pointwise limit is g : [0, 1] → R given by
(
0, if x ∈ [0, 1)
g( x ) :=
1, if x = 1.

Because the function g is not continuous, but for every n ∈ N the


function f n is continuous (as it is a polynomial), we conclude that the
sequence ( f n ) does not converge to g uniformly.

23.4 Differentiability theorem

Theorem 23.4.1. Let ( f n ) be a sequence of functions from an open


domain Ω in a vector space V to R and suppose the sequence con-
verges pointwise to a function g : Ω → R. Suppose moreover that
the functions f n are continuously differentiable on Ω and suppose the
sequence of functions D f n : Ω → Lin(V, R) converges uniformly to a
function ∆ : Ω → Lin(V, R). Then the function g is differentiable on Ω
as well and
Dg = ∆.

Proof. To show that g : Ω → R is differentiable on Ω, we need to show


that for all a ∈ Ω, the function g is differentiable in a.
Let a ∈ Ω.
Define the error function
g
Err a ( x ) := g( x ) − g( a) − ∆ a ( x − a).
CHAPTER 23. FUNCTION SEQUENCES 297

We need to show that


g
Err a ( x )
lim = 0.
x → a k x − a kV

Let e > 0.
Because the functions D f n converge to ∆ uniformly, the function ∆ is
continuous by Theorem 23.3.1. Therefore, there exists a δ0 > 0, such
that for all z ∈ Ω, if 0 < kz − akV < δ0 then
e
k ∆ z − ∆ a k V →R < .
3

Choose such a δ0 .
Choose δ := δ0 .
Let x ∈ Ω and assume that 0 < k x − akV < δ.
Moreover, since the functions D f n converge to ∆ uniformly on Ω, there
exists an N0 ∈ N such that for all z ∈ Ω and all n ≥ N0 ,
e
k( D f n )z − ∆z kV →R < .
3
Now because the function sequence ( f n ) converges to g pointwise,
there exists an N1 ∈ N such that for all n ≥ N1 ,
e
| f n ( x ) − g( x )| < k x − akV .
6
Similarly, there exists an N2 ∈ N such that for all n ≥ N2 ,
e
| f n ( a) − g( a)| < k x − akV .
6
CHAPTER 23. FUNCTION SEQUENCES 298

Choose N := max( N0 , N1 , N2 ). Then


g
|Err a ( x )| = | g( x ) − g( a) − ∆ a ( x − a)|
≤ | f N ( x ) − f N ( a) − ∆ a ( x − a)| + | f N ( x ) − g( x )| + | f N ( a) − g( a)|
e
< | f N ( x ) − f N ( a) − ∆ a ( x − a)| + k x − akV .
3
By the Mean-Value Theorem, there exists a point y on the line segment
from a to x such that

f N ( x ) − f N ( a ) = ( D f N ) y ( x − a ).

Therefore
g e
|Err a ( x )| < | f N ( x ) − f N ( a) − ∆ a ( x − a)| + k x − akV
3
e
= |( D f N )y ( x − a) − ∆ a ( x − a)| + k x − akV
3
e
= |(( D f N )y − ∆ a )( x − a)| + k x − akV
3
e
≤ k( D f N )y − ∆ a kV →R k x − akV + k x − akV
3
e
≤ (k( D f N )y − ∆y kV →R + k∆y − ∆ a kV →R )k x − akV + k x − akV
3
< e k x − a kV

which is what we needed to show.

23.5 The normed vector space of bounded func-


tions
Definition 23.5.1. Let D be a set. The normed vector space (B( D ), k ·
k∞ ) is defined as the vector space of bounded functions from D to R
with norm k · k∞ given by

k f k∞ = sup | f ( x )|.
x∈D
CHAPTER 23. FUNCTION SEQUENCES 299

The vector space B( D ) is infinite-dimensional if D has infinitely many el-


ements.

Proposition 23.5.2. Let ( f n ) be a sequence of functions from D to R


and let f be a function. Then the sequence ( f n ) converges uniformly
to f if and only if there exists an N ∈ N such that for every n ≥ N,
the function ( f n − f ) is bounded, and such that the sequence n 7→
( f N +n − f ) converges to 0 in B( D ).

Proposition 23.5.3. Let ( f n ) be a sequence of functions from a domain


D in the metric space ( X, distX ) to the metric space (Y, distY ) and let
g : D → Y be a function. Then ( f n ) converges to g uniformly if and
only if there exists an N ∈ N such that for every n ≥ N, the function
hn given by
hn ( x ) := distY ( f n ( x ), g( x ))
is bounded and the sequence n 7→ h N +n converges to 0 in B( D ).

23.6 Exercises
Exercise 23.6.1. Determine whether the following functions converge point-
wise and whether they converge uniformly on the indicated domain. If
they converge pointwise, give the pointwise limit.

a. an ( x ) = nx exp(−nx ) on the domain R.

b. bn ( x ) = sin(nx ) on the domain R.

c. cn ( x ) = x n cos(nx ) on the domain [0, 1).

d. dn ( x ) = tan( x/n) on the domain [−1, 1].

e. en ( x ) = exp( x − n) on the domain (−∞, 5].

f. f n ( x ) = arctan(n( x − 2)) on the domain R.


Chapter 24

Function series

24.1 Definitions
Let ( X, distX ) be a metric space. Let ( f n ) be a sequence of functions from
Ω ⊂ X to R.
We say that the function series

∑ fk
k =0

converges pointwise to a function s : Ω → R if the function sequence of


partial sums
n
Sn ( x ) : = ∑ f k (x)
k =0
converges pointwise to the function s : Ω → R.
We say that the function series ∑∞
k=0 f k converges to s uniformly if the se-
quence of partial sums converges to the function s uniformly.

24.2 The Weierstrass M-test

300
CHAPTER 24. FUNCTION SERIES 301

Theorem 24.2.1 (Weierstrass M-test). Let ( f n ) be a sequence of func-


tions from Ω to R, and suppose there exists a sequence of real numbers
( Mn ) such that for all n ∈ N, all x ∈ Ω it holds that

| f n ( x )| ≤ Mn

and suppose the series



∑ Mk
k =0
converges.
Then the function series

∑ fk
k =0
converges absolutely and uniformly on Ω.

Proof. We first show that the function series



∑ fk
k =0

converges absolutely. For that we need to show that for every x ∈ Ω,


the series

∑ f k (x)
k =0
converges absolutely.
Let x ∈ Ω. By assumption, for every k ∈ N, we have | f k ( x )| ≤ Mk .
Since the series

∑ Mk
k =0
converges, it follows by the comparison test that also the series

∑ | f k (x)|
k =0
CHAPTER 24. FUNCTION SERIES 302

converges. Hence, the series



∑ f k (x)
k =0

converges absolutely.
In particular, the series

∑ fk
k =0
converges pointwise to a function s : Ω → R.
We will now show that the series

∑ fk
k =0

converges to s uniformly. Since the series



∑ Mk
k =0

converges, we know that



lim
`→∞ k=`+1
∑ Mk = 0

Let x ∈ Ω. Then
` ∞
| ∑ f k ( x ) − s( x )| = ∑ f k (x)
k =0 k=`+1

≤ ∑ | f k ( x )|
k=`+1

≤ ∑ Mk
k=`+1
CHAPTER 24. FUNCTION SERIES 303

from which it follows that


` ∞
0≤k ∑ f k − sk∞ ≤ ∑ Mk
k =0 k=`+1

and from the squeeze theorem we get that

`
lim k
`→∞
∑ f k − sk∞ = 0.
k =0

In other words, the series



∑ fk
k =0
converges to s uniformly.

The statement of the above theorem is practically equivalent to the state-


ment that the normed vector space of bounded functions on Ω, denoted
by (B(Ω), k · k∞ ), is complete.

Example 24.2.2. Consider the function series



xk
∑ k! .
k =0

We claim that for every r > 0, this series converges uniformly on the
interval [−r, r ].

Proof. Let r > 0. To verify the claim, we will apply the Weierstrass
M-test. The functions f k appearing in the theorem correspond to the
functions
xk
f k (x) = .
k!
We need to check all conditions in the Weierstrass M-test, so we need
to define a sequence ( Mk ) and verify for that choice that for all x ∈
[−r, r ] and all k ∈ N,
| f k ( x )| ≤ Mk ,
CHAPTER 24. FUNCTION SERIES 304

and in addition we would need to verify that the series



∑ Mk
k =0

converges.
First we note that for all k ∈ N and for all x ∈ [−r, r ],

| x |k rk
| f k ( x )| = ≤ .
k! k!
Therefore, we choose for ( Mk ) the sequence

rk
Mk : = .
k!
We then indeed find that for all k ∈ N and for all x ∈ [−r, r ],

| f k ( x )| ≤ Mk .

We can verify by the ratio test that the series


∞ ∞
rk
∑ Mk = ∑ k!
k =0 k =0

converges. It follows from the Weierstrass M-test that the function se-
ries

xk
∑ k!
k =0

converges uniformly on the interval [−r, r ].

24.3 Conditions for differentiation of function se-


ries
The following theorem is a direct consequence of Theorem 23.4.1 on the
preservation of differentiability.
CHAPTER 24. FUNCTION SERIES 305

Theorem 24.3.1. Let ( f n ) be a sequence of functions from Ω ⊂ R to R.


Suppose

i. for every n ∈ N, the function f n is continuously differentiable on


Ω.

ii. the function series



∑ fk
k =0
converges pointwise to a function g : Ω → R.

iii. the function series



∑ f k0
k =0
converges uniformly.

Then the function g is differentiable on Ω as well and for all x ∈ Ω,



g0 ( x ) = ∑ f k0 ( x )
k =0

24.4 Exercises
Exercise 24.4.1. Let ( f k ) be a sequence of continuously differentiable, bounded
functions R → R. Suppose that for all k ∈ N,
k f k k∞ ≤ 1
and
k f k0 k∞ ≤ 5

a. Show that the function series



∑ 2− k f k
k =0
converges pointwise to some function s : R → R.
CHAPTER 24. FUNCTION SERIES 306

b. Show that the function s : R → R is differentiable.


Exercise 24.4.2. Let D be a subset of R, and let ( f k ) be a sequence of bounded
functions from D to R, i.e. ( f k ) is a sequence in the normed vector space
(B( D ), k · k∞ ). Assume that the series

∑ fk
k =0

converges uniformly. Show that

lim k f k k∞ = 0.
k→∞

Exercise 24.4.3. Consider the sequence of functions ( f k ) from (0, ∞) to R


given by
f k ( x ) = k2 x2 exp(−k2 x )

a. Show that the series



∑ fk
k =0
converges uniformly to some function s : (0, ∞) → R.
Hint: Note that f k can be written as
1
f k (x) = g ( k2 x )
k2
for some function g : (0, ∞) → (0, ∞). What is g?

b. Show that the function s : (0, ∞) → R is continuous.

c. Show that the series



∑ f k0
k =0
does not converge uniformly on (0, ∞).

d. Show that for every a ∈ (0, ∞), the series



∑ f k0
k =0
CHAPTER 24. FUNCTION SERIES 307

does converge uniformly on the interval ( a, ∞).


Hint: Note that there exists a z ∈ R such that for b ≥ z,

b2 exp(−b) ≤ exp(−b/2)

Therefore, for k large enough, it holds for all x ∈ ( a, ∞) that

(k2 x )2 exp(−k2 x ) < exp(−k2 x/2)

e. Show that the function s : (0, ∞) → R is differentiable on (0, ∞).


Chapter 25

Power series

25.1 Definition
Definition 25.1.1. A power series at a point c ∈ R is a function series of
the form

∑ ak ( x − c)k
k =0
where a : N → R is a real-valued sequence.

25.2 Convergence of power series

Lemma 25.2.1. Suppose a power series



∑ ak ( x − c)k
k =0

converges at a point z ∈ R. Let δ > 0 be such that δ < |z − c|. Then


the power series converges absolutely and uniformly on the interval
[c − δ, c + δ].

308
CHAPTER 25. POWER SERIES 309

Proof. We will apply the Weierstrass M-test.


Since the series

∑ ak (z − c)k
k =0

converges, the sequence k 7→ ak (z − c)k converges (to zero) and there-


fore it is bounded. In other words, there exists a C > 0 such that for all
k ∈ N,
| ak (z − c)k | ≤ C.

Now note that for all k ∈ N, and all x ∈ [c − δ, c + δ],


k k
x−c

k k δ
| ak ( x − c) | = | ak | |z − c| ≤ C = : Mk . (25.2.1)
z−c |z − c|

Now note that since δ < |z − c|, it follows that δ/|z − c| < 1 so that
∞  k
δ
∑ |z − c|
k =0

is a standard convergent geometric series. By limit laws for series, the


series
∞ ∞  k
δ
∑ Mk = ∑ C | z − c |
k =0 k =0
is convergent as well.
By the Weierstrass M-test, the function series

∑ ak ( x − c)k
k =0

converges absolutely and uniformly on the interval [c − δ, c + δ].

Corollary 25.2.2. For every power series



∑ ak ( x − c)k
k =0
CHAPTER 25. POWER SERIES 310

around a point c ∈ R exactly one of the following occurs:

i. The series converges for x = c and diverges for x 6= c. In this case


we say that the radius of convergence of the power series is 0.

ii. There exists an R > 0 such that for all x ∈ (c − R, c + R) the power
series converges and for all x ∈ R\[c − R, c + R] the series diverges.
In this case we say the radius of convergence equals R.

iii. The series converges for all x ∈ R. In this case we say the radius of
convergence is ∞.

The following proposition gives a way to determine the radius of conver-


gence of a power series.

Proposition 25.2.3. Let



∑ ak ( x − c)k
k =0
be a power series around c and define the (extended real) number
q
L := lim sup k | ak |
k→∞

Then

i. if L = ∞, then the radius of convergence of the power series is 0,

ii. if L ∈ (0, ∞), then the radius of convergence of the power series
is 1/L,

iii. if L = 0, then the radius of convergence of the power series is ∞.

For the proof of the proposition, let us first give an alternative version of
the root test, Theorem 8.4.1.

Theorem 25.2.4 (Root test, lim sup version). Let (bk ) be a sequence of
nonnegative real numbers.
CHAPTER 25. POWER SERIES 311

i. If p
k
lim sup bk < 1,
k→∞
then the series ∑k bk converges.

ii. If p
k
lim sup bk > 1
k→∞
then the series ∑k bk diverges.

Proof. First suppose that


p
k
lim sup bk < 1.
k→∞

Denote by M := lim supk→∞ k bk . Then by the alternative characteri-
zation of the lim sup, there exists an N ∈ N such that for all k ≥ N,
p
k 1−M 1+M
bk < M + = < 1.
2 2
It follows by Theorem 8.4.1 that the series ∑k bk converges.
Now suppose that p
lim sup k
bk = ∞.
k→∞

Then the sequence ( k bk ) is not bounded from above, and as a conse-
quence the sequence (bk ) is not bounded from above. Therefore, the
series ∑k bk diverges.
If p
M := lim sup k
bk ∈ (1, ∞)
k→∞
then by the alternative characterization of the lim sup, for every K ∈
N, there exists an ` ≥ K such that
p
`
b` > 1.
CHAPTER 25. POWER SERIES 312

For such `, also b` > 1. It follows that the sequence (bk ) does not
converge to zero, and therefore the series ∑k bk diverges.

With this version of the root test, we can now prove Proposition 25.2.3.
Proof of Proposition 25.2.3. We would like to apply the root test. We
therefore consider
q q 
k k k
lim sup | ak ( x − c) | = lim sup | ak || x − c|
k→∞ k→∞

It is clear that the power series always converges for x = c.


If L = ∞, and x 6= c, then
q 
lim sup k
| ak || x − c| =∞
k→∞

and in particular the terms do not converge to zero. Therefore, if L =


∞, the power series only converges for x = c.
If L ∈ (0, ∞), the root test implies that the series converges if | x − c| <
1/L, then q 
k
lim sup | ak || x − c| <1
k→∞
so that by the root test it follows that the series converges. On the other
hand,if L ∈ (0, ∞) and if | x − c| > 1/L, then
q 
k
lim sup | ak || x − c| > 1
k→∞

and in particular the terms do not converge to zero. Therefore, the


power series diverges.
If L = 0, it follows from the root test that the series converges for all
x ∈ R.

25.3 Standard functions defined as power series


CHAPTER 25. POWER SERIES 313

Proposition 25.3.1. The power series



1 k
∑ k!
x
k =0

has radius of convergence R = ∞.

Definition 25.3.2. The function exp : R → R is defined as the power


series

1
exp( x ) := ∑ x k .
k =0
k!
It has radius of convergence R = ∞.

Definition 25.3.3. The function sin : R → R is defined as the power


series

1
sin( x ) := ∑ (−1)k x2k+1
k =0
( 2k + 1 ) !
It has radius of convergence R = ∞.

Definition 25.3.4. The function cos : R → R is defined as the power


series

1 2k
cos( x ) := ∑ (−1)k x .
k =0
( 2k ) !
It has radius of convergence R = ∞.

25.4 Operations with power series


CHAPTER 25. POWER SERIES 314

Proposition 25.4.1 (Sums of power series). Let


∞ ∞
∑ ak ( x − c) k
and ∑ bk ( x − c ) k
k =0 k =0

be two power series around c, with radii of convergence R1 and R2


respectively. The sum of these functions is the power series

∑ (ak + bk )(x − c)k
k =0

and the radius of convergence R for this new power series satisfies

R ≥ min( R1 , R2 ).

Proposition 25.4.2 (Products of power series). Let


∞ ∞
∑ ak ( x − z)k and ∑ bk ( x − z ) k
k =0 k =0

be two power series around a point z ∈ R with radii of convergence


R1 and R2 respectively. Then the product of the two power series is
again a power series

∑ ck ( x − z)k
k =0
where
k
ck := ∑ a` bk−` .
`=0
The radius of convergence R of this new power series satisfies

R ≥ min( R1 , R2 ).

Proof. It suffices to show that for all x ∈ R such that | x − z| < min( R1 , R2 ),
CHAPTER 25. POWER SERIES 315

the power series



∑ ck ( x − z)k
k =0
converges and

∞ ∞ ∞
! !
∑ ck ( x − z)k = ∑ ak ( x − z)k ∑ bk ( x − z ) k
k =0 k =0 k =0

Let therefore x ∈ R be such that | x − z| < min( R1 , R2 ). We can now


choose Ak := ak ( x − z)k and Bk := bk ( x − z)k in Theorem 9.3.1. It
follows from the theorem that with
k k
Ck := ∑ A` Bk−` = ∑ a` bk−` (x − z)` (x − z)k−`
`=0 `=0
k
= ck ( x − z)

we have that the series


∞ ∞
∑ Ck = ∑ ck ( x − z)k
k =0 k =0

converges and indeed

∞ ∞ ∞
! !
∑ ck ( x − z) k
= ∑ ak ( x − z) k
∑ bk ( x − z ) k
.
k =0 k =0 k =0

25.5 Differentiation of power series


The following proposition follows directly from the differentiability theo-
rem.
CHAPTER 25. POWER SERIES 316

Proposition 25.5.1. Let



∑ ak ( x − c)k
k =0
be a power series with radius of convergence R. Then, the power series
is differentiable on the interval (c − R, c + R) and on this interval, its
derivative equals the power series

∑ (` + 1)a`+1 (x − c)`
`=0

which has the same radius of convergence.

Corollary 25.5.2. Let



∑ ak ( x − c)k
k =0
be a power series with radius of convergence R. Then the power series is
infinitely many times differentiable on the interval (c − R, c + R), and for
every ` ∈ N, the `th derivative has the same radius of convergence.

Theorem 25.5.3 (Identification of coefficients). Let R > 0 and let f :


(c − R, c + R) → R be given by a power series

f ( x ) := ∑ ak ( x − c)k .
k =0

Then for all k ∈ N,


f (k) ( c )
ak = .
k!

Theorem 25.5.4 (Identity theorem for power series). Let R > 0 and let
f , g : (c − R, c + R) → R be given by power series
∞ ∞
f (x) = ∑ ak ( x − c)k and g( x ) = ∑ bk ( x − c ) k
k =0 k =0
CHAPTER 25. POWER SERIES 317

and assume for all x ∈ (c − R, c + R),

f ( x ) = g ( x ).

Then for all k ∈ N,


a k = bk .

25.6 Taylor series

Definition 25.6.1. Let f : Ω → R be a function that is infinitely many


times differentiable on some open set Ω. Assume c ∈ Ω.
Then the function series

f (k) ( c )
∑ k!
( x − c)k
k =0

is called the Taylor series of f around c.

25.7 Exercises
Exercise 25.7.1. Give the Taylor series of the function
1
f ( x ) :=
3x2
around the point c = 1 and give its radius of convergence.
Exercise 25.7.2. Prove Proposition 25.4.1.
Exercise 25.7.3. Now that the definitions of cos and sin are provided:

i. Prove using power series, (i.e. without using Proposition 16.3.1) that
for all x ∈ R,
cos0 ( x ) = − sin( x )
and
sin0 ( x ) = cos( x ).
CHAPTER 25. POWER SERIES 318

ii. Show that for all x ∈ R,

sin2 ( x ) + cos2 ( x ) = 1.

Exercise 25.7.4. Determine, as a power series around 0, the solution to the


differential equation

x2 f 00 ( x ) − 2x f 0 ( x ) + (2 − x2 ) f ( x ) = 0

with
f (0) = 0, f 0 (0) = 1, f 00 (0) = 0.
and determine the radius of convergence of the power series. Hint, use
the Ansatz

f ( x ) := ∑ ak x k
k =0
and use the theorems about the sums of power series, products of power
series and the identity theorem to determine the coefficients ak .
Chapter 26

Riemann integration in one


dimension

In this chapter, we will introduce a method to integrate functions. We will


define the Riemann integral, but we cannot define the Riemann integral of
all functions.
The main messages for this chapter are:

• Every continuous function f : [ a, b] → R is Riemann integrable.

• The fundamental theorem of calculus, mainly the part that when F :


[ a, b] → R and f : [ a, b] → R satisfy F 0 = f and f is bounded and
Riemann integrable, then
Z b
f ( x )dx = F (b) − F ( a).
a

The last statement gives a way to compute integrals in practice.

26.1 Riemann integrable functions and the Rie-


mann integral

319
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 320

Definition 26.1.1. A partition P of an interval [ a, b] (with n intervals) is


a subset { x0 , x1 , . . . , xn } ⊂ [ a, b] such that a = x0 < x1 < · · · < xn = b.

Definition 26.1.2. Let f : [ a, b] → R be a bounded function and let


P = ( x0 , x1 , . . . , xn ) be a partition of [ a, b]. Then the upper sum of f
with respect to P is defined as
n
U ( P, f ) := ∑ Mk ∆xk
k =1

where ∆xk := ( xk − xk−1 ) and

Mk : = sup f (x)
x ∈[ xk−1 ,xk ]

Similarly, we define the lower sum of f with respect to P as


n
L( P, f ) := ∑ mk ∆xk
k =1

where
mk := inf f ( x ).
x ∈[ xk−1 ,xk ]

Definition 26.1.3. Let P̃ be a partition of [ a, b] ⊂ R. A partition P is


called a refinement of P̃ if P̃ ⊂ P.
If P̃ and Q̃ are two partitions of [ a, b]. Then a partition P is called
a common refinement of P̃ and Q̃ if P is both a refinement of P̃ and a
refinement of Q̃.

Note that two partitions P̃ and Q̃ always have a common refinement P:


For P one could just take P̃ ∪ Q̃.
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 321

Proposition 26.1.4. For every bounded f : [ a, b] → R and every parti-


tion P of [ a, b], we have

L( P, f ) ≤ U ( P, f ).

Proof. Let P := { x0 , x1 , . . . , xn } be a partition of [ a, b]. Then


n n
L( P, f ) = ∑ mk ∆xk ≤ ∑ Mk ∆xk = U ( P, f )
k =1 k =1

because
mk = inf f (x) ≤ sup f ( x ) = Mk .
x ∈[ xk−1 ,xk ] x ∈[ xk−1 ,xk ]

Definition 26.1.5. Let f : [ a, b] → R be a bounded function. We define


the upper Darboux integral of f as
Z b
f dx := inf{U ( P, f ) | P partition of [ a, b]}
a

and the lower Darboux integral of f as


Z b
f dx := sup{ L( P, f ) | P partition of [ a, b]}
a

Proposition 26.1.6. Let f : [ a, b] → R be a bounded function. Then


Z b Z b
f dx ≤ f dx.
a a

We are most interested in those functions for which the upper Darboux in-
tegral agrees with the lower Darboux integral. We will call those functions
Riemann integrable.
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 322

Definition 26.1.7 (Riemann integrability and the Riemann integral).


Let f : [ a, b] → R be a bounded function. We say f is Riemann integrable
if
Z b Z b
f dx = f dx.
a a

In this case we say that the Riemann integral of f equals this common
value, i.e.
Z b Z b Z b
f dx := f dx = f dx.
a a a

Proposition 26.1.8 (Alternative characterization of Riemann integra-


bility). Let f : [ a, b] → R be bounded. Then f is Riemann integrable if
and only if

for all e > 0,


there exists a partition P of [ a, b],
U ( P, f ) − L( P, f ) < e.

Proof. First suppose that f is Riemann integrable. We need to show


that
for all e > 0,
there exists a partition P of [ a, b],
U ( P, f ) − L( P, f ) < e.

Let e > 0. There exists a partition P̃ of [ a, b] such that


Z b
U ( P̃, f ) < f ( x )dx + e/2.
a

Moreover, there exists a partition Q̃ of [ a, b] such that


Z b
L( Q̃, f ) > f ( x )dx − e/2.
a
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 323

Define the partition P of [ a, b] as P := P̃ ∪ Q̃. Then P is a common


refinement of P̃ and Q̃, i.e. it is both a refinement of P̃ and it is a
refinement of Q̃. Therefore,
Z b
f ( x )dx − e/2 < L( Q̃, f )
a
≤ L( P, f ) ≤ U ( P, f ) ≤ U ( P̃, f )
Z b
< f ( x )dx + e/2
a

It follows that
U ( P, f ) − L( P, f ) < e.

Now suppose that

for all e > 0,


there exists a partition P of [ a, b],
U ( P, f ) − L( P, f ) < e.

We need to show that f is Riemann integrable. Suppose not, then we


may set
Z b Z b
e := f ( x )dx − f ( x )dx > 0.
a a

Then by assumption there exists a partition P such that

U ( P, f ) − L( P, f ) < e.

But then
Z b Z b
e= f ( x )dx − f ( x )dx < U ( P, f ) − L( P, f ) < e.
a a

Definition 26.1.9. We denote the set of bounded, Riemann-integrable


functions f : [ a, b] → R by R[ a, b].
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 324

26.2 Sums, products of Riemann integrable func-


tions
Proposition 26.2.1 (R[ a, b] is a vector space). Let f , g : [ a, b] → R be
two bounded and Riemann-integrable functions. Then

i. The function f + g is Riemann integrable and


Z b Z b Z b
( f ( x ) + g( x ))dx = f ( x )dx + g( x )dx
a a a

ii. For every λ ∈ R, the function λ f is Riemann integrable and


Z b Z b
(λ f ( x ))dx = λ f ( x )dx.
a a

Definition 26.2.2. If f : [ a, b] → R is bounded and Riemann integrable


on [ a, b], then we define
Z a Z b
f ( x )dx := − f ( x )dx.
b a

Proposition 26.2.3 (Further properties of the Riemann integral). Let


f , g : [ a, b] → R be two bounded and Riemann integrable functions.

i. We have Z b
1dx = b − a
a

ii. Monotonicity: if for all x ∈ [ a, b] it holds that f ( x ) ≤ g( x ), then


Z b Z b
f ( x )dx ≤ g( x )dx.
a a

iii. Restriction: If z ∈ ( a, b), then f is Riemann integrable on [ a, z]


CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 325

and on [z, b] and


Z b Z z Z b
f ( x )dx = f ( x )dx + f ( x )dx.
a a z

iv. Triangle inequality: The function | f | is Riemann integrable on


[ a, b] and we have the following version of the triangle inequality
Z b Z b
f ( x )dx ≤ | f ( x )|dx
a a

v. The function f g is Riemann integrable on [ a, b].

26.3 Continuous functions are Riemann integrable

Proposition 26.3.1. Let f : [ a, b] → R be continuous. Then f is Rie-


mann integrable.

Proof. We will use the alternative characterization of Riemann integra-


bility, namely Proposition 26.1.8. We therefore need to show that

for all e > 0,


there exists a partition P of [ a, b],
U ( P, f ) − L( P, f ) < e.

Let e > 0.
Because the closed interval [ a, b] is compact, the function f is uni-
formly continuous. Therefore, there exists a δ > 0 such that for all
x, y ∈ [ a, b], if 0 < | x − y| < δ then
e
| f ( x ) − f (y)| < .
2( b − a )
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 326

Now choose an N ∈ N such that N > 1/δ. We define the partition

P : = ( x 0 , x 1 , . . . , x N +1 )

by
b−a
xi : = a + i .
N
Then
N N
U ( P, f ) − L( P, f ) = ∑ Mk ∆xk − ∑ mk ∆xk
k =1 k =1
N
= ∑ ( Mk − mk )∆xk
k =1

with
Mk = sup f (x)
x ∈[ xk−1 ,xk ]

and
mk = inf f ( x ).
x ∈[ xk−1 ,xk ]

Therefore, for all k ∈ {1, . . . , N },


e
0 ≤ Mk − m k ≤ .
2( b − a )

We find that
N
|U ( P, f ) − L( P, f )| ≤ ∑ | Mk − mk |∆xk
k =1
N
e
2(b − a) k∑
≤ ∆xk
=1
e
= (b − a)
2( b − a )
< e.
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 327

26.4 Fundamental theorem of calculus


Theorem 26.4.1 (Fundamental theorem of calculus). i. Let f : [ a, b] →
R be continuous. Then the function

F : [ a, b] → R

given by Z x
F ( x ) := f (s)ds
a
is differentiable on ( a, b) and for all x ∈ ( a, b)

F 0 ( x ) = f ( x ).

ii. Let F : [ a, b] → R be an anti-derivative of a function f : [ a, b] →


R, i.e. for all x ∈ ( a, b),

F0 (x) = f (x)

and suppose that f is bounded and Riemann integrable on [ a, b].


Then Z b
f ( x )dx = F (b) − F ( a).
a

Proof. We will first show part (i).


We will show that the function F is differentiable on ( a, b) and satisfies
for all c ∈ ( a, b)
F 0 ( c ) = f ( c ).
To do so, we define Errc : ( a, b) → R by

Errc ( x ) := F ( x ) − ( F (c) + f (c)( x − c))

and we need to show that


Errc ( x )
lim = 0.
x →c | x − c |
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 328

Let e > 0. Because f is continuous in c, there exists a δ > 0 such that


for all x ∈ ( a, b), it holds that if 0 < | x − c| < δ then
e
| f ( x ) − f (c)| < .
2
Choose such a δ.
Let x ∈ ( a, b) and assume 0 < | x − c| < δ. We have that

|Errc ( x )| = | F ( x ) − F (c) − f (c)( x − c)|


Z x
= f (s)ds − f (c)( x − c)
c
Z x
= ( f (s) − f (c)dx
c
Z x
≤ | f (s) − f (c)|dx
c
e
≤ | x − c|
2
Therefore
|Errc ( x )|
< e.
| x − c|

We will now show part (ii) of the theorem.


Let F : [ a, b] → R be an anti-derivative of a function f : [ a, b] → R and
assume that f is bounded and Riemann integrable.
We need to show that
Z b
f ( x )dx = F (b) − F ( a).
a

Let P := { x0 , x1 , . . . , xn } be a partition of [ a, b]. We can write F (b) −


F ( a) as a telescoping sum
n


F (b) − F ( a) = F ( x k ) − F ( x k −1 )
k =1
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 329

and then use the Mean-Value Theorem to conclude that there are points
ck ∈ ( xk−1 , xk ) such that
n


F (b) − F ( a) = F ( x k ) − F ( x k −1 )
k =1
n
= ∑ f (ck )( xk − xk−1 )
k =1

We now use the definition of

Mk : = sup f (x)
x ∈[ xk−1 ,xk ]

to estimate
n
F (b) − F ( a) = ∑ f (ck )( xk − xk−1 )
k =1
n
≤ ∑ Mk ∆xk
k =1
= U ( P, f ).

It follows that
Z b
F (b) − F ( a) ≤ f ( x )dx
a
Similarly, we can prove that
Z b
f ( x )dx ≤ F (b) − F ( a)
a

Therefore,
Z b Z b
f ( x )dx ≤ F (b) − F ( a) ≤ f ( x )dx
a a
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 330

Because f is integrable the left- and right-hand side of this inequality


have the same value so that we can conclude that
Z b
F (b) − F ( a) = f ( x )dx.
a

26.5 Exercises
Exercise 26.5.1. Let a, b and c be three real numbers, with a < b < c and let
g : [ a, b] → R and h : [b, c] → R be two bounded and Riemann-integrable
functions. Let f : [ a, c] → R be such that for all x ∈ ( a, b)

f ( x ) = g( x )

and for all x ∈ (b, c),


f ( x ) = h ( x ).
Show that f is Riemann integrable. (Note that the values of f in the points
a, b and c are not specified.)

Exercise 26.5.2. Consider the function f : [0, 1] → R defined by


(
2 + x2 if x ∈ R \ Q
f (x) =
−3 if x ∈ Q.

i. Let P = { x0 , x1 , . . . , xn } be a partition of [0, 1]. Show that

U ( P, f ) − L( P, f ) ≥ 5.

ii. Show that f is not Riemann integrable.

Exercise 26.5.3. Compute the following integral, carefully quoting the the-
orems that you use in your computation
Z √3
1
dx.
−1 1 + x2
CHAPTER 26. RIEMANN INTEGRATION IN ONE DIMENSION 331

Exercise 26.5.4. Let a, b and c be three real numbers, with a < b < c. As-
sume f : [ a, c] → R is bounded, and assume f is Riemann integrable on
[ a, b] and Riemann integrable on [b, c].
Prove that f is Riemann integrable on [ a, c].

Exercise 26.5.5. Suppose f : [ a, b] → R and g : [ a, b] → R are bounded.


Assume there exists an n ∈ N and that there are points y1 , . . . , yn in [ a, b]
such that for all x ∈ [ a, b] \ {y1 , . . . , yn }

f ( x ) = g ( x ).

Hint: First show that for all y ∈ [ a, b], the function

1y : [ a, b] → R

defined as (
1 if x = y
1y ( x ) =
0 if x 6= y
is Riemann integrable.
Chapter 27

Riemann integration in multiple


dimensions

In this chapter we define Riemann integration for functions defined on Rd .

27.1 Partitions in multiple dimensions

Definition 27.1.1. By a closed rectangle in Rd we mean a set R of the


form
R = [ a1 , b1 ] × · · · × [ ad , bd ]

Definition 27.1.2 (Partition of a rectangle). Let

R = [ a1 , b1 ] × · · · × [ ad , bd ]

be a closed rectangle. By a partition Q of R we mean a Cartesian prod-


uct
Q = P1 × · · · × P d
where for i ∈ {1, . . . , d}, the partition Pi = { x1i , . . . , xni i } is a partition
of [ ai , bi ].

332
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS333

27.2 Riemann integral on rectangles in Rd

Riemann integrability of functions defined on a rectangle R ⊂ Rd is de-


fined completely analogously to the Riemann integral for functions de-
fined on an interval [ a, b] ⊂ R.

Definition 27.2.1. Let R ⊂ Rd be a closed rectangle,

R = [ a1 , b1 ] × [ a2 , b2 ] × · · · × [ ad , bd ]

let f : R → R be a bounded function and let

Q = P1 × · · · × P d

be a partition of R, where for every j ∈ {1, · · · , d},


j j j
P j = { x0 , x1 , . . . , x n j }

is a partition of [ a j , b j ].
Then the upper sum of f with respect to Q is defined as
n1 nd
U ( Q, f ) := ∑ ··· ∑ Mk1 ,··· ,kd ∆xk11 · · · ∆xkdd
k 1 =1 k d =1

j j j
where ∆xk := ( xk − xk−1 ) and

Mk1 ,...,kd := sup f (x)


x ∈[ xk1 −1 ,xk1 ]×···×[ xkd −1 ,xkd ]
1 1 d d

Similarly, we define the lower sum of f with respect to Q as


n1 nd
L( Q, f ) := ∑ ··· ∑ mk1 ,··· ,kd ∆xk11 · · · ∆xkdd
k 1 =1 k d =1
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS334

where
mk1 ,...,kd := inf f ( x ).
x ∈[ xk1 −1 ,xk1 ]×···×[ xkd −1 ,xkd ]
1 1 d d

Definition 27.2.2. Let R ⊂ Rd be a closed rectangle, and let f : R → R


be a bounded function. We define the upper Darboux integral of f as
Z
f dx := inf{U ( P, f ) | P partition of R}
R

and the lower Darboux integral of f as


Z
f dx := sup{ L( P, f ) | P partition of R}
R

Definition 27.2.3. Let R ⊂ Rd be a closed rectangle and let f : R → R


be a bounded function. We say f is Riemann integrable if
Z Z
f dx = f dx.
R R

In this case we say that the Riemann integral of f equals this common
value, i.e. Z Z Z
f dx := f dx = f dx.
R R R

Proposition 27.2.4 (Alternative characterization of Riemann integra-


bility). Let R ⊂ Rd be a closed rectangle and let f : R → R be
bounded. Then f is Riemann integrable if and only if

for all e > 0,


there exists a partition P of R,
U ( P, f ) − L( P, f ) < e.
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS335

27.3 Properties of the multi-dimensional Riemann


integral
Just as in the one-dimensional case, the set of all Riemann integrable func-
tions on a rectangle R forms a vector space, and the integral is linear.

Proposition 27.3.1. Let R be a closed rectangle and let f , g : R → R be


bounded and Riemann integrable on R. Then

i. the function f + g is Riemann integrable on R and


Z Z Z
( f ( x ) + g( x ))dx = f ( x )dx + g( x )dx
R R R

ii. for all λ ∈ R, the function λ f is Riemann integrable on R and


Z Z
λ f ( x )dx = λ f ( x )dx.
R R

Proposition 27.3.2. Let R = [ a1 , b1 ] × · · · × [ ad , bd ] be a closed rectangle


in Rd and let f , g : R → R be bounded and Riemann integrable on R.
Then

i. (volume)
Z
1dx = (b1 − a1 )(b2 − a2 ) · · · (bd − ad ) =: Vol( R)
R

ii. (monotonicity) if for all x ∈ R, f ( x ) ≤ g( x ) then


Z Z
f ( x )dx ≤ g( x )dx
R R

iii. (triangle inequality) the function | f | is Riemann integrable on R


and Z Z
f ( x )dx ≤ | f ( x )|dx.
R R
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS336

iv. (additivity of domain) if Q is a closed rectangle contained in R,


then f is integrable on Q. Moreover, if Q1 , . . . , Q N are finitely
many closed rectangles,

• if their interiors are disjoint, i.e. int Qi ∩ int Q j = ∅ if i 6= j


and
• if the union of Qi ’s equals R, i.e.
N
[
Qi = R,
i =1

then
Z N Z

R
f ( x )dx = ∑ f ( x )dx.
i =1 Q i

27.4 Continuous functions are Riemann integrable


Just as with integration in one dimension, if the function f is continuous
on R, then it is Riemann integrable on R.

Theorem 27.4.1. Let R ⊂ Rd be a closed rectangle, and let f : R → R


be a continuous function. Then f is bounded and Riemann integrable
on R.

The proof is practically identical to the proof in the one-dimensional case.

27.5 Fubini’s theorem


To effectively compute the value of integrals, we need some more tools. In
fact, if possible, we would like to use the fundamental theorem of calculus
in the computation. That theorem is however a statement about integrals
in one dimension. We therefore need a way to use one-dimensional in-
tegrals in the computation of multi-dimensional integrals. The following
theorem, called Fubini’s theorem, provides such a way.
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS337

Theorem 27.5.1 (Fubini). Let R = A × B be a rectangle in Rd+m . Let


f : R → R be bounded and Riemann integrable on R, and suppose for
every x ∈ Rd the function h x : Rm → R defined by

h x (y) := f ( x, y)

is Riemann integrable. Then the function F : Rd → R given by


Z
F ( x ) := f ( x, y)dy
B

is Riemann integrable and


Z Z Z 
f (z)dz = f ( x, y)dy dx
R A B

27.6 The (topological) boundary of a set


The topological boundary of a set in a metric space is defined as those
points that are neither in the interior of the set, nor in the interior of the
complement of the set. For the subsets of (Rd , k · k2 ), this comes down to
the following definition.

Definition 27.6.1 (Topological boundary). Let E be a subset of the normed


vector space (Rd , k · k). The boundary of E is defined as

∂E := Rd \ (int E) ∪ (int(Rd \ E)) .




27.7 Jordan content

Definition 27.7.1 (Volume of a rectangle). Let

R = [ a1 , b1 ] × · · · [ ad , bd ] ⊂ Rd
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS338

be a rectangle. Then the volume of R is defined as

Vol( R) := (b1 − a1 )(b2 − a2 ) · · · (bd − ad ).

Definition 27.7.2. We say that a closed rectangle R ⊂ Rd is a cube if all


sides have the same length, i.e. if

R = [ a1 , b1 ] × · · · × [ ad , bd ]

then for all i, j ∈ {1, . . . , d},

bi − a i = b j − a j .

Definition 27.7.3. We say that a subset S ⊂ Rd has Jordan content zero


if
for all e > 0,
there exists N ∈ N,
there exist rectangles R1 , . . . , R N ,
N N
∑ Vol Ri < e.
[
S⊂ Ri and
i =1 i =1

Lemma 27.7.4. Suppose a set S ⊂ Rd has Jordan content zero. Then


for all e > 0, there exists an M ∈ N and cubes Q1 , . . . , Q M , such that
M M
∑ Vol Qi < e.
[
S⊂ Qi and
i =1 i =1

Proposition 27.7.5. Let S ⊂ Rd be a subset with Jordan content zero


and let F : S → Rd be Lipschitz. Then F (S) has Jordan content zero.
CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS339

Proposition 27.7.6. Let E be a bounded subset on Rd , and let F : E →


Rd+m be Lipschitz where m ≥ 1. Then F ( E), as a subset of Rd+m , has
Jordan content zero.

27.8 Integration over general domains

Definition 27.8.1 (Integration over bounded subsets). Let E be a bounded


subset of Rd . We say that a function f : E → R is integrable on E if,
with R some rectangle in Rd containing E, the function f E : R → R
defined by (
f ( x ) if x ∈ E,
f E ( x ) :=
0 if x ∈
/E
is integrable on R. Moreover, we define
Z Z
f ( x )dx := f E ( x )dx
E R

The above definition is actually independent of the choice of rectangle R


that contains E.

Definition 27.8.2. Let E ⊂ Rd . We say that E is a Jordan set if the


topological boundary ∂E of E has Jordan content zero.

Proposition 27.8.3. Let R ⊂ Rd be a closed rectangle and assume that


E ⊂ R is a bounded subset of Rd and assume E is a Jordan set. Let f be
a bounded and Riemann integrable function on R. Then f is integrable
on E.

27.9 The volume of bounded sets


CHAPTER 27. RIEMANN INTEGRATION IN MULTIPLE DIMENSIONS340

Definition 27.9.1 (Characteristic function of a set). Let E ⊂ Rd . The


characteristic function of E is the function 1 E : Rd → R given by
(
1 if x ∈ E
1 E ( x ) :=
0 if x ∈
/ E.

Definition 27.9.2. Let E be a bounded set such that the characteristic


function 1 E : Rd → R is Riemann integrable. Then the volume of E is
defined as Z
Vol( E) := 1dx.
E

27.10 Exercises
Exercise 27.10.1. Let S1 , . . . , Sm ⊂ Rd have Jordan content zero. Show that
the union
m
[
Si
i =1
also has Jordan content zero.

Exercise 27.10.2. Show that the unit ball B(0, 1) in R2 is a Jordan set. Hint:
Make a parametrization of the boundary of the unit ball

Exercise 27.10.3. Let E ⊂ R2 be the subset of [0, 1]2 above the curve y = x4 ,
and to the right of the curve x = sin(yπ )3 . Show that E is a Jordan set.

Exercise 27.10.4. Let E ⊂ R2 be the


√ subset of [0, 1]2 above the curve y = x4 ,
3
and to the right of the curve y = x. Show that E is a Jordan set.
Chapter 28

Change-of-variables Theorem

28.1 The Change-of-variables Theorem

Theorem 28.1.1 (Change-of-variables theorem). Let Ω ⊂ Rd be open.


Let E ⊂ Ω be a Jordan set such that also its closure Ē ⊂ Ω. Let Φ :
Ω → Rd be continuously differentiable and injective. Assume that
also the inverse function Φ−1 is differentiable.
Assume that f : Φ( E) → R is integrable on Φ( E) and assume that
f ◦ Φ is integrable on E.
Then Z Z
f ( x )dx = f (Φ(y)) det([ DΦ]y ) dy.
Φ( E) E

We will review in this chapter a few standard transformations, and there-


fore standard applications of the change-of-variables theorem. These in-
volve transformations from and to polar, cylindrical and spherical coordi-
nates. It is important to know these transformations, and the correspond-
ing determinants of the Jacobians, by heart.

341
CHAPTER 28. CHANGE-OF-VARIABLES THEOREM 342

28.2 Polar coordinates


The transformation is given by the function

Φpol : (0, ∞) × (0, 2π ) → R2

defined by
Φpol (r, φ) := (r cos φ, r sin φ)

Here,
det [ DΦpol ](r,φ) = r

In many situations in which one would like to change the polar coordi-
nates to compute an integral, the Change-of-variables Theorem does not
directly apply. The transformation from polar to Cartesian coordinates is
so nice, however, that one can obtain a statement that can be applied more
conveniently.
A subset E ⊂ (0, ∞) × (0, 2π ) is a Jordan set if and only if Φpol ( E) is a
Jordan set. Moreover, if one of these holds, a function f : R2 → R is
Riemann integrable on Φpol ( E) if and only if f ◦ Φpol is Riemann integrable
on E and Z Z
f ( x )dx = f (Φpol (r, φ))rdrdφ
Φpol ( E) E

28.3 Cylindrical coordinates


The transformation is given by the function

Φcyl : (0, ∞) × (0, 2π ) × R → R3

defined by
Φcyl (r, φ, z) := (r cos φ, r sin φ, z)

Here,
det [ DΦcyl ](r,φ,z) = r

A subset E ⊂ (0, ∞) × (0, 2π ) × R is a Jordan set if and only if Φcyl ( E) is


a Jordan set. Moreover, if one of these holds, a function f : R3 → R is
CHAPTER 28. CHANGE-OF-VARIABLES THEOREM 343

Riemann integrable on Φcyl ( E) if and only if f ◦ Φcyl is Riemann integrable


on E and Z Z
f ( x )dx = f (Φcyl (r, φ, z))rdrdφdz
Φcyl ( E) E

28.4 Spherical coordinates


The transformation is given by the function
Φsph : (0, ∞) × (0, 2π ) × (0, π ) → R3
given by
Φsph (ρ, φ, θ ) := (ρ cos φ sin θ, ρ sin φ sin θ, ρ cos θ )

Here,
det [ DΦsph ](ρ,φ,θ ) = ρ2 sin θ.

A subset E ⊂ (0, ∞) × (0, 2π ) × (0, π ) is a Jordan set if and only if Φsph ( E)


is a Jordan set. Moreover, if one of these holds, a function f : R3 → R
is Riemann integrable on Φsph ( E) if and only if f ◦ Φsph ( E) is Riemann
integrable on E and
Z Z
f ( x )dx = f (Φsph (r, φ, θ ))r2 sin θdrdφdθ.
Φsph ( E) E

28.5 Exercises
Exercise 28.5.1. Determine
Z π/2 Z y
sin(y)
q dxdy
0 0 2
4 − sin ( x )

Exercise 28.5.2. Let K be the following subset of R3


K := {( x, y, z) ∈ R3 | 0 ≤ z ≤ 1, x2 + y2 < z}
Determine Z
exp(−( x2 + y2 ))dxdydz.
K
CHAPTER 28. CHANGE-OF-VARIABLES THEOREM 344

Exercise 28.5.3. Let K be the subset of those points in R3 that are inside
the ball around the origin of radius 4 but outside the cylinder around the
z-axis of radius 1, i.e.

K := B(0, 4) \ {( x, y, z) ∈ R3 | x2 + y2 < 1}

Determine the volume of K.

Exercise 28.5.4. The center of mass of a Jordan set K ⊂ Rd is defined as


Z 
1
Z Z
c.m.(K ) := x1 dx, x2 dx, · · · , xd dx
Vol(K ) K K K

Let E ⊂ Rd be a Jordan set. Let Φ : Rd → Rd be linear and invertible.


Show that the center of mass of Φ( E) is equal to

c.m.(Φ( E)) = Φ(c.m.( E)).

Exercise 28.5.5. i. Let e1 , . . . , ed be the standard basis in Rd . Consider


the simplex

S := { x ∈ Rd | x1 + · · · + xd ≤ 1 and for all i ∈ {1, . . . , d}, 0 ≤ xi ≤ 1, }

Compute the volume of S .

ii. Now let v1 , . . . , vd be a basis in Rd . Let M denote the d × d matrix


for which the column vectors are the vectors v1 , . . . , vd . Let the map
ι : Rd → Rd be given by

ι ( x ) = x1 v1 + · · · + x d v d .

Show that
1
Vol(ι(S)) = | det M|.
d!
Appendix A

Best practices

i. Always start by writing down what is given and what you need to
show.

ii. To directly prove a statement

for all a ∈ A,
(. . . )

you first introduce a ∈ A by writing

Let a ∈ A.

and then you continue to prove (. . . ).

iii. To directly prove a statement

there exists a ∈ A,
(. . . )

you make a choice for a and write

345
APPENDIX A. BEST PRACTICES 346

Choose a := . . .

and then you continue to prove (. . . ).

iv. If you need to show a statement of the form

if A then B

start with writing

Assume A.

and continue showing B.

v. At the end of a proof in analysis, you often need to show an (in)equality.


You can do so by chaining several (in)equalities together, of course
making sure that they all hold.

vi. (Backwards reasoning.) Sometimes you need to show a statement B,


but it would follow directly from another statement A. In that case
you could write

It suffices to show A.

and continue showing A.

vii. (Forward reasoning.) Perhaps more often than backwards reasoning


you would like to apply forward reasoning. You usually do so in the
middle of a proof, stating a new fact that can be derived from your
earlier conclusions.

It holds that A.

If you want to use a theorem (or lemma, proposition etc.), first ex-
plicitly check all the assumptions, then afterwards you can use the
conclusion of the theorem. A template is:
APPENDIX A. BEST PRACTICES 347

... check conditions of theorem ...


Therefore, by Theorem (insert reference to theorem)
it holds that (theorem conclusion).

viii. If you know that a for-all statement such as

for all a ∈ A,
(A.0.1)
(. . . )

holds, you can use it as follows

Choose a := . . . in ( A.0.1). Then (. . . ).

ix. If you know that a there-exists statement such as

there exists a ∈ A,
(A.0.2)
(. . . )

holds, you can use it as follows

Obtain an a ∈ A according to ( A.0.2)

or as

Obtain an a ∈ A such that (. . . ) according to ( A.0.2)

or just

Obtain such an a ∈ A.

x. To prove a statement
APPENDIX A. BEST PRACTICES 348

(. . . )

by contradiction, you can use the following template:

We argue by contradiction. Suppose ¬(. . . ).


. . . derivation that leads to a contradiction . . .
Contradiction. We conclude that (. . . ) holds.

xi. Sometimes you need to make a case distinction: you might for in-
stance want to argue differently if a real number is strictly negative
or positive. A template for a case distinction is as follows.

Case A.
. . . proof in Case A. . .
Case B.
. . . proof in Case B. . .
Case C.
. . . proof in Case C. . .
etc. . .

Make sure to really cover all possible cases.

xii. You can use natural induction to show a statement


for all n ∈ N,
P(n)

where P(n) is a statement depending on n. The template is as fol-


lows:
APPENDIX A. BEST PRACTICES 349

We use induction on n ∈ N.
We first show the base case, i.e. that P(0) holds.
... insert here a proof of P(0) ...
We now show the induction step.
Let k ∈ N and assume that P(k ) holds.
We need to show that P(k + 1) holds.
... insert here a proof of P(k + 1) ...

xiii. Especially when constructing subsequences, we will often need in-


ductive definitions of sequences. If X is a set, we might want to in-
ductively define a sequence f : N → X. We can use the following
template for this.

We will inductively define a sequence f : N → X.


. . . possible auxiliary derivations . . .
Define f (0) := . . .

Let k ∈ N and assume f (0), . . . , f (k ) are defined .


. . . possible auxiliary derivations . . .
Define f (k + 1) := . . .

xiv. Make sure that every variable that you are using is defined. In par-
ticular:

• After writing a sentence:

for all e > 0, ...

the variable e is not defined, and you cannot refer to it.


• After writing the sentence

there exists N ∈ N, ...


APPENDIX A. BEST PRACTICES 350

the variable N is not defined, and you cannot refer to it. To use
it in the rest of a proof, you can follow up with

Choose such an N ∈ N.

See also item (ix) of this best-practices list.

xv. Beware that ( A =⇒ B) means “if A then B”. Importantly, it does


not mean ”A holds therefore B holds”. Because this goes wrong very
often, I recommend to use

if . . . then . . .

in your proof rather than implication symbols ⇒ and ⇔.

xvi. If the statement that you need to show is an “if and only if” state-
ment, show the “if” and “only if” statements separately.

xvii. Indicate whether a statement that you write down is a statement


you want to show, or whether it is a statement that you assume, or
whether it is a consequence of your earlier derivations.

xviii. Care about your presentation of the proof.

xix. At several times, remind the reader (and yourself) of what you need
to show at that stage.

xx. If you hand-write your proof, make sure that you use your best
handwriting.
Bibliography

[A+ 15] Stephen Abbott et al. Understanding analysis. Springer, 2015.

[Col12] Rodney Coleman. Calculus on normed vector spaces. Springer, 2012.

351

You might also like