0% found this document useful (0 votes)
22 views1,539 pages

Libro Di Testo

Uploaded by

fedemoli05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views1,539 pages

Libro Di Testo

Uploaded by

fedemoli05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1539

Principles of Mathematics for Economics

Simone Cerreia-Vioglio
Department of Decision Sciences and IGIER, Universita Bocconi

Massimo Marinacci
AXA-Bocconi Chair, Department of Decision Sciences and IGIER, Universita Bocconi

Elena Vigna
Dipartimento Esomas, Universita di Torino and Collegio Carlo Alberto

August 2022
Ai nostri cari
Contents

Preface xxiii

I Structures 1

1 Sets and numbers: an intuitive introduction (sdoganato) 3


1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Properties of the operations . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 A naive remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Structure of the integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.1 Divisors and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.2 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Order structure of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.1 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.2 Supremum and in mum . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.3 Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Powers and logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.1 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.2 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6 Numbers, ngers and circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.7 The extended real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8 The birth of the deductive method: an intellectual revolution . . . . . . . . . 39

2 Cartesian structure (sdoganato) 43


2.1 Cartesian products and Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Operations in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Order structure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Pareto optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.2 Maxima and maximals . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5.3 Pareto frontier and Edgeworth box . . . . . . . . . . . . . . . . . . . . 56

iii
iv CONTENTS

3 Linear structure (sdoganato) 61


3.1 Vector subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Linear independence and dependence . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Generated subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.6 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.7 Post scriptum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Euclidean structure (sdoganato) 77


4.1 Absolute value and norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.2 Absolute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.1 Normalized vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 Topological structure (sdoganato) 89


5.1 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Taxonomy of the points of Rn with respect to a set . . . . . . . . . . . . . . . 94
5.3.1 Interior, exterior and boundary points . . . . . . . . . . . . . . . . . . 94
5.3.2 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.4 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5 Set stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7 Closure and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6 Functions (sdoganato) 109


6.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2.2 Intertemporal choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.1 Preimages and level curves . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.2 Algebra of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Classes of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.4.1 Injective, surjective, and bijective functions . . . . . . . . . . . . . . . 130
6.4.2 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4.3 Bounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.4 Monotone functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.4.5 Concave and convex functions: a preview . . . . . . . . . . . . . . . . 149
6.4.6 Separable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Elementary functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
CONTENTS v

6.5.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152


6.5.2 Exponential and logarithmic functions . . . . . . . . . . . . . . . . . . 153
6.5.3 Trigonometric and periodic functions . . . . . . . . . . . . . . . . . . . 155
6.5.4 Rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.6 Maximizers and minimizers: a preview . . . . . . . . . . . . . . . . . . . . . . 162
6.7 Domains and restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.8 Grand nale: preferences and utility . . . . . . . . . . . . . . . . . . . . . . . 166
6.8.1 Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.8.2 Paretian utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.8.3 Existence and lexicographic preference . . . . . . . . . . . . . . . . . . 170

7 Cardinality (sdoganato) 173


7.1 Actual in nite and potential in nite . . . . . . . . . . . . . . . . . . . . . . . 173
7.2 Bijective functions and cardinality . . . . . . . . . . . . . . . . . . . . . . . . 174
7.3 A Pandora's box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.4 Coda: what is a natural number? . . . . . . . . . . . . . . . . . . . . . . . . . 185

II Discrete analysis 189

8 Sequences (sdogonato) 191


8.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.2 The space of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.3 Application: intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . 199
8.4 Application: prices and expectations . . . . . . . . . . . . . . . . . . . . . . . 200
8.4.1 A market for a good . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.4.2 Delays in production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.4.3 Expectation formation . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.5 Images and classes of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.6 Eventually: a key adverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.7 Limits: introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.8 Limits and asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.8.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.8.2 Limits from above and from below . . . . . . . . . . . . . . . . . . . . 211
8.8.3 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.8.4 Topology of R and a general de nition of limit . . . . . . . . . . . . . 213
8.9 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.9.1 Monotonicity and convergence . . . . . . . . . . . . . . . . . . . . . . 217
8.9.2 Bolzano-Weierstrass' Theorem . . . . . . . . . . . . . . . . . . . . . . 218
8.10 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.10.1 The (many) certainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.10.2 Some common limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.10.3 Indeterminate forms for the limits . . . . . . . . . . . . . . . . . . . . 226
8.10.4 Summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.10.5 How many indeterminate forms are there? . . . . . . . . . . . . . . . . 229
8.11 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
vi CONTENTS

8.11.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230


8.11.2 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.11.3 Root criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.12 The Cauchy condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.13 Napier's constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.14 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 242
8.14.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.14.2 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
8.14.3 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.14.4 Characterization and decay . . . . . . . . . . . . . . . . . . . . . . . . 250
8.14.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
8.14.6 Scales of in nities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
8.14.7 The De Moivre-Stirling formula . . . . . . . . . . . . . . . . . . . . . . 252
8.14.8 Distribution of prime numbers . . . . . . . . . . . . . . . . . . . . . . 253
8.15 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.15.1 Big-O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.15.2 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.16 Sequences in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

9 Series (sdoganato) 261


9.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.1.1 Three classic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
9.1.2 Sub specie aeternitatis: in nite horizon . . . . . . . . . . . . . . . . . 266
9.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
9.3 Series with positive terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.3.1 Comparison criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.3.2 Ratio criterion: prelude . . . . . . . . . . . . . . . . . . . . . . . . . . 273
9.3.3 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.3.4 A rst series expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.4 Series with terms of any sign . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
9.4.1 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
9.4.2 Hic sunt leones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

10 Discrete calculus (sdoganato) 285


10.1 Preamble: limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
10.1.1 Limit superior and inferior . . . . . . . . . . . . . . . . . . . . . . . . 285
10.1.2 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
10.2 Discrete calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10.2.1 Finite di erences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10.2.2 Newton di erence formula . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.2.3 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
10.3 Convergence in mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.3.1 In medio stat virtus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.3.2 Creatio ex nihilo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
10.4 Convergence criteria for series . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
10.5 Multiplication of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
CONTENTS vii

10.5.1 Convolutions of sequences . . . . . . . . . . . . . . . . . . . . . . . . . 308


10.5.2 Cauchy products of series . . . . . . . . . . . . . . . . . . . . . . . . . 310
10.6 In nitely often: a second key adverb . . . . . . . . . . . . . . . . . . . . . . . 312
10.6.1 Tail bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
10.6.2 A tale of two adverbs . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.6.3 Illustration: asymptotic partial sums . . . . . . . . . . . . . . . . . . . 316
10.6.4 Illustration: prime gaps . . . . . . . . . . . . . . . . . . . . . . . . . . 317

11 Power series 321


11.1 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
11.2 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.2.1 De nition and properties . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.2.2 Solving recurrences via generating functions . . . . . . . . . . . . . . . 331
11.3 Discounted convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
11.3.1 Abel convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
11.3.2 In nite patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
11.4 Recursive patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

III Continuity 353

12 Limits of functions 355


12.1 Introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.2 Functions of a single variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
12.2.1 Two-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
12.2.2 One-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
12.2.3 Relations between one-sided and two-sided limits . . . . . . . . . . . . 369
12.2.4 Grand nale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
12.3 Functions of several variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
12.3.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
12.3.2 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
12.3.3 Sequential characterization . . . . . . . . . . . . . . . . . . . . . . . . 375
12.4 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
12.5 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
12.5.1 Indeterminacies for limits . . . . . . . . . . . . . . . . . . . . . . . . . 381
12.6 Common limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.7 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 386
12.7.1 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
12.7.2 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 389
12.7.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
12.7.4 The usual bestiary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

13 Continuous functions 393


13.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
13.2 Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
13.3 Operations and composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
viii CONTENTS

13.4 Zeros and equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407


13.4.1 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.4.2 Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
13.5 Weierstrass' Theorem: a preview . . . . . . . . . . . . . . . . . . . . . . . . . 411
13.6 Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.1 The theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
13.6.2 Some consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
13.6.3 Multivariable version . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
13.7 Limits and continuity of operators . . . . . . . . . . . . . . . . . . . . . . . . 421
13.8 Infracoda topologica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
13.9 Coda continua . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
13.10Ultracoda continua . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
13.10.1 Stone-Weierstrass' Theorem . . . . . . . . . . . . . . . . . . . . . . . . 428
13.10.2 Bernstein polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
13.10.3 Bernstein's version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

14 Equations and xed points (sdoganato) 437


14.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
14.1.1 Poincare-Miranda's Theorem . . . . . . . . . . . . . . . . . . . . . . . 437
14.1.2 Fixed points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
14.1.3 Aggregate market analysis via xed points . . . . . . . . . . . . . . . . 442
14.2 Asymptotic behavior of recurrences . . . . . . . . . . . . . . . . . . . . . . . . 445
14.2.1 A general de nition for recurrences . . . . . . . . . . . . . . . . . . . . 445
14.2.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
14.2.3 Price dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
14.2.4 Heron's method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454

IV Linear and nonlinear analysis 457

15 Linear functions and operators 459


15.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
15.1.1 De nition and rst properties . . . . . . . . . . . . . . . . . . . . . . . 459
15.1.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
15.1.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
15.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
15.2.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
15.2.2 Operations on matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 467
15.2.3 A rst taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
15.2.4 Product of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
15.3 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
15.3.1 De nition and rst properties . . . . . . . . . . . . . . . . . . . . . . . 473
15.3.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
15.3.3 Matrices and operations . . . . . . . . . . . . . . . . . . . . . . . . . . 478
15.4 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
15.4.1 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
CONTENTS ix

15.4.2 Rank of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483


15.4.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
15.4.4 Gaussian elimination procedure . . . . . . . . . . . . . . . . . . . . . . 489
15.5 Invertible operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
15.5.1 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
15.5.2 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
15.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
15.6.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
15.6.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
15.6.3 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
15.6.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
15.6.5 Laplace's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
15.6.6 Inverses and determinants . . . . . . . . . . . . . . . . . . . . . . . . . 509
15.6.7 Ranks and determinants: Kronecker's Algorithm . . . . . . . . . . . . 512
15.6.8 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
15.7 Square linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
15.8 General linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
15.8.1 Kronecker-Capelli's Theorem . . . . . . . . . . . . . . . . . . . . . . . 519
15.8.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
15.8.3 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
15.9 Solving systems: Cramer's method . . . . . . . . . . . . . . . . . . . . . . . . 523
15.10Coda media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
15.10.1 The notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
15.10.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
15.10.3 Average functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
15.10.4 Arithmetic average functions . . . . . . . . . . . . . . . . . . . . . . . 533
15.11Ultracoda: Hahn-Banach et similia . . . . . . . . . . . . . . . . . . . . . . . . 535

16 Convexity and a nity 543


16.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
16.2 The skeleton of convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
16.2.1 Convex envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
16.2.2 Extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
16.3 A ne sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
16.4 A ne independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
16.5 Simplices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
16.6 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560

17 Concave functions 565


17.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
17.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
17.2.1 Concave functions and convex sets . . . . . . . . . . . . . . . . . . . . 573
17.2.2 Jensen's inequality and continuity . . . . . . . . . . . . . . . . . . . . 577
17.3 Quasi-concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
17.3.1 De nition and basic notions . . . . . . . . . . . . . . . . . . . . . . . . 581
17.3.2 Convexity of indi erence curves . . . . . . . . . . . . . . . . . . . . . . 584
x CONTENTS

17.3.3 Transformations, cardinality and ordinality . . . . . . . . . . . . . . . 585


17.3.4 Multivariable transformations . . . . . . . . . . . . . . . . . . . . . . . 587
17.4 Diversi cation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
17.5 Grand nale: Cauchy's equation . . . . . . . . . . . . . . . . . . . . . . . . . 591
17.5.1 The basic equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
17.5.2 Remarkable variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
17.5.3 Continuous compounding . . . . . . . . . . . . . . . . . . . . . . . . . 594
17.5.4 Additive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

18 Homogeneous functions 597


18.1 Preamble: cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
18.2 Homogeneity and returns to scale . . . . . . . . . . . . . . . . . . . . . . . . . 598
18.2.1 Homogeneous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 598
18.2.2 Average functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
18.2.3 Homogeneity and concavity: superlinear functions . . . . . . . . . . . 602
18.2.4 Homogeneity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . 604
18.3 Homotheticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
18.3.1 Semicones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
18.3.2 Homotheticity and utility . . . . . . . . . . . . . . . . . . . . . . . . . 608

19 Lipschitz functions 611


19.1 Global control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
19.2 Local control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
19.3 Translation invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616

20 Supermodular functions 621


20.1 Joins and meets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
20.2 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
20.3 Supermodular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
20.4 Functions with increasing cross di erences . . . . . . . . . . . . . . . . . . . . 626
20.4.1 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
20.4.2 Increasing cross di erences and complementarity . . . . . . . . . . . . 627
20.5 Supermodularity and concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 631
20.6 Log-convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

21 Correspondences 635
21.1 A set-theoretic notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
21.2 Back to Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
21.3 Hemicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
21.4 Addition and scalar multiplication of sets . . . . . . . . . . . . . . . . . . . . 642
21.5 Combining correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644

V Optima 647

22 Optimization problems 649


22.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
CONTENTS xi

22.1.1 Beginner's luck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654


22.1.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
22.1.3 Cogito ergo solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
22.1.4 Consumption and production . . . . . . . . . . . . . . . . . . . . . . . 666
22.1.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
22.2 Existence: Weierstrass' Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 672
22.3 Existence: Tonelli's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
22.3.1 Coercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
22.3.2 Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
22.3.3 Supercoercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
22.4 Separation theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
22.5 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
22.6 Concavity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 688
22.6.1 Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
22.6.2 Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
22.7 A nity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
22.7.1 Quasi-a ne objective functions . . . . . . . . . . . . . . . . . . . . . . 692
22.7.2 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
22.8 Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
22.8.1 Optimal bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
22.8.2 Demand function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
22.9 Equilibrium analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
22.9.1 Exchange economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
22.9.2 Invisible hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
22.10Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
22.10.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
22.10.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
22.11Operator optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
22.11.1 Operator optimization problems . . . . . . . . . . . . . . . . . . . . . 709
22.11.2 Planner's problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
22.12Coda: cuneiform functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
22.13Ultracoda: no illusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713

23 Semicontinuous optimization 715


23.1 Semicontinuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
23.1.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
23.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
23.2 The (almost) ultimate Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
23.3 The ordinal Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
23.4 Asymptotic analysis: beyond compactness . . . . . . . . . . . . . . . . . . . . 724
23.4.1 Recession cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
23.4.2 Recession cones of functions . . . . . . . . . . . . . . . . . . . . . . . . 727
23.4.3 Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
xii CONTENTS

24 Projections and approximations 737


24.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
24.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
24.3 The ultimate Riesz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
24.4 Least squares and projections . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
24.5 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
24.6 A nance illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
24.6.1 Portfolios and contingent claims . . . . . . . . . . . . . . . . . . . . . 746
24.6.2 Market value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
24.6.3 Law of one price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
24.6.4 Pricing rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
24.6.5 Pricing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
24.6.6 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
24.7 Coda monotona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
24.7.1 Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
24.7.2 Positive projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
24.7.3 The ultimate Riesz-Markov . . . . . . . . . . . . . . . . . . . . . . . . 756

25 Forms and spectra 761


25.1 Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
25.1.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
25.1.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
25.2 Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
25.2.1 Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
25.2.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
25.2.3 Ubi minor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779

VI Di erential calculus 785

26 Derivatives 787
26.1 Marginal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
26.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
26.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
26.4 Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
26.5 One-sided derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
26.6 Derivability and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
26.7 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . . 800
26.8 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
26.9 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
26.10Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
26.11Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
26.12Di erentiability and linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
26.12.1 Di erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
26.12.2 Di erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 814
26.12.3 Di erentiability and continuity . . . . . . . . . . . . . . . . . . . . . . 816
CONTENTS xiii

26.12.4 A terminological turning point . . . . . . . . . . . . . . . . . . . . . . 816


26.13Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
26.13.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
26.13.2 Higher order chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . 820
26.14Discrete limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
26.15Coda: Weierstrass' monster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824

27 Di erential calculus in several variables 827


27.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
27.1.1 The notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
27.1.2 A continuity failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
27.1.3 Derivative operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
27.1.4 Ceteris paribus: marginal analysis . . . . . . . . . . . . . . . . . . . . 835
27.2 Di erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
27.2.1 Di erentiability and partial derivability . . . . . . . . . . . . . . . . . 840
27.2.2 Total di erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
27.2.3 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
27.3 Partial derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . 847
27.4 Taking stock: the natural domain of analysis . . . . . . . . . . . . . . . . . . 851
27.5 Incremental and approximation viewpoints . . . . . . . . . . . . . . . . . . . 851
27.5.1 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
27.5.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
27.5.3 The two viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
27.6 Di erential of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
27.6.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
27.6.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
27.6.3 Proof of the chain rule (Theorem 1296) . . . . . . . . . . . . . . . . . 865

28 Di erential methods 867


28.1 Extremal and critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
28.1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
28.1.2 Fermat's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
28.1.3 Unconstrained optima: incipit . . . . . . . . . . . . . . . . . . . . . . . 873
28.2 Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874
28.3 Continuity properties of the derivative . . . . . . . . . . . . . . . . . . . . . . 878
28.4 Monotonicity and di erentiability . . . . . . . . . . . . . . . . . . . . . . . . . 879
28.5 Su cient conditions for local extremal points . . . . . . . . . . . . . . . . . . 883
28.5.1 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
28.5.2 Searching local extremal points via rst and second-order conditions . 886
28.5.3 Searching global extremal points via rst and second-order conditions 888
28.5.4 A false start: global extremal points . . . . . . . . . . . . . . . . . . . 890
28.6 De l'Hospital's Theorem and rule . . . . . . . . . . . . . . . . . . . . . . . . . 891
28.6.1 Indeterminate forms 0=0 and 1=1 . . . . . . . . . . . . . . . . . . . . 891
28.6.2 Other hospitalized indeterminacies . . . . . . . . . . . . . . . . . . . . 894
xiv CONTENTS

29 Approximation 897
29.1 Taylor's polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 897
29.1.1 Polynomial expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 897
29.1.2 Taylor and Peano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900
29.1.3 Taylor and Lagrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906
29.2 Omnibus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
29.2.1 Omnibus proposition for local extremal points . . . . . . . . . . . . . 907
29.2.2 Omnibus procedure of search of local extremal points . . . . . . . . . 910
29.3 Multivariable Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 911
29.3.1 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
29.3.2 Second-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 916
29.3.3 Multivariable unconstrained optima . . . . . . . . . . . . . . . . . . . 922

30 Analytic functions 923


30.1 A calculus paradise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923
30.2 Asymptotic scales and expansions . . . . . . . . . . . . . . . . . . . . . . . . . 924
30.3 Analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
30.3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
30.3.2 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932
30.3.3 Analytic failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
30.3.4 Analyticity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
30.3.5 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
30.4 Coda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
30.4.1 Hille-Taylor's formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
30.4.2 Borel's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943

31 Concavity and di erentiability 945


31.1 Scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
31.1.1 Decreasing marginal e ects . . . . . . . . . . . . . . . . . . . . . . . . 945
31.1.2 Chords and tangents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
31.1.3 Concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
31.1.4 Degree of concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
31.2 Intermezzo: inner monotone operators . . . . . . . . . . . . . . . . . . . . . . 967
31.2.1 De nite matrices revisited . . . . . . . . . . . . . . . . . . . . . . . . . 967
31.2.2 Monotone operators and the law of demand . . . . . . . . . . . . . . . 968
31.3 Multivariable case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972
31.3.1 Derivability and di erentiability . . . . . . . . . . . . . . . . . . . . . 972
31.3.2 A key inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
31.3.3 Concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
31.4 Ultramodular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
31.5 Global optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
31.5.1 Su ciency of the rst-order condition . . . . . . . . . . . . . . . . . . 984
31.5.2 A deeper result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
31.5.3 A fundamental logarithmic inequality . . . . . . . . . . . . . . . . . . 989
31.6 Coda: strong concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991
31.7 Ultracoda: projections on convex sets . . . . . . . . . . . . . . . . . . . . . . 993
CONTENTS xv

32 Convex Analysis 999


32.1 Superdi erentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
32.1.1 A useful surrogate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
32.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
32.1.3 Supercalculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008
32.2 Ordinal superdi erentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011
32.2.1 A quasi-concave notion . . . . . . . . . . . . . . . . . . . . . . . . . . 1011
32.2.2 Quasi-concavity criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 1015
32.2.3 A normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017
32.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1019
32.4 Inclusion equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1020
32.4.1 Inclusion equations and xed points . . . . . . . . . . . . . . . . . . . 1020
32.4.2 Aggregate market analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1021
32.4.3 Back to agents: exchange economy . . . . . . . . . . . . . . . . . . . . 1022
32.5 Coda: a linear algebra aggregation result . . . . . . . . . . . . . . . . . . . . 1024

33 Nonlinear Riesz's Theorems 1025


33.1 The ultimate Hahn-Banach's Theorem . . . . . . . . . . . . . . . . . . . . . . 1025
33.2 Representation of superlinear functions . . . . . . . . . . . . . . . . . . . . . . 1027
33.3 Modelling bid-ask spreads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030
33.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030
33.3.2 Market values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030
33.3.3 Law of one price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032
33.3.4 Pricing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033

34 Implicit functions 1037


34.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
34.2 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040
34.3 A local perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045
34.3.1 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 1045
34.3.2 Level curves and marginal rates . . . . . . . . . . . . . . . . . . . . . . 1051
34.3.3 Quadratic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055
34.3.4 Implicit functions of several variables . . . . . . . . . . . . . . . . . . . 1056
34.3.5 Implicit operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
34.4 A global perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
34.4.1 Preamble: projections and shadows . . . . . . . . . . . . . . . . . . . . 1061
34.4.2 Implicit functions and frames . . . . . . . . . . . . . . . . . . . . . . . 1063
34.4.3 Comparative statics I . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067
34.4.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1070
34.4.5 Comparative statics II . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071

35 Equations and inverse functions 1073


35.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073
35.2 Well-posed equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
35.3 Local analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080
35.3.1 A closer look at di eomorphisms . . . . . . . . . . . . . . . . . . . . . 1080
xvi CONTENTS

35.3.2 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1081


35.4 Global analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084
35.4.1 Topological prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084
35.4.2 Finitely many solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
35.4.3 Unique solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090
35.4.4 Global Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . 1093
35.5 Monotone equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095
35.6 Parametric equations and implicit functions . . . . . . . . . . . . . . . . . . . 1097
35.7 Coda: de consolatione topologiae . . . . . . . . . . . . . . . . . . . . . . . . . 1098
35.8 Ultracoda: equations in science . . . . . . . . . . . . . . . . . . . . . . . . . . 1099

36 Study of functions 1103


36.1 In ection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103
36.2 Asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105
36.3 Study of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1110
36.4 Bells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116

VII Di erential optimization 1121

37 Unconstrained optimization 1123


37.1 Unconstrained problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
37.2 Coercive problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
37.3 Concave problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
37.4 Relationship among problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
37.5 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1131
37.6 Optimization and equations: general least squares . . . . . . . . . . . . . . . 1134
37.7 Coda: computational issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135
37.7.1 Decision procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135
37.7.2 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
37.7.3 Maximizing sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 1140
37.7.4 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1142

38 Equality constraints 1145


38.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145
38.2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145
38.3 One constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146
38.3.1 A key lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146
38.3.2 Lagrange's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1150
38.3.3 A heuristic interpretation of the multiplier . . . . . . . . . . . . . . . . 1151
38.4 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1152
38.5 The consumer problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158
38.6 Cogito ergo solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162
38.7 Several constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162
CONTENTS xvii

39 Inequality constraints 1171


39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171
39.2 Resolution of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174
39.2.1 Kuhn-Tucker's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 1177
39.2.2 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . 1178
39.3 Cogito et solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1182
39.4 Concave optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1182
39.4.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1182
39.4.2 Kuhn-Tucker points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184
39.5 Appendix: proof of a key lemma . . . . . . . . . . . . . . . . . . . . . . . . . 1188

40 General constraints 1193


40.1 A general concave problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193
40.2 Black box optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194
40.2.1 Variational inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194
40.2.2 A general rst-order condition . . . . . . . . . . . . . . . . . . . . . . 1196
40.2.3 Divide et impera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
40.3 Opening the black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200
40.4 Dulcis in fundo: Multivariable Bolzano Theorem . . . . . . . . . . . . . . . . 1202

41 Parametric optimization problems 1205


41.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205
41.2 An illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207
41.3 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
41.4 Maximum Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211
41.5 Envelope theorems I: xed constraint . . . . . . . . . . . . . . . . . . . . . . . 1215
41.6 Envelope theorems II: variable constraint . . . . . . . . . . . . . . . . . . . . 1217
41.7 Marginal interpretation of multipliers . . . . . . . . . . . . . . . . . . . . . . . 1219
41.8 Monotone solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1220
41.9 Approximations: the Laplace method . . . . . . . . . . . . . . . . . . . . . . . 1223
41.9.1 Log-exponential and softmax functions . . . . . . . . . . . . . . . . . . 1223
41.9.2 The Laplace method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225

42 Interdependent optimization 1227


42.1 Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227
42.2 Nash equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232
42.3 Nash equilibria and saddle points . . . . . . . . . . . . . . . . . . . . . . . . . 1235
42.4 Nash equilibria on a simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236
42.5 Parametric interdependent optimization . . . . . . . . . . . . . . . . . . . . . 1237
42.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1239
42.6.1 Randomization in games and decisions . . . . . . . . . . . . . . . . . . 1239
42.6.2 Kuhn-Tucker's saddles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1242
42.6.3 Linear programming: duality . . . . . . . . . . . . . . . . . . . . . . . 1245
xviii CONTENTS

43 Variational inequality problems 1249


43.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1249
43.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1250
43.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253
43.4 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1257

VIII Integration 1259

44 The Riemann integral (sdoganato) 1261


44.1 The method of exhaustion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1261
44.2 Plurirectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262
44.3 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1264
44.3.1 Positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1264
44.3.2 General functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1270
44.3.3 Everything holds together . . . . . . . . . . . . . . . . . . . . . . . . . 1272
44.4 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276
44.5 Classes of integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1280
44.5.1 Step functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1280
44.5.2 Analytic and geometric approaches . . . . . . . . . . . . . . . . . . . . 1284
44.5.3 Continuous functions and monotone functions . . . . . . . . . . . . . . 1284
44.6 Properties of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287
44.6.1 Linearity and monotonicity . . . . . . . . . . . . . . . . . . . . . . . . 1287
44.6.2 Panini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293
44.7 Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296
44.7.1 Primitive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296
44.7.2 Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299
44.7.3 The First Fundamental Theorem of Calculus . . . . . . . . . . . . . . 1300
44.7.4 The Second Fundamental Theorem of Calculus . . . . . . . . . . . . . 1302
44.8 Properties of the inde nite integral . . . . . . . . . . . . . . . . . . . . . . . . 1306
44.9 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309

45 Improper Riemann integrals (sdoganato) 1313


45.1 Integration on the positive half-line . . . . . . . . . . . . . . . . . . . . . . . . 1313
45.2 Integration on the real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1317
45.3 Principal values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319
45.4 Properties and criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322
45.4.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322
45.4.2 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323
45.4.3 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326
45.5 Gauss integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327
45.6 Unbounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329
CONTENTS xix

46 Parametric Riemann integrals (sdoganato) 1331


46.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1331
46.2 Variability: Leibniz's rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334
46.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
46.4 Dirichlet integral: the tree and the forest . . . . . . . . . . . . . . . . . . . . . 1338

47 Stieltjes' integral 1341


47.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1342
47.2 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1343
47.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344
47.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349
47.5 Step integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1350
47.6 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1352
47.7 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1353
47.8 Modelling assets' gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354
47.9 Coda: beyond monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355
47.9.1 Functions of bounded variation . . . . . . . . . . . . . . . . . . . . . . 1355
47.9.2 A general Stieltjes integral . . . . . . . . . . . . . . . . . . . . . . . . . 1361
47.9.3 Variability and volatility . . . . . . . . . . . . . . . . . . . . . . . . . . 1363

48 Introductory probability theory 1369


48.1 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369
48.2 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1373
48.2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1373
48.2.2 Simple probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1378
48.2.3 A continuity property . . . . . . . . . . . . . . . . . . . . . . . . . . . 1381
48.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385
48.4 Expected values I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387
48.5 Euclidean twist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1389
48.6 Measures of variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1391
48.7 Intermezzo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396
48.8 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397
48.8.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397
48.8.2 Density functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1402
48.9 Expected values II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1404
48.10Moments and all that . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1408
48.10.1 Transformations of random variables . . . . . . . . . . . . . . . . . . . 1408
48.10.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1409
48.10.3 The problem of moments . . . . . . . . . . . . . . . . . . . . . . . . . 1412
48.10.4 Moment generating function . . . . . . . . . . . . . . . . . . . . . . . . 1413
48.11Coda oscura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417
48.11.1 Zero mysteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417
48.11.2 Probability of outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 1421
48.12Ultracoda: expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425
48.12.1 Expected utility criterion . . . . . . . . . . . . . . . . . . . . . . . . . 1425
48.12.2 Lotteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1426
xx CONTENTS

48.12.3 Expected utility of lotteries . . . . . . . . . . . . . . . . . . . . . . . . 1427

IX Appendices 1429

A Binary relations: modelling connections (sdoganato) 1431


A.1 De nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1431
A.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433
A.3 Equivalence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435

B Permutations (sdoganato) 1439


B.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439
B.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1440
B.3 Anagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1441
B.4 A set-theoretic angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1443
B.5 Newton's binomial formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1444

C Notions of trigonometry (sdoganato) 1447


C.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1447
C.2 Concerto d'archi (string concert) . . . . . . . . . . . . . . . . . . . . . . . . . 1449
C.3 Perpendicularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1454

D Elements of intuitive logic 1457


D.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457
D.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457
D.3 Logical equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1460
D.4 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1462
D.4.1 Logical consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1462
D.4.2 Theorems and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463
D.4.3 Direct proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1464
D.4.4 Reductio ad absurdum . . . . . . . . . . . . . . . . . . . . . . . . . . . 1465
D.4.5 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467
D.5 Deductive method in mathematics . . . . . . . . . . . . . . . . . . . . . . . . 1468
D.5.1 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468
D.5.2 Deductive method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468
D.5.3 A miniature mathematical theory . . . . . . . . . . . . . . . . . . . . . 1469
D.5.4 Interpretations and models . . . . . . . . . . . . . . . . . . . . . . . . 1470
D.6 Intermezzo: the logic of empirical scienti c theories . . . . . . . . . . . . . . . 1471
D.6.1 Empirical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1471
D.6.2 Logical atomism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1473
D.6.3 Logic of certainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477
D.7 Predicates and quanti ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1479
D.7.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1479
D.7.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1480
D.7.3 Example: linear dependence . . . . . . . . . . . . . . . . . . . . . . . . 1481
D.7.4 Example: negation of convergence . . . . . . . . . . . . . . . . . . . . 1481
D.7.5 Binary predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1482
CONTENTS xxi

D.7.6 A set-theoretic twist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1483

E Mathematical induction (sdoganato) 1485


E.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485
E.2 The harmonic Mengoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487

F Cast of characters 1489


xxii CONTENTS
Preface

Reason emerged around the sixth century BC in the Greek world, rst in Ionian and Italian
colonies { from Miletus to Samos, Croton and Elea { and then in mainland Greece, chie y
in Athens. Mathematics and the rational investigation of the empirical world were two
marvelous gifts of this momentous emergence, a turning point in mankind's intellectual
history that marked the beginning of Western thought.
Centuries later, the works of Galileo and Newton { inspired also by a rediscovered
Archimedes { set the foundations of modern science by combining these two gifts into the
mathematization of physics. Mathematics is the best support for any rational empirical
investigation, be it dealing with natural or social phenomena. It empowers theoretical rea-
soning by pursuing the logical implications of scholars' original insights, implications that
often go well beyond what was rst envisioned (even by the subtlest minds). Indeed, the
logical transparency of mathematics favors scholarly communication and the incremental
accumulation of knowledge within and across generations of scholars. As a result, disciplines
that have embraced mathematics built theoretical structures that are far more re ned (and
elegant) than what would have been possible otherwise.
Mathematics also permits the development of the quantitative features of a theoretical
model that make it empirically relevant. A purely literary argument, however subtle it may
be, can at best have qualitative empirical implications.
As empirical disciplines bene t from mathematics, so mathematics draws inspiration
and intellectual discipline from applications. A virtuous circle results. Since the time of
Galileo and Newton, the importance of mathematics in empirical disciplines has been steadily
growing and now goes well beyond physics. In particular, economics has been a major source
of motivation for the development of new mathematics, from game theory to optimization and
probability theory. In turn, mathematics gave economics a rigorous and powerful language
for articulating the theoretical and empirical implications of its models.
This book provides an introduction to the mathematics of economics and is primarily
addressed to undergraduate students in economics. We con ne ourselves to Rn and leave
the more abstract structures that characterize higher mathematics { such as vector and
metric spaces { to more advanced books (for instance, the excellent Ok, 2007). Within
these boundaries, however, we take a rigorous approach by proving and motivating results,
not shying away from the more subtle issues (often covered in \coda" sections that can be
skipped when reading the book for the rst time). Our assumption is that students are
intellectually curious and should be given a chance to fully understand a topic, even the
toughest one. This approach also has an educational value by helping students to master
analytical reasoning as well as to articulate and support arguments by relentlessly exploring
their (pleasant or unpleasant) implications.

xxiii
xxiv PREFACE

In the book there are no formal exercises. Yet, we left the proof of some results to the
reader and, in addition, some proofs have gaps, highlighted by a \why?". These can be seen
as useful exercises to test the reader understanding of the material presented.
During the journey of learning in which we embarked upon to write this book, we collected
several debts of gratitude to colleagues and students. In particular, we thank Gabriella
Chiomio and Claudio Mattalia, who thoroughly translated a rst version of the manuscript,
as well as Alexandra Fotiou, Giacomo Lanzani, Paolo Leonetti and Kelly Gail Strada for
excellent research assistance. We are grateful to Margherita Cigola, Satoshi Fukuda, Fabrizio
Iozzi, Guido Osimo, Lorenzo Peccati and Alberto Za aroni for their very useful comments
that helped us to improve the manuscript. We are much indebted to Massimiliano Amarante,
Pierpaolo Battigalli, Maristella Botticini, Erio Castagnoli (with whom this project started),
Pierre-Andre Chiappori, Larry Epstein, Paolo Ghirardato, Itzhak Gilboa, Lars Peter Hansen,
Peter Klibano , Fabio Maccheroni, Aldo Montesano, Luigi Montrucchio, Sujoy Mukerji, Aldo
Rustichini, Tom Sargent and David Schmeidler for the discussions that over the years shaped
our views on economics and mathematics. Needless to say, any error is ours but, hopefully,
se non e vero, e ben trovato.
Part I

Structures

1
Chapter 1

Sets and numbers: an intuitive


introduction (sdoganato)

1.1 Sets
A set is a collection of distinguishable objects. There are two ways to describe a set: by
listing directly its elements or by specifying a property that its elements have in common.
The second way is more common: for instance,

f11; 13; 17; 19; 23; 29g (1.1)

can be described as the set of the prime numbers between 10 and 30. The chairs of your
kitchen form a set of objects, the chairs, that have in common the property of being part
of your kitchen. The chairs of your bedroom form another set, as the letters of the Latin
alphabet form a set, distinct from the set of the letters of the Greek alphabet (and from the
set of chairs or from the set of numbers considered above).

Sets are usually denoted by capital letters: A, B, C and so on; their elements are denoted
by small letters: a, b, c and so on. To denote that an element a belongs to the set A we
write

a2A
where 2 is the symbol of belonging. Instead, to denote that an element a does not belong
to the set A we write a 2
= A.

O the record remark (O.R.) The concept of set, apparently introduced in 1847 by
Bernhard Bolzano, is for us a primitive concept, not de ned through other notions. Like
in Euclidean geometry, where points and lines are primitive concepts (with an intuitive
geometric meaning that readers may give them). H

1.1.1 Subsets
The chairs of your bedroom are a subset of the chairs of your home: a chair that belongs to
your bedroom also belongs to your home. In general, a set A is subset of a set B when all
the elements of A are also elements of B. In this case we write A B. Formally:

3
4CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

De nition 1 Given two sets A and B, we say that A is subset of B, in symbols A B, if


all the elements of A are also elements of B; that is, if x 2 A implies x 2 B.

For instance, denote by A the set (1.1), that is,

A = f11; 13; 17; 19; 23; 29g

and let
B = f11; 13; 15; 17; 19; 21; 23; 25; 27; 29g (1.2)
be the set of the odd numbers between 10 and 30. We have A B.

Graphically, the relation A B can be illustrated as

4 A⊆B
2

-2 A

-4
B

-6
-6 -4 -2 0 2 4 6

by using the so-called Venn diagrams to represent graphically the sets A and B (a simple,
yet e ective, way to visualize sets).

When we have both A B and B A { that is, when x 2 A if and only if x 2 B {


the two sets A and B are said to be equal; in symbols A = B. For example, let A be the
set of the solutions of the quadratic equation x2 3x + 2 = 0 and B the set formed by the
numbers 1 and 2. It is easy to see that A = B.
When A B and A 6= B, we write A B and say that A is a proper subset of B.
The sets A = fag that consist of a unique element are called singletons. They are a
peculiar, but altogether legitimate, class of sets.1

Nota Bene (N.B.) Though the two symbols 2 and are conceptually well distinct and
must not be confused, there exists an interesting relation between them. Indeed, the set
formed by a unique element a, i.e., the singleton fag, permits to establish the relation

a 2 A if and only if fag A

between 2 and . O
1
Note that a and fag are not the same thing; a is an element and fag is a set, even if formed by only one
element. For instance, the set A of the Nations of the Earth with the ag of only one color had (until 2011)
only one element, Libya, but it is not \the Libya": Tripoli is not the capital of A.
1.1. SETS 5

1.1.2 Operations
There are three basic operations among sets: union, intersection and di erence. As we will
see, they take any two sets and, starting from them, form a new set.

The rst operation that we consider is the intersection of two sets A and B. As the
term \intersection" suggests, with this operation we select all the elements that belong
simultaneously to both sets.

De nition 2 Given two sets A and B, their intersection A \ B is the set of all the elements
that belong to both A and B; that is, x 2 A \ B if x 2 A and x 2 B.

The operation can be illustrated graphically in the following way:

For example, let A be the set of left-handed and B the set of right-handed citizens of a
country. The intersection A \ B is the set of ambidextrous citizens. If, instead, A is the set
of gasoline cars and B the set of methane cars, the intersection A \ B is the set of bi-fuel
cars that run on both gasoline and methane.

It can happen that two sets have no elements in common. For example, let

C = f10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30g (1.3)

be the set of the even numbers between 10 and 30. It has no elements in common with the
set B in (1.2). In this case we talk of disjoint sets, with no elements in common. Such a
notion gives us the opportunity to introduce a fundamental set.

De nition 3 The empty set, denoted by ;, is the set without elements.

As a rst use of this notion, note that two sets A and B are disjoint when they have
empty intersection, that is, A \ B = ;. For example, for the sets B and C in (1.2) and (1.3),
we have B \ C = ;.
We write A 6= ; when the set A is not empty, that is, when it contains at least one
element. Conventionally, we regard the empty set as a subset of any set, that is, ; A for
every set A.

It is immediate that A \ B A and A \ B B. The next result is more subtle and


establishes a useful property that links and \.
6CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

Proposition 4 A \ B = A if and only if A B.

Proof \If". Let A B. We want to show that A \ B = A. To prove that two sets are
equal, we always need to prove separately the two opposite inclusions: here A \ B A and
A A \ B.
The inclusion A \ B A is easily proven to be true. Indeed, let x 2 A \ B.2 Then, by
de nition, x belongs to both A and B. In particular, x 2 A and this is enough to conclude
that A \ B A.
Let us prove the inclusion A A \ B. Let x 2 A. Since by hypothesis A B, each
element of A also belongs to B, it follows that x 2 B. Hence, x belongs to both A and B,
i.e., x 2 A \ B. This proves that A A \ B.
We have shown that both the inclusions A \ B A and A A \ B hold; we can therefore
conclude that A \ B = A, which completes the proof of the \if" part.

\Only if". Let A \ B = A. Take x 2 A. By hypothesis A \ B = A, so x 2 A \ B. In


particular, x then belongs to B, as claimed.

The next operation we consider is the union. Here again the term \union" already
suggests how in this operation all the elements of both sets are collected together.

De nition 5 Given two sets A and B, their union A [ B is the set of all the elements that
belong to A or to B; that is, x 2 A [ B if x 2 A or x 2 B.3

Note that an element can belong to both sets (unless they are disjoint). For example, if
A is again the set of the left-handed and B is the set of the right-handed citizens, the union
set contains all citizens with at least one hand. There are individuals (the ambidexters) who
belong to both sets.4
It is immediate to show that A A [ B and that B A [ B. It then follows that

A\B A[B

Graphically the union is represented as follows:

2
In proving an inclusion between sets, say C D, throughout the book we will tacitly assume that C 6= ;
because the inclusion is trivially true when C = ;. For this reason our inclusion proof will show that x 2 C
(i.e., C 6= ;) implies x 2 D.
3
The conjunction \or" has the inclusive sense of the Latin \vel" (x belongs to A or to B or to both) and
not the exclusive sense of \aut" (x belongs to either A or to B, but not to both). Indeed, Giuseppe Peano
gave the symbol [ the meaning \vel" when he rst introduced it, along with the intersection symbol \ and
the membership symbol " that he interpreted as the Latin \et" and \est", respectively (see the \signorum
tabula" in his 1889 Arithmetices principia, a seminal work on the foundations of mathematics).
4
The clause \with at least one hand", though needed, may seem pedantic, even tactless. The distinction
between being precise and pedantic is subtle and, ultimately, subjective. Experience may help to balance
rigor and readability. In any case, in mathematics loose ends have to be handled with care and, de nitely,
are not for beginners.
1.1. SETS 7

4 A ∪ B

-2 A
B
-4

-6
-2 0 2 4 6 8 10

The last operation that we consider is the di erence.

De nition 6 Given two sets A and B, their di erence A B is the set of all the elements
that belong to A, but not to B; that is, x 2 A B if both x 2 A and x 2
= B.

The set A B is, therefore, obtained by eliminating from A all the elements that belong
(also) to B.5 Graphically:

2 A-B

-1 B
A
-2

-3
-3 -2 -1 0 1 2 3 4 5

For example, let us go back to the sets A and B speci ed in (1.1) and (1.2). Their di erence

B A = f15; 21; 25; 27g

is the set of the non-prime odd numbers between 10 and 30.


Note that: (i) when A and B are disjoint, we have A B = A and B A = B, (ii) the
inclusion A B is equivalent to A B = ; since, by removing from A all the elements that
belong also to B, the set A is deprived of all its elements, that is, we remain with the empty
set.

In many applications there is a general set of reference, an all-inclusive set, of which


various subsets are considered. For example, for demographers this set can be the entire
5
The di erence A B is often denoted by AnB.
8CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

population of a country, of which they can consider various subsets according to the demo-
graphic properties that are of interest (for instance, age is a standard demographic variable
through which the population can be subdivided in subsets).
The general set of reference is called universal set or, more commonly, space. There
is no standard notation for this set (which is often clear from the context). We denote it
temporarily by S. Given any of its subsets A, the di erence S A is denoted by Ac and
is called the complement set, or simply the complement, of A. The di erence operation is
called complementation when it involves the universal set.

Example 7 If S is the set of all citizens of a country and A is the set of all citizens that
are at least 65 years old, the complement Ac is formed by all citizens that are (strictly) less
than 65 years old. N

It is immediate to verify that, for every set A, we have

A [ Ac = S and A \ Ac = ;

We also have:

Proposition 8 (Ac )c = A.

Proof As for Proposition 4, we have to verify an equality between sets. We thus have to
prove separately the two inclusions (Ac )c A and A (Ac )c . If a 2 (Ac )c , then a 2
= Ac and
c c c
therefore a 2 A. It follows that (A ) A. Vice versa, if a 2 A, then a 2
= A and therefore
c c c c
a 2 (A ) . We conclude that A (A ) .

Finally, we can easily prove that

A B = A \ Bc

Indeed, x 2 A = B, that is, x 2 A and x 2 B c .


B means that x 2 A and x 2

1.1.3 Properties of the operations


Proposition 9 The operations of union and intersection are:

(i) commutative: for any two sets A and B, we have A \ B = B \ A and A [ B = B [ A;

(ii) associative: for any three sets A, B and C, we have A [ (B [ C) = (A [ B) [ C and


A \ (B \ C) = (A \ B) \ C.

We leave to the reader the simple proof. Property (ii) permits to write A [ B [ C
and A \ B \ C and, therefore, to extend without ambiguity the operations of union and
intersection to an arbitrary ( nite) number of sets:
n
[ n
\
Ai and Ai
i=1 i=1
1.1. SETS 9

Example 10 In a successful police investigation in which each clue i identi es a set of


n
\ n
[
suspected individuals, the intersection Ai is the set of o enders, while the union Ai is
i=1 i=1
the overall set of individuals that have been investigated. N

It is possible to extend these operations to in nitely many sets. If A1 ; A2 ; :::An ; ::: is an


in nite collection of sets, the union
[1
An
n=1

is the set of the elements that belong at least to one of the An , that is,
1
[
An = fa : a 2 An for at least one index ng
n=1

The intersection
1
\
An
n=1

is the set of the elements that belong to every An , that is,


1
\
An = fa : a 2 An for every index ng
n=1

Example 11 Let An be the setTof the (positive) even numbers n. For example, A3 = f0; 2g
and A6 = f0; 2; 4; 6g. We have 1S1
n=1 An = f0g because 0 is the only even number such that
0 2 An for each n 1. Moreover, n=1 An is the set of all even numbers. N

We turn to the relations between the operations of intersection and union. Note the
symmetry between properties (1.4) and (1.5), in which \ and [ are exchanged.

Proposition 12 The operations of union and intersection are distributive: given any three
sets A, B and C, we have

A \ (B [ C) = (A \ B) [ (A \ C) (1.4)

and
A [ (B \ C) = (A [ B) \ (A [ C) : (1.5)

Proof We prove only (1.4) since (1.5) is similarly proved. We have to consider separately
the two inclusions A \ (B [ C) (A \ B) [ (A \ C) and (A \ B) [ (A \ C) A \ (B [ C).
If x 2 A \ (B [ C), then x 2 A and x 2 B [ C, i.e., x 2 B or x 2 C. It follows that x 2
A \ B or x 2 A \ C, i.e., x 2 (A \ B) [ (A \ C). Therefore, A \ (B [ C) (A \ B) [ (A \ C).
Vice versa, if x 2 (A \ B) [ (A \ C), then x 2 A \ B or x 2 A \ C, that is, x belongs to
A and to at least one of B and C. Therefore, x 2 A \ (B [ C). It follows that (A \ B) [
(A \ C) A \ (B [ C).

We now introduce a concept that plays an important role in many applications.


10CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

De nition 13 A family fA1 ; A2 ; : : : ; An g of subsets of a set A is a partition of A if the


subsets are pairwise
S disjoint, that is, Ai \ Aj = ; for every i 6= j, and if their union coincides
with A, that is, ni=1 Ai = A.

Example 14 Let A be the set of all citizens of a country. Its subsets A1 , A2 and A3
formed, respectively, by the citizens of school or pre-school age (from 0 to 17 years old), by
the citizens of working age (from 18 to 65 years old) and by the elders (from 66 years old
on) form a partition of the set A. Relatedly, age cohorts, which consist of citizens who have
the same age, form a partition of A. N

We conclude with the so-called De Morgan's laws for complementation: they illustrate
the relationship between the operations of intersection, union and complementation.

Proposition 15 Given two subsets A and B of a space S, we have (A [ B)c = Ac \ B c and


(A \ B)c = Ac [ B c .

Proof We prove only the rst law, leaving the second one to the reader. As usual, to prove
an equality between sets we have to consider separately the two inclusions that compose it.
(i) (A [ B)c Ac \ B c . If x 2 (A [ B)c , then x 2
= A [ B, that is, x belongs neither to A
nor to B. It follows that x belongs simultaneously to Ac and to B c and, therefore, to their
intersection.
(ii) Ac \ B c (A [ B)c . If x 2 Ac \ B c then x 2 = A and x 2= B; therefore, x does not
belong to their union.

De Morgan's laws show that, when considering complements, the operations [ and \
are, essentially, interchangeable. Often these laws are written in the equivalent form

A [ B = (Ac \ B c )c and A \ B = (Ac [ B c )c

More importantly, they hold for any collection of sets be that nite or not. For instance, for
a nite collection the last form becomes
n n
!c n n
!c
[ \ \ [
Ai = Aci and Ai = Aci
i=1 i=1 i=1 i=1

as the reader can easily check.

1.1.4 A naive remark


In this book we will usually de ne sets by means of the properties of their elements, a \naive"
notion of set that, however, is su cient for our purposes. The naivete of this approach is
highlighted by the classic paradoxes that, between the end of the nineteenth century and
early twentieth century, were discovered by Cesare Burali Forti and Bertrand Russell. These
paradoxes arise by considering sets of sets, that is, sets whose elements are sets themselves.
As in Burali Forti, using the naive notion of a set we de ne \the set of all sets", that is, the
set whose elements share the property of being sets. If such a universal set \U " existed, we
could also form the set fB : B U g that consists of U and all of its subsets. Yet, as it will
be shown later in Cantor's Theorem 285, such a set does not belong to U , which contradicts
1.2. NUMBERS 11

the supposed universality of U . Among the bizarre features of a universal set there is the
fact that it belongs to itself, i.e., U 2 U , a completely unintuitive property (as observed by
Russell, \the human race, for instance, is not a human").
As suggested by Russell, let us consider the set A formed by all sets that are not members
of themselves (e.g., the set of red oranges belongs to A because its elements are red oranges
and, obviously, none of them is the entire collection of all them). If A 2 = A, namely if A
does not belong to itself, then A 2 A because it is a set that satis es the property of not
belonging to itself. On the other hand, if A 2 A, namely if A contains itself, then A 2 = A
because, by de nition, the elements of A do not contain themselves. In conclusion, we reach
the absurdity A 2 = A if and only if A 2 A. It is the famous paradox of Russell.
These logical paradoxes, often called antinomies, can be addresses within a non-naive set
theory, in particular that of Zermelo-Fraenkel. In the practice of mathematics, all the more
in an introductory book, these foundational aspects can be safely ignored (their study would
require an ad hoc, highly non-trivial, course). But, it is important to be aware of these
paradoxes because the methods that have been developed to address them have actually
a ected the practice of mathematics, as well as that of the empirical sciences.

1.2 Numbers
To quantify the variables of interest in economic applications { for example, the prices and
quantities of goods traded in some market { we need an adequate set of numbers. This is
the topic of the present section.
The natural numbers
0; 1; 2; 3; :::
do not need any introduction; their set will be denoted by the symbol N.
The set N of natural numbers is closed with respect to the fundamental operations of
addition and multiplication:

(i) m + n 2 N when m; n 2 N;

(ii) m n 2 N when m; n 2 N.

On the contrary, N is not closed with respect to the fundamental operations of subtraction
and division: for example, neither 5 6 nor 5=6 are natural numbers. It is, therefore, clear
that N is inadequate as a set of numbers for economic applications: the budget of a company
is an obvious example in which the closure with respect to the subtraction is crucial {
otherwise, how can we quantify losses?6
The integer numbers
:::; 3; 2; 1; 0; 1; 2; 3; :::
form a rst extension, denoted by the symbol Z, of the set N. It leads to a set that is closed
with respect to addition and multiplication, as well as to subtraction. Indeed, by setting
m n = m + ( n),7 we have
6
Historically, negative numbers have often been viewed with suspicion. It is in economics, indeed, where
they have a most natural interpretation in terms of losses.
7
The di erence m n is simply the sum of m with the negative n of n (recall the notion of algebraic
sum).
12CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

(i) m n 2 Z when m; n 2 Z;

(ii) m n 2 Z when m; n 2 Z.

Formally, the set Z can be written in terms of N as

Z = fm n : m; n 2 Ng

Proposition 16 N Z.

Proof Let m 2 N. We have m = m 0 2 Z because 0 2 N.

We are left with a fundamental operation with respect to which Z is not closed: division.
For example, the unit fraction 1=3 is not an integer:8 if we want to divide 1 cake among 3
guests, how can we quantify their portions if only Z is available? To remedy this important
shortcoming of the integers, we need a further enlargement to the set of the rational numbers,
denoted by the symbol Q and given by
nm o
Q= : m; n 2 Z with n 6= 0
n
In words, the set of the rational numbers consists of all the fractions with integers in both
the numerator and the denominator (not equal to zero). So the name rational after \ratio".

Proposition 17 Z Q.

Proof Let m 2 Z. We have m = m=1 2 Q because 1 2 Z.

The set of rational numbers is closed with respect to all the four fundamental operations:9

(i) m n 2 Q when m; n 2 Q;

(ii) m n 2 Q when m; n 2 Q;

(iii) m=n 2 Q when m; n 2 Q with n 6= 0.

O.R. Rational numbers which are not periodic, so have a nite number of decimals, have
two decimal representations. For example, 1 = 0:9 because
1
0:9 = 3 0:3 = 3 =1
3
Similarly, 2:5 = 2:49, 51:2 = 51:19 and so on. On the contrary, periodic rational numbers
and irrational numbers have a unique decimal representation (which is in nite).
8
Unit fractions have 1 as their numerator, so have the form 1=n. They are the simplest kind of fraction
and historically played an important role because of their natural interpretation as \inverses of integers" (see,
e.g., Ritter, 2000, p. 128, on their use in ancient Egyptian mathematics).
9
The names of the four fundamental operations are addition, subtraction, multiplication and division,
while the names of their results are sum, di erence, product and quotient, respectively (the addition of 3 and
4 has 7 as sum, and so on).
1.2. NUMBERS 13

This is not a simple curiosity: if 0:9 were not equal to 1, we could state that 0:9 is the
number that immediately precedes 1 (without any other number in between), which would
violate a notable property of real numbers that we will see shortly in Proposition 19. H

The set of rational numbers seems, therefore, to have all that we need. Some simple
observations on multiplication, however, will bring us some surprising ndings. If q is a
rational number, the notation q n , with n 1, means

q q ::: q
| {z }
n times

with q 0 = 1 for every q 6= 0. The notation q n , called power of basis q and exponent n, per se
is just shorthand notation for the repeated multiplication of the same factor. Nevertheless,
given a rational q > 0, it is natural to consider the inverse path, that is, to determine the
1 p
positive \number", denoted by q n { or, equivalently, by n q { and called root of order n of
q, such that
1 n
qn =q
p
For example,10 25 = 5 because 52 = 25. To understand the importance of roots, we can
consider the following simple geometric gure:

p
By Pythagoras' Theorem, the length of the hypotenuse is 2. To quantify elementary
geometric entities, we thus need square roots. Here we have a, tragic to some, surprise.11
p
Theorem 18 2 2 = Q.
p
Proof p Suppose, by contradiction, that 2 2 Q. Then there exist m; n 2 Z such that
m=n = 2, and therefore
m 2
=2 (1.6)
n
We can assume that m=n is already reduced to its lowest terms, i.e., that m and n have no
factors in common.12 This means that m and n cannot both be even numbers (otherwise, 2
would be a common factor).
10 p p
The square root 2 q is simply denoted by q, omitting the index 2.
11
For the Pythagorean philosophy, in which the proportions (that is, the rational numbers) were central,
the discovery of the non-rationality of square roots was a traumatic event. We refer the curious reader to
Fritz (1945).
12
For example, 14=10 is not reduced to its lowest terms because the numerator and the denominator have
in common the factor 2. On the contrary, 7=5 is reduced to its lowest terms.
14CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

Formula (1.6) implies


m2 = 2n2 (1.7)
and, therefore, m2 is even. As the square of an odd number is odd, m is also even { otherwise,
if m were odd, then m2 would also be odd.13 Therefore, there exists an integer k 6= 0 such
that
m = 2k (1.8)
From (1.7) and (1.8) it follows that
n2 = 2k 2
Therefore n2 is even, and so n itself is even. In conclusion, both m and n are even, but this
contradicts
p the fact that m=n is reduced to its lowest terms. This contradiction proves that
22= Q.

This magni cent result is one of the great theorems of Greek mathematics. Proved
by the Pythagorean school between the VI and the V century B.C., it is the unexpected
outcome of the { prima facie innocuous { distinction between even and odd numbers that
the Pythagoreans were the rst to make. It represented a turning point in the early history
of mathematics that showed the fundamental role of abstract reasoning, which exposed the
logical inconsistency of a physical, granular, notion of line formed by material points. Indeed,
however small, these points would have a dimension that makes it possible to count them
and so express the ratio between the hypotenuse and the catheti as a rational number m=n.
Pythagoras' Theorem thus questioned the relations between geometry and the physical world
that originally motivated its study (at least under any kind of Atomism, back then advocated
by the Ionian school). The resulting intellectual turmoil paved the way to the notion of point
with no dimension, a purely theoretical construct central to Euclid's Elements that, indeed,
famously start by stating that \a point is that which has no part".
Leaving aside these philosophical aspects, further discussed at the end of the chapter,
here Pythagoras' Theorem shows the need for a further enlargement of the set of numbers,
closed under the square root operation, that permits to quantify basic geometric entities
(as well as basic economic variables, as it will be clear in the sequel). To introduce, at an
intuitive level, this nal enlargement,14 consider the real line:

It is easy to see how on this line we can represent the rational numbers:

The rational numbers do not exhaust, however, the real line. For example, also roots like
p
2, or other non-rational numbers, such as , must nd their representation on the real line:
13
If integer m is odd, we have m = 2n + 1 for some n 2 Z. So, the integer m2 = (2n + 1)2 = 4n2 + 4n + 1
is odd since the integer 4n2 + 4n is even as it is divisible by 2.
14
For a rigorous treatment we refer, for example, to the rst chapter of Rudin (1976).
1.2. NUMBERS 15

We denote by R the set of all the numbers that can be represented on the real line; they are
called real numbers.15
The set R has the following properties in terms of the fundamental operations (here a; b
and c are generic real numbers):

(i) a + b 2 R and a b 2 R;
(ii) a + b = b + a and a b = b a;
(iii) (a + b) + c = a + (b + c) and (a b) c = a (b c);
(iv) a + 0 = a and b 1 = b;
1
(v) a + ( a) = 0 and b b = 1 provided b 6= 0;
(vi) a (b + c) = a b + a c.

Clearly, Q R. But Q 6= R: there are many real numbers, called irrationals, that are
not rational. Many roots and the numbers and e are examples of irrational numbers. It
is actually possible to prove that most real numbers are irrational. Although a rigorous
treatment of this topic would take us too far, the next simple result is already a clear
indication of how rich the set of the irrational numbers is.

Proposition 19 Given any two real numbers a < b, there exists an irrational number c 2 R
such that a < c < b.

Proof For each natural n 2 N with n 1, let


p
2
cn = a +
n
We have cn > a for every n and
p
2
cn < b () n >
b a
p
Let therefore n 2 N be any natural number such that n > 2= (b a) > 0.16 We conclude
that ap< cn ; cn+1 < b. If cn is not rational, then the statement follows by setting c = cn .
Since 2 = (cn cn+1 ) (n + 1) n, if cn is rational, then cn+1 cannot be rational. By setting
c = cn+1 , the statement follows.

In conclusion, R is the set of numbers that we will consider in the rest of the book. It
turns out to be adequate for most economic applications.17
15
When real numbers are rigorously introduced, our intuitive approach relies upon a postulate (of continuity
of the real line) that can be traced back to Descartes. It asserts the existence of a one-to-correspondence (a
bijection) between real numbers and the points of the real line that preserves order and distance.
16
Such n exists because of the Archimedean property of the real numbers, which we will soon see in
Proposition 41.
17
An important further enlargement, which we do not consider, is the set C of complex numbers.
16CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

1.3 Structure of the integers


Let us now analyze some basic { yet not trivial { properties of integers. The main result we
will present is the Fundamental Theorem of Arithmetic, which shows the central role that
prime numbers play in the structure of the set of integers.

1.3.1 Divisors and algorithms


In this rst section we present some preliminary notions that will be needed for the next sec-
tion on prime numbers. In so doing we will encounter the all-important notion of algorithm.
We begin by introducing in a rigorous fashion some notions, the essence of which the
reader may have learned in elementary school. An integer n is divisible by an integer p 6= 0
if there is a third integer q such that n = pq. In symbols we write p j n, which is read as \p
divides n".

Example 20 The integer 6 is divisible by the integer 2, that is, 2 j 6, because the integer 3
is such that 6 = 2 3. Furthermore, 6 is divisible by 3, that is, 3 j 6 because the integer
2 is such that 6 = 2 3. N

In elementary school one learns how to divide two integers by using remainders and
quotients. For example, if n = 7 and m = 2, we have n = 3 2 + 1, with 3 as the quotient
and 1 as the remainder. The next simple result formalizes the above procedure and shows
that it holds for any pair of integers (something that young learners take for granted).

Proposition 21 Given any two integers m and n, with m strictly positive,18 there is one
and only one pair of integers q and r such that

n = qm + r

with 0 r < m.

Proof Two distinct properties are stated in the proposition: the existence of the pair (q; r)
and its uniqueness. Let us start by proving its existence. We rst consider the case n 0.
Consider the set A = fp 2 N : p n=mg. Since n 0, it is non-empty because it contains
at least the integer 0. Let q be the largest element of A. By de nition, qm n < (q + 1) m.
Setting r = n qm, we have

0 n qm = r < (q + 1) m qm = m

We have thus shown the existence of the desired pair (q; r) when n 0. If n < 0, then
n > 0 and so, by what has been just proved, there exist q; r 2 Z such that n = qm + r
and 0 r < m. Since r < m, if r > 0 then m > m r > 0. By setting q 0 = q 1 2 Z and
r0 = m r 2 Z, we have

n=( q 1) m r + m = q0 m + r0

with 0 r0 < m, proving the statement. If r = 0, then set q 0 = q and r0 = 0 in order to


get n = q 0 m + r0 .
18
An integer m is strictly positive when m > 0, that is, when m 1.
1.3. STRUCTURE OF THE INTEGERS 17

As to uniqueness, consider two di erent pairs (q 0 ; r0 ) and (q 00 ; r00 ) such that

n = q 0 m + r0 = q 00 m + r00 (1.9)

with 0 r0 ; r00 < m. We need to prove that they coincide, that is, q 0 = q 00 and r0 = r00 .
It is enough to show that r0 = r00 . In fact, by (1.9) this implies (q 00 q 0 ) m = r0 r00 = 0,
yielding that q 0 = q 00 because m > 0. By contradiction, assume that r0 6= r00 . Without loss
of generality, assume that r0 > r00 . By (1.9), we have (q 00 q 0 ) m = r0 r00 > 0. Since m > 0
and q 00 q 0 is an integer, this implies that q 00 q 0 > 0 and r0 r00 = (q 00 q 0 ) m m. At the
same time, since 0 r00 < r0 < m, we reach the contradiction m (q 00 q 0 ) m = r0 r00 < m.

Given two strictly positive integers m and n, their greatest common divisor, denoted by
gcd (m; n), is the largest divisor that both numbers share. The next result, which was proven
by Euclid in his Elements, shows exactly what was taken for granted in elementary school,
namely, that any pair of integers has a unique greatest common divisor.

Theorem 22 (Euclid) Any pair of strictly positive integers has one and only one greatest
common divisor.

Proof Like Proposition 21, this is also an existence and uniqueness result. Uniqueness is
obvious; let us prove existence. Let m and n be any two strictly positive integers. By
Proposition 21, there is a unique pair (q1 ; r1 ) such that

n = q 1 m + r1 (1.10)

with 0 r1 < m. If r1 = 0, then gcd (m; n) = m, and the proof is concluded. If r1 > 0, we
iterate the procedure by applying Proposition 21 to m. We thus have a unique pair (q2 ; r2 )
such that
m = q 2 r1 + r 2 (1.11)
where 0 r2 < r1 . If r2 = 0, then gcd (m; n) = r1 . Indeed, (1.11) implies r1 j m. Further-
more, by (1.10) and (1.11) we have
n q 1 m + r1 q 1 q 2 r1 + r 1
= = = q1 q2 + 1
r1 r1 r1
and so r1 j n. Thus, r1 is a divisor both for n and m. We now need to show that it is the
greatest of those divisors. Suppose p is a strictly positive integer such that p j m and p j n.
By de nition, there are two strictly positive integers a and b such that n = ap and m = bp.
We have
r1 n q1 m
0< = = a q1 b
p p
Hence r1 =p is a strictly positive integer, which implies that r1 p. To sum up, gcd (m; n) =
r1 , if r2 = 0. If this is the case, the proof is concluded.
If r2 > 0, we iterate the procedure once more by applying Proposition 21 to r2 . We thus
have a unique pair (q3 ; r3 ) such that

r 1 = q 3 r2 + r 3
18CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

where 0 r3 < r2 . If r3 = 0, by proceeding as above we can show that gcd (m; n) = r2 ,


and the proof is complete. If r3 > 0, we iterate the procedure. Iteration after iteration, a
strictly decreasing sequence of positive integers r1 > r2 > > rk is generated. A strictly
decreasing sequence of positive integers can only be nite: there is a k 1 such that rk = 0.
By proceeding as above we can show that gcd (m; n) = rk 1 , which completes the proof of
existence of gcd (m; n).

From a methodological standpoint, the above argument is a good example of a construc-


tive proof since it is based on an algorithm, known as Euclid's Algorithm, that determines
with a nite number of iterations the mathematical entity whose existence is stated { here,
the greatest common divisor. The notion of algorithm is of paramount importance because,
when available, it makes mathematical entities computable. In principle, an algorithm can be
automated by means of an appropriate computer program (for example, Euclid's Algorithm
permits to automate the search for the greatest common divisors).

Euclid's Algorithm is the rst algorithm we encounter and it is of such importance in


number theory that it deserves to be reviewed in greater detail. Given two strictly positive
integers m and n, the algorithm unfolds in the following k 1 steps:
Step 1 n = q1 m + r1
Step 2 m = q2 r1 + r2
Step 3 r1 = q2 r2 + r3

Step k rk 2 = q 2 rk 1 (that is, rk = 0)


The algorithm stops at step k when rk = 0. In this case gcd (m; n) = rk 1, as we saw in
the previous proof.

Example 23 Let us consider the strictly positive integers 3801 and 1708. Their greatest
common divisor is not apparent at rst sight. Fortunately, we can calculate it by means of
Euclid's Algorithm. We proceed as follows:
Step 1 3801 = 2 1708 + 385
Step 2 1708 = 4 385 + 168
Step 3 385 = 2 168 + 49
Step 4 168 = 3 49 + 21
Step 5 49 = 2 21 + 7
Step 6 21 = 3 7
In six steps we have found that gcd(3801; 1708) = 7. N

The quality of an algorithm depends on the number of steps, or iterations, that are
required to reach the solution. The fewer the iterations, the more powerful the algorithm is.
The following remarkable property { proven by Gabriel Lame in 1844 { holds for Euclid's
Algorithm.19
19
See Sierpinski (1988) p. 17 for a proof.
1.3. STRUCTURE OF THE INTEGERS 19

Theorem 24 (Lame) Given two strictly positive integers m and n, the number of iterations
needed for Euclid's Algorithm is no greater than ve times the number of digits of min fm; ng.

For example, if we go back to the numbers 3801 and 1708, the number of relevant digits
is 4. Lame's Theorem guarantees in advance that Euclid's Algorithm would have required
at most 20 iterations. It took us only 6 steps, but thanks to Lame's Theorem we already
knew, before starting, that it would not have taken too much e ort (and thus it was worth
giving it a shot without running the risk of getting stuck in a grueling number of iterations).

1.3.2 Prime numbers


Among the natural numbers, a prominent position is held by prime numbers, which the
reader has most likely encountered in secondary school

De nition 25 A natural number n 2 is said to be prime if it is divisible only by 1 and


itself.

A natural number which is not prime is called composite. Let us denote the set of prime
numbers by P. Obviously, P N and N P is the set of composite numbers. The reader can
easily verify that the following naturals

f2; 3; 5; 7; 11; 13; 17; 19; 23; 29g

are the rst ten prime numbers.


The importance of prime numbers becomes more apparent if we note how composite
numbers (strictly greater than 1) can be written as a product of primes. For example, the
composite number 12 can be written as

12 = 2 2 3 = 22 3

while the composite number 60 can be written as

60 = 2 2 3 5 = 22 3 5

In general, the prime factorization (or decomposition) of a composite number n > 1 can be
written as
n = pn1 1 pn2 2 : : : pnk k (1.12)
where pi 2 P and 0 6= ni 2 N for each i = 1; :::; k, with p1 < p2 < : : : < pk .

Example 26 (i) For n = 12 we have p1 = n1 = 2, p2 = 3 and n2 = 1; in this case k = 2.


(ii) For n = 60 we have p1 = n1 = 2, p2 = 3, n2 = 1, p3 = 5 and n3 = 1; in this case k = 3.
(iii) For n = 200 we have
200 = 23 52
and so p1 = 2, n1 = 3, p2 = 5 and n2 = 2; in this case k = 2. (iv) For n = 522 we have

522 = 2 32 29

and so p1 = 2, n1 = 1, p2 = 3, n2 = 2, p3 = 29 and n3 = 1; in this case k = 3. N


20CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

What we have just seen raises two questions: whether every natural number > 1 admits
a prime factorization (we have seen only a few examples up to now) and whether such
factorization is unique. The next result, the Fundamental Theorem of Arithmetic, addresses
both questions by showing that every integer admits one and only one prime factorization.
In other words, every integer can be expressed uniquely as a product of prime numbers.
Prime numbers are thus the \atoms" of N: they are \indivisible" { as they are divisible
only by 1 and themselves { and through them any other natural number can be expressed
uniquely. The importance of this result, which shows the centrality of prime numbers, can
be seen in its name. Its rst proof can be found in the famous Disquisitiones Arithmeticae,
published in 1801 by Carl Friederich Gauss, although Euclid was already aware of the result
in its essence.

Theorem 27 (Fundamental Theorem of Arithmetic) Any natural number n > 1 ad-


mits one and only one prime factorization as in (1.12).

Proof Let us start by showing the existence of this factorization. We will proceed by
contradiction. Suppose there are natural numbers > 1 that do not have a prime factorization
as in (1.12). Let n > 1 be the smallest among them. Obviously, n is a composite number.
There are then two natural numbers p and q such that n = pq with 1 < p; q < n. Since n
is the smallest number that does not admit a prime factorization, the numbers p and q do
admit such factorization. In particular, we can write
n0 n0 0
p = pn1 1 pn2 2 : : : pnk k and q = q1 1 q2 2 : : : qsns

Thus, we have
n0 n0 0
n = pq = pn1 1 pn2 2 : : : pnk k q1 1 q2 2 : : : qsns
By collecting the terms pi and qj appropriately, n can be rewritten as in (1.12). Hence, n
admits a prime factorization, which contradicts our assumptions on n, thus concluding the
proof of the existence.
Let us proceed by contradiction to prove uniqueness as well. Suppose that there are
natural numbers that admit more than one factorization. Let n > 1 be the smallest among
them: then n admits at least two di erent factorizations, so that we can write
n0 n0 0
n = pn1 1 pn2 2 : : : pnk k = q1 1 q2 2 : : : qsns

Since q1 is a divisor of n, it must be a divisor of at least one of the factors p1 < < pm .20
For example, let p1 be one such factor. Since both q1 and p1 are primes, we have that q1 = p1 .
Hence
n0 1 n0 0
pn1 1 1 pn2 2 : : : pnk k = q1 1 q2 2 : : : qsns < n
which contradicts the minimality of n, as the number pn1 1 1 pn2 2 : : : pnk k also admits multiple
factorizations. The contradiction proves the uniqueness of the prime factorization.

From a methodological viewpoint it must be noted that this proof of existence is carried
out by contradiction and, as such, cannot be constructive. Indeed, these proofs are based
20
This mathematical fact, although intuitive, requires a mathematical proof. This is indeed the content of
Euclid's Lemma, which we do not prove. This lemma permits to conclude that, if a prime p divides a product
of strictly positive integers, then it must divide at least one of them.
1.3. STRUCTURE OF THE INTEGERS 21

on the law of excluded middle { a property is either true or false (cf. Appendix D) {
and the truth of a statement is established by showing its non-falseness. This often allows
for such proofs to be short and elegant but, although logically air-tight,21 they are almost
metaphysical as they do not provide a procedure for constructing the mathematical entities
whose existence they establish (let alone an algorithm to compute them, when relevant).22
To sum up, we invite the reader to compare this proof of existence with the constructive
one provided for Theorem 22. This comparison should clarify the di erences between the two
fundamental types of proofs of existence, constructive/direct and non-constructive/indirect.

It is not a coincidence that the proof of the existence in the Fundamental Theorem of
Arithmetic is not constructive. Indeed, designing algorithms which allow us to factorize
a natural number n into prime numbers { the so-called factorization tests { is exceedingly
complex. After all, constructing algorithms which can assess whether n is prime or composite
{ the so-called primality tests { is already extremely cumbersome and it is to this day an
active research eld (so much so that an important result in this eld dates to 2002).23
To grasp the complexity of the problem it su ces to observe that, if n is composite, there
p p
are two natural numbers a; b > 1 such that n = ab. Hence, a n or b n (otherwise,
p
ab > n), so there is a divisor of n among the natural numbers between 1 and n. To verify
whether n is prime or composite, we can merely divide n by all natural numbers between 1
p
and n: if none of them is a divisor for n, we can safely conclude that n is a prime number,
p
or, if this is not the case, that n is composite. This procedure requires at most n steps.
With this in mind, suppose we want to test whether the number 10100 + 1 is prime or
composite p (it is a number with 101 digits, so it is big but not huge). The procedure requires
at most 10100 + 1 operations, that is, at most 1050 operations (approximately). Suppose we
have an extremely powerful computer which is able to carry out 1010 (ten billion) operations
per second. Since there are 31:536:000 seconds in a year, that is, approximately 3 107
seconds, our computer would be able to carry out approximately 3 107 1010 = 3 1017
operations in one year. To carry out the operations that our procedure might require, our
computer would need
1050 1
17
= 1033
3 10 3
years. We had better get started...

It should be noted that, if the prime factorization of two natural numbers n and m is
known, we can easily determine their greatest common divisor. For example, from

3801 = 3 7 181 and 1708 = 22 7 61

it easily follows that gcd (3801; 1708) = 7, which con rms the result of Euclid's Algorithm.
Given how di cult it is to factorize natural numbers, the observation is hardly useful from
a computational standpoint. Thus, it is a good idea to hold on to Euclid's Algorithm, which
21
Unless one rejects the law of excluded middle, as some eminent mathematicians have done (although it
constitutes a minority view and a very subtle methodological issue, the analysis of which is surely premature).
22
Enriques (1919) pp. 11-13 is an authoritative discussion of this issue.
23
One of the reasons why the study of factorization tests is an active research eld is that the di culty
in factorizing natural numbers is exploited by modern cryptography to build unbreakable codes (see Section
6.4).
22CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

thanks to Lame's Theorem is able to produce the greatest common divisors with reasonable
e ciency, without having to conduct any factorization.

But how many are there?


Given the importance of prime numbers, it comes naturally to ask oneself how many there
are. The next celebrated result of Euclid shows that these are in nitely many. After Theorem
18, it is the second remarkable gem of Greek mathematics we have the pleasure to meet in
these few pages.

Theorem 28 (Euclid) There are in nitely many prime numbers.

Proof The proof is carried out by contradiction. Suppose that there are only nitely many
prime numbers and denote them by p1 < p2 < : : : < pn . De ne

q = p 1 p2 : : : p n

and set m = q + 1. The natural number m is larger than any prime number, hence it is a
composite number. By the Fundamental Theorem of Arithmetic, it is divisible by at least
one of the prime numbers p1 , p2 , ..., pn . Let us denote this divisor by p. Both natural
numbers m and q are thus divisible by p. It follows that also their di erence, that is, the
natural number 1 = m q, is divisible by p, which is impossible since p > 1. Hence, the
assumption that there are nitely many prime numbers is false.

In conclusion, we have looked at some basic notions in number theory, the branch of
mathematics which deals with the properties of integers. It is one of the most fascinating
and complex elds of mathematics, and it bears incredibly deep results, often easy to state
but hard to prove. A classic example is the famous Fermat's Last Theorem, whose statement
is quite simple: if n 3, there cannot exist three strictly positive integers x, y and z such that
xn +y n = z n . Thanks to Pythagoras' Theorem we know that for n = 2 such triplets of integers
do exist (for example, 32 + 42 = 52 ); Fermat's Last Theorem states that n = 2 is indeed the
only case in which this remarkable property holds. Stated by Fermat, the theorem was rst
proven in 1994 by Andrew Wiles after more than three centuries of unfruitful attempts.

1.4 Order structure of R


We now turn our attention to the set R of the real numbers. An important property of R is
the possibility of ordering its elements through the inequality . The intuitive meaning of
this inequality is clear: given two real numbers a and b, we have a b when a is at least as
great as b.
Consider the following properties of the inequality :

(i) re exivity: a a;

(ii) antisymmetry: if a b and b a, then a = b;

(iii) transitivity: if a b and b c, then a c;


1.4. ORDER STRUCTURE OF R 23

(iv) completeness (or totality): for every pair a; b 2 R, we have a b or b a (or both);

(v) additive independence: if a b, then a + c b + c for every c 2 R.

(vi) multiplicative independence: if a b and c 2 R, then

ac bc if c > 0

ac = bc = 0 if c = 0

ac bc if c < 0

(vii) separation: given two sets of real numbers A and B, if a b for every a 2 A and
b 2 B, then there exists c 2 R such that a c b for every a 2 A and b 2 B.

The rst three properties have an obvious interpretation. Completeness guarantees that
any two real numbers can always be ordered. Additive independence ensures that the initial
ordering between two real numbers a and b is not altered by adding to both the same real
number c. Multiplicative independence considers, instead, the stability of such ordering with
respect to multiplication.
Finally, separation permits to separate two sets ordered by { that is, such that each
element of one of the two sets is greater than or equal to each element of the other one {
through a real number c, called separating element.24 Separation is a fundamental property
of \continuity" of the real numbers and it is what mainly distinguishes them from the rational
numbers (for which such property does not hold, as remarked in the last footnote) and makes
them the natural environment for mathematical analysis.

The strict form a > b of the \weak" inequality indicates that a is strictly greater than
b, i.e., a b and a 6= b. We have a > b if and only if b a, that is, the strict inequality
can be de ned as the negation of the weak inequality (of opposite direction). The reader
can verify that transitivity and independence (both additive and multiplicative) hold also
for the strict inequality >, while the other properties of the inequality do not hold for >.

The order structure, characterized by properties (i)-(vii), is fundamental in R. Before


starting its study, we introduce by means of and > some fundamental subsets of R:

(i) the closed bounded intervals [a; b] = fx 2 R : a x bg;

(ii) the open bounded intervals (a; b) = fx 2 R : a < x < bg;

(iii) the half-closed (or half-open) bounded intervals (a; b] = fx 2 R : a < x bg and [a; b) =
fx 2 R : a x < bg.

In these bounded intervals, the points a and b are called endpoints. Other important
intervals are:
24
The property p of separation holds also forpN and Z, but not for Q. For example, the sets of rationals
A = q 2 Q : q < 2 and B = q 2 Q : q > 2 do not have a rational separating element (as the reader
can verify in light of Theorem 18 and of what we will see in Section 1.4.3).
24CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

(iv) the unbounded intervals [a; 1) = fx 2 R : x ag and (a; 1) = fx 2 R : x > ag, and
their analogous ( 1; a] and ( 1; a).25 In particular, the positive half-line [0; 1) is
often denoted by R+ , while R++ denotes (0; 1), that is, the positive half-line without
the origin.

The use of the adjectives open, closed and unbounded will become clear in Chapter 5. To
ease notation, in the rest of the chapter (a; b) will denote both an open bounded interval and
the unbounded ones (a; 1), ( 1; b) and ( 1; 1) = R. Analogously, (a; b] and [a; b) will
denote both the half-closed bounded intervals and the unbounded ones ( 1; b] and [a; 1).

After all these examples, it is time to de ne formally what an interval is.

De nition 29 A subset I of R is an interval if, given any x; y 2 I, all points z 2 R such


that x z y belong to I.

In words, a subset of the real line is an interval when, taken any two of its points, all
points between them also belong to the set. It is easy to see that, indeed, all the previous
examples of intervals have this \betweenness" property. They represent all the forms that
subsets of the real line having this property can take.
Though de ned through an order property, intervals admit a simple algebraic character-
ization, as readers can check.

Proposition 30 A subset A of R is an interval if and only if, given any x; y 2 A, we have


x + (1 ) y 2 A for all 2 [0; 1].

1.4.1 Maxima and minima


De nition 31 Let A R be a non-empty set. A number h 2 R is called upper bound of A
if it is greater than or equal to each element of A, that is, if26

h x 8x 2 A

while it is called lower bound of A if it is smaller than or equal to each element of A, that
is, if
h x 8x 2 A

For example, if A = [0; 1], the number 3 is an upper bound and the number 1 is a lower
bound since 1 x 3 for every x 2 [0; 1]. In particular, the set of upper bounds of A is
the interval [1; 1) and the set of the lower bounds is the interval ( 1; 0].
We denote by A the set of upper bounds of A and by A the set of lower bounds. In
the example just seen, A = [1; 1) and A = ( 1; 0].

A few simple remarks. Let A be any set.


25
When there is not danger of confusion, we will write simply 1 instead of +1. The symbol 1, introduced
in mathematics by John Wallis in the seventeenth century, reminds a curve called lemniscate and a kind of
hat or of halo (symbol of force) put on the head of some tarot card gures: in any case, it is de nitely not a
attened 8.
26
The universal quanti er 8 reads \for every". Therefore, \8x 2 A" reads \for every element x that belongs
to the set A" (see Appendix D).
1.4. ORDER STRUCTURE OF R 25

(i) Upper and lower bounds do not necessarily belong to the set A: the upper bound 3
and the lower bound 1 for the set [0; 1] are an example of this.

(ii) Upper and lower bounds might not exist. For example, for the set of positive even
numbers
f0; 2; 4; 6; g (1.13)
there is no real number which is greater than all its elements: hence, this set does not
have upper bounds. Analogously, the set of negative even numbers

f0; 2; 4; 6; g (1.14)

has no lower bounds, while the set of integers Z is a simple example of a set without
upper and lower bounds.

(iii) If h is an upper bound, so is h0 > h; analogously, if h is a lower bound, so is h00 < h.


Therefore, if they exist, upper and lower bounds are not unique.

Through upper and lower bounds we can give a rst classi cation of sets of the real line.

De nition 32 A non-empty set A R is said to be:

(i) bounded ( from) above if it has an upper bound, that is, A 6= ;;

(ii) bounded ( from) below if it has a lower bound, that is, A 6= ;;

(iii) bounded if it is bounded both above and below.

For example, the closed interval [0; 1] is bounded, the set (1.13) of positive even numbers
is bounded below but not above (indeed, it has no upper bounds),27 while the set (1.14) of
the negative ones is bounded above but not below.
Note that this classi cation of sets is not exhaustive: there exist sets that do not fall
in any of the types (i)-(iii) of the previous de nition. For example, Z has neither an upper
bound nor a lower bound in R, and therefore it is not of any of the types (i)-(iii). Such sets
are called unbounded.

We now introduce a fundamental class of upper and lower bounds.

De nition 33 Given a non-empty set A R, an element x


^ of A is called maximum of A
if it is the greatest element of A, that is, if

x
^ x 8x 2 A

while it is called minimum of A if it is the smallest element of A, that is, if

x
^ x 8x 2 A
27
By using Proposition 41 below, the reader can formally prove that, indeed, the set of positive even
numbers is unbounded above.
26CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

The key feature of this de nition is the clause \element x


^ of A" that requires the max-
imum and minimum to belong to the set A at hand. It is immediate to see how maxima
and minima are, respectively, upper bounds and lower bounds. Indeed, they are nothing
but the upper bounds and lower bounds that belong to the set A. For this reason, maxima
and minima can be seen as the \best" among upper bounds and lower bounds. Many eco-
nomic applications are, indeed, based on the search of maxima or minima of suitable sets of
alternatives.

Example 34 The closed interval [0; 1] has minimum 0 and maximum 1. N

Unfortunately, maxima and minima are fragile notions: sets often do not admit them.

Example 35 The half-closed interval [0; 1) has minimum 0 but has no maximum. Indeed,
suppose by contradiction that there exists a maximum x ^ 2 [0; 1), so that x
^ x for every
x 2 [0; 1). Set
1 1
x
~= x
^+ 1
2 2
Since x^ < 1, we have x
^<x ~. But, it is obvious that x
~ 2 [0; 1), which contradicts the fact
that x
^ is maximum of [0; 1). N

By reasoning in a similar way, we see that:

(i) the half-closed interval (0; 1] has maximum 1, but it has no minimum;
(ii) the open interval (0; 1) has neither minimum nor maximum.

When they exist, maxima and minima are unique:

Proposition 36 A set A R has at most one maximum and one minimum.

Proof Let x^1 ; x


^2 2 A be two maxima of A. We show that x^1 = x^2 . Since x
^1 is a maximum,
we have x
^1 x for every x 2 A. In particular, since x
^2 2 A, we have x ^1 x ^2 . Analogously,
x
^2 x ^1 because also x^2 is a maximum. Therefore, x
^1 = x
^2 . In a similar way, we can prove
the uniqueness of the minimum.

The maximum of a set A is denoted by max A, and the minimum by min A. For example,
for A = [0; 1] we have max A = 1 and min A = 0.

1.4.2 Supremum and in mum


Since maxima and minima are key for applications (and not only there), their fragility is a
substantial problem. To mitigate it, we look for a \surrogate": a conceptually similar, but
less fragile, notion which is available also when maxima or minima are absent.
Let us consider rst maxima (that, as already mentioned, play a fundamental role in
economics). We begin by noting that the maximum, when it exists, is the smallest (least)
upper bound, that is,
max A = min A (1.15)
Indeed, let x
^ 2 A be the maximum of A. If h is an upper bound of A, we have h x
^, since
x
^ 2 A. On the other hand, x^ is also an upper bound, and we thus obtain (1.15).
1.4. ORDER STRUCTURE OF R 27

Example 37 The set of upper bounds of [0; 1] is the interval [1; 1). In this example, the
equality (1.15) takes the form max [0; 1] = min [1; 1). N

Thus, when it exists, the maximum is the smallest upper bound. But, the smallest upper
bound { that is, min A { might exist also when the maximum does not exist. For example,
consider A = [0; 1): the maximum does not exist, but the smallest upper bound exists and
is 1, i.e., min A = 1.
All of this suggests that the smallest upper bound is the surrogate for the maximum
which we are looking for. Indeed, in the example just seen, the point 1 is, in absence of a
maximum, its closest approximation.
Reasoning in a similar way, the greatest lower bound, i.e., max A , is the natural can-
didate to be the surrogate for the minimum when the latter does not exist. Motivated by
what we have just seen, we give the following de nition.

De nition 38 Given a non-empty set A R, the supremum of A is the least upper bound
of A, that is, min A , while the in mum is the greatest lower bound of A, that is, max A .

Thanks to Proposition 36, both the supremum and the in mum of A are unique, when
they exist. We denote them by sup A and inf A. For example, for A = (0; 1) we have
inf A = 0 and sup A = 1.
As already remarked, when inf A 2 A, it is the minimum of A, and when sup A 2 A, it
is the maximum of A.

Although suprema and in ma may exist when maxima and minima do not, they do not
always exist.

Example 39 Consider the set A of the even numbers in (1.13). In this case A = ; and so
A has no supremum. More generally, if A is not bounded above, we have A = ; and the
supremum does not exist. In a similar way, the sets that are not bounded below have no
in ma. N

To be a useful surrogate, suprema and in ma must exist for a large class of sets; other-
wise, if also their existence were problematic, they would be of little help as surrogates.28
Fortunately, the next important result shows that suprema and in ma do indeed exist for a
large class of sets (with sets of the kind seen in the last example being the only troublesome
ones).

Theorem 40 (Least Upper Bound Principle) Each non-empty set A R has supre-
mum if it is bounded above and in mum if it is bounded below.

An immediate consequence of this result is that bounded sets have both supremum and
in mum.

Proof We limit ourselves to prove the supremum part, the other part being similarly proved.
To say that A is bounded above means that it admits an upper bound, i.e., that A 6= ;.
Since a h for every a 2 A and every h 2 A , by the separation property there exists a
28
The utility of a surrogate depends on how well it approximates the original, as well as on its availability.
28CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

separating element c 2 R such that a c h for every a 2 A and every h 2 A . Since c a


for every a 2 A, we have that c is an upper bound of A, so that c 2 A . But, since c h
for every h 2 A , it follows that c = min A , that is, c = sup A. This proves the existence of
the supremum of A.

Except for the sets that are unbounded above, all the other sets in R admit supremum.
Analogously, except for the sets that are unbounded below, all the other sets in R have
in mum. Suprema and in ma are thus excellent surrogates that exist, and so help us, for a
large class of subsets of R.
We close with some notation. We write sup A = +1 (= 1) if a set A has no supremum
(in mum). Moreover, by convention, we set sup ; = 1 and inf ; = +1. This is motivated
by the fact that each real number can be viewed as both an upper and a lower bound of ;,
so sup ; = inf ; = inf R = 1 and inf ; = sup ; = sup R = + 1.

1.4.3 Density
The order structure is also useful to clarify the relations among the sets N, Z, Q and R. First
of all, we make rigorous a natural intuition: however great is a real number, there always
exists a greater natural number. This is the so-called Archimedean property of real numbers.

Proposition 41 For each real number a 2 R, there exists a natural number n 2 N such that
n a.

Proof By contradiction, assume that there exists a 2 R such that a n for all n 2 N.
By the Least Upper Bound Principle, sup N exists and belongs to R. Recall that, by the
de nition of sup,
sup N n 8n 2 N (1.16)
At the same time, again by the de nition of sup, we have sup N 1 < n for some n 2 N
(otherwise, sup N 1 would be an upper bound of N, thus violating the fact that sup N is
the least of these upper bounds). We conclude that sup N < n + 1 2 N, which contradicts
(1.16).

There is a fundamental di erence between the structures of N and Z, on the one side,
and of Q and R, on the other side. If we take an integer, we can talk in a natural way of
predecessor and successor: if m 2 Z, its predecessor is the integer m 1, while its successor
is the integer m + 1 (for example, the predecessor of 317 is 316 and its successor is 318). In
other words, Z has a discrete \rhythm".
In contrast, we cannot talk of predecessors and successors in Q or in R. Consider rst
Q. Given a rational number q = m=n, let q 0 = m0 =n0 be any rational such that q 0 > q. Set

1 0 1
q 00 = q + q
2 2
The number q 00 is rational since

1 m0 1 m 1 m0 n + mn0
q 00 = + =
2 n0 2 n 2 nn0
1.4. ORDER STRUCTURE OF R 29

and one has


q < q 00 < q 0 (1.17)
Therefore, there is no smallest rational number greater than q and no greatest rational
number smaller than q 0 . Rational numbers, hence, do not admit predecessors and successors.
In a similar way, given any two real numbers a < b there exists a real number c such that
a < c < b. Indeed,
1 1
a< a+ b<b
2 2
Real numbers as well, therefore, do not admit predecessors and successors. The rhythm of
both rational and real numbers is \tight", without discrete interruptions (given by intervals).
Such property of Q and R is called density. Unlike N and Z, which are discrete sets, Q and
R are dense sets.29

We conclude with an important density relationship between Q and R. We already


observed how most real numbers are not rational. Nevertheless, rational numbers are a
\dense" { therefore signi cant { subset of the real numbers: between any two real numbers
we can always \insert" a rational number, as we show next.

Proposition 42 Given any two real numbers a < b, there exists a rational number q 2 Q
such that a < q < b.

This property can be stated by saying that Q is dense in R. In the proof of this result
we use the notion of integer part [a] of a real number a 2 p
R, which is the greatest integer
n 2 Z such that n a. For example, [ ] = 3, [5=2] = 2, 2 = 1, [ ] = 4 and so on.
The reader can verify that
[a + 1] = [a] + 1 (1.18)
since, for each n 2 Z, we have n a if and only if n + 1 a + 1. Moreover, [a] < a when
a2= Z.

Proof Let a; b 2 R, with a < b. For simplicity, we distinguish three cases.

Case 1: Let a + 1 = b. If a 2 Q the result follows from (1.17). Let a 2


= Q, and therefore
a+12 = Q. We have
[a] a < [a] + 1 = [a + 1] < a + 1 (1.19)
So, q = [a] + 1 is the rational number we were looking for.

Case 2: Let b a > 1, i.e., a < a + 1 < b. From Case 1 it follows that there exists q 2 Q
such that a < q < a + 1 < b.

Case 3: Let b a < 1. By the Archimedean property of real numbers, there exists 0 6= n 2 N
such that
1
n
b a
29
In his famous argument against plurality, Zeno of Elea remarks that a \plurality" is in nite because \...
there will always be other things between the things that are, and yet others between those others." (trans.
Raven; cf. Vlastos, 1996, pp. 241-248). Zeno thus identi es density as the characterizing property of an
in nite collection. With a (twenty ve centuries) hindsight, we can say that he is neglecting the integers.
Yet, it is stunning how he was able to identify a key property of in nite sets.
30CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

So, nb na = n (b a) 1. Then, for what we have just seen in cases 1 and 2, there exists
q 2 Q such that na < q < nb. Therefore a < q=n < b, which completes the proof because
q=n 2 Q.

1.5 Powers and logarithms


1.5.1 Powers
1
Given n 2 N, we have already recalled the meaning of q n with q 2 Q and of q n with
1
0 < q 2 Q. In a similar way we de ne an with a 2 R and a n with 0 < a 2 R. More generally,
we set
1 m 1
a n = n and a n = (am ) n
a
for m; n 2 N and 0 < a 2 R. We have, therefore, de ned the power ar with real positive
base and rational exponent. We now want to extend this notion to the case ax with x 2 R,
i.e., with real exponent. Before doing this, we make two important observations.

(i) We have de ned ar only for a > 0 to avoid dangerous and embarrassing
q misunderstand-
3 2 3 p
ings. Think, for example, of ( 5) 2 . It could be rewritten as ( 5) = 2 125 or as
p
2 3
5 , which do not exist among the real numbers. But, it could also be written
q
3 6 p
as ( 5) = ( 5) which, in turn, can be expressed as either 4 ( 5)6 = 4 15; 625, or
2 4
p
4 6
5 . The former exists and is approximately equal to 11:180339, but the latter
does not exist.
p 1
(ii) Let us consider the root a = a 2 . From p high school we know that each positive number
has two algebraic roots, for example 9 = 3. The unique positive value of the root
is called, instead, arithmetical root. For example, 3 and 3 are the two algebraic roots
of 9, while 3 is its unique arithmetical root. In what follows the (even order) roots will
always be in the arithmetical sense (therefore, with a unique value). It is, by the way,
the standard convention: for example, in the classic solution formula
p
b b2 4ac
x=
2a
of the quadratic equation ax2 + bx + c = 0, the root is in the arithmetical sense (this
is why we need to write ).

We now extend the notion of power to the case ax , with 0 < a 2 R and x 2 R. Un-
fortunately, the details of this extension are tedious, so we limit ourselves to saying that, if
a > 1, the power ax is the supremum of the set of all the values aq when the exponent q
varies among the rational numbers such that q x. Formally,

ax = sup faq : q x with q 2 Qg (1.20)

In a similar way we de ne ax for 0 < a < 1. We have the following properties that, by (1.20),
follow from the analogous properties that hold when the exponent is rational.
1.5. POWERS AND LOGARITHMS 31

Lemma 43 Let a; b > 0 and x; y 2 R. We have ax > 0 for every x 2 R. Moreover:

(i) ax ay = ax+y and ax =ay = ax y;

(ii) (ax )y = axy ;

(iii) ax bx = (ab)x and ax =bx = (a=b)x ;

(iv) if x 6= y then ax 6= ay ; in particular, if x > y then

ax > ay if a > 1

ax < ay if a < 1

ax = ay = 1 if a = 1

The most important base a is Napier's constant e, which will be introduced in Chapter
8. As we will see, the power ex has truly remarkable { almost magical {properties.
Note that point (ii) of the lemma implies, inter alia, that
y
ax = by =) a = b x (1.21)
y y 3
for all a; b > 0 and x; y 2 R. Indeed, (b x )x = b x x = by . For instance, a2 = b3 implies a = b 2 ,
5
while a 3 = b5 implies a = b 3 . p m
A nal remark. Though sometimes we write n am instead of a n , one should not forget
that it is this latter notation which is best suited to carry out operations on powers, as
Lemma 43 just showed. So much p that Newton, in a letter sent to Leibniz in June 1676,30
p p 3 1 1 3
wrote that \instead of a, 3 a, a5 , etc. I write a 2 , a 3 , a 5 , and instead of 1=a, 1=a2 , 1=a3 ,
I write a 1 , a 2 , a 3 ."

1.5.2 Logarithms
The operations of addition and multiplication are commutative: a + b = b + a and ab = ba.
Therefore, they have only one inverse operation, respectively subtraction and division:

(i) if a + b = c, then b = c a and a = c b.

(ii) if ab = c, then b = c=a and a = c=b, with a; b 6= 0.

The power operation ab , with a > 0, is not commutative: ab might well be di erent from
ba .
Therefore, it has two distinct inverse operations.
Let ab = c. The rst inverse operation { given c and b, nd out a { is called root with
index b of c: p
a = b c = c1=b
The second one { given c and a, nd out b { is called logarithm with base a of c:

b = loga c
30
See Struik (2014) p. 286.
32CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

Note that, together with a > 0 and c > 0, one must also have a 6= 1 because 1b = c is
impossible except when c = 1.

The logarithm is a fundamental notion, introduced in 1614 by John Napier, ubiquitous in


mathematics and in its applications. As we have just seen, it is a simple notion: the number
b = loga c is nothing but the exponent that must be given to a in order to get c, that is,

aloga c = c

The properties of the logarithms derive easily from the properties of the powers established
in Lemma 43.

Lemma 44 Let a; c; d > 0, with a 6= 1. Then:

(i) loga cd = loga c + loga d;

(ii) loga (c=d) = loga c loga d;

(iii) loga ck = k loga c for every k 2 R;

(iv) logak c = k 1 log


ac for every 0 6= k 2 R.

Proof (i) Let ax = c, ay = d and az = cd. Since az = cd = ax ay = ax+y , by Lemma 43-(iv)


it follows that z = x + y. (ii) The proof is similar to the previous one. (iii) Let b = loga ck .
b
Then, ab = ck and so by (1.21) we have c = a k , which implies b=k = loga c. We conclude
b
that loga ck = b = k loga c.31 (iv) Let ak = c. Then akb = c, so kb = loga c. In turn, this
implies b = k 1 loga c.

The key property of the logarithm is to transform the product of two numbers in a sum
of two other numbers, that is, property (i) above. Sums are much easier to handle than
products, so the importance of logarithms also computationally (till the age of computers,
tables of logarithms were a most important aid to perform computations). To emphasize this
key property of logarithms, denote a strictly positive real number by an upper case letter
and its logarithm by the corresponding lower case letter; e.g., C = loga c. Then, we can
summarize property (i) as:
c d !C +D
The importance of this transformation can be hardly overestimated.32

A simple formula permits a change of base.

Lemma 45 Let a; b; c > 0, with a 6= 1. Then


logb c
loga c =
logb a
31
For example, loga x2 = 2 loga x for x > 0. Note that loga x2 exists for each x 6= 0, while 2 loga x exists
only for x > 0.
32
Napier's entitled his 1614 work Miri ci logarithmorum canonis descriptio, that is, \A description of the
wonderful law of logarithms". He was not exaggerating (the importance of logarithms was very soon realized).
1.6. NUMBERS, FINGERS AND CIRCUITS 33

Proof Let ax = c, by = c and bz = a. We have ax = (bz )x = bzx = c = by and therefore


zx = y , that is, x = y=z.

Thanks to this change of base formula, it is possible to take as base of the logarithms
always the same number, say 10, because
log10 c
loga c =
log10 a
As for the powers ax , also for the logarithms the most common base is Napier's constant
e. In this case we simply write
log x
instead of loge x. Because of its importance, log x is called the natural logarithm of x, which
leads to the notation ln x sometimes used in place of log x.

The next result shows the close connections between logarithms and powers, which can
be actually viewed as inverse notions.

Proposition 46 Given a > 0, a 6= 1, we have

loga ax = x 8x 2 R

and
aloga x = x 8x > 0

We leave the simple proof to the readers. To check their understanding of the material
of this section, they may also want to verify that

bloga c = cloga b

for all strictly positive numbers a 6= 1, b and c.

1.6 Numbers, ngers and circuits


The most natural way to write numbers makes use of the \decimal notation". Ten symbols
have been chosen,
0; 1; 2; 3; 4; 5; 6; 7; 8; 9 (1.22)
called digits. Using positional notation, any natural number can be written by means of
digits which represent, from right to left respectively, units, tens, hundreds, thousands, etc.
For example, 4357 means 4 thousands, 3 hundreds, 5 tens and 7 units. The natural
numbers are thus expressed by powers of 10, each of which causes a digit to be added:
writing 4357 is the abbreviation of

4 103 + 3 102 + 5 101 + 7 100

To use positional notation, it is fundamental to adopt the 0 to signal an empty slot: for
example, when writing 4057 the zero signals the absence of the hundreds, that is,

4 103 + 0 102 + 5 101 + 7 100


34CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

Decimals are represented in a completely analogous fashion through the powers of 1=10 =
10 1 : for example 0:501625 is the abbreviation of

1 2 3 4 5 6
5 10 + 0 10 + 1 10 + 6 10 + 2 10 + 5 10

The choice of decimal notation is due to the mere fact that we have ten ngers, but
obviously is not the only possible one. Some Native American tribes used to count on their
hands using the eight spaces between their ngers rather than the ten ngers themselves.
They would have chosen only 8 digits, say

0; 1; 2; 3; 4; 5; 6; 7

and they would have articulated the integers along the powers of 8, that is 8, 64, 512, 4096,
. . . They would have written our decimal number 4357 as

1 4096 + 0 512 + 4 64 + 0 8 + 5 = 1 84 + 0 83 + 4 82 + 0 81 + 5 80 = 10405

and the decimal 0:501625 as

1 2
4 0:125 + 1 0:0015625 = 4 8 +1 8 = 0:41
In general, given a base b and a set of digits

Cb = fc0 ; c1 ; :::; cb 1g

used to represent the integers between 0 and b 1, every natural number n is written in the
base b as
dk dk 1 d1 d0
where k is an appropriate natural number and

n = d k bk + d k 1b
k 1
+ + d1 b + d0

with di 2 Cb for each i = 0; :::; k.


For example, let us consider the duodecimal base, with digits

0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |; •

We have used the symbols | and • for the two additional digits that we need compared to
the decimal notation. The duodecimal number

9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2

can be converted to decimal notation as

9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2


= 9 124 + 10 123 + 0 122 + 11 12 + 2
= 188630
1.6. NUMBERS, FINGERS AND CIRCUITS 35

using the conversion table

Duod. 0 1 2 3 4 5 6 7 8 9 | •
Dec. 0 1 2 3 4 5 6 7 8 9 10 11

One can note that the duodecimal notation 9|0•2 requires fewer digits than the decimal
188630, that is, ve instead of six. On the other hand, the duodecimal notation requires 12
symbols to be used as digits, instead of 10. It is a typical trade o one faces in choosing the
base in which to represent numbers: larger bases make it possible to represent numbers with
fewer digits, but require a large set of digits. The solution to the trade o , and the resulting
choice of base, depends on the characteristics of the application of interest.
For example, in electronic engineering it is important to have a set of digits which is as
simple as possible, with only two elements, as computers and electrical appliances are able to
handle only two digits (open or closed circuit, positive or negative polarity). For this reason,
the base 2 is incredibly common: it is the most e cient base in terms of the complexity of
the digit set C2 that only consists of the digits 0 and 1 { which are called bits, from binary
digits.
In binary notation, the integers can be written as

Dec. 0 1 2 3 4 5 6 7 8 9 10 11 16
Bin. 0 1 10 11 100 101 110 111 1000 1001 1010 1011 10000

where, for example, in binary notation

1011 = 1 23 + 0 22 + 1 21 + 1 20

and in decimal notation


11 = 1 101 + 1 100
The considerable reduction in the digit set C2 made possible by the base 2 involves in terms of
cost the large number of bits required to represent numbers in binary notation. For example:
if 16 consists of two decimal digits, the corresponding binary 10000 requires ve bits; if 201
requires three digits, the corresponding binary 11001001 requires eight bits; if 2171 requires
four digits, the corresponding binary 100001111011 requires twelve bits, and so on. Very
quickly, binary notation requires a number of bits that only a computer is able to process.

From a purely mathematical perspective, the choice of base is merely conventional, and
going from one base to another is easy (although tedious).33 Bases 2 and 10 are nowadays
the most important ones, but others have been used in the past, such as 20 (the number of
ngers and toes, a trace of which is still found in the French language where \quatre-vingts"
{ i.e., \four-twenties" { stands for eighty and \four-twenty-ten" stands for ninety), as well
33
Operations on numbers written in a non-decimal notation are not particularly di cult either. For exam-
ple, 11 + 9 = 20 can be calculated in a binary way as

1011+
1001 =
10100

It is su cient to remember that the \carrying" must be done at 2 and not at 10.
36CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

as 16 (the number of spaces between ngers and toes) and 60 (which is convenient because
it is divisible by 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30; a signi cant trace of this system remains
in how we divide hours and minutes and in how we measure angles).

Positional notation has been used to perform manual calculations since the dawn of times
(just think about computations carried out with the abacus), yet it is a relatively recent
conquest in terms of writing, made possible by the fundamental innovation of the zero.
It has been exceptionally important in the development of mathematics and its countless
applications { commercial, scienti c and technological. Though already used in Babylonian
mathematics, the most fruitful formulation of positional notation emerged in India, around
the fth century AD. It was further developed during the early Middle Ages in the Arab world
(especially thanks to the works of Al-Khwarizmi), from which the name \Arabic numerals"
for the decimal digits (1.22) derives.34 It arrived in the Western world thanks to Italian
merchants between the eleventh and twelfth centuries. In particular, the son of one of
those merchants, Leonardo da Pisa (also known as Fibonacci),35 was the most important
medieval mathematician: for the rst time in Western Europe after so many dark centuries,
he conducted original research in mathematics with the overt ambition of going beyond
what the great mathematicians of the classical world had established.36 Inter alia, Leonardo
authored a famous treatise in 1202, the Liber abaci, which was the most important among
the rst essays that brought in Europe the positional notation.37 Until then, non-positional
Roman numerals were used

I; II; III; IV; V; :::; X; :::; L; :::; C; :::; M; :::

which made even trivial operations overly complex (try to sum up CXL and MCL, and then
140 and 1150).
Let us conclude with the incipit of the rst chapter of Liber abaci, with the extraordinary
innovation that the book brought to the Western world:

Novem gure indorum he sunt

9; 8; 7; 6; 5; 4; 3; 2; 1

Cum his itaque novem guris, et cum hoc signo, quod arabice zephirum appellatur,
scribitur quilibet numerus, ut inferius demonstratur. [...] ut in sequenti cum
guris numeris super notatis ostenditur.
MI M M XXIII M M M XXII M M M XX M M M M M DC MMM
1001 2023 3022 3020 5600 3000

... Et sic in reliquis numeris est procedendum.38


34
See Neugebauer (1957).
35
Because of its trade network, Pisa at the time of Leonardo had a privileged position at the intersection
of three Mediterranean cultures, Latin Christianity, Greek Christianity and the Muslim world.
36
This ambition, however, was fully ful lled only with the works of the Renaissance algebraists { culminated
with Girolamo Cardano's Ars Magna of 1545 { that in the rst half of the sixteenth century were able to
solve the cubic equation, an important question that had eluded the mathematicians of the classical world.
This epoch-making discovery marked the beginning of modern mathematics.
37
See Giusti (2002) for an authoritative introduction to the Liber abaci.
38
\The nine Indian symbols are ... With these nine symbols and with the symbol 0, which the Arabs call
1.7. THE EXTENDED REAL LINE 37

1.7 The extended real line


In the theory of limits that we will study later in the book, it is useful to consider the
extended real line. It is obtained by adding to the real line the two ideal points +1 and
1. We get in this way the set
R [ f 1; +1g
denoted by the symbol R or, sometimes, by [ 1; +1].
The order structure of R can be naturally extended on R by setting 1 < a < +1 for
each a 2 R. In contrast, the operations de ned in R can be only partially extended to R. In
particular, besides the usual rules of calculation in R, on the extended real line the following
further rules hold:

(i) addition with a real number:

a + 1 = +1; a 1= 1 8a 2 R (1.23)

(ii) addition between in nities of the same sign:

+1 + 1 = +1 and 1 1= 1

(iii) multiplication with a non-zero number:

a (+1) = +1 and a ( 1) = 1 8a > 0


a (+1) = 1 and a ( 1) = +1 8a < 0

(iv) multiplication of in nities:


(
+1 (+1) = 1 ( 1) = +1
+1 ( 1) = 1 (+1) = 1

with, in particular,

(+1)a = +1 if a > 0 and (+1)a = 0 if a < 0

(v) division:
a a
= =0 8a 2 R
+1 1
(vi) power of a real number:
8
>
> a+1 = +1 if a > 1
>
>
>
< a+1 = 0 if 0 < a < 1
>
> a 1 =0 if a > 1
>
>
>
: 1
a = +1 if 0 < a < 1
zephyr, any number can be written as shown below. [...] the above numbers are shown below in symbols
... And in this way you continue for the following numbers." Interestingly, Roman numerals continued to be
used in book keeping for a long time because they are more di cult to manipulate (just add a 0 to an Arabic
numeral in a balance sheet...).
38CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

(vii) power between in nities:


(
(+1)+1 = +1
(+1) 1 =0

While the addition of in nities with the same sign is a well-de ned operation (for example,
the sum of two positive in nities is again a positive in nity), the addition of in nities of
di erent sign is not de ned. For example, the result of +1 1 is not de ned. This is a
rst example of an indeterminate operation in R. In general, the following operations are
indeterminate:

(i) addition of in nities with di erent sign:

+1 1 and 1+1 (1.24)

(ii) multiplication between 0 and in nity:

1 0 and 0 ( 1) (1.25)

(iii) divisions with denominator equal to zero or with numerator and denominator that are
both in nities:
a 1
and (1.26)
0 1

with a 2 R;

(iv) the powers:


1
1 ; 00 ; (+1)0 (1.27)

The indeterminate operations (i)-(iv) are called forms of indetermination and will play
an important role in the theory of limits. Note that, by setting a = 0, formulas (1.26) include
the form of indetermination
0
0

O.R. As we have observed, the most natural geometric image of R is the (real) line: to each
point there corresponds a number and, vice versa, to each number there corresponds a point.
Yet, we can \transport" all the numbers from the real line to the open interval (0; 1), as the
following gure shows:39

39
We refer to the proof of Proposition 276 for the analytic expression of the bijection shown here.
1.8. THE BIRTH OF THE DEDUCTIVE METHOD: AN INTELLECTUAL REVOLUTION39

2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

All the real numbers that found a place on the real line also nd a place on the interval (0; 1)
{ maybe packed, but they t all. Two points are left, the endpoints of the interval, to which
it is natural to associate, respectively, +1 and 1. The closed interval [0; 1] is, therefore,
a geometric image of R. H

1.8 The birth of the deductive method: an intellectual revo-


lution
The deductive method, upon which mathematics is based, was born between the VI and
the V century B.C. and, in that period, came to dominate Greek mathematics. As we have
seen throughout the chapter, mathematical properties are stated in theorems, whose truth
is established by a logical argument, their proof, which is based on axioms and de nitions.
It is a revolutionary innovation in the history of human thought, celebrated in several
dialogues of Plato, elaborated and codi ed in the Elements of Euclid. It places reason as the
sole guide for scienti c (and non-scienti c) investigations. A mathematical property { for
example, that the sum of the squares of the catheti is equal to the square of the hypotenuse {
is true because it can be logically proved and not because it is empirically veri ed in concrete
examples or because a nice drawing makes the intuition clear or because some \authority"
reveals its truth.40
40
Plato in the sixth book of The Republic writes \And do you not known that although [students of
geometry] make use of the visible forms and reason about them, they are thinking not of these, but of the
ideals which they resemble; not of the gures which they draw, but of the absolute square and the absolute
diameter, and so on { the forms which they draw or make, and which have shadows and re ections in water
of their own, are converted by them into images, but they are really seeking to behold the things themselves,
40CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)

Little is known about the birth of the deductive method, the survived documentation
is scarce. Reason emerged in the Ionian Greek colonies { rst in Miletus with Thales and
Anaximander { to guide the rst scienti c investigations of physical phenomena. It was,
however, in Magna Graecia that reason rst tackled abstract matters. This miracle, of which
we are all intellectual children, happened within the Eleatic philosophy that ourished at
Elea in the V century B.C. and had in Parmenides and Zeno its best known exponents.41
In Parmenides' famous doctrine of the Being, a turning point in intellectual history that the
reader might have encountered in some high school philosophy course, it is logic that permits
the study of the Being, that is, of the world of truth ( " ). This study is impossible for
the senses, which can only guide us among the appearances that characterize the world of
opinion ( o ). In particular, only the reason can dominate the arguments by contradiction,
which have no empirical substratum, but are the pure result of reason. Such arguments,
developed by the Eleatic school and at the center of its dialectics (culminated in the famous
paradoxes of Zeno),42 for example enabled the Eleatic philosopher Melissus of Samo to state
that the Being \always was what it was and always will be. For if it had come into being,
necessarily before it came into being there was nothing. But, if there was nothing, in no way
could something come into being from nothing".43
True knowledge is thus theoretic. Only the eye of the mind can see the truth through
a sustained, uncompromising, logical argumentation based on the law of excluded middle,
the fundamental principle of the Eleatic theory of knowledge (to paraphrase Parmenides,
what-is has to be, what-is-not cannot be). In contrast, an empirical analysis based on senses
necessarily stops at the appearance. The anti-empirical character of the Eleatic school could
have been decisive in the birth of the deductive method, at least in creating a favorable
intellectual environment.44 Naturally, it is not possible to exclude an opposite causality:
the deductive method could have been developed inside mathematics and could have then
in uenced philosophy, in particular the p Eleatics (allegedly Parmenides had Pythagorean
mentors).45 Indeed, the irrationality of 2 established by the Pythagorean school { the
other great Presocratic school of Magna Graecia { is a rst decisive triumph of such a
method in mathematics: only the eye of the mind could see such a property, which is devoid
of any \empirical" intuition. It is the eye of the mind that explains the inescapable error
in which incurs every empirical measurement of the hypotenuse of a right triangle with
catheti of unitary length: however accurate is this measurement, it will always be a rational

which can only be seen with the eye of the mind? (trans. Jowett).
41
Elea was a town of Magna Graecia, around 140 kilometers south of Naples and 300 kilometers north of
Crotone, the center of the Pythagorean school.
42
Cf. Vlastos (1996) p. 240.
43
Barnes (1982) calls this beautiful fragment the theorem of ungenerability (trans. Allho et al. in \Ancient
philosophy", Blackwell, 2008). In a less transparent way (but it was part of the rst logical argument ever
reported) Parmenides in his poem On Nature had written \And how might what is be then? And how might
it have come into being? For if it came into being, it is not, nor if it is about to be at some time" (trans.
Barnes). We refer to Calogero (1977) for a classic work on Eleatic philosophy, as well as to Barnes (1982)
and to the more recent Warren (2014) for general introductions to the Presocratics.
44
As advocated by Szabo (1978).
45
For instance, arguments by contradiction could have been developed within the Pythagorean school
through
p the odd-even dichotomy for natural numbers which is central in the proof of the irrationality of
2. This is what Cardini Timpanaro (1964) argues, contra Szabo, in her comprehensive book. See also pp.
258-259 in Vlastos (1996). Interestingly, the archaic Greek enigmas were formulated in contradictory terms
(their role in the birth of dialectics is emphasized by Colli, 1975).
1.8. THE BIRTH OF THE DEDUCTIVE METHOD: AN INTELLECTUAL REVOLUTION41
p
approximation of the true irrational distance, 2, with a consequent approximation error
(that, by the way, will probably vary from measurement to measurement).
In any case, between the VI and the V century B.C. two Presocratic schools of Magna
Graecia were the cradle of an intellectual revolution. In the III century B.C. Euclid of Alexan-
dria and another famous Magna Graecia scholar, Archimedes of Syracuse, led this revolution
to its maximum splendor in the classical world (and beyond).46 We close with Plato's fa-
mous (probably ctional) description of two protagonists of this revolution, Parmenides and
Zeno.47

They came to Athens ... the former was, at the time of his visit, about 65 years
old, very white with age, but well favoured. Zeno was nearly 40 years of age,
tall and fair to look upon: in the days of his youth he was reported to have been
beloved by Parmenides.

46
Hellenistic science, with intellectual center in Alexandria, reached an impressive level that remained
unmatched till the time of Galileo and Newton (cf. Russo, 1996).
47
In Plato's dialogue Parmenides (trans. Jowett reported in Barnes ibid.). A caveat: over the centuries
{ actually, over the millennia { the strict Eleatic anti-empirical stance (understandable, back then, in the
excitement of a new approach) has inspired a great deal of metaphysical thinking. Yet, reason without
empirical motivation and discipline becomes, at best, sterile. Already Aristotle lamented, in his treatise De
generatione et corruptione, the \devotion to abstract discussions ... unobservant of the facts" and famously
noted that \opinions appear to follow logically in a dialectic discussion, yet to believe them seems next door
to madness when one considers the facts" (trans. Joachim).
42CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Chapter 2

Cartesian structure (sdoganato)

2.1 Cartesian products and Rn


Suppose we want to classify a wine according to two characteristics, aging and alcoholic
content. For example, suppose one reads on a label: 2 years of aging and 12 degrees. We
can write
(2; 12)

On another label one reads: 1 year of aging and 10 degrees. In this case we can write

(1; 10)

The pairs (2; 12) and (1; 10) are called ordered pairs. In them we distinguish the rst element,
the aging, from the second one, the alcoholic content. In an ordered pair the position is,
therefore, crucial: a (2; 12) wine is very di erent from a (12; 2) wine (try the latter...).
Let A1 be the set of the possible years of aging and A2 the set of the possible alcoholic
contents. We can then write

(2; 12) 2 A1 A2 ; (1; 10) 2 A1 A2

We denote by a1 a generic element of A1 and by a2 a generic element of A2 . For example,


in (2; 12) we have a1 = 2 and a2 = 12.

De nition 47 Given two sets A1 and A2 , the Cartesian product A1 A2 is the set of all
the ordered pairs (a1 ; a2 ) with a1 2 A1 and a2 2 A2 .

In the example, we have A1 N and A2 N, i.e., the elements of A1 and A2 are natural
numbers. More generally, we can assume that A1 = A2 = R, so that the elements of A1
and A2 are real numbers, although with a possible di erent interpretation according to their
position. In this case
A1 A2 = R R = R2

and the pair (a1 ; a2 ) can be represented by a point in the plane:

43
44 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

An ordered pair of real numbers (a1 ; a2 ) 2 R2 is called vector.

Among the subsets of R2 , of particular importance are:

(i) (a1 ; a2 ) 2 R2 : a1 = 0 , that is, the set of the ordered pairs of the form (0; a2 ); it is
the vertical axis (or axis of the ordinates).
(ii) (a1 ; a2 ) 2 R2 : a2 = 0 , that is, the set of the ordered pairs of the form (a1 ; 0); it is
the horizontal axis (or axis of the abscissae).
(iii) (a1 ; a2 ) 2 R2 : a1 0 and a2 0 , that is, the set of the ordered pairs (a1 ; a2 ) with
both components that are positive; it is the rst quadrant of the plane (also called
positive orthant). In a similar way we can de ne the other quadrants:

y
3

II I
1

0
O x
-1

III IV
-2
-3 -2 -1 0 1 2 3 4 5

(iv) (a1 ; a2 ) 2 R2 : a21 + a22 1 and (a1 ; a2 ) 2 R2 : a21 + a22 < 1 , that is, the closed unit
ball and open unit ball, respectively (both centered at the origin and with radius one).1
1
The meaning of the adjectives \closed" and \open" will become clear in Chapter 5.
2.1. CARTESIAN PRODUCTS AND RN 45

(v) (a1 ; a2 ) 2 R2 : a21 + a22 = 1 , that is, the unit circle; it is the skin of the closed unit
ball:

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

Before we classi ed wines according to two characteristics, aging and alcoholic content.
We now consider the slightly more complicated example of a portfolio of assets. Suppose
that there exist four di erent assets that can be purchased in a nancial market. A portfolio
is then described by an ordered quadruple

(a1 ; a2 ; a3 ; a4 )

where a1 is the amount of money invested in the rst asset, a2 is the amount of money
invested in the second asset, and so on. For example,

(1000; 1500; 1200; 600)

denotes a portfolio in which 1000 euros have been invested in the rst asset, 1500 in the
second one, and so on. The position is crucial: the portfolio

(1500; 1200; 1000; 600)

is very di erent from the previous one, although the amounts of money involved are the
same.
Since amounts of money are numbers that are not necessarily integers, possibly negative
(in case of sales), it is natural to assume A1 = A2 = A3 = A4 = R, where Ai is the set of the
possible amounts of money that can be invested in asset i = 1; 2; 3; 4. We have

(a1 ; a2 ; a3 ; a4 ) 2 A1 A2 A3 A4 = R4

In particular,
(1000; 1500; 1200; 600) 2 R4
In general, if we consider n sets A1 ; A2 ; :::; An we can give the following de nition.
46 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

De nition 48 Given n sets A1 ; A2 ; :::; An , the Cartesian product

A1 A2 An
Q
denoted by ni=1 Ai (or by ni=1 Ai ), is the set of all the ordered n-tuples (a1 ; a2 ; :::; an ) with
a1 2 A1 ; a2 2 A2 ; ; a n 2 An .

We call a1 ; a2 ; ; an the components (or elements) of the n-tuple


Yn
a = (a1 ; a2 ; :::; an ) 2 Ai
i=1

When A1 = A2 = = An = A, we write

|A A {z A} = An
n times

In particular, if A1 = A2 = = An = R the Cartesian product is denoted by Rn , which


therefore is the set of all the (ordered) n-tuples of real numbers. In other words,

R
| R {z R} = Rn
n times

An element
x = (x1 ; x2 ; :::; xn ) 2 Rn
is called vector.2 The Cartesian product Rn is called the (n-dimensional ) Euclidean space.
For n = 1, R is represented by the real line and, for n = 2, R2 is represented by the plane.
As one learns in high school, it was Descartes that in 1637 understood it { so all points of
the plane can be identi ed by a pair (a1 ; a2 ), as seen in a previous gure { a marvelous
insight that permitted to study geometry through algebra (this is why Cartesian products
are named after him). Also the vectors (a1 ; a2 ; a3 ) in R3 admit a graphic representation:

1 z
0.9

0.8
a
3
0.7

0.6

0.5
a
2
0.4 O
0.3 a
1
0.2 y
x
0.1

0
0 0.2 0.4 0.6 0.8 1

2
For real numbers we use the letter x instead of a.
2.2. OPERATIONS IN RN 47

However, this is no longer possible in Rn when n 4. The graphic representation may help
the intuition, but from a theoretical and computational viewpoint it has no importance: the
vectors of Rn , with n 4, are completely well-de ned entities. They actually turn out to
be fundamental in economics, as we will see in Section 2.4 and as the portfolio example
already showed. Indeed, \the economic world is a world of n dimensions", as Irving Fisher
remarked.3

Notation We will denote the components of a vector by the same letter used for the vector
itself, along with ad hoc indexes: for example a3 is the third component of the vector a, y7
the seventh component of the vector y, and so on.

2.2 Operations in Rn
Let us consider two vectors in Rn ,

x = (x1 ; x2 ; ::; xn ) ; y = (y1 ; y2 ; :::; yn )

We de ne the vector sum x + y by

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn )

For example, for the two vectors x = (7; 8; 9) and y = (2; 4; 7) in R3 , we have

x + y = (7 + 2; 8 + 4; 9 + 7) = (9; 12; 16)

Note that x + y 2 Rn : through the operation of addition we constructed a new element of


Rn .
Now, let 2 R and x 2 Rn . We de ne the vector product x by

x = ( x1 ; x2 ; :::; xn )

For example, for = 2 and x = (7; 8; 9) 2 R3 , we have

2x = (2 7; 2 8; 2 9) = (14; 16; 18)

Also in this case we have x 2 Rn . In other words, also through the operation of scalar
multiplication we constructed a new element of Rn .4

Notation We set x = ( 1)x = ( x1 ; x2 ; :::; xn ) and x y = x + ( 1) y. We will also


set 0 = (0; 0; :::; 0), where boldface distinguishes the vector 0 of zeros from the scalar 0. The
vector 0 is called zero vector.

We have introduced in Rn two operations, addition and scalar multiplication, that extend
to vectors the corresponding operations for real numbers. Let us see their properties. We
start with addition.
3
See Fisher (1930) p. 237.
4
A real number is often called scalar. Throughout the book we will use the terms \scalar" and \real
number" interchangeably.
48 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

Proposition 49 Let x; y; z 2 Rn . The operation of addition satis es the following proper-


ties:

(i) x + y = y + x (commutativity),

(ii) (x + y) + z = x + (y + z) (associativity),

(iii) x + 0 = x (existence of the neutral element for addition),

(iv) x + ( x) = 0 (existence of the opposite of any vector).

Proof We prove (i), leaving the other properties to the reader. We have

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn ) = (y1 + x1 ; y2 + x2 ; :::; yn + xn ) = y + x

as desired.

We now consider scalar multiplication.

Proposition 50 Let x; y 2 Rn and ; 2 R. The operation of scalar multiplication satis es


the following properties:

(i) (x + y) = x + y (distributivity of the addition of vectors),

(ii) ( + ) x = x + x (distributivity of the addition of scalars),

(iii) 1x = x (existence of the neutral element for the scalar multiplication),

(iv) ( x) = ( ) x (associativity).

Proof We only prove (ii), the other properties are left to the reader. We have:

( + ) x = (( + ) x1 ; ( + ) x2 ; :::; ( + ) xn )
= ( x1 + x1 ; x2 + x2 ; :::; xn + xn )
= ( x1 ; x2 ; :::; xn ) + ( x1 ; x2 ; :::; xn ) = x + x

as claimed.

The last operation in Rn that we consider is the inner product. Given two vectors x and
y in Rn , their inner product, denoted by x y, is the scalar de ned by

x y = x1 y1 + x2 y2 + + xn yn

That is, in more compact notation,5


n
X
x y= xi yi
i=1
5 Pn
Given n real
Q numbers ri , their sum r1 + r2 + + rn is denoted by i=1 ri , while their product r1 r2 rn
is denoted by n i=1 ri .
2.3. ORDER STRUCTURE ON RN 49

Another common notation for the inner product is hx; yi.


For example, for the vectors x = (1; 1; 5; 3) and y = ( 2; 3; ; 1) of R4 , we have

x y = 1 ( 2) + ( 1) 3 + 5 + ( 3) ( 1) = 5 2

The inner product is an operation that di ers from addition and scalar multiplication in a
structural aspect: while the latter operations determine a new vector of Rn , the result of the
inner product is a scalar. The next result gathers the main properties of the inner product
(we leave to the reader the simple proof).

Proposition 51 Let x; y; z 2 Rn and 2 R. We have:

(i) x y = y x (commutativity),

(ii) (x + y) z = (x z) + (y z) (distributivity),

(iii) x z= (x z) (distributivity).

Note that the two distributive properties can be summarized in the single property
( x + y) z = (x z) + (y z).

2.3 Order structure on Rn


The order structure of Rn is based on the order structure of R, but with some important
novelties. We begin by de ning the order on Rn : given two vectors x = (x1 ; x2 ; ::; xn ) and
y = (y1 ; y2 ; ::; yn ) in Rn , we write
x y

when xi yi for every i = 1; 2; : : : ; n. In particular, we have x = y if and only if we have


both x y and y x.
In other words, orders two vectors by applying, component by component, the order
on R studied in Section 1.4. For example, x = (0; 3; 4) y = (0; 2; 1). When n = 1, the
order thus reduces to the standard one on R.

The study of the basic properties of the inequality on Rn reveals a rst important
novelty: when n 2, the order does not satisfy completeness. Indeed, consider for
example x = (0; 1) and y = (1; 0) in R2 : neither x y nor y x. We say, therefore, that
on Rn is a partial order (which becomes a complete order when n = 1).
It is easy to nd vectors in Rn that are not comparable. In the following gure the darker
area represents the points of R2 that are smaller than x = (1; 2), the clearer area those that
are greater than x, and the two white areas represent the points that are not comparable
50 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

with x.

5
y
4

2
2
1

0
O 1 x

-1

-2
-2 -1 0 1 2 3 4 5

Apart from completeness, it is easy to verify that on Rn continues to enjoy the properties
seen for n = 1:

(i) re exivity: x x,

(ii) transitivity: if x y and y z, then x z,

(iii) additive independence: if x y, then x + z y + z for every z 2 Rn ,

(iv) multiplicative independence:6 if x y, 2 R and z 2 Rn , then

x y if >0 x z y z if z > 0

x= y=0 if =0 and x z=y z=0 if z = 0

x y if <0 x z y z if z < 0

(v) separation: given two sets A and B in Rn , if a b for every a 2 A and b 2 B, then
there exists c 2 Rn such that a c b for every a 2 A and b 2 B.

Another notion that becomes surprisingly delicate when n 2 is that of strict inequality.
Indeed, given two vectors x = (x1 ; x2 ; :::; xn ) and y = (y1 ; y2 ; :::; yn ) of Rn , two cases can
happen.

1. All the components of x are than the corresponding components of y, with some of
them strictly greater; i.e., xi yi for each index i = 1; 2; :::; n, with xi > yi for at least
an index i.

2. All the components of x are > than the corresponding components of y; i.e., xi > yi
for each i = 1; 2; :::; n:
6
In Rn multiplicative independence holds with respect to both scalar and inner products (the asymmetric
position of and z is standard).
2.3. ORDER STRUCTURE ON RN 51

In the rst case we have a strict inequality, in symbols x > y; in the second case a strong
inequality, in symbols x y.

Example 52 For x = (1; 3; 4) and y = (0; 1; 2) in R3 , we have x y. For x = (0; 3; 4) and


y = (0; 1; 2), we have x > y but not x y, because x has only two components out of three
strictly greater than the corresponding components of y. N

Given two vectors x; y 2 Rn , we have

x y =) x > y =) x y

The three notions of inequality among vectors in Rn are, therefore, more and more
stringent. Indeed, we have:

(i) a weak notion, , that permits the equality between the two vectors;

(ii) an intermediate notion, >, that requires at least one strict inequality among the com-
ponents;

(iii) a strong notion, , that requires strict inequality among all the components of the
two vectors.

When n = 1, both > and reduce to the standard > on R. The \reversed" symbols ,
<, and are used for the converse inequalities.

An important comparison is that between a vector x and the zero vector 0. We say that
the vector x is:

(i) positive if x 0, i.e., if all the components of x are positive;

(ii) strictly positive if x > 0, i.e., if all the components of x are positive and at least one
of them is strictly positive;

(iii) strongly positive if x 0, i.e., all the components of x are strictly positive.

N.B. The notation and terminology that we introduced is not the only possible one. For
example, some authors use =, >, and > in place of >, >, and ; other authors call \non-
negative" the vectors that we call positive, and so on. O

Together with the lack of completeness of , the presence of the two di erent notions of
strict inequality is the main novelty, relative to what happens in the real line, that we have
in Rn when n 2.

We conclude this section by generalizing the intervals introduced in R (Section 1.4).


Given a; b 2 Rn , we have:

(i) the bounded closed interval

[a; b] = fx 2 Rn : a x bg = fx 2 Rn : ai xi bi g
52 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

(ii) the bounded open interval

(a; b) = fx 2 Rn : a x bg = fx 2 Rn : ai < xi < bi g

(iii) the bounded half-closed (or half-open) intervals

(a; b] = fx 2 Rn : a x bg and [a; b) = fx 2 Rn : a x bg

(iv) the unbounded intervals [a; 1) = fx 2 Rn : x ag and (a; 1) = fx 2 Rn : x ag,


and their analogues ( 1; a] and ( 1; a).

N.B. (i) The intervals

[0; 1) = fx 2 Rn : x 0g and (0; 1) = fx 2 Rn : x 0g

are often denoted by Rn+ and Rn++ , respectively. The intervals Rn = fx 2 Rn : x 0g and
Rn = fx 2 Rn : x 0g are similarly de ned. (ii) The intervals
Qn in Rn can be expressed as
Cartesian products of intervals in R; for example, [a; b] = i=1 [ai ; bi ]. (iii) In the intervals
just introduced we used the inequalities or . By replacing them with the inequality <,
we obtain other possible intervals that, however, are not that relevant for our purposes. O

2.4 Applications
Static choices Consider a consumer who has to choose how many kilograms of apples and
of potatoes to buy at the market. For convenience, we assume that these goods are in nitely
divisible, so that the consumer can buy any real positive quantity { for example, 3 kg of
apples and kg of potatoes. In this case, R+ is the set of the possible quantities of apples or
potatoes that can be bought. Therefore, the collection of all bundles of apples and potatoes
that the consumer can buy is

R2+ = R+ R+ = f(x1 ; x2 ) : x1 ; x2 0g

Graphically, it is the rst quadrant of the plane. In general, if a consumer chooses n goods,
the set of the bundles is represented by the Cartesian product

Rn+ = R+ R+ R+ = f(x1 ; x2 ; ::; xn ) : xi 0 for i = 1; 2; :::; ng

In production theory, a vector in Rn+ may represent a possible con guration of n inputs
for a producer. In this case the vector x = (x1 ; x2 ; ::; xn ) indicates that the producer has at
his disposal x1 units of the rst input, x2 units of the second input, ..., and xn units of the
last input.

Intertemporal choices In consumer theory a vector x = (x1 ; x2 ; ::; xn ) may thus be


interpreted as a bundle in which xi is the quantity of good i = 1; 2; :::; n. But, there is another
possible interpretation in which there is a single good and x = (x1 ; x2 ; ::; xn ) indicates the
quantity of such good available in di erent periods, with xi being the quantity of the good
available in the i-th period. For example, if the single good are apples, x1 is the quantity of
2.5. PARETO OPTIMA 53

apples in period 1, x2 is the quantity of apples in period 2, and so on, until xn which is the
quantity of apples in the n-th period.
In this case, Rn+ denotes the space of all streams of quantities of a given good, say apples,
over n periods. It is often used the more evocative notation RT , where T is the number of
periods and xt is the quantity of apples in period t, with t = 1; 2; : : : ; T .7 In a similar spirit,

x = (x1 ; x2 ; :::; xt ; :::; xT ) 2 RT

may represent amounts of money in di erent periods: in this case stream x is called a cash
ow. For example, the checking account of a family records in each day the balance between
revenues (wages, incomes, etc.) and expenditures (purchases, rents, etc.). Setting T = 365,
the resulting cash ow is
x = (x1 ; x2 ; ::::; x365 )

So, x1 is the balance of the checking account on January 1st , x2 is the balance on January
2nd , and so on until x365 , which is the balance at the end of the year.
Instead of a stream of quantities of a single good, we can consider a stream of bundles of
several goods. Similarly, in an intertemporal problem of production, we will have streams of
inputs' vectors. Such situations are modeled by means of matrices, a simple notion that will
be studied in Chapter 15. Many economic applications focus, however, on the single good
case, so RT is an important space in economics.

2.5 Pareto optima


2.5.1 De nition
We can extend the notion of maximum to subsets of the space Rn , with its order

De nition 53 Let A Rn . A point x


^ 2 A is called maximum of A if x
^ x for every
x 2 A.

In an analogous way we can de ne the minimum. Moreover, Proposition 36 continues to


hold: the maximum (minimum) of a set A Rn , if it exists, is unique (as the reader can
check).
Unfortunately, this last de nition is of little interest in economic applications because
often subsets of Rn do not have maxima (or minima) since the order is not complete in
n
R when n 2, as seen in Section 2.3. The binary set f(1; 2) ; (2; 1)g is a trivial example of
a set of the plane without maxima and minima.
It is much more fruitful, instead, to observe that the concept of maximum of a subset of
R, given in De nition 33, can be equivalently reformulated as follows.

Lemma 54 Let A R. A point x


^ 2 A is maximum of A if and only if there is no x 2 A
such that x > x
^.
7
The notation t = 1; 2; : : : ; T is equivalent to t 2 f1; 2; : : : ; T g, like the notation i = 1; 2; : : : ; n is equivalent
to i 2 f1; 2; : : : ; ng. Choosing one of them is a matter of convenience.
54 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

Indeed, since is complete on the real line, requiring that all the points of A be x
^
amounts to require that none of them be > x ^. A similar reformulation can be given for
minima.
Interestingly, this equivalent characterization of the concept of maximum in R becomes
more general in Rn . Indeed, is no longer complete when n 2 and so the \if" is easily seen
to fail in Lemma 54. This motivates the next de nition, of great importance in economic
applications.

De nition 55 Let A Rn . A point x ^ 2 A is called maximal (or a Pareto optimum) of A


if there is no x 2 A such that x > x
^.

In a similar way we can de ne minimals, which are also called Pareto optima (like angels,
optima have no gender).
To understand the nature of maximals,8 say that a point x 2 A is dominated by another
point y 2 A if x < y, that is, if xi yi for each index i, with xi < yi for at least an
index i. A dominated point is thus outperformed by another point available in the set. For
instance, if they represent bundles of goods, a dominated bundle x is obviously a no better
alternative than the dominant one y.9 In terms of dominance, we can say that a point a of A
is maximal when is not dominated by any other point in A, so when is not outperformed by
any other alternative available in A. Maximality is thus the natural extension of the notion
of maximum when dealing { as it is often the case in applications { with alternatives that
are multi-dimensional (and so represented by vectors of Rn ).

2.5.2 Maxima and maximals


Lemma 54 shows that the notions of maximum and maximal are equivalent in R. This is no
longer true in Rn when n > 1: the notion of maximum becomes (much) stronger than that
of maximal.

Lemma 56 The maximum of a set A Rn is, if it exists, the unique maximal of A.

Proof Let x ^ 2 A be the maximum of A. Clearly, x ^ is a maximal. We need to show that it


is the unique maximal. Let x 2 A with x 6= x ^. Since x
^ is the maximum of A, we have x
^ x.
Since x 6= x
^, we have x^ > x. Therefore, x is not a maximal.

The set in the next gure has a maximum, i.e., point a. Thanks to this lemma, a is
therefore also the unique maximal.

8
Here \maximal" is an adjective used as a noun, as it was the case for \maximum" in De nitions 33 and
53. If used as adjectives, we would have a \maximal element" and a \maximum element". Be that as it may,
in the rest of the chapter we focus on maxima and maximals, the most relevant in economic applications,
leaving to the reader the dual properties that hold for minima and minimals.
9
Here we are tacitly assuming the monotonicity of preferences over bundles that will be discussed in Section
6.8.
2.5. PARETO OPTIMA 55

Thus:

maximum =) maximal

But, the converse is false: there exist maximals that are not maxima, that is,

maximal 6=) maximum

Example 57 In the set A = f(1; 2) ; (2; 1) ; (0; 0)g of the plane, the vectors (2; 1) and (1; 2)
are maximals that are not maxima. N

A vector can be both a maximal and a minimal, as the next example shows. So, the two
properties are not mutually exclusive.10

Example 58 In the binary set A = f(1; 2) ; (2; 1)g of the plane, the vectors (2; 1) and (1; 2)
are both maximals and minimals. N

The set A of the next example illustrates another fundamental di erence between maxima
and maximals in Rn with n > 1: the maximum of a set is, if it exists, unique while a maximal
might well not be unique.

Example 59 The next gure shows a set A of R2 that has no maxima, but in nitely many

10
A vector x can be both a maximum and a minimum of a set A if and only if A = fxg. So, a vector can
be simultaneously a maximum and a minimum only in the very special case of singleton sets.
56 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

maximals.

3 a

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

It is easy to see that any point a 2 A on the dark edge is maximal: there is no x 2 A such
that x > a. On the other hand, a is not a maximum: we have a x only for the points
x 2 A that are comparable with a, which are represented in the shaded part of A :

Nothing can be said, instead, for the points that are not comparable with a (the non-shaded
part of A). The lack of maxima for this set is thus due to the fact that the order is
n
incomplete in R when n > 1. N

Summing up, because of the incompleteness of the order on Rn , maxima are much less
important than maximals in Rn . That said, maximals might also not exist: the 45 straight
line is a subset of R2 without maximals (and minimals).11

2.5.3 Pareto frontier and Edgeworth box


Maximals are fundamental in economics, where they are called Pareto optima. The set of
these points is of particular importance.
11
This set is the graph of the identity function f (x) = x, as we will see in Chapter 6.
2.5. PARETO OPTIMA 57

De nition 60 The set of the maximals of a set A Rn is called the Pareto (or e cient)
frontier of A.

In the last example, the dark edge is the Pareto frontier of the set A :
5

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

As a rst economic application, assume that the di erent vectors of a set A Rn


represent the pro ts that n individuals can earn. So, in x = (x1 ; :::; xn ) 2 A the component
xi is the pro t of individual i, with i = 1; :::; n. The Pareto optima represent the situations
from which it is not possible to move away without reducing the pro t of at least one of the
individuals. In other words, the n individuals would not object to restrict A to the set of its
Pareto optima (nobody looses), that is, to its Pareto frontier. A con ict of interests arises
among them, instead, when a speci c point on the frontier has to be selected.
The concept of Pareto optimum thus permits to narrow down, with a unanimous con-
sensus, a set A of alternatives by identifying the true \critical" subset, the Pareto frontier,
which is often much smaller than the original set A.12 A magni cent illustration of this key
feature of Pareto optimality is the famous Edgeworth box.13 Consider two agents, Albert
and Barbara, who have to divide between them unitary quantities of two in nitely divisible
goods (for example, a kilogram of our and a liter of wine). We want to model the problem
of division (probably determined by a bargaining between them) and to see if, thanks to
Pareto optimality, we can say something non-trivial about it.
Each pair x = (x1 ; x2 ), with x1 2 [0; 1] and x2 2 [0; 1], represents a possible allocation
of the two goods to one of the two agents. In particular, the Cartesian product [0; 1] [0; 1]
describes them all. The two agents must agree on the allocations (a1 ; a2 ) of Albert and
12
For Pareto optimality is key that agents only consider their own alternatives (bundles of goods, pro ts,
etc.), without caring about those of their peers. In other words, they should not feel envy or similar social
emotions. To see why, think of a tribe of \envious" whose chief decides to double the food rations to half of
the members of the tribe, leaving unchanged those of the other members. The new allocation would provoke
lively protests by the \unchanged" members even though nothing changed for them.
13
Since we will use notions that we will introduce in Chapter 6, the reader may want to read this application
after that chapter.
58 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

(b1 ; b2 ) of Barbara. Clearly,


a1 + b1 = a2 + b2 = 1 (2.1)
To complete the description of the problem, we have to specify the desiderata of the two
agents. To this end, we suppose that they have identical utility functions ua ; ub : [0; 1]
[0; 1] ! R that, for simplicity, are of the Cobb-Douglas type
p
ua (x1 ; x2 ) = ub (x1 ; x2 ) = x1 x2

(see Example 187). The indi erence curves can be \packed" in the following way:

This is the classic Edgeworth box. By condition (2.1), we can think of a point (x1 ; x2 ) 2
[0; 1] [0; 1] as the allocation of Albert. We can actually identify each possible division
between the two agents with the allocations (x1 ; x2 ) of Albert. Indeed, the allocations of
Barbara (1 x1 ; 1 x2 ) are uniquely determined once those of Albert are known.
Each allocation (x1 ; x2 ) has utility ua (x1 ; x2 ) for Albert and ub (1 x1 ; 1 x2 ) for Bar-
bara. Let

A = (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 R2+ : (x1 ; x2 ) 2 [0; 1] [0; 1]

be the set of all the utility pro les of the two agents determined by the division of the two
goods. We are interested in the allocations whose utility pro les belong to the Pareto frontier
of A, so are Pareto optima of the set A. Indeed, these are the allocations that cannot be
improved upon with a unanimous consensus.
By looking at the Edgeworth box, it is easy to see that the Pareto frontier P of A is
given by the values of allocations on the diagonal of the box, i.e.,

P = (ua (d; d) ; ub (1 d; 1 d)) 2 R2+ : d 2 [0; 1]

That is, by the locus of the tangency points of the indi erence curves (called contract curve).
To prove it rigorously, we need the next simple result.
2.5. PARETO OPTIMA 59

Lemma 61 Given x1 ; x2 2 [0; 1], we have


p p
1 x1 x2 (1 x1 ) (1 x2 ) (2.2)

with equality if and only if x1 = x2 .

Proof Since x1 ; x2 2 [0; 1], we have:


p p p
1 x1 x2 (1 x1 ) (1 x2 ) () (1 x1 x2 )2 (1 x1 ) (1 x2 )
2
x1 + x2 p x1 + x2
() x1 x2 () x1 x2 () (x1 x2 )2 0
2 2

Since the last inequality is always true, we conclude that (2.2) holds. Moreover, these
equivalences imply that
p p
1 x1 x2 = (1 x1 ) (1 x2 ) () (x1 x2 )2 = 0

which holds if and only if x1 = x2 .

Having established this lemma, we can now prove rigorously what the last picture sug-
gested.

Proposition 62 A utility pro le (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 A is a Pareto optimum


of A if and only if x1 = x2 .

Proof Let D = (d; d) 2 R2+ : d 2 [0; 1] be the diagonal of the box. We start by showing
that, for any division of goods (x1 ; x2 ) 2
= D { i.e., with x1 6= x2 { there exists (d; d) 2 D such
that
(ua (d; d) ; ub (1 d; 1 d)) > (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) (2.3)
For Albert, we have
p p p
ua ( x1 x2 ; x1 x2 ) = x1 x2 = ua (x1 ; x2 )
p p
Therefore, ( x1 x2 ; x1 x2 ) is for him indi erent to (x1 ; x2 ). By Lemma 61, for Barbara we
have
p p p p
ub (1 x1 x2 ; 1 x1 x2 ) = 1 x1 x2 > (1 x1 ) (1 x2 ) = ub (1 x1 ; 1 x2 )
p
where the inequality is strict since x1 6= x2 . Therefore, setting d = x1 x2 , (2.3) holds.
It follows that the o -diagonal divisions (x1 ; x2 ) have utility pro les that are not Pareto
optima. It remains to show that the divisions on the diagonal are so. Let (d; d) 2 D and
suppose, by contradiction, that there exists (x1 ; x2 ) 2 [0; 1] [0; 1] such that

(ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) > (ua (d; d) ; ub (1 d; 1 d)) (2.4)

Suppose that14

ua (x1 ; x2 ) > ua (d; d) and ub (1 x1 ; 1 x2 ) ub (1 d; 1 d)


14
A similar argument holds when ua (x1 ; x2 ) ua (d; d) and ub (1 x1 ; 1 x2 ) > ub (1 d; 1 d).
60 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)

that is,
p p p p
x1 x2 > dd = d and (1 x1 ) (1 x2 ) (1 d) (1 d) = 1 d

Therefore, p
p
1 x1 x2 < 1 d (1 x1 )(1 x2 )
which contradicts (2.2). It follows that there is no (x1 ; x2 ) 2 [0; 1] [0; 1] for which (2.4)
holds. This completes the proof.

In sum, if agents maximize their Cobb-Douglas utilities, the bargaining will result in
a division of the goods on the diagonal of the Edgeworth box, i.e., such that each agent
has an equal quantity of both goods. Proposition 62 does not say anything about which of
the points of the diagonal is, then, actually determined by the bargaining, that is, how the
ensuing con ict of interests among agents is then solved. Nevertheless, through the notion
of Pareto optimum we have been able to say something highly non-trivial about the problem
of division.
Chapter 3

Linear structure (sdoganato)

In this chapter we study more in depth the linear structure of Rn that was introduced in
Section 2.2. The study of this fundamental structure of Rn , which we will continue with the
analysis of linear functions in Chapter 15, is part of linear algebra, an all-important topic
that is also at the heart of innumerable applications (to illustrate, a classic application of
linear algebra to nance will be seen in Section 24.6).

3.1 Vector subspaces of Rn


Propositions 49 and 50 have shown that the operations of addition and scalar multiplication
on Rn satisfy the following properties, for all vectors x; y; z 2 Rn and all scalars ; 2 R,
(v1) x + y = y + x
(v2) (x + y) + z = x + (y + z)
(v3) x + 0 = x
(v4) x + ( x) = 0
(v5) (x + y) = x + y
(v6) ( + ) x = x + x
(v7) 1x = x
(v8) ( x) = ( )x
For this reason, Rn is an example of a vector space, which in general is a set endowed
with two operations of addition and scalar multiplication that satisfy properties (v1)-(v8).
For instance, in Chapter 15 we will see that the space of matrices is another example of
vector space.1
We call vector subspaces the subsets of Rn that behave well with respect to the two
operations:
1
The notion of vector space, rst proposed by Giuseppe Peano in 1888 in his book \Calcolo geometrico"
and then developed to its full power by Stefan Banach in the 1920s, is central in mathematics but it is
necessary to go beyond Rn to fully understand it. For this reason the reader will study in depth vector spaces
in more advanced courses.

61
62 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

De nition 63 A non-empty subset V of Rn is called vector subspace if it is closed under


the operations of addition and scalar multiplication, i.e.,2

(i) x + y 2 V if x; y 2 V ;

(ii) x 2 V if x 2 V and 2 R.

It is easy check that the two operations satisfy in V properties (v1)-(v8). Note that, by
(ii), the origin belongs to each vector subspace V { i.e., 0 2 V . Indeed, 0x = 0 for every
vector x 2 V .

The following characterization is useful when one needs to check whether a subset of Rn
is a vector subspace.

Proposition 64 A non-empty subset V of Rn is a vector subspace if and only if

x+ y 2V (3.1)

for every ; 2 R and every x; y 2 V .

Proof \Only if". Let V be a vector subspace and let x; y 2 V . As V is closed with respect
to scalar multiplication, we have x 2 V and y 2 V . It follows that x + y 2 V since V
is closed with respect to addition.
\If". Putting = = 1 in (3.1), we get x + y 2 V , while putting = 0 we get x 2 V .
Therefore, V is closed with respect to the operations of addition and scalar multiplication
inherited from Rn .

Putting = = 0, (3.1) implies that 0 2 V . This con rms that each vector subspace
contains the origin 0.

Example 65 There are two legitimate, yet trivial, subspaces of Rn : the singleton f0g and
the space Rn itself. In particular, the reader can check that a singleton fxg is a vector
subspace of Rn if and only if x = 0. N

Example 66 Let m n and set

M = fx 2 Rn : x1 = = xm = 0g

For example, we have M = x 2 R3 : x1 = x2 = 0 when n = 3 and m = 2. The subset M is


a vector subspace. Indeed, if m = n it reduces to the trivial vector subspace f0g. Otherwise,
if m < n, let x; y 2 M and ; 2 R. We have:

x + y = ( x1 + y1 ; :::; xn + yn )
= (0; :::; 0; xm+1 + ym+1 ; :::; xn + yn ) 2 M

In particular, the vertical axis in R2 , which corresponds to M = x 2 R2 : x1 = 0 , is a


vector subspace of R2 . N
2
Recall that a set is closed under an operation when the result of the operation still belongs to the set.
3.1. VECTOR SUBSPACES OF RN 63

Example 67 Let M be the set of all vectors x 2 R4 such that


8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0

In other words, M is the set of the solutions of this system of equations. It is a vector
subspace: the reader can check that, given x; y 2 M and ; 2 R, we have x + y 2 M .
Performing the computations,3 we nd that the vectors

10 2
t; 6t; t; t (3.2)
3 3

solve the system for each t 2 R, so that

10 2
M= t; 6t; t; t :t2R
3 3

is a description of the subspace. N

The intersection V1 \ V2 of two vector subspaces V1 and V2 is easily seen to be a vector


subspace. More generally:

Proposition 68 The intersection of any collection of vector subspaces of Rn is a vector


subspace.

ProofT Let fVi g be any collection


T of vector subspaces of RTn . Since 0 2 Vi for every i, we
have i Vi 6= ;. Let x; y 2 i Vi and ; 2 R. Since x; y 2 i Vi , we have x; y 2 Vi for every
i and, therefore,
T x + yT2 Vi for every i since each Vi is a vector subspace of Rn . Hence,
x + y 2 i Vi , and so i Vi is a vector subspace of Rn .

Di erently from the intersection, the union of vector subspaces is not in general a vector
subspace, as the next simple example shows.4

Example 69 The sets V1 = x 2 R2 : x1 = 0 and V2 = x 2 R2 : x2 = 0 are both vector


subspaces of R2 . Their union

V1 [ V2 = x 2 R2 : x1 = 0 or x2 = 0

is not a vector subspace of R2 . Indeed,

(1; 0) 2 V1 [ V2 and (0; 1) 2 V1 [ V2

but (1; 0) + (0; 1) = (1; 1) 2


= V1 [ V2 . N
3
The system is properly solved in Example 747. But, for completeness at the end of the chapter (Section
3.7) we provide a simple high school argument.
4
Examples that show the failure of a property are often called counterexamples. In general, the simpler
they are, the better because the failure is then starker.
64 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

A nal remark. In the last proposition we intuitively used the intersection of an arbitrary
family of sets. Indeed, unions and intersections are easily de ned for any family whatsoever,
nite or not, of sets: if fAi gi2I is any such family, with a generic ( nite or in nite) index set
I, their union [
Ai
i2I

is the set of the elements that belong to at least one of the Ai , while their intersection
\
Ai
i2I

is the set of the elements that belong to every Ai .

3.2 Linear independence and dependence


In this chapter we will adopt the notation

xi = xi1 ; :::; xin 2 Rn

in which superscripts identify di erent vectors and subscripts their components. We use
immediately this notation in the next important de nition.

De nition 70 A nite set of vectors x1 ; :::; xm of Rn is said to be linearly independent


if, whenever
1 2
1x + 2x + + m xm = 0
for some set f 1 ; :::; mg of scalars, then

1 = 2 = = m =0

A set x1 ; :::; xm is, instead, said to be linearly dependent if it is not linearly independent,
i.e.,5 if there exists a set f 1 ; :::; m g of scalars, not all equal to zero, such that
1 2 m
1x + 2x + + mx =0

Example 71 Consider the vectors

e1 = (1; 0; 0; :::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)

called standard unit vectors or versors of Rn . The set e1 ; :::; en is linearly independent.
Indeed
1
1e + + n en = ( 1 ; :::; n )
and so 1e
1 + + ne
n = 0 implies 1 = = n = 0. N
5
See Section D.7.3 of the Appendix for a careful logical analysis of this important negation.
3.2. LINEAR INDEPENDENCE AND DEPENDENCE 65

Example 72 All sets of vectors x1 ; :::; xm of Rn that include the zero vector 0 are linearly
dependent. Indeed, without loss of generality, set x1 = 0. Given a set f 1 ; :::; m g of scalars
with 1 6= 0 and i = 0 for i = 2; :::; m, we have
1 2 m
1x + 2x + + mx =0
m
which proves the linear dependence of the set xi i=1
. N

Before presenting other examples, we must clarify a terminological question. Although


m
linear independence and dependence are properties of a set of vectors xi i=1 , often they are
referred to the single vectors. We then speak of a \set of linearly independent (dependent)
vectors" instead of a \linearly independent (dependent) set of vectors".

Example 73 In R3 , the vectors

x1 = (1; 1; 1) ; x2 = (3; 1; 5) ; x3 = (9; 1; 25)

are linearly independent. Indeed


1 2 3
1x + 2x + 3x = 1 (1; 1; 1) + 2 (3; 1; 5) + 3 (9; 1; 25)
=( 1 +3 2 +9 3; 1 + 2 + 3; 1 +5 2 + 25 3)

Therefore, 1 2 3
1x + 2x + 3x = 0 means
8
>
> +3 2+9 3 =0
< 1
1+ 2+ 3 =0
>
>
:
1 + 5 2 + 25 3 = 0

which is a system of equations whose unique solution is ( 1; 2; 3) = (0; 0; 0). More gener-
ally, to check if k vectors

x1 = x11 ; :::; x1n ; x2 = x21 ; :::; x2n ; :::; xk = (xk1 ; :::; xkn )

are linearly independent in Rn , it su ces to solve the linear system


8 1 2
>
> 1 x1 + 2 x1 + + k xk1 = 0
>
< 1 2 + k xk2 = 0
1 x2 + 2 x2 +
>
>
>
: 1 2 + k xkn = 0
1 xn + 2 xn +

If ( 1 ; :::; k ) = (0; :::; 0) is the unique solution, then the vectors are linearly independent in
Rn . For example, consider in R3 the two vectors x1 = (1; 3; 4) and x2 = (2; 5; 1). The system
to solve is 8
>
> 1+2 2 =0
<
3 1+5 2 =0
>
>
:
4 1+ 2=0
It has the unique solution ( 1; 2) = (0; 0), so the two vectors x1 and x2 are linearly inde-
pendent. N
66 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

Example 74 Consider the vectors

x1 = (2; 1; 1) ; x2 = ( 1; 1; 2) ; x3 = (2; 2; 2) ; x4 = (2; 4; 10)

To check if these vectors are linearly independent in R3 , we solve the system


8
>
> 2 1 2+2 3+2 4 =0
<
1 2 2 3 4 4=0
>
>
:
1 2 2 2 3 10 4 = 0

As we have seen previously (Example 67), it is solved by the vectors

10 2
t; 6t; t; t (3.3)
3 3

for each t 2 R. Therefore, (0; 0; 0; 0) is not the unique solution of the system, and so the
vectors x1 , x2 , x3 , and x4 are linearly dependent. Indeed, by setting for example t = 1 in
(3.3), the set of four numbers

10 2
( 1; 2; 3; 4) = ; 6; ;1
3 3

is a set of scalars, not all zero, such that 1x


1 + 2x
2 + 3x
3 + 4x
4 = 0. N

Example 75 Two vectors x1 and x2 that are linearly dependent are called collinear. This
happens when either one of the two vectors is 0 or there exists 6= 0 such that x1 = x2 . In
other words, when there exist two scalars 1 and 2 , not both zero, such that 1 x1 = 2 x2 .
Geometrically, two vectors in the plane are collinear when they belong to the same straight
line passing through the origin. N

Subsets retain linear independence.

Proposition 76 Subsets of a linearly independent set are, in turn, linearly independent.

The simple proof is left to the reader, who can also check that if we add vectors to a
linearly dependent set, the set remains linearly dependent.

3.3 Linear combinations


De nition 77 A vector x 2 Rn is said to be a linear combination of the vectors x1 ; :::; xm
of Rn if there exist m scalars f 1 ; :::; m g such that
1 m
x= 1x + + mx

The scalars i are called the coe cients of the linear combination.

Example 78 Consider the two versors e1 = (1; 0; 0) and e2 = (0; 1; 0) in R3 . A vector of R3


is a linear combination of e1 and e2 when has the form ( 1 ; 2 ; 0) for 1 ; 2 2 R. Indeed,
( 1 ; 2 ; 0) = 1 e1 + 2 e2 . N
3.3. LINEAR COMBINATIONS 67

The notion of linear combination allows us to establish a remarkable characterization of


linear dependence.

Theorem 79 A nite set of vectors S of Rn , with at least two elements, is linearly dependent
if and only if there exists at least an element of S that is a linear combination of other
elements of S.6
m
Proof \Only if". Let S = xi i=1 be a linearly dependent set of Rn , with m 2. Then,
there exists a set f i gm
i=1 of scalars, not all zero, such that
1 2 m
1x + 2x + + mx =0

Without loss of generality, assume that 1 6= 0. Thus, we can write


2 2 3 3 m m
x1 = x + x + + x
1 1 1

and so x1 is linear combination of the vectors x2 ; :::; xm . That is, vector x1 of S is a linear
combination of other elements of S.
m
\If". Suppose that the vector xk of a nite set S = xi i=1 is a linear combination of
other elements of S. Without loss of generality, assume k = 1. Then, there exists a set
f i gm 1 2
i=2 of scalars such that x = 2 x + + m xm . De ne the scalars f i gm
i=1 as follows

1 i=1
i =
i i 2
Pm
By construction, f i gm
i=1 is a set of scalars, not all zero, such that i=1 ix
i = 0. Indeed
m
X
i
ix = x1 + 2x
2
+ 3x
3
+ + mx
m
= x1 + x1 = 0
i=1
m
It follows that xi i=1
is a linearly dependent set.

Example 80 Consider the vectors x1 = (1; 3; 4), x2 = (2; 5; 1), and x3 = (0; 1; 7) in R3 .
Since x3 = 2x1 x2 , the third vector is a linear combination of the other two. By Theorem
79, the set x1 ; x2 ; x3 is linearly dependent. It is immediate to check that also each of the
vectors in the set x1 ; x2 ; x3 is a linear combination of the other two, something that, as
the next example shows, does not hold in general for sets of linearly dependent vectors. N

Example 81 Consider the vectors x1 = (1; 3; 4), x2 = (2; 6; 8) ;and x3 = (2; 5; 1) in R3 .


Since x2 = 2x1 , the second vector is a multiple (so, a linear combination) of the rst vector.
By Theorem 79, the set x1 ; x2 ; x3 is linearly dependent. Note how x3 is not a linear
combination of x1 and x2 , i.e., there are no 1 ; 2 2 R such that x3 = 1 x1 + 2 x2 . In
conclusion, Theorem 79 ensures that, in a set of linearly dependent vectors, some of them
are linear combination of others, but this is not necessarily the case for all the vectors of the
set. For example, this happened for all the vectors in the previous example, but not in this
example. N
6
The reader can check that a set S with one element { i.e., a singleton { is linearly dependent if and only
if S = f0g.
68 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

The next result is an immediate, yet fundamental, consequence of Theorem 79.

Corollary 82 A nite set S 6= f0g of Rn is linearly independent if and only if none of the
vectors in S is linear combination of other vectors in S.

3.4 Generated subspaces


Let S be a non-empty set of vectors of Rn and fVi g the collection of all vector subspaces that
contain S. The collection is non-empty because, trivially, Rn contains
T S and is, therefore,
an element of the collection. By Proposition 68, the intersection i Vi of all such subspaces
is itself a vector subspace of Rn that contains S. So it is the smallest { under T set inclusion
{ vector subspace of Rn that contains S: for each such subspace V , we have i Vi V .
T
The vector subspace i Vi is very important and is called the vector subspace generated
or spanned by S, denoted by span S. In other words, span S is the smallest \enlargement"
of S with the property of being a vector subspace.

The next result shows that span S has a \concrete" representation in terms of linear
combinations of S.

Theorem 83 Let S be a set of Rn . A vector x 2 Rn belongs to span S if and only if it is a


linear combination of vectors of S.

Proof We need to prove that x 2 Rn belongs to span S if and only if there P exist a nite
set xi i2I of vectors in S and a nite set f i gi2I of scalars such that x = i2I i xi . \If".
Let x 2 Rn be a linear combination of a nite set xi i2I of vectors of S. For simplicity,
set xi i2I = x1 ; :::; xk . There exists, therefore, a set f i gki=1 of real numbers such that
P
x = ki=1 i xi . By the de nition of vector subspace, we have 1 x1 + 2 x2 2 span S since
x1 ; x2 2 span S. In turn, 1
1x + 2x
2 2 span S implies 1
1x + 2x
2 + 3
3 x 2 span S,
Pk i
and by proceeding in this way we get that x = i=1 i x 2 span S, as claimed.
\Only if". Let V be the set of all vectors x 2 Rn that can be expressed as linear
combinations of vectors of S, that is, x 2 V if there exist nite sets xi i2I S and
i
P k
i2I
R such that x = i=1 i xi . It is easy to see that V is a vector subspace of Rn
containing S. It follows that span S V and so each x 2 span S is a linear combination of
vectors of S.

Before illustrating the theorem with some examples, we state a simple consequence.

Corollary 84 Let S be a set of Rn . If x 2 Rn is a linear combination of vectors of S, then


span S = span (S [ fxg).

In words, the vector subspace generated by a set does not change by adding to the set a
vector that is already a linear combination of its elements. The \generative" capability of a
set is not improved by adding to it vectors that are linear combinations of its elements.
3.4. GENERATED SUBSPACES 69

Example 85 Let S = x1 ; :::; xk Rn . By Theorem 83,


( k
)
X
span S = x 2 Rn : x = ix
i
with i 2 R for each i = 1; :::; k
i=1
( k )
X
i
= ix : i 2 R for each i = 1; :::; k
i=1

Example 86 Let S = f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g R3 . We have

span S = x 2 R3 : x = 1 (1; 0; 0) + 2 (0; 1; 0) + 3 (0; 0; 1) with each i 2R


3
= f( 1; 2; 3) : i 2 R for every i = 1; 2; 3g = R

More generally, for S = e1 ; :::; en Rn we have


( n
)
X
n i n
span S = x2R :x= ie with each i 2R
i=1
= f( 1; 2 ; :::; n) : i 2 R for every i = 1; :::; ng = Rn

Example 87 If S = fxg, then span S = f x : 2 Rg. For example, let x = (2; 3) 2 R2 .


We have
span S = f(2 ; 3 ) : 2 Rg
So, span S is the graph of the straight line y = (3=2) x that passes through the origin and
the point x. Graphically:
8

y
6

3
2

0
O 2 x

-2

-4
-6 -4 -2 0 2 4 6

N
70 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

3.5 Bases
By Theorem 83, the subspace generated by a subset S of Rn is formed by all the linear
combinations of the vectors in S. Suppose that S is a linearly dependent set. By Theorem
79, some vectors in S are then linear combinations of other elements of S. By Corollary
84, such vectors are, therefore, redundant for the generation of span S. Indeed, if a vector
x 2 span S is a linear combination of vectors of S, then by Corollary 84 we have

span S = span (S fxg)

where S fxg is the set S without the vector x.


A linearly dependent set S thus contains some elements that are redundant for the
generation of span S. This does not happen if, on the contrary, S is a linearly independent
set: by Corollary 82, no vector of S can then be a linear combination of other elements of S.
In other words, when S is linearly independent, all its vectors are essential for the generation
of span S.
These observations lead us to the notion of basis.

De nition 88 A nite subset S of Rn is a basis of Rn if S is a linearly independent set


such that span S = Rn .

If S is a basis of Rn , we therefore have that:

(i) each x 2 Rn can be represented as a linear combination of vectors in S;

(ii) all the vectors of S are essential for this representation, none of them is redundant.

The \essentiality" of a basis to represent, as linear combinations, the elements of Rn is


evident in the following result.

Theorem 89 A nite subset S of Rn is a basis of Rn if and only if each vector x 2 Rn can


be written in only one way as a linear combination of vectors in S.
m
Proof \Only if". Let S = xi i=1 be a basis of Rn . By Theorem 83 and since span S = Rn ,
each vector x 2 Rn can be represented as a linear combination of elements of S. Given
x 2 Rn , suppose that there exist two sets of scalars f i gm m
i=1 and f i gi=1 such that

m
X m
X
i i
x= ix = ix
i=1 i=1

Hence,
m
X
i
( i i) x =0
i=1

and, since the vectors in S are linearly independent, it follows that i i = 0 for every
i = 1; :::; m. That is, i = i for every i = 1; :::; m.
\If". Let S = x1 ; :::; xm and suppose that each x 2 Rn can be written in a unique way
as a linear combination of vectors in S. Clearly, by Theorem 83 we have Rn = span S. It
3.5. BASES 71

remains to prove that S is a linearly independent set. Suppose that the scalars f i gm
i=1 are
such that
Xm
i
ix = 0
i=1
Since we also have
m
X
0xi = 0
i=1
we conclude that i = 0 for every i = 1; :::; m because, by hypothesis, the vector 0 can be
written in only one way as a linear combination of vectors in S.

Example 90 The standard basis of Rn is given by the versors e1 ; :::; en . Each vector
x 2 Rn can be written, in a unique way, as a linear combination of these vectors. In
particular,
Xn
x = x1 e1 + + xn en = xi ei (3.4)
i=1
That is, the coe cients of the linear combination are the components of the vector x. N

Example 91 The standard basis of R2 is f(1; 0) ; (0; 1)g. But, there exist in nitely many
other bases of R2 : for example, S = f(1; 2) ; (0; 7)g is another basis of R2 . It is easy to prove
the linear independence of S. To show that span S = R2 , consider any vector x = (x1 ; x2 ) 2
R2 . We need to show that there exist 1 ; 2 2 R such that

(x1 ; x2 ) = 1 (1; 2) + 2 (0; 7)

i.e., that solve the simple linear system

1 = x1
2 1+7 2 = x2

Since
x2 2x1
1 = x1 ; 2 =
7
solve the system, we conclude that S is indeed a basis of R2 . N

Each vector of Rn can be represented (\recovered") as a linear combination of the vectors


of a basis of Rn . In a sense, a basis is therefore the \genetic code" for a vector space that
contains all the pieces of information necessary to identify its elements. Since there are
several bases of Rn , such pieces of \genetic" information can be encoded in di erent sets of
vectors. It is therefore important to understand what are the relations among the di erent
bases. They will become clear after the next theorem, whose remarkable implications make
it the deus ex machina of the chapter.

Theorem 92 For each linearly independent set x1 ; :::; xk of Rn with k n, there exist
n
n k vectors xk+1 ; :::; xn such that the overall set xi i=1 is a basis of Rn .

Because of its importance, we give two di erent proofs of the result. They both require
the following lemma.
72 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

Lemma 93 Let b1 ; :::; bn be a basis of Rn . If

x = c1 b1 + + cn bn

with ci 6= 0, then b1 ; :::; bi 1 ; x; bi+1 ; :::; bn is a basis of Rn .

Proof Without loss of generality suppose that c1 6= 0. We prove that x; b2 ; :::; bn is a


basis of Rn . As c1 6= 0, we can write
1 c2 2 cn n
b1 = x b b
c1 c1 c1

Therefore, for each choice of the coe cients f i gni=1 R we have


n n
" n
# n
X X 1 X ci 1
X 1 ci
i i i
ib = ib + 1 x b = x+ i bi
c1 c1 c1 c1
i=1 i=2 i=2 i=2

It follows that
span x; b2 ; :::; bn = span b1 ; b2 ; :::; bn = Rn
It remains to show that the set x; b2 ; :::; bn is linearly independent, so that we can conclude
that it is a basis of Rn . Let f i gni=1 R be coe cients for which
n
X
i
1x + ib =0 (3.5)
i=2

If 1 6= 0, we have
n
X n
X
i i i i
x= b = 0b1 + b
i=2 1 i=2 1

Since x can be written in a unique way as linear combination of the vectors of the basis
n
bi i=1 , one gets that c1 = 0, which contradicts the hypothesis c1 6= 0. This means that
1 = 0 and (3.5) simpli es to
Xn
1 i
0b + ib = 0
i=2

Since b1 ; : : : ; bn is a basis, one obtains 2 = = n =0= 1.

Proof 1 of Theorem 92 We proceed by ( nite) induction.7 Initial step: the theorem holds
for k = 1. Indeed, P consider a singleton fxg,8 with x 6= 0, and the standard basis e1 ; :::; en
of Rn . As x = ni=1 xi ei , there exists at least one index i such that xi 6= 0. By Lemma 93,
e1 ; :::; ei 1 ; x; ei+1 ; :::; en is a basis of Rn .
Induction step: suppose now that the theorem is true for each set of k 1 vectors; we
want to show that it is true for each set of k vectors. Let therefore x1 ; :::; xk be a set of k
linearly independent vectors. The subset x1 ; :::; xk 1 is linearly independent and has k 1
elements. By the induction hypothesis, there exist n (k 1) vectors y~k ; :::; y~n such that
7
See Appendix E for the induction principle.
8
A singleton fxg is linearly independent when x = 0 implies = 0, which amounts to require that x 6= 0.
3.5. BASES 73

n
x1 ; :::; xk ~k ; :::; y~n
1; y is a basis of Rn . Therefore, there exist coe cients f i gi=1 R such
that
k 1
X n
X
xk = i
ix + ~i
iy (3.6)
i=1 i=k

As the vectors x1 ; :::; xk 1 ; xk


are linearly independent, at least one of the coe cients
P
f i gi=k is di erent from zero. Otherwise, xk = ki=11 i xi and so the vector xk would be
n

a linear combination of the vectors x1 ; :::; xk 1 , something that by Corollary 82 cannot


happen. Let, without loss of generality, k 6= 0. By Lemma 93, x1 ; :::; xk ; y~k+1 ; :::; y~n is
then a basis of Rn . This completes the induction.

Proof 2 of Theorem 92 The theorem holds for k = 1 (see the previous proof). So, let
1<k n be the smallest integer for which the property is false. There exists a linearly
independent set x1 ; :::; xk such that there are no n k vectors of Rn that, added to
x1 ; :::; xk , yield a basis of Rn . Given that x1 ; :::; xk 1 is, in turn, linearly independent,
the minimality of k implies that there are xk ; :::; xn such that x1 ; :::; xk 1 ; xk ; :::; xn is a
basis of Rn . But then

xk = c1 x1 + + ck 1x
k 1
+ ck xk + + cn xn

Given that x1 ; :::; xk is linearly independent, one cannot have ck = = cn = 0. So,


cj 6= 0 for some index j 2 fk; :::; ng. By Lemma 93,
n o
x1 ; :::; xk 1 ; xk ; :::; xj 1 ; xk ; xj+1 ; :::; xn

is a basis of Rn , a contradiction.

The next result is a simple, but important, consequence of Theorem 92.

Corollary 94 (i) Each linearly independent set of Rn with n elements is a basis of Rn .

(ii) Each linearly independent set of Rn has at most n elements.

Proof (i) It is enough to set k = n in Theorem 92. (ii) Let S = x1 ; :::; xk be a linearly
independent set in Rn . We want to show that k n. By contradiction, suppose k > n.
Then, x1 ; :::; xn is in turn a linearly independent set and by point (i) is a basis of Rn .
Hence, the vectors xn+1 ; :::; xk are linear combinations of the vectors x1 ; :::; xn , which,
by Corollary 82, contradicts the linear independence of the vectors x1 ; :::; xk . Therefore,
k n, which completes the proof.

Example 95 By point (i), any two linearly independent vectors form a basis of R2 . Going
back to Example 91, it is therefore su cient to verify that the vectors (1; 2) and (0; 7) are
linearly independent to conclude that S = f(1; 2) ; (0; 7)g is a basis of R2 . N

We can nally state the main result of this section.

Theorem 96 All bases of Rn have the same number n of elements.


74 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

Although the \genetic" information of Rn can be codi ed in di erent sets of vectors {


that is, in di erent bases { such sets thus have the same nite number of elements, that is,
the same \length". The number n can, therefore, be seen as the dimension of the space Rn .
Indeed, it is natural to think that, the \greater" a space Rn is, the more elements its bases
have { that is, the greater is the quantity of information that the bases require in order to
represent all the elements of Rn through linear combinations.
Summing up, the number n that emerges from the last theorem indicates the \dimension"
of Rn and, in a sense, justi es its superscript n. This notion of dimension makes rigorous
the intuitive idea that Rn is a larger space than Rm when m < n.

Proof Recall that Rn has a basis of n elements (e.g., the standard basis seen in Example
90). Let S = x1 ; :::; xk be any other basis of Rn . By de nition, S is linearly independent.
Then, by Corollary 94-(ii), it has at most n elements, that is, k n. To prove the result,
let us show that k = n. By contradiction, suppose that k < n. By Theorem 92, there exist
n k vectors xk+1 ; :::; xn such that the set x1 ; :::; xk ; xk+1 ; :::; xn is a basis of Rn . Yet,
since S = x1 ; :::; xk is a basis of Rn , any vector in xk+1 ; :::; xn is a linear combination
of vectors in S. This contradicts the linear independence of x1 ; :::; xk ; xk+1 ; :::; xn . We
conclude that k = n.

3.6 Dimension
The notions introduced in the previous section for Rn extend in a natural way to its vector
subspaces.

De nition 97 Let V be a vector subspace of Rn . A nite subset S of V is a basis of V if


it is a linearly independent set such that span S = V .

In other words, a linearly independent set S is a basis of the vector subspace that it
generates. Through it we can represent { without redundancies { each vector of the subspace
as a linear combination.
The results of the previous section continue to hold and are similarly proved.9 We start
with Theorem 89.

Theorem 98 Let V be a vector subspace of Rn . A nite subset S of V is a basis of V if


and only if each x 2 V can be written in a unique way as linear combination of vectors in S.

Example 99 (i) The horizontal axis M = x 2 R2 : x2 = 0 is a vector subspace of R2


with the singleton e1 M as a basis. (ii) The plane M = x 2 R3 : x3 = 0 is a vector
subspace of R . The set e1 ; e2
3 M is a basis of M . N

Since V is a subset of Rn , it will have at most n linearly independent vectors. In partic-


ular, the following generalization of Theorem 92 holds.

Theorem 100 Let V be a vector subspace of Rn with a basis of m n elements. For


each linearly independent set of vectors x1 ; :::; xk , with k m, there exist m k vectors
m
xk+1 ; :::; xm such that the set xi i=1 is a basis of V .
9
For this reason we leave to the reader the proofs of the results of this section.
3.7. POST SCRIPTUM 75

In turn, this theorem leads to the following extension of Theorem 96.

Theorem 101 All bases of a vector subspace of Rn have the same number of elements.

Although in view of Theorem 96 the result is not surprising, it remains of great elegance
because it shows how, despite their diversity, the bases share a fundamental characteristic
like the cardinality. This motivates the next important de nition, which was implicit in the
discussion that followed Theorem 96.

De nition 102 The dimension of a vector subspace V of Rn is the number of elements of


any basis of V .

By Theorem 101, this number is unique. We denote it by dim V . It is the notion of dimen-
sion that, indeed, makes interesting this (otherwise routine) section, as the next examples
show.

Example 103 In the special case V = Rn we have dim Rn = n, which makes rigorous the
discussion that followed Theorem 96. N

Example 104 (i) The horizontal axis is a vector subspace of dimension one of R2 . (ii)
The plane M = x 2 R3 : x3 = 0 is a vector subspace of dimension two of R3 , that is,
dim M = 2. N

Example 105 If V = f0g, that is, if V is the trivial vector subspace formed only by the
origin 0, we set dim V = 0. Indeed, V does not contain linearly independent vectors (why?)
and, therefore, it has as basis the empty set f;g. N

3.7 Post scriptum


m
A terminological caveat In discussing enumerated lists of vectors xi i=1 we often refer
to them as \sets of vectors". Yet, enumerated lists allow for repetitions: to di erent indexes,
say i 6= j, may correspond the same element, i.e., xi = xj . We thus have a (standard) abuse
of terminology because sets are collections of distinguishable elements, so repetitions are
forbidden { e.g., to the (linearly dependent) set of vectors f(0; 1) ; (0; 1) ; (1; 0)g corresponds
the actual set f(0; 1) ; (1; 0)g. Hopefully, this abuse should not cause any confusion. In math-
ematics, abuses of notation and terminology are not rare in that they often ease exposition
(they are mathematics' poetic licences).

Some high school algebra We solve the system of equations


8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0
76 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)

in Example 67 through a simple high school argument. Consider x4 as a known term and
solve the system in x1 , x2 , and x3 ; clearly, we will get solutions that depend on the value of
the parameter x4 :
8 8
< 2x1 x2 + 2x3 + 2x4 = 0 < 2x1 x2 = 2x3 2x4
x1 x2 2x3 4x4 = 0 ) x1 x2 = 2x3 + 4x4 )
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< 2 (x2 + 2x3 + 4x4 ) x2 = 2x3 2x4 < x2 = 6x3 10x4
x1 + ( 2x3 2x4 2x1 ) = 2x3 + 4x4 ) x1 = 4x3 6x4 )
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< x2 = 6x3 10x4 < x2 = 6x3 10x4
x1 = 4x3 6x4 ) x1 = 4x3 6x4 )
: :
( 4x3 6x4 ) 2 ( 6x3 10x4 ) 2x3 10x4 = 0 x3 = 32 x4
8 2
8
< x2 = 6 3 x4 10x4 < x2 = 6x4
2
x1 = 4 x
3 4 6x 4 ) x1 = 10 3 x4
: 2 : 2
x3 = 3 x4 x3 = 3 x4

In conclusion, the vectors of R4 of the form (3.2) are the solutions of the system for every
t 2 R.
Chapter 4

Euclidean structure (sdoganato)

4.1 Absolute value and norm


4.1.1 Inner product
The operations of addition and scalar multiplication and their properties determine the linear
structure of Rn . The operation of inner product and its properties characterize, instead, the
Euclidean structure of Rn that will be the subject matter of this chapter.
Recall from Section 2.2 that the inner product x y of two vectors in Rn is de ned by
n
X
x y = x1 y1 + x2 y2 + + xn yn = xi yi
i=1

and is commutative, x y = y x, and distributive, ( x + y) z = (x z) + (y z). Note,


moreover, that
Xn
x x= x2i 0
i=1

The sum of the squares of the components of a vector is thus the inner product of the vector
with itself. This simple observation will be central in this chapter because it will allow us
to de ne the fundamental notion of norm using the inner product. In this regard, note that
x x = 0 if and only if x = 0: a sum of squares is zero if and only if all addends are zero.
Before studying the norm we introduce the absolute value, its scalar version which is
probably already familiar to the reader.

4.1.2 Absolute value


The absolute value jxj of a scalar x 2 R is
(
x if x 0
jxj =
x if x < 0

For example, j5j = j 5j = 5. These equalities exemplify the, readily checked, symmetry of
the absolute value:
jxj = j xj 8x 2 R (4.1)

77
78 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)

Geometrically, the absolute value represents the distance of a scalar from the origin. It
satis es the following elementary properties that the reader can verify:

(i) jxj 0 for every x 2 R;


(ii) jxj = 0 if and only if x = 0;
(iii) jxyj = jxj jyj for every x; y 2 R;
(iv) jx + yj jxj + jyj for every x; y 2 R.

Property (iii) includes as a special case the symmetry property (4.1) since

j xj = j( 1) xj = j 1j jxj = jxj

Property (iv) is called triangle inequality. It is equivalent to the inequality

jx yj jxj + jyj 8x; y 2 R (4.2)

because jx yj = jx + ( y)j jxj + j yj = jxj + jyj and, vice versa, jx + yj = jx ( y)j


jxj + j yj = jxj + jyj.
p
Earlier in the book we agreed
p to consider only the positive root x of a positive scalar
x (Section 1.5). For example, 25 = 5. Formally, this amounts to
p
x2 = jxj 8x 2 R (4.3)

as it is easy to check. This formula is a useful algebraic characterization, via the square root,
of the absolute value. For instance, it delivers an alternative, algebraic, proof of the triangle
inequality. Indeed, for every x; y 2 R we have
q p p p p 2
jx + yj jxj + jyj () (x + y)2 x2 + y 2 () (x + y)2 x2 + y 2
p p
() x2 + y 2 + 2xy x2 + y 2 + 2 x2 y 2 () xy x2 y 2
() xy jxyj

Since xy jxyj trivially holds, we conclude that jx + yj jxj + jyj, thus proving the triangle
inequality.

An easily checked order characterization of the absolute value is given by

max fx; xg = jxj 8x 2 R (4.4)

For instance, max f5; 5g = j 5j = j5j = 5. A nice dividend of this characterization is the
following important equivalence

jxj < c () c<x<c 8c > 0 (4.5)

Indeed, max fx; xg < c if and only if x < c and x < c, i.e., if and only if x < c and
x > c. This equivalence also shows how absolute values may permit to write inequalities
more compactly, something that often will come in handy.

We close with a further nice and important inequality involving absolute values.
4.1. ABSOLUTE VALUE AND NORM 79

Lemma 106 Let x; y 2 R. Then:

jjxj jyjj jx yj (4.6)

Proof For each x; y 2 R it holds

jxj = jx y + yj jx yj + jyj

i.e., jxj jyj jx yj. By interchanging the roles of x and y, it also holds jyj jxj jy xj =
jx yj. We conclude that

max fjxj jyj ; jxj + jyjg = max fjxj jyj ; jyj jxjg jx yj

In view of the order characterization (4.4), the inequality (4.6) then holds.

Inequality (4.6) is easily seen to be equivalent to jjxj jyjj jx + yj for all x; y 2 R. So,
we can combine it with the triangle inequality in a single expression by writing

jjxj jyjj jx + yj jxj + jyj

for all x; y 2 R.

4.1.3 Norm
The notion of norm generalizes that of absolute value to Rn . Speci cally, the (Euclidean)
norm of a vector x 2 Rn , denoted by kxk, is given by

1
q
kxk = (x x) 2 = x21 + x22 + + x2n

When n = 1, the norm reduces to the absolute value; indeed, by (4.3) we have
p
kxk = x2 = jxj 8x 2 R
q p
For example, if x = 4 we have kxk = ( 4)2 = 16 = 4 = j 4j = jxj.

The norm thus extends to Rn the absolute value by levering on its square root, algebraic,
characterization (4.3).1 Geometrically, the norm of a vector x = (x1 ; x2 ) of the plane is the
length of the segment that joins it with the origin, that is, it is the distance ofpthe vector
from the origin. Indeed, this length is, by Pythagoras' Theorem, exactly kxk = x21 + x22 .

1
Later in the book (Section 20.1) we will study the modulus, an extension of the absolute value to Rn
based on its order characterization (4.4).
80 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)

A similar geometric interpretation holds for n = 3, but is obviously lost when n 4.


p p
Example 107 (i) If x = (1; 1) 2 R2 , then kxk = 12 + ( 1)2 = 2.
q p p
(ii) If x = (a; a2 ) 2 R2 , with a 2 R, then kxk = a2 + (a2 )2 = a2 + a4 = jaj 1 + a2 .
p p
(iii) If x = (a; 2a; a) 2 R3 , then kxk = a2 + (2a)2 + ( a)2 = jaj 6.
q
p p 2 p
(iv) If x = (2; ; 2; 3) 2 R4 , then kxk = 22 + 2 + 2 + 32 = 15 + 2. N

The norm satis es some elementary properties that extend to Rn those of the absolute
value. The next result gathers the simplest ones.

Proposition 108 Let x; y 2 Rn and 2 R. Then:

(i) kxk 0;

(ii) kxk = 0 if and only if x = 0;

(iii) k xk = j j kxk.

Proof We prove
p point (ii), leaving the other points to the reader. If x = 0 = (0; 0; :::; 0),
then kxk = 0 + 0 + + 0 = 0. Vice versa, if kxk = 0 then

0 = kxk2 = x21 + x22 + + x2n (4.7)

Since x2i 0 for each i = 1; 2; :::; n, from (4.7) it follows that x2i = 0 for each i since a sum
of squares is zero if and only if they are all zero.

Property (iii) extends the property jxyj = jxj jyj of the absolute value. The famous
Cauchy-Schwarz inequality is a di erent, more subtle, extension of this property.
4.1. ABSOLUTE VALUE AND NORM 81

Theorem 109 (Cauchy-Schwarz) Let x; y 2 Rn . Then:

jx yj kxk kyk (4.8)

Equality holds if and only if the vectors x and y are collinear.2

Proof Let x; y 2 Rn be any two vectors. If either x = 0 or y = 0, the result is trivially


true. Indeed, in this case we have jx yj = 0 = kxk kyk and, moreover, the two vectors are
trivially collinear, consistently with the fact that in (4.8) we have equality.
So, let us assume that x and y are both di erent from 0. Note that (x + ty) (x + ty) =
kx + tyk2 0 for all t 2 R. Therefore,

0 (x + ty) (x + ty) = x x + 2t(x y) + t2 (y y) = at2 + bt + c

where a = y y, b = 2(x y) and c = x x. From high school algebra we know that at2 +bt+c 0
only if the discriminant = b2 4ac is smaller or equal than 0. Therefore,

0 = b2 4ac = 4(x y)2 4(x x)(y y) = 4((x y)2 kxk2 kyk2 ) (4.9)

Hence
(x y)2 kxk2 kyk2
and, by taking square roots of both sides, we obtain the Cauchy-Schwarz inequality (4.8).
It remains to prove that equality holds if and only if the vectors x and y are collinear.
\Only if". Let us assume that (4.8) holds as equality. Then, by (4.9), it follows that = 0.
Thus, there exists a point t^ where the parabola at2 + bt + c assumes value 0, i.e.,
2
0 = (x + t^y) (x + t^y) = x + t^y

By Proposition 108, this implies that x + t^y = 0, i.e., x = t^y. \If". If x and y are collinear,
then x = t^y for some scalar t^. Then, 0 = 0 0 =(x + t^y) (x + t^y). This implies that the
parabola at2 + bt + c, besides being always positive, assumes value 0 at the point t^, and thus
the discriminant must be zero. By (4.9), we deduce that (4.8) holds as equality.

The Cauchy-Schwarz inequality allows us to prove the triangle inequality for the norm,
thereby completing the extension to the norm of the properties (i)-(iv) of the absolute value.

Corollary 110 Let x; y 2 Rn . Then:

kx + yk kxk + kyk (4.10)

Proof We proceed as in the (algebraic) proof of the triangle inequality for the scalar case.
Compared to it, we only need to resort to the Cauchy-Schwarz inequality for the very last
step that, instead, in the scalar case was trivially true. Note that

kx + yk kxk + kyk () kx + yk2 kxk2 + kyk2 + 2 kxk kyk


() x x + 2(x y) + y y x x + y y + 2 kxk kyk
() x y kxk kyk
2
Recall that two vectors are collinear if they are linearly dependent (Example 75).
82 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)

Since the last inequality follows from the Cauchy-Schwarz inequality, the statement follows
too.

We close by extending to the norm the inequality (4.6).

Proposition 111 Let x; y 2 Rn . Then:

jkxk kykj kx yk (4.11)

Proof The proof is similar, mutatis mutandis, to that of (4.6). For each x; y 2 Rn it holds

kxk = kx y + yk kx yk + kyk

i.e., kxk kyk kx yk. By interchanging the roles of x and y, it also holds kyk kxk
ky xk = kx yk. We conclude that (4.11) holds.

4.2 Orthogonality
4.2.1 Normalized vectors
Vectors with norm p1, called
p unit vectors,
p play a special role in linear algebra. In the next
gure the vectors ( 2=2; 2=2) and ( 2
3=2; 1=2) are two unit vectors in R :

x
2
y

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Note that, for any vector x 6= 0, the vector


x
kxk

is a unit vector: to \normalize" a vector is enough to divide it by its own norm. Indeed, we
have
x 1
= kxk = 1 (4.12)
kxk kxk
where, being kxk a scalar, the rst equality follows from Proposition 108-(iii).
4.2. ORTHOGONALITY 83

The unit vectors

e1 = (1; 0; 0; ::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)

are the versors of Rn introduced in Chapter 3. To see their special status, note that in R2
they are
e1 = (1; 0) and e2 = (0; 1)
and lie on the horizontal and on the vertical axes, respectively. In particular, e1 ; e2
belong to the Cartesian axes of R2 :

0.8

0.6
2
+e
0.4

0.2
1 1
-e +e
0
O
-0.2

-0.4
2
-e
-0.6

-0.8

-1
-1 -0.5 0 0.5 1

In R3 the versors are

e1 = (1; 0; 0) ; e2 = (0; 1; 0) and e3 = (0; 0; 1)

Also in this case, e1 ; e2 ; e3 belong to the Cartesian axes of R3 .

4.2.2 Orthogonality
Through a simple trigonometric analysis, Appendix C.3 shows that two vectors x and y of
the plane can be regarded to be perpendicular when their inner product is zero, i.e., x y = 0.
This suggests the following de nition.

De nition 112 Two vectors x; y 2 Rn are said to be orthogonal (or perpendicular), written
x?y, if x y = 0.

From the commutativity of the inner product it follows that x?y is equivalent to y?x.

Example 113 (i) Two di erent versors are orthogonal.pFor example,


p for e1pand e2 in R3
1 2
we have e e = (1; 0; 0) (0; 1; 0) = 0. (ii) The vectors ( 2=2; 6=2) and ( 3=2; 1=2) are
orthogonal:
p p ! p ! p p
2 6 3 1 6 6
; ; = + =0
2 2 2 2 4 4
N
84 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)

The next result clari es the importance of orthogonality.

Theorem 114 (Pythagoras) Let x; y 2 Rn . If x?y, then kx + yk2 = kxk2 + kyk2 .

Proof We have

kx + yk2 = (x + y) (x + y) = kxk2 + x y + y x + kyk2 = kxk2 + kyk2

as desired.

The basic Pythagoras' Theorem is the case n = 2. Thanks to the notion of orthogonality,
we established a general version for Rn of this celebrated result of Greek mathematics.

Orthogonality extends in a natural way to sets of vectors.

De nition 115 A set of vectors of Rn is said to be orthogonal if its elements are pairwise
orthogonal vectors.

The set e1 ; :::; en of the versors is the most classic example of an orthogonal set. Indeed,
ei ej = 0 for every 1 i = 6 j n.

A remarkable property of orthogonal sets is linear independence, as next we show.3 In


view of Corollary 94-(ii), this implies inter alia that an orthogonal set of vectors has at most
n elements.

Proposition 116 Any orthogonal set that does not contain the zero vector is linearly inde-
pendent.

Proof Let x1 ; :::; xk be an orthogonal set of Rn and f 1 ; :::; k g a set of scalars such that
Pk i
i=1 i x = 0. We have to show that 1 = 2 = = k = 0. We have:
k k k
!
X X X
j j i
0= jx 0 = jx ix
j=1 j=1 i=1
k k
! k k
X X X X 2
j i 2
= jx ix = j xj xj = 2
j xj
j=1 i=1 j=1 j=1

where the penultimate equality uses the hypothesisP that the2 vectors are pairwise orthogonal,
2
i.e., xi xj = 0 for every i 6= j. Hence, 0 = ki=1 2i xi and so 2i xi = 0 for every
i = 1; :::; k. Since none of the vectors xi is zero, we thus have i = 0 for every i = 1; 2; :::; k.
This yields that 1 = 2 = = k = 0, as desired.

An orthogonal set of unit vectors is called orthonormal. The set e1 ; :::; en is, for
example, orthonormal. In general, given an orthogonal set x1 ; :::; xk of vectors of Rn , the
set
x1 xk
; :::;
kx1 k kxk k
3
In reading this result, recall that a set of vectors containing the zero vector is necessarily linearly dependent
(see Example 72).
4.2. ORTHOGONALITY 85

obtained by dividing each element by its norm is orthonormal. Indeed, by (4.12) each vector
xi = xi has norm 1, so is a unit vector, and for every i 6= j we have

xi xj 1
i j
= i xi xj = 0
kx k kx k kx k kxj k

Example 117 Consider the following three orthogonal vectors in R3 :

x1 = (1; 1; 1) ; x2 = ( 2; 1; 1) ; x3 = (0; 1; 1)

Then p p p
x1 = 3; x2 = 6; x3 = 2
By dividing each vector by its norm, we get the orthonormal vectors

x1 1 1 1 x2 2 1 1 x3 1 1
= p ;p ;p ; = p ;p ;p ; = 0; p ;p
kx1 k 3 3 3 kx2 k 6 6 6 kx3 k 2 2
In particular, these three vectors form an orthonormal basis. N

The orthonormal bases of Rn , in primis the standard basis e1 ; :::; en , are the most
important bases of Rn because for them it is easy to determine the coe cients of the linear
combinations that represent the vectors of Rn , as the next result shows.

Proposition 118 Let fx1 ; :::; xn g be an orthonormal basis of Rn . For every y 2 Rn , we


have
Xn
y= (y xi )xi (4.13)
i=1

The coe cients y xi are called Fourier coe cients of y (with respect to the given
orthonormal basis).

Proof Since fx1 ; :::; xn g is a basis, there exist n scalars 1; 2 ; :::; n such that
n
X
i
y= ix
i=1

For j = 1; 2; :::; n the scalar product y xj is


n
X
y xj = i (x
i
xj )
i=1

Since fx1 ; :::; xn g is orthonormal, we have


(
i j
0 if i 6= j
x x =
1 if i = j

Hence y xj = j, from which the statement follows.


86 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)

With respect to the standard basis e1 ; :::; en , each vector y = (y1 ; :::; yn ) 2 Rn has the
Fourier coe cients y ei = yi . In this case, (4.13) thus reduces to (3.4), i.e., to
n
X
y= yi ei
i=1

This way of writing vectors, which plays a key role in many results, is a special case of
the general expression (4.13). In other words, the components of a vector y are its Fourier
coe cients with respect to the standard basis.

For a change, the next example considers an orthonormal basis di erent from the standard
basis.

Example 119 Take the orthonormal basis of R3 of Example 117, i.e.,


1 1 1 2 1 1 1 1
x1 = p ;p ;p ; x2 = p ;p ;p ; x3 = 0; p ;p
3 3 3 6 6 6 2 2
Consider, for example, the vector y = (2; 3; 4). Since
9 3 1
x1 y = p ; x2 y = p ; x3 y = p
3 6 2
we have

y = x1 y x1 + x2 y x2 + x3 y x3
9 1 1 1 3 2 1 1 1 1 1
=p p ;p ;p +p p ;p ;p +p 0; p ; p
3 3 3 3 6 6 6 6 2 2 2
p p p
Thus, 9= 3; 3= 6; 1= 2 are the Fourier coe cients of y = (2; 3; 4) with respect to this
orthonormal basis of R3 . N

We close by showing that Pythagoras' Theorem extends to orthogonal sets of vectors.

Proposition 120 For an orthogonal set x1 ; :::; xk of vectors of Rn we have

k 2 k
X X 2
i
x = xi
i=1 i=1

Proof We proceed by induction. Initial step: by Pythagoras' Theorem, the result holds for
k = 2. Induction step: assume that it holds for k 1 (induction hypothesis), i.e.,
k 1 2 k 1
X X 2
i
x = xi (4.14)
i=1 i=1
Pk 1 i
We want show that this implies that it holds for k. Observe that, setting y = i=1 x , we
have y?xk . Indeed, !
k 1
X k 1
X
k i k
y x = x x = xi xk = 0
i=1 i=1
4.2. ORTHOGONALITY 87

By Pythagoras' Theorem and (4.14), we have

k 2 k 1 2
X X 2 2
i
x = i
x +x k
= y + xk = kyk2 + xk
i=1 i=1
k 1 2 k 1 k
X 2 X 2 2 X 2
i
= x + xk = xi + xk = xi
i=1 i=1 i=1

as desired.
88 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)
Chapter 5

Topological structure (sdoganato)

In this chapter we introduce the fundamental notion of distance between points of Rn that,
by formalizing the notion of \proximity", endows Rn with a topological structure.

5.1 Distances
The norm, studied in Section 4.1, allows us to de ne a distance in Rn . We start with n = 1,
when the norm is simply the absolute value jxj. Consider two points x and y on the real
line, with x > y:

The distance between the two points is x y, which is the length of the segment that joins
them. On the other hand, if we take any two points x and y on the real line, without knowing
their order (i.e., whether x y or x y), the distance becomes the absolute value

jx yj

of their di erence. Indeed,


(
x y if x y
jx yj =
y x if x < y

and so the absolute value of the di erence represents the distance between the two points,
independently of their order. In symbols, we write

d (x; y) = jx yj 8x; y 2 R

In particular, d (0; x) = jxj and therefore the absolute value of a point x 2 R can be regarded
as its distance from the origin.
Let us now consider n = 2. Take two vectors x = (x1 ; x2 ) and y = (y1 ; y2 ) in the plane:

89
90 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

The distance between x and y is given by the length of the segment that joins them (in
boldface in the gure). By Pythagoras' Theorem, this distance is
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 (5.1)

since it is the hypotenuse of the right triangle whose catheti are the segments that join xi
and yi for i = 1; 2. The following gure illustrates:

The distance (5.1) is nothing but the norm of the vector x y (and also of y x), i.e.,

d (x; y) = kx yk

The distance between two vectors in R2 is, therefore, given by the norm of their di erence.
It is easy to see that, by applying again Pythagoras' Theorem, the distance between two
vectors x and y in R3 is given by
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 + (x3 y3 )2

Therefore, we have again


d (x; y) = kx yk
At this point we can generalize the notion of distance to any dimension n.

De nition 121 The ( Euclidean) distance d (x; y) between two vectors x and y in Rn is the
norm of their di erence, i.e., d (x; y) = kx yk.
5.1. DISTANCES 91

In particular, d(x; 0) = kxk, which is the norm kxk of the vector x 2 Rn , can be regarded
as its distance from the vector 0 (i.e., the length of the segment that joins 0 and x).

The following proposition collects the basic properties of the distance.


Proposition 122 Let x; y 2 Rn . Then:
(i) d (x; y) 0;
(ii) d (x; y) = 0 if and only if x = y;
(iii) d (x; y) = d (y; x);
(iv) d (x; y) d (x; z) + d (z; y) for every z 2 Rn .
Proof By Proposition 108-(i), we trivially have d (x; y) = kx yk 0. By Proposition
108-(ii),
d (x; y) = 0 () kx yk = 0 () x y = 0 () x = y
By Proposition 108-(iii),
d (x; y) = kx yk = k( 1) (y x)k = j( 1)j ky xk = d (y; x)
Finally, x z 2 Rn and observe that
d (x; y) = kx yk = k(x z) + (z y)k kx zk + kz yk = d (x; z) + d (z; y)
where the inequality follows from Corollary 110 (applied to the two vectors x z and z y).

Properties (i)-(iv) are natural for a notion of distance. Property (i) says that a distance
is always a positive quantity, which by (ii) is zero only between vectors that are equal (so,
the distance between distinct vectors is always strictly positive). Property (iii) says that
distance is a symmetric notion: in measuring a distance between two vectors, it does not
matter from which vector we take the measurement. Finally, property (iv) is the so-called
triangle inequality: for example, the distance between cities x and y cannot exceed the sum
of the distances between x and any other city z and between z and y: detours cannot reduce
the distance one needs to cover.
Example 123 (i) If x = 1=3 and y = 1=3, then
1 1 2 2
d (x; y) = = =
3 3 3 3
(ii) if x = a and y = a2 with a 2 R, then d (x; y) = d a; a2 = a a2 = jaj j1 aj;
p p
(iii) if x = (1; 3) and y = (3; 1), then d (x; y) = (1 3)2 + ( 3 ( 1))2 = 2 2;
(iv) if x = (a; b) and y = ( a; b) with a; b 2 R, then
p p p
d (x; y) = (a ( a))2 + (b b)2 = (2a)2 + 0 = 4a2 = 2 jaj

(v) if x = (0; a; 0) and y = (1; 0; a) with a 2 R, then


p p
d (x; y) = (0 1)2 + (a 0)2 + (0 ( a))2 = 1 + 2a2
N
92 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

5.2 Neighborhoods
De nition 124 We call neighborhood of center x0 2 Rn and radius " > 0, denoted by
B" (x0 ), the set
B" (x0 ) = fx 2 Rn : d (x; x0 ) < "g

The neighborhood B" (x0 ) is, therefore, the locus of the points of Rn that lie at distance
strictly smaller than " from x0 .1
In R the neighborhoods are the open intervals (x0 "; x0 + "), i.e.,

B" (x0 ) = (x0 "; x0 + ")

Indeed,

fx 2 R : d(x; x0 ) < "g = fx 2 R : jx x0 j < "g


= fx 2 R : "<x x0 < "g = (x0 "; x0 + ")

where we have used (4.5), i.e., jxj < a () a < x < a. Graphically:

Hence, in R the neighborhoods are open intervals. It is easily seen that in R2 they are open
discs (so, without circumference), in R3 open balls (so, without surface) and so on. Indeed,
the points that lie at a distance strictly less than " from x0 form a open, so \skinless", ball
of center x0 . Graphically, in the plane we have:

2 ε
x
0
1

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Next we give some examples of neighborhoods. To ease notation we write B" (x1 ; ::; xn )
instead of B" ((x1 ; ::; xn )).
1
In the mathematical jargon, they are \" close" to x0 .
5.2. NEIGHBORHOODS 93

Example 125 (i) We have B3 ( 1) = ( 1 3; 1 + 3) = ( 4; 2), as well as


3 3 1 5
B 3 (1) = 1 ;1 + = ;
2 2 2 2 2
(ii) The notations B 1 (0) and B0 (1) are meaningless because we need " > 0.
(iii) We have
q
B3 (0; 0) = B3 (0) = x 2 R2 : d(x; 0) < 3 = x 2 R2 : x21 + x22 < 3

= x 2 R2 : x21 + x22 < 9

(iv) We have
B1 (1; 1; 1) = x 2 R3 : d (x; (1; 1; 1)) < 1
n p o
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
For example, (1=2; 1=2; 1=2) 2 B1 (1; 1; 1). Indeed
2 2 2
1 1 1 3
1 + 1 + 1 = <1
2 2 2 4
Check that, instead, 0 = (0; 0; 0) 2
= B1 (1; 1; 1). N
N.B. Each point x0 of Rn has in nitely many neighborhoods B" (x0 ), one per each value of
the radius " > 0. O

In the real line, sometimes we will use \half neighborhoods" of a point x0 . Speci cally:
De nition 126 Given " > 0, the interval [x0 ; x0 + ") is called the right neighborhood of
x0 2 R of radius ", while the interval (x0 "; x0 ] is called the left neighborhood of x0 of
radius ".
Through them we can give a useful characterization of suprema and in ma of subsets of
the real line (Section 1.4.2).
Proposition 127 Let A R be bounded above. We have a = sup A if and only if
(i) a x for every x 2 A,
(ii) for every " > 0, there exists x 2 A such that x > a ".
Thus, point a 2 R is the supremum of A R if and only if (i) it is an upper bound of A
and (ii) in each left neighborhood of a there are elements of A. A similar characterization
holds for in ma of lower bounded sets by replacing left neighborhoods with right ones.

Proof \Only if". If a = sup A, (i) is obviously satis ed. Let " > 0. Since sup A > a ", the
point a " is not an upper bound of A. Therefore, there exists x 2 A such that x > a ".
\If". Suppose that a 2 R satis es (i) and (ii). By (i), a is an upper bound of A. By (ii),
it is also the least upper bound. Indeed, each b < a can be written as b = a ", by setting
" = a b > 0. Given b < a, by (ii) there exists x 2 A such that x > a " = b. Therefore, b
is not an upper bound of A, which implies that there is no upper bound smaller than a.
94 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

5.3 Taxonomy of the points of Rn with respect to a set


The notion of neighborhood permits to classify the points of Rn in various categories, ac-
cording to their relations with a given set A Rn .

5.3.1 Interior, exterior and boundary points


The rst fundamental notion is that of interior point. Intuitively, a point is interior to a set
if it is \well inside" the set, i.e., if it is surrounded by other points that belong to the set (so,
from an interior point one can always go in any direction by remaining, at least for a while,
in the set).

De nition 128 Let A be a set in Rn . A point x0 2 A is an interior point of A if there


exists " > 0 such that B" (x0 ) A.

In words, x0 is an interior point of A if there exists at least a neighborhood of x0 com-


pletely contained in A. This motivates the adjective \interior". An interior point x of A is,
therefore, contained in A together with an entire neighborhood B" (x), however small. Thus,
we can say that it belongs to A both in a set-theoretic sense, x 2 A, and in a topological
sense, B" (x) A.

In a dual way, a point x0 2 Rn is called exterior to A if it is interior to the complement Ac


of A, i.e., if there exists " > 0 such that B" (x0 ) is contained in Ac (so that B" (x0 ) \ A = ;).
A point that is exterior to a set is thus \well outside" of it.

The set of the interior points of A is called the interior of A and is denoted by int A. By
de nition, int A A. The set of the exterior points of A is then int Ac .

Example 129 Let A = (0; 1). Each point of A is interior, that is, int A = A. Indeed, let
x 2 (0; 1). Consider the smallest distance of x from the two endpoints 0 and 1 of the interval,
i.e., min fd (0; x) ; d (1; x)g. Let " > 0 be such that " < min fd (0; x) ; d (1; x)g. Then

B" (x) = (x "; x + ") (0; 1)

Therefore, x is an interior point of A. Since x was arbitrarily chosen, it follows that int A = A.
It is easy to check that the set of exterior points is int Ac = ( 1; 0) [ (1; 1). N

Example 130 Let A = [0; 1]. We have int A = (0; 1). Indeed, by proceeding as above we
see that the points in (0; 1) are all interior, that is, (0; 1) int A. It remains to check the
endpoints 0 and 1. Consider 0. Its neighborhoods have the form ( "; "), so they contain also
points of Ac . It follows that 0 2
= int A. Similarly, 1 2= int A. We conclude that int A = (0; 1).
The set of the exterior points is Ac , i.e., int Ac = Ac (as the reader can easily verify). N

De nition 131 Let A be a set in Rn . A point x0 2 Rn is a boundary point of A if it is


neither interior nor exterior, i.e., if for every " > 0 both B" (x0 )\A 6= ; and B" (x0 )\Ac 6= ;.
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 95

A point x0 is, therefore, a boundary point for A if all its neighborhoods contain both
points of A (because it is not exterior) and points of Ac (because it is not interior). The set
of the boundary points of a set A is called the boundary or frontier of A and is denoted by
@A. Intuitively, the frontier is the \border" of a set.

The de nition of boundary points is residual: a point is boundary if it is neither interior


nor exterior. This implies that the classi cation into interior, exterior and boundary points
is exhaustive: given a set A, each point x0 of Rn necessarily falls into one of these three
categories. The classi cation is also exclusive: given a set A, each point x0 of Rn is either
interior or exterior or boundary.

Example 132 (i) Let A = (0; 1). Given the residual nature of the de nition of boundary
points, to determine @A we need to nd the interior and exterior points. From Example 129,
we know that int A = (0; 1) and int Ac = ( 1; 0) [ (1; 1). It follows that

@A = f0; 1g

i.e., the boundary of (0; 1) is formed by the two endpoints 0 and 1. Note that A \ @A = ;:
in this example the boundary points do not belong to the set A.
(ii) Let A = [0; 1]. In Example 130 we have seen that int A = (0; 1) and int Ac = Ac .
Therefore, @A = f0; 1g. Here @A A, the set A contains its own boundary points.
(iii) Let A = (0; 1]. The reader can verify that int A = (0; 1) and int Ac = ( 1; 0) [
(1; 1). Hence, @A = f0; 1g. In this example, the frontier is partly outside and partly inside
the set: the boundary point 1 belongs to A, while the boundary point 0 does not. N

In view of this example, the boundary points of a bounded interval are easily seen to be
its endpoints (which may or may not belong to the interval).

Example 133 Consider the closed unit ball

A = (x1 ; x2 ) 2 R2 : x21 + x22 1

All the points such that x21 + x22 < 1 are interior, that is,

int A = (x1 ; x2 ) 2 R2 : x21 + x22 < 1

while all the points such that x21 + x22 > 1 are exterior, that is,

int Ac = (x1 ; x2 ) 2 R2 : x21 + x22 > 1

Therefore, the unit circle is the frontier of A:

@A = (x1 ; x2 ) 2 R2 : x21 + x22 = 1

The closed unit ball thus contains all its own boundary points. N

Example 134 Let A = Q be the set of rational numbers, so that Ac is the set of the
irrational numbers. By Propositions 19 and 42, between any two rational numbers q < q 0
there exists an irrational number a such that q < a < q 0 and between any two irrational
96 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

numbers a < b there exists a rational number q 2 Q such that a < q < b. The reader
can check that this implies int A = int Ac = ;, and so @A = R. This example shows that
the interpretation of the boundary as a \border" can be misleading in some cases. Indeed,
mathematical notions have their own life and we must be ready to follow them also when
our intuition may fall short. N

The next lemma generalizes what we saw in Example 132.

Lemma 135 Let A R be a bounded set. Then sup A 2 @A and inf A 2 @A.

Proof We prove that = sup A 2 @A (the proof for the in mum is similar). Consider any
neighborhood ( "; + ") of . We have ( ; + ") Ac , so ( "; + ") \ Ac 6= ;.
Moreover, by Proposition 127 for every " > 0 there exists x0 2 A such that x0 > ", so
that ( "; ] \ A 6= ;. Thus, ( "; + ") \ A 6= ;. We conclude that, for every " > 0, we
have both ( "; + ") \ A 6= ; and ( "; + ") \ Ac 6= ;, that is, 2 @A.

Next we identify an important class of boundary points.

De nition 136 Let A be a set in Rn . A point x0 2 A is isolated if there exists a neigh-


borhood B" (x0 ) of x0 that does not contain other points of A except for x0 itself, i.e.,
A \ B" (x0 ) = fx0 g.

As the terminology suggests, isolated points are \separated" from the rest of the set.

Example 137 Let A = [0; 1] [ f2g. It consists of the closed unit interval and, in addition,
of the point 2. This point is isolated. Indeed, if B" (2) is a neighborhood of 2 with " < 1,
then A \ B" (2) = f2g. N

As anticipated, we have:

Lemma 138 Isolated points are boundary points.

Proof We begin with a simple observation. Let 1 = (1; :::; 1) be the vector with components
equal to 1. Consider x0 2 Rn . Note that x0 + t1 6= x0 as long as t 6= 0. Moreover,
p
d (x0 + t1;x0 ) = jtj k1k = t n
p
for all t 2 R. This implies that, if " > 0 and t 2 (0; "= n), then x0 + t1 2 B" (x0 ).
That said, let now x0 be an isolated point of A. We want to show that, for each " > 0,
B" (x0 ) \ A 6= ; and B" (x0 ) \ Ac 6= ;. Since x0 is isolated, there exists " > 0 such that
A \ B" (x0 ) = fx0 g. Thus, x0 2 A. Fix any " > 0. Trivially x0 2 B" (x0 ), so B" (x0 ) \ A 6= ;.
p
At the same time, if t = min f"; "g =2 n, then x0 + t1 2 B" (x0 ) as well as x0 + t1 2 B" (x0 )
and x0 + tminf";"g 1 2
= A (otherwise, x0 6= x0 + t1 2 A \ B" (x0 ), a contradiction with x0 being
isolated). This proves that B" (x0 ) \ Ac 6= ;, concluding the proof since " > 0 was arbitrarily
chosen.
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 97

5.3.2 Limit points


De nition 139 Let A be a set in Rn . A point x0 2 Rn is called a limit (or accumulation)
point for A if each neighborhood B" (x0 ) of x0 contains at least one point of A distinct from
x0 .

Hence, x0 is a limit point of A if, for every " > 0, there exists some x 2 A such that
0 < kx0 xk < ".2 The set of limit points of A is denoted by A0 and is called the derived
set of A. Note that limit points are not required to belong to the set.

Clearly, limit points are never exterior. Moreover:

Lemma 140 Let A be a set in Rn .

(i) Each interior point of A is a limit point, that is, int A A0 .

(ii) A boundary point of A is a limit point if and only if it is not isolated.

Proof (i) If x0 2 int A, there exists a neighborhood B"0 (x0 ) of x0 such that B"0 (x0 ) A.
Let B" (x0 ) be any neighborhood of x0 . The intersection

B"0 (x0 ) \ B" (x0 ) = Bminf"0 ;"g (x0 )

is, in turn, a neighborhood of x0 of radius min f"0 ; "g > 0. Hence Bminf"0 ;"g (x0 ) A and,
to complete the proof, it is su cient to consider any x 2 Bminf"0 ;"g (x0 ) such that x 6= x0 .
Indeed, x belongs also to the neighborhood B" (x0 ) and it is distinct from x0 .

(ii) \If". Consider a boundary point x0 which is not an isolated point. By the de nition
of boundary points, for every " > 0 we have B" (x0 ) \ A 6= ;. Since x0 is not isolated,
for every " > 0 we have B" (x0 ) \ A 6= fx0 g. This implies that for every " > 0 we have
(B" (x0 ) fx0 g) \ A 6= ;, i.e., that x0 is a limit point of A.
\Only if". Take a point x0 that is both a boundary point and a limit point, i.e., x0 2
@A \ A0 . Each neighborhood B" (x0 ) contains at least a point x 2 A distinct from x0 , that
is, B" (x0 ) \ A 6= fx0 g. It follows that x0 is not isolated.

In view of this result, we can say that the set A0 of the limit points consists of the interior
points of A as well as of the boundary points of A that are not isolated. Therefore, a point
of a set A is either a limit or an isolated point, tertium non datur.

Example 141 (i) The points of the interval A = [0; 1) are all limit points since A0 = [0; 1]
A. Note that the limit point 1 does not belong to A. (ii) The points of the closed unit ball
A = (x1 ; x2 ) 2 R2 : x21 + x22 1 are all limit points since A0 = A. (iii) For the set A = Q it
holds A0 = R. In words, the real numbers are the limits points of the set of rational numbers.
Indeed, let x 2 R. For each " > 0, there exists q" 2 Q such that q" 2 B" (x) = (x "; x + ")
because between any two real numbers, here x " and x + ", there exists a rational number,
here q" (cf. Example 134). N
2
The inequality 0 < kx0 xk is equivalent to the condition x 6= x0 , so it is a way to require that x is a
point of A distinct from x0 .
98 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

Example 142 The set A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1 is a straight line in the plane. We


have int A = ; and @A = A0 = A. Hence, the set A has no interior points (as the next gure
shows, if one draws a disc around a point of A, however small it can be, there is no way to
include it all in A), while all its points are both limit and boundary points.

4
x
2
3

2 2

0
-1 O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

N
The de nition of limit point requires that its neighborhoods contain at least one point
of A other than itself. As next we show, they actually contain in nitely many of them.
Proposition 143 Each neighborhood of a limit point of A contains in nitely many points
of A.
Proof Let x be a limit point of A. Suppose, by contradiction, that there exists a neigh-
borhood B" (x) of x containing a nite number of points fx1 ; :::; xn g of A distinct from x.
Since the set fx1 ; :::; xn g is nite, the minimum distance mini=1;:::;n d (x; xi ) exists and is
strictly positive, i.e., mini=1;:::;n d (x; xi ) > 0. Let > 0 be such that < mini=1;:::;n d (x; xi ).
Clearly, 0 < < " since mini=1;:::;n d (x; xi ) < ". Hence, B (x) B" (x). It is also clear, by
construction, that xi 2 = B (x) for each i = 1; 2; :::; n. So, if x 2 A we have B (x) \ A = fxg.
Instead, if x 2= A we have B (x) \ A = ;. Regardless of whether x belongs to A or not, we
thus have B (x) \ A fxg. Therefore, the unique point of A that B (x) may contain is x
itself. But, this contradicts the hypothesis that x is a limit point of A.
O.R. The concept of interior point of a set A requires the existence of a neighborhood of the
point that is entirely formed by points of A. This means that it is possible to move away, at
least a bit, from the point by remaining inside A { i.e., it is possible go for a \little walk"
in any direction without showing the passport. Retracing one's steps, it is then possible to
approach the point from any direction by remaining inside A.
The concept of limit point of a set A does not require the point to belong to A but
requires, instead, that we can get as close as we want to the point by \jumping" on points
of the set (by jumping on river stones, we can get as close as we want to our target through
stones that all belong to the set). This idea of approaching a point by remaining within a
given set will be crucial to de ne limits of functions. H
5.4. OPEN AND CLOSED SETS 99

5.4 Open and closed sets


We introduce now the fundamental notions of open and closed sets. We begin with open
sets.

De nition 144 A set A in Rn is called open if all its points are interior, that is, if int A =
A.

Thus, a set is open if it does not contains its borders (so it is skinless).

Example 145 The open interval (a; b) is open (hence the name). Indeed, let x 2 (a; b). Let
" > 0 be such that
" < min fd (x; a) ; d (x; b)g
We have B" (x) (a; b), so x is an interior point of (a; b). Since x was arbitrarily chosen, it
follows that (a; b) is open. N

Example 146 The set x 2 R2 : 0 < x21 + x22 < 1 is open. Graphically, it is the ball de-
prived of both the skin and the origin:

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

The neighborhoods in R are all of the type (a; b) and so they are all open. The next
result shows that, more generally, all neighborhoods are open in Rn .

Lemma 147 Neighborhoods are open sets.

Proof Let B" (x0 ) be a neighborhood of a point x0 2 Rn . To show that B" (x0 ) is open, we
have to show that all its points are interior. Let x 2 B" (x0 ). To prove that x is interior to
B" (x0 ), let
0 < "0 < " d (x; x0 ) (5.2)
100 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

Then B"0 (x) B" (x0 ). Indeed, let y 2 B"0 (x). Then

d(y; x0 ) d(y; x) + d(x; x0 ) < "0 + d (x; x0 ) < "

where the last inequality follows from (5.2). Therefore B"0 (x) B" (x0 ), which completes
the proof.

This proof can be illustrated by the following picture:

De nition 148 The set


A [ @A
formed by the points of A and by its boundary points is called the closure of A, denoted by
A.

Clearly, A A. The closure of A is, thus, an \enlargement" of A that includes all its
boundary points, that is, the borders. Naturally, the notion of closure becomes relevant
when the borders are not already part of A.

Example 149 (i) If A = [0; 1) R, then A = [0; 1]. (ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1
is the closed unit ball, then A = A. N

Example 150 Given a neighborhood B" (x0 ) of a point x0 2 Rn , we have

B" (x0 ) = fx 2 Rn : d (x; x0 ) "g (5.3)

The closure of a neighborhood features \ "" instead of \< "". N

We can now introduce closed sets.

De nition 151 A set A in Rn is called closed if it contains all its boundary points, that is,
if A = A.

Hence, a set is closed when it includes its border (so it has a skin).

Example 152 (i) The set A = [0; 1) is not closed since A 6= A, while the closed unit
ball A = (x1 ; x2 ) 2 R2 : x21 + x22 1 is closed since A = A. (ii) The closed interval
[a; b] is closed (hence the name). The unbounded intervals (a; 1) and ( 1; a) are open.
The unbounded intervals [a; 1) and ( 1; a] are closed. (iii) The circumference A =
(x1 ; x2 ) 2 R2 : x1 + x2 = 1 is closed because A = @A = A0 = A. N
5.4. OPEN AND CLOSED SETS 101

Open and closed sets are dual notions, as the next result shows.3

Theorem 153 A set A in Rn is open if and only if its complement is closed.

Proof \Only if". Let A be open. We show that Ac is closed. Let x be a boundary point
of Ac , that is, x 2 @Ac . By de nition, x is not an interior point of either A or Ac . Hence,
x2 = int A. But, A = int A because A is open. Therefore x 2 = A, that is, x 2 Ac . It follows
that @A c c c
A since x 2 @A . Therefore, Ac = A , which proves that Ac is closed.
c

\If". Let Ac be closed. We show that A is open. Let x be a point of A. Since x 2


= Ac = Ac ,
c
the point x is not a boundary point of A . It is, therefore, an interior point of either A or
Ac . But, since x 2
= Ac implies x 2
= int Ac , we have x 2 int A. Since x was arbitrarily chosen,
we conclude that A is open.

Example 154 The nite sets of Rn (so, the singletons) are closed. To verify it, let A =
fx1 ; x2 ; :::; xn g be a generic nite set. Its complement Ac is open. Indeed, let x 2 Ac . If
" > 0 is such that
" < d (x; xi ) 8i = 1; :::; n
then B" (x) Ac . So, x is an interior point of Ac . Since x was arbitrarily chosen, it follows
that Ac is open. As the reader can check, we also have int A = ; and @A = A. N

Example 155 The gure

4
x
2
3

0 -1 2
O x
1
-1
-1

-2
-3 -2 -1 0 1 2 3 4 5

represents the closed set

f(2; 1)g [ f(x1 ; x2 ) 2 R2 : x2 = x21 g [ f(x1 ; x2 ) 2 R2 : (x1 + 1)2 + (x2 + 1)2 1=4g

of R2 . N
3
Often, a set is de ned to be closed when its complement is open. It is then proved as a theorem that a
closed set contains its boundary. In other words, the de nition and the theorem are switched relative to the
approach that we have chosen.
102 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

Open and closed sets are, therefore, two sides of the same coin: a set is closed (open) if
and only if its complement is open (closed). Naturally, there are many sets that are neither
open nor closed. Next we can give a simple example of such a set.

Example 156 The set A = [0; 1) is neither open nor closed. Indeed, int A = (0; 1) 6= A and
A = [0; 1] 6= A. N

There is a case in which the duality of open and closed sets takes a curious form.

Example 157 The empty set ; and the whole Rn are simultaneously open and closed. By
Theorem 153, it is su cient to show that Rn is both open and closed. But, this is obvious.
Indeed, Rn is open because, trivially, all its points are interior (all neighborhoods are included
in Rn ) as well as closed because it trivially coincides with its own closure. It is possible to
show that ; and Rn are the unique sets with such double personality. N

Let us go back to the notion of closure A. The next result shows that it can be equivalently
seen as the addition to the set A of its limit points A0 . In other terms, adding the borders
turns out to be equivalent to adding the limit points.

Proposition 158 We have A = A [ A0 .

Proof We need to prove that A [ A0 = A [ @A. We rst prove that A [ A0 A [ @A. Since
A A [ @A, we have to prove that A0 A [ @A. Let x 2 A0 . In view of what we observed
after the proof of Lemma 140, x is either an interior or a boundary point, so x 2 A [ @A.
We conclude that A [ A0 A [ @A.
It remains to show that A [ @A A [ A0 . Since A A [ A0 , we have to prove that
@A A [ A0 . Let x 2 @A. If x is an isolated point, then by de nition x 2 A. Otherwise,
by Lemma 140 x is a limit point for A, that is, x 2 A0 . Hence, x 2 A [ A0 . This proves
A [ @A A [ A0 , and so the result.

A corollary of this result is that a set is closed when it contains all its limit points. This
sheds further light on the nature of closed sets.

Corollary 159 A set A in Rn is closed if and only if it contains all its limit points.

Proof Let A be closed. By de nition, A = A and hence, by Proposition 158, A [ A0 = A,


that is, A0 A. Vice versa, if A0 A, then obviously A [ A0 = A. By Proposition 158,
0
A = A [ A = A.

Example 160 The inclusion A0 A in this corollary can be strict, in which case the set
A A0 consists of the isolated points of A. For example, let A = [0; 1] [ f 1; 4g. Then A
is closed and A0 = [0; 1]. Hence, A0 is strictly included in A and the set A A0 = f 1; 4g
consists of the isolated points of A. N

As already remarked, we have


int A A A (5.4)
The next result shows the importance of these inclusions. In so doing, it shows that closure
A of a set A is, indeed, closed and that its interior int A is open.
5.4. OPEN AND CLOSED SETS 103

Proposition 161 Given a set A in Rn , we have that:

(i) int A is the largest open set contained in A;

(ii) A is the smallest closed set that contains A.

The proof relies on a duality, via complements, between the closure and the interior of a
set established in the next lemma.

Lemma 162 Given a set A in Rn , we have Ac = int Ac .

In words, the complement of the closure is the interior of the complement.

Proof We rst prove that Ac = int Ac . Given a set A, recall that a point x can be, with
respect to this set, either interior or boundary or exterior (three mutually exclusive and
exhaustive options). Moreover, a point is exterior if and only if it belongs to int Ac . Since
int Ac Ac , this implies that if x 2
= int Ac then both x 2
= A and x 2 = Ac ,
= @A, that is, x 2
proving that int Ac Ac . Similarly, since int A A, if x 2 = Ac , then x 2
= int A and x 2= @A,
yielding that x 2 c
= int A and proving that A c int A . We conclude that Ac = int Ac .
c

Proof of Proposition 161 (i) We rst show that the set int A is open. If int A is empty,
we are done. Otherwise, we need to show that every point of int A is an interior point of
int A, i.e., for each x 2 int A there exists " > 0 such that B" (x) int A. Since x belongs to
int A, by de nition it is an interior point of A, i.e., there exists " > 0 such that B" (x) A.
Let y 2 B" (x). We show that y is also an interior point of A, that is, y 2 int A. In turn,
this proves that B" (x) int A and so that int A is open. Set = " d (x; y) > 0. If
z 2 B (y), then d (x; z) d (x; y) + d (y; z) < d (x; y) + = ", proving that z 2 B" (x). Thus,
B (y) B" (x) A and so y 2 int A.
We next show that, if G is an open subset of A, then G int A, proving that int A is the
largest open set contained in A. Let x 2 G. Since G is open, there exists an " > 0 such that
B" (x) G. Since G A, we conclude that B" (x) A, that is, x 2 int A and G int A.
(ii) By what has been just proved, int Ac is open. In view of Lemma 162, A is then closed
because is the complement of an open set (Theorem 153). To complete the proof, let F be
a closed superset of A. We want to show that F A, proving that A is the smallest closed
set that contains A. If F A, then F c Ac . By Theorem 153 and by point (i), F c is open
c
and so F c int Ac = Ac , that is, F = (F c )c Ac = A.

The set of interior points int A is, therefore, the largest open set that approximates A
\from inside", while the closure A is the smallest closed set that approximates A \from
outside". The relation (5.4) is, therefore, the best topological sandwich { with lower open
slice and upper closed slice { that we can have for the set A.4

It is now easy to prove an interesting and intuitive property of the boundary of a set.

Corollary 163 The boundary of a set in Rn is a closed set.


4
Clearly, there are also sandwiches with a lower closed slice and an upper open slice, as the reader will see
in more advanced courses.
104 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

Proof Let A be any set in Rn . Since the exterior points to A are interior to its complement,
we have (@A)c = int A [ int Ac . So, @A is closed because int A and int Ac are open and, as
we will momentarily see in Theorem 165, a union of open sets is open.

The next result, whose proof is left to the reader, shows that the di erence between the
closure and the interior of a set is given by its boundary points.

Proposition 164 For each set A in Rn , we have @A = A int A.

This result makes rigorous the intuition that open sets are sets without borders (or
skinless). Indeed, it implies that A is open if and only if @A \ A = ;. On the other hand, by
de nition, a set is closed if and only if @A A, that is, when it includes the borders (it has
a skin).

5.5 Set stability


We saw in Theorem 153 that the set operation of complementation plays a crucial role for
open and closed sets. It is then natural to ask what are the stability properties of these sets
with respect to the other basic set operations of intersection and union.
We start by considering this issue for neighborhoods, the simplest open sets. The inter-
section of two neighborhoods of x0 is still a neighborhood of x0 : indeed B"1 (x0 ) \ B"2 (x0 )
is nothing but the smallest of the two, i.e.,

B"1 (x0 ) \ B"2 (x0 ) = Bminf"1 ;"2 g (x0 )

The same is true for intersections of a nite number of neighborhoods:

B"1 (x0 ) \ \ B"n (x0 ) = Bminf"1 ;:::;"n g (x0 )

It is, however, no longer true for intersections of in nitely many neighborhoods. For example,
in R we have
\1 \1
1 1
B 1 (x0 ) = x0 ; x0 + = fx0 g (5.5)
n n n
n=1 n=1

i.e., this intersection reduces to the singleton fx0 g, which is closed (Example 154). Therefore,
the intersection of in nitely many neighborhoods might well not be open.
T
To check (5.5), note that a point belongs to the intersection 1 n=1 BT
1=n (x0 ) if and only if
it belongs to each neighborhood B1=n (x0 ). This is true for x0 , so x0 2 1 n=1 B1=n (x0 ). This
is, however, the unique pointTthat satis es this property. Indeed, suppose by contradiction
that y 6= x0 is such that y 2 1 n=1 B1=n (x0 ). Since y 6= x0 , we have d (x0 ; y) > 0. If we take
n su ciently large, in particular if
1
n>
d (x0 ; y)
then its reciprocal 1=n will be su ciently small so to have

1
0< < d (x0 ; y)
n
5.5. SET STABILITY 105

T
Therefore, y 2= B1=n (x0 ), which contradicts the assumption
T1 that y 2 1 n=1 B1=n (x0 ). It
follows that x0 is the only point in the intersection n=1 B1=n (x0 ), i.e., (5.5) holds.

A union of neighborhoods of x0 is, instead, always a neighborhood of x0 , even if the


union is in nite. The union of two neighborhoods is nothing but the largest of the two:

B"1 (x0 ) [ B"2 (x0 ) = Bmaxf"1 ;"2 g (x0 )

More generally, in the case of in nitely many neighborhoods B"i (x0 ), if supi "i < +1 we
set " = supi "i , so that
1
[
B"i (x0 ) = B" (x0 )
i=1
For example, in R we have
1
[ 1
[ 1 1
B 1 (x0 ) = x0 ; x0 + = B1 (x0 )
n n n
n=1 n=1

When, instead, supi "i = +1, we have


1
[
B"i (x0 ) = Rn
i=1

For example, in R we have


1
[ 1
[
Bn (x0 ) = (x0 n; x0 + n) = R
n=1 n=1

In any case, we always get an open set.

Summing up, nite intersections of neighborhoods are open sets and so are their arbitrary
unions. The next result shows that these properties of stability continue to hold for all open
sets.

Theorem 165 The intersection of a nite family of open sets is open, while the union of
any family ( nite or not) of open sets is open.
T
Proof Let A = ni=1 Ai with each Ai open. Each point x 2 A belongs to all sets Ai and is
interior to all of them (because theyT are open), i.e., there exist neighborhoods B"i (x) of x
such that B"i (x) Ai . Put B = ni=1 B"i (x). The set B is still a neighborhood of x { with
radius " = min f"1 ; :::; "n g { and B Ai for each i. So, B is a neighborhood of x contained
in A. Therefore,
S A is open.
Let A = i2I Ai , where i runs over a nite or in nite index set I. Each x 2 A belongs to
at least one of the sets Ai , say to A{ . Since all sets Ai are open, there exists a neighborhood
of x contained in A{ , and so in A. Therefore, x is interior to A and, given the arbitrariness
of x, A is open.

By Theorem 153 and by the De Morgan laws, it is easy to prove that dual properties
hold for closed sets.
106 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

Corollary 166 The union of a nite family of closed sets is closed, while the intersection
of any family ( nite or not) of closed sets is closed.

In general, in nite unions of closed sets are not closed: for example, for the closed sets
[1
An = [ 1 + 1=n; 1 1=n] we have An = ( 1; 1).
n=1

5.6 Compact sets


This section is short, yet important. We rst introduce bounded sets. On the real line, they
have already been introduced: a set A in R is bounded when it is bounded both below and
above (De nition 32). As the reader can easily verify, this is equivalent to the existence of
a scalar K > 0 such that K < x < K for every x 2 A, that is,

jxj < K 8x 2 A

The next de nition is the natural extension of this idea to Rn , where the absolute value is
replaced by the more general notion of norm.

De nition 167 A set A in Rn is bounded if there exists K > 0 such that

kxk < K 8x 2 A

By recalling that kxk is the distance of x from the origin d(x; 0), it is easily seen that a
set A is bounded if, for every x 2 A, we have d(x; 0) < K, i.e., all its points have distance
from the origin smaller than K.5 So, a set A is bounded if is contained in a neighborhood
BK (0) of the origin, geometrically if it can be inscribed in a large enough open ball.
The neighborhoods of the origin can thus be seen as the prototypical bounded sets, used
as benchmarks to test, via set inclusion, the boundedness of any set. This brings to mind a
de nition at the beginning of Spinoza's Ethica: \A thing is said to be nite in its kind if it
can be limited by another thing of the same nature."6

Example 168 (i) Neighborhoods and their closures (5.3) are bounded sets: it is su cient
to take K > ". In contrast, (a; 1) is a simple example of an unbounded set (for this reason,
it is called unbounded open interval). (ii) Subsets of bounded sets are, in turn, easily seen
to be bounded. N

A set is bounded when its elements are componentwise bounded.


5
Throughout this discussion of boundedness, we can replace \< K" with \ K" without any consequence
(why?).
6
For the curious reader, in his de nition 2 Spinoza writes (trans. Silverthorne and Kisner) that \A thing
is said to be nite in its kind if it can be limited by another thing of the same nature. For example, a body is
said to be nite because we always conceive bodies that are greater. Similarly a thought is limited by another
thought. But a body is not limited by a thought nor a thought by a body." Remarkably, Spinoza's book is
written more geometrico through de nitions, axioms and theorems, a rare choice in the history of philosophy
(see Solere, 2003).
5.6. COMPACT SETS 107

Proposition 169 A set A is bounded if and only if there exists K > 0 such that, for every
x = (x1 ; :::; xn ) 2 A, we have
jxi j < K 8i = 1; :::; n

Proof We prove the \if" and leave the converse to the P reader. Let x 2 A. If jxi j < K for
1; :::; n, then x2i < K for all i = 1; :::; n. So, ni=1 x2i < nK. In turn, this implies
all i = q
Pn p p
kxk = 2
i=1 xi < nK. Since x was arbitrarily chosen in A, by setting K 0 = nK it
follows that kxk < K 0 for each x 2 A, so A is bounded.

Using boundedness, we can de ne an all-important class of closed sets.

De nition 170 A set A in Rn is called compact if it is both closed and bounded.

For example, all intervals [a; b] that are closed and bounded in R are compact.7 More
generally, the closure B" (x0 ) of a neighborhood in Rn is compact. For example,

B1 (0) = (x1 ; :::; xn ) 2 Rn : x21 + + x2n 1 = fx 2 Rn : kxk 1g

is a classic compact set in Rn , called closed unit ball. It generalizes to Rn the notion of closed
ball unit ball that in Section 2.1 we presented in R2 (if the inequality is strict we have the
open unit ball, which instead is an open set).
Like closedness, compactness is stable under nite unions and arbitrary intersections, as
the reader can check.8

Example 171 Finite sets { so, the singletons { are compact. Indeed, in Example 154 we
showed that they are closed sets. Since they are obviously bounded, they are then compact.
N

Example 172 Provided there are no free goods, budget sets are a fundamental example of
compact sets in consumer theory, as Proposition 992 will show. N

Finally, compactness can be inherited.

Proposition 173 A closed subset of a compact set is compact.

Proof It is enough to observe that a subset of a bounded set is bounded.

For instance, the boundary @A of a compact set A is a closed subset of A (cf. Corollary
163) and so is a compact set. The boundary of the closed unit ball

@B1 (0) = fx 2 Rn : kxk = 1g

is another classic compact set of Rn , called unit sphere. It generalizes to Rn the unit circle
in R2 .
7
The empty set ; is considered a compact set.
8
Since the empty set is compact, the intersection of two disjoint compact sets is the empty (so, compact)
set.
108 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)

5.7 Closure and convergence


In this nal section we characterize closed sets by means of sequences.9

Theorem 174 A set C in Rn is closed if and only if it contains the limit of every convergent
sequence of its points. That is, C is closed if and only if

fxn g C; xn ! x =) x 2 C (5.6)

Proof \Only if". Let C be closed. Let fxn g C be a sequence such that xn ! x. We want
to show that x 2 C. Suppose, by contradiction, that x 2 = C. Since xn ! x, for every " > 0
there exists n" 1 such that xn 2 B" (x) for every n n" . Therefore, x is a limit point for
C, which contradicts x 2= C because C is closed and so contains all its limit points.
\If". Let C be a set for which property (5.6) holds. By contradiction, suppose C is not
closed. Then, there exists at least a boundary point x of C that does not belong to C. Since
it cannot be an isolated point (otherwise it would belong to C), by Lemma 140 x is a limit
point for C. Each neighborhood B1=n (x) does contain a point of C, call it xn . The sequence
of such points xn converges to x 2 = C, contradicting (5.6). Hence, C is closed.

This characterization is important: a set is closed if and only if \it is closed with respect
to the limit operation", that is, if we never leave the set by taking limits of sequences. This
is a main reason why in applications sets are often assumed to be closed: otherwise, one
could get arbitrarily close to a point x without being able to reach it, a \discontinuity" that
applications typically do not feature (it would be like licking the windows of a pastry shop
without being able to reach the, close yet unreachable, pastries).

Example 175 Let us show that the interval [a; b] is closed using Theorem 174. Let fxn g C
be such that xn ! x 2 R. By Theorem 174, to show that C is closed it is su cient to show
that x 2 C. Since a xn b, a simple application of the comparison criterion shows that
a x b, that is, x 2 C. N

Example 176 Consider the rectangle C = [a; b] [c; d] in R2 . Let xk C be such


k 2
that x ! x 2 R . By Theorem 174, to show that C is closed it is su cient to show that
x = (x1 ; x2 ) 2 C. By (8.68), xk ! x implies xk1 ! x1 and xk2 ! x2 . Since xk1 2 [a; b] and
xk2 2 [c; d] for every k, again a simple application of the comparison criterion shows that
x1 2 [a; b] and x2 2 [c; d], that is, x 2 C. N

9
This section can be skipped at a rst reading, and be read only after having studied sequences in Chapter
8.
Chapter 6

Functions (sdoganato)

6.1 The concept


Consider a shopkeeper who, at a wholesale market, faces the following table that lists the
unit price of a kilogram of walnuts in correspondence to various quantities of walnuts that
can be purchased from his dealer:

Quantity Price per kg


10 kg 4 euros
20 kg 3:9 euros
30 kg 3:8 euros
40 kg 3:7 euros

In other words, if the shopkeeper buys 10 kg of walnuts he will pay them 4 euros per kg,
if he buys 20 kg he will pay them 3:9 euros per kg, and so on. As it is often the case, the
dealer o ers quantity discounts: the higher the quantity purchased, the lower the unit price.
The table is an example of a supply function that associates to each quantity the
corresponding selling price, where A = f10; 20; 30; 40g is the set of the quantities and
B = f4; 3:9; 3:8; 3:7g is the set of their unit prices. The supply function is a rule that,
to each element of the set A, associates an element of the set B.
In general, we have:

De nition 177 Given any two sets A and B, a function de ned on A and with values in
B, denoted by f : A ! B, is a rule that associates to each element of the set A one, and
only one, element of the set B.

We write
b = f (a)

to indicate that, to the element a 2 A, the function f associates the element b 2 B. Graph-
ically:

109
110 CHAPTER 6. FUNCTIONS (SDOGANATO)

The rule can be completely arbitrary; what matters is that it associates to each element
a of A only one element b of B.1 The arbitrariness of the rule is the key feature of the notion
of function. It is one of the fundamental ideas of mathematics, key for applications, which
has been fully understood not so long ago: the notion of function that we just presented was
introduced in 1829 by Dirichlet after about 150 years of discussions (the rst ideas on the
subject go back at least to Leibniz at the end of the seventeenth century).

Note that it is perfectly legitimate that the same element of B is associated to two (or
more) di erent elements of A, that is,

Legitimate

In contrast, it cannot happen that di erent elements of B are associated to an element of A,

1
We have emphasized in italics the most important words: the rule must hold for each element of A and,
to each of them, it must associate only one element of B.
6.1. THE CONCEPT 111

that is,

Illegitimate

In terms of the supply function in the initial example, di erent quantities of walnuts might
well have the same unit price (e.g., there are no quantity discounts), but the same quantity
cannot have di erent unit prices!

Before considering some examples, we introduce a bit of terminology. The two variables a
and b are called the independent variable and the dependent variable, respectively. Moreover,
the set A is called the domain of the function, while the set B is its codomain.
The codomain is the set in which the function takes on its values, but not necessarily
contains only such values: it might well be larger. In this respect, the next notion is impor-
tant: given a 2 A, the element f (a) 2 B is called the image of a. Given any subset C of the
domain A, the set
f (C) = ff (a) : a 2 Cg B (6.1)
of the images of the points in C is called the image of C. In particular, the set f (A) of
all the images of points of the domain is called image (or range) of the function f , denoted
Im f . Therefore, Im f is the subset of the codomain formed by the elements that are actually
image of some element of the domain:

Im f = f (A) = ff (a) : a 2 Ag B

Note that any set that contains Im f is a possible codomain for the function: if Im f B
and Im f C, then writing both f : A ! B and f : A ! C is ne. The choice of codomain
is, ultimately, a matter of convenience. For example, throughout this book we will often
consider functions that take on real values, that is, f (a) 2 R for each a in the domain of f .
In this case, the most convenient choice for the codomain is the entire real line, so we will
usually write f : A ! R.

Example 178 (i) Let A be the set of all countries in the world and B a set containing some
colors. If the function f : A ! B associates on a geographic map to each country one of
these colors, then Im f is the set of the colors used at least once on the map.
(ii) The rule that associates to each living human being his date of birth is a function
f : A ! B, where A is the set of the human beings and, for example, B is the set of the dates
of the last 150 years (a codomain su ciently large to contain all the possible birthdates). N
112 CHAPTER 6. FUNCTIONS (SDOGANATO)

Let us see an example of a rule that does not de ne a function.

Example 179 Consider the rule that associates to each positive scalar x both the positive
p p
and the negative square roots, that is, f x; xg. For example, it associates to 4 the
elements f 2; 2g. This rule does not describe a function f : [0; 1) ! R because, to each
element of the domain di erent from 0, two di erent elements of the codomain are associated.
N

The main classes of functions that we will consider are:

(i) f : A R ! R, real-valued functions of a real variable, called functions of a single


variable or scalar functions.2

(ii) f : A Rn ! R, real-valued functions of n real variables, called functions of several


variables or vector (or multivariable) functions.

(iii) f : A R ! Rm , vector-valued functions of a real variable, called curves.3

(iv) f : A Rn ! Rm , vector-valued functions of n real variables, called operators.

We present now some classic examples of functions of a single variable.

Example 180 The cubic function f : R ! R de ned by f (x) = x3 is a rule that associates
to each scalar its cube. Since each scalar has a unique cube, this rule de nes a function.
Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

In particular, we have Im f = f (R) = R. N


2
The terminology \scalar function" is convenient, but not standard (and it can have di erent meanings
in di erent books). So, the reader must use it with some care. The same is true for the terminology \vector
function".
3
We will rarely consider functions f : A R ! Rm (we mention them here for the sake of completeness),
so this speci c meaning of the word \curve" will not be relevant for us in the book.
6.1. THE CONCEPT 113

Example 181 The quadratic function f : R ! R de ned by f (x) = x2 is a rule that


associates to each scalar its square. Since each scalar has a unique square, this rule de nes
a function. In particular, Im f = f (R) = [0; 1). Graphically:

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

In this case, to two di erent elements of the domain may correspond the same element of
the codomain: for example, f (1) = f ( 1) = 1. N

The clause \is a rule that" is usually omitted, so we will do from now on.
p
Example 182 The square root function f : [0; 1) ! R de ned by f (x) = x associates
to each positive scalar its (arithmetic) square root. The domain is the positive half-line and
Im f = [0; 1). Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

N
114 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 183 The logarithmic function f : (0; 1) ! R de ned by f (x) = loga x, a > 0
and a 6= 1, associates to each strictly positive scalar its logarithm. Its domain is (0; 1),
while Im f = R. Graphically, for a > 1:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 184 The absolute value function f : R ! R de ned by f (x) = jxj associates to
each scalar its absolute value. This function has domain R, with Im f = [0; 1). Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 185 Let f : R f0g ! R be de ned by f (x) = 1= jxj for every scalar x 6= 0.
6.1. THE CONCEPT 115

Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Here the domain is A = R f0g, the real line without the origin. Moreover, Im f = (0; 1).
N

Functions of several variables f : A Rn ! R play a key role in economics. Let us


provide some examples.

Example 186 (i) The function f : R2 ! R de ned by

f (x1 ; x2 ) = x1 + x2 (6.2)

associates to each vector x = (x1 ; x2 ) 2 R2 the sum of its components.4 For every x 2 R2 ,
such sum is unique, so the rule de nes a function with Im f = f (R2 ) = R.
(ii) The function f : Rn ! R de ned by
n
X
f (x1 ; x2 ; ; xn ) = xi
i=1

generalizes to Rn the function of two variables (6.2). N

Example 187 (i) The function f : R2+ ! R de ned by


p
f (x1 ; x2 ) = x1 x2 (6.3)

associates to each vector x = (x1 ; x2 ) 2 R2+ the square root of the product of the components.
For each x 2 R2+ , this root is unique, so the rule de nes a function with Im f = R+ .
(ii) The function f : Rn+ ! R de ned by
n
Y
f (x1 ; x2 ; ; xn ) = xi i
i=1
4
To be consistent with the notation adopted for vectors, we should write f ((x1 ; x2 )). But, to ease notation
we write f (x1 ; x2 ).
116 CHAPTER 6. FUNCTIONS (SDOGANATO)

P
with the exponents i > 0 such that ni=1 i = 1, generalizes to Rn the function of two
variables (6.3) { which is the special case with n = 2 and 1 = 2 = 1=2. It is widely used
in economics with the name of Cobb-Douglas function. N

In economics the operators f : A Rn ! Rm , too, are important. Next we present a


few examples.

Example 188 (i) De ne f : R2 ! R2 by

f (x1 ; x2 ) = (x1 ; x1 x2 )

For example, if (x1 ; x2 ) = (2; 5), then f (x1 ; x2 ) = (2; 2 5) = (2; 10) 2 R2 .

(ii) De ne f : R3 ! R2 by

f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42

For example, if x = (2; 5; 3), then

f (x1 ; x2 ; x3 ) = 2 22 + 5 3; 2 54 = (10; 623)

O.R. A function f : A ! B is a kind of machine that transforms each element a 2 A in an


element b = f (a) 2 B.

b=f(a)

If we insert in it any element a 2 A, it \spits out" f (a) 2 B. If we insert an element a 2


= A,
the machine will jam and will not produce anything. The image Im f = f (A) B is simply
the \list" of all the elements that can come out from the machine.
In particular, for scalar functions the machine transforms real numbers into real numbers,
for vector functions it transforms vectors of Rn into real numbers, for curves it transforms
real numbers into vectors of Rm , and for operators it transforms vectors of Rn into vectors
of Rm .
6.1. THE CONCEPT 117

The names of the variables are altogether irrelevant: we can indi erently write a = f (b),
or y = f (x), or s = f (t), or = f ( ), etc., or also = f ( ): these names are just
placeholders, what matters is only the sequence of operations (almost always numerical)
that lead from a to b = f (a). Writing b = a2 + 2a + 1 is exactly the same as writing
y = x2 + 2x + 1, or s = t2 + 2t + 1, or = 2 + 2 + 1, or even = 2 + 2 + 1. This
function is identi ed by the operations \square + double + 1" that allow us to move from
the independent variable to the dependent one. H

We close this introductory section by making rigorous the notion of graph of a function,
until now used intuitively. For the quadratic function f (x) = x2 the graph is the parabola

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

that is, the locus of the points x; x2 of the plane, as x varies in the real line { which is
the domain of the function. For example, the points ( 1; 1), (0; 0) and (1; 1) belong to the
parabola.

De nition 189 The graph of a function f : A ! B, denoted by Gr f , is the set

Gr f = f(x; f (x)) : x 2 Ag A B

The graph is, therefore, a subset of the Cartesian product A B. In particular:

(i) When A; B R, the graph is a subset of the plane R2 . Geometrically, it is a curved


line (without thickness) in R2 because, to each x 2 A, there corresponds a unique f (x).
118 CHAPTER 6. FUNCTIONS (SDOGANATO)

Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

(ii) When A R2 and B R, the graph is a subset of the tridimensional space R3 , i.e., a
surface (without thickness). Graphically:

6.2 Applications
6.2.1 Static choices
Let us interpret the vectors in Rn+ as bundles of goods (Section 2.4). It is natural to assume
that the consumer will prefer some bundles to others. For example, it is reasonable to assume
that, if x y (bundle x is \richer" than y), then x is preferred to y. In symbols, we then
6.2. APPLICATIONS 119

write x % y, where the symbol % represents the preference (binary) relation of the consumer
over the bundles.
In general, we assume that the preference % over the available bundles of goods can be
represented by a function u : Rn+ ! R, called utility function, such that

x % y () u (x) u (y) (6.4)

That is, bundle x is preferred to y if and only if it gets a higher \utility". The image, Im u,
represents all the levels of utility that can be attained by the consumer.
Originally, around 1870, the rst marginalists { in particular, Jevons and Walras { in-
terpreted u (x) as the level of physical satisfaction caused by the bundle x. They gave,
therefore, a physiological interpretation of utility functions, which quanti ed the emotions
that consumers felt in owing di erent bundles. It is the so-called cardinalist interpretation
of the utility functions that goes back to Jeremy Bentham and to his \pain and pleasure
calculus".5 The utility functions, besides representing the preference %, are inherently inter-
esting because they quantify an emotional state of the consumer, i.e., the degree of pleasure
determined by the bundles. In addition to the comparison u (x) u (y), it is also meaningful
to compare the di erences
u (x) u (y) u (z) u (w) (6.5)
which indicate that bundle x is more intensively preferred to bundle y than bundle z relative
to bundle w. Moreover, since u (x) measures the degree of pleasure that the consumer gets
by the bundle x, in the cardinalist interpretation it is also legitimate to compare these
measures among di erent consumers, i.e., to make interpersonal comparisons of utility. Such
interpersonal comparisons can be then used, for example, to assess the impact of di erent
economic policies on the welfare of the economic agents. For instance, we can ask whether
a given policy, though making some agents worse o , still increases the overall utility across
agents.

The cardinalist interpretation came into question at the end of the nineteenth century
due to the impossibility of measuring experimentally the physiological aspects that were
assumed to underlie utility functions.6 For this reason, with the works of Vilfredo Pareto at
the beginning of the twentieth century, developed rst by Eugen Slutsky in 1915 and then
by John Hicks in the 1930s,7 the ordinalist interpretation of the utility functions prevailed:
more modestly, it is assumed that they are only a mere numerical representation of the
preference % of the consumer. According to such less demanding interpretation, what matters
is only that the ordering u (x) u (y) represents the preference for bundle x over bundle
y, that is, x % y. Instead, it is no longer of interest to know if it also represents the, more
or less intense, consumers' emotions over the bundles. In other terms, in the ordinalist
approach the fundamental notion is the preference %, while the utility function becomes
just a numerical representation of it. The comparisons of intensity (6.5), as well as the
interpersonal comparisons of utility, no longer have meaning.
5
See his Introduction to the Principles of Morals and Legislation, published in 1789.
6
Around 1901, the famous mathematician Henri Poincare wrote to Leon Walras: \I can say that one
satisfaction is greater than another, since I prefer one to the other, but I cannot say that the rst satisfaction
is two or three times greater than the other." Poincare, with great sensibility, understood a key issue.
7
We refer interested readers to Stigler (1950).
120 CHAPTER 6. FUNCTIONS (SDOGANATO)

At the empirical level, the consumers' preferences % are revealed through their choices
among bundles, which are much simpler to observe than emotions or other mental states.

The ordinalist interpretation became the mainstream one because, besides the superior
empirical content just mentioned, the works of Pareto showed that it is su cient for develop-
ing a powerful consumer theory (cf. Section 22.1.4). So, Occam's razor was a further reason
to abandon the earlier cardinalist interpretation. Nevertheless, economists often use, at an
intuitive level, cardinalist categories because of their introspective plausibility.

Be that as it may, through utility functions we can address the problem of a consumer
who has to choose a bundle within a given set A of Rn+ . The consumer will be guided in such
a choice by his utility function u : A Rn+ ! R; namely, u (x) u (y) indicates that the
consumer prefers the bundle x of goods to the bundle y or that he is indi erent between the
two.
For example,
n
X
u (x) = xi
i=1

is the utility function of a consumer that orders the bundles simply according to the sum
of the quantities of the di erent goods that they contain. The classic Cobb-Douglas utility
function is
Yn
u (x) = xi i
i=1
Pn
with the exponents i > 0 such that i=1 i = 1 (see Example 187). When i = 1=n for
each i, we have
n n
!1
Y 1 Y n

u (x) = (xi ) =
n xi
i=1 i=1

with bundles being ordered according to the n-th root of the product of the quantities of the
di erent goods that they contain.8

We close by considering a producer that has to decide how much output to produce
(Section 2.4). In such a decision the production function f : A Rn+ ! R plays a crucial
role in that it describes how much output f (x) is obtained by starting from a vector x 2 Rn+
of input. For example,
n
!1
Y n

f (x) = xi
i=1

is the Cobb-Douglas production function in which the output is equal to the n-th root of the
product of the input components.
8
Because of its multiplicative form, the bundles with at least one zero component xi have zero utility
according to the Cobb-Douglas utility function. Since it is not that plausible that the presence of a zero
component has such drastic consequences, this utility function is often de ned only on Rn
++ (as we will also
often do).
6.3. GENERAL PROPERTIES 121

6.2.2 Intertemporal choice


Assume that the consumer has, over the possible consumption streams x = (x1 ; x2 ; :::; xT ) of
some good, preferences quanti ed by an intertemporal utility function U : RT+ ! R (Section
2.4). For example, assume that he has a utility function u : R+ ! R, called instantaneous,
for the consumption level xt of each period. In this case a possible form of the intertemporal
utility function is
T
X
T 1 t 1
U (x) = u (x1 ) + u (x2 ) + + u (xT ) = u (xt ) (6.6)
t=1

where 2 (0; 1) is a subjective discount factor that depends on how \patient" the consumer
is. The more patient the consumer is { i.e., the more he is willing to postpone his consumption
of a given quantity of the good { the higher the value of is. In particular, the closer gets
to 1, the closer we approach the form
T
X
U (x) = u (x1 ) + u (x2 ) + + u (xT ) = u (xt )
t=1

in which consumption in each period is evaluated in the same way. In contrast, the closer
gets to 0, the closer U (x) gets to u (x1 ), that is, the consumer becomes extremely impatient
and does not give any importance to future consumptions.

6.3 General properties


6.3.1 Preimages and level curves
The notion of preimage is dual to that of image. Speci cally, let f : A ! B. Given a point
y 2 B, its preimage (or inverse image or counter-image), denoted by f 1 (y), is the set
1
f (y) = fx 2 A : f (x) = yg

of the elements of the domain whose image is y. More generally, given any subset D of the
codomain B, its preimage f 1 (D) is the set
1
f (D) = fx 2 A : f (x) 2 Dg

of the elements of the domain whose images belong to D.


The next examples illustrate these notions.9

Example 190 Consider the function f : A ! B that to each (living) person associates the
year of birth. If y 2 B is a possible such year, f 1 (y) is the set of the persons that have y
as year of birth; in other words, all the persons in f 1 (y) have the same age (they form a
cohort, in the demography terminology). N
9
For the sake of brevity, we will consider as sets D only intervals and singletons, but similar considerations
hold for other types of sets.
122 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 191 Let f : R ! R be the cubic function f (x) = x3 . We have Im f = R. For


each y 2 R,
n 1o
1
f (y) = y 3

For example, f 1 (27) = 3. The preimage of a closed interval [a; b] is


h 1 1i
1
f ([a; b]) = a 3 ; b 3

For example, f 1 ([ 8; 27]) = [ 2; 3]. N

Example 192 Let f : R ! R be the quadratic function f (x) = x2 . We have Im f = R+ .


The preimage of each y 0 is
p p
f 1 (y) = f y; yg

while that of each y < 0 is f 1 (y) = ;.10 So,


8 p p p p
>
< ( b; a) [ ( a; b) if a 0
f 1 (a; b) = ; if b < 0
>
: p p
( b; b) if a < 0 < b

Note that f 1 (a; b) = f 1 ([0; b)) when a < 0. Indeed, the elements
p p between a and 0 have
no preimage. For example, if D = ( 1; 2), then f 1 (D) = ( 2; 2). Since

1 1 1
f (D) = f ([0; 2)) = f ( 1; 2)

the negative elements of D are irrelevant (as they do not belong to the image of the function).
N

By resorting to an appropriate topographic term, the preimage

1
f (k) = fx 2 A : f (x) = kg

of a function f : A ! R is often called level curve (or level set) of f of level k 2 R.


This terminology nicely expresses the idea that the set f 1 (k) is formed by the points of
the domain at which the function attains the \level" k. It is particularly tting in economic
applications, as we will see shortly.
The level curves of functions of two variables have a geometric representation that may
prove illuminating, as next we show.

Example 193 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 + x22 . For every k 0, the level
curve f 1 (k) is the locus in R2 of equation

x21 + x22 = k
10 1 1
To ease notation, we denote the preimage of an open interval (a; b) by f (a; b) instead of f ((a; b)).
6.3. GENERAL PROPERTIES 123
p
That is, it is the circumference with center at the origin and radius k. Graphically, the
level curves can be represented as:

while the graph of the function is:

4
x3

0
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Two di erent level curves of the same function cannot have any point in common, that
is,
1 1
k1 6= k2 =) f (k1 ) \ f (k2 ) = ; (6.7)

Indeed, if there were a point x 2 A that belongs to both the two curves of levels k1 and k2 ,
we would have f (x) = k1 and f (x) = k2 with k1 6= k2 , but this is impossible because, by
de nition, a function may assume only one value at each point.
124 CHAPTER 6. FUNCTIONS (SDOGANATO)
p
Example 194 Let f : A R2 ! R be given by f (x1 ; x2 ) = 7x21 x2 , where A consists
of the points x = (x1 ; x2 ) in the plane suchpthat 7x21 x2 0. For every k 0, the level
curve f 1 (k) is the locus in R2 of equation 7x21 x2 = k, that is, x2 = k 2 + 7x21 . It is a
parabola that intersects the vertical axis at k 2 . Graphically:

7
x
6 2

1
k=0
0
O x
1
-1
k=1
-2

-3

-4
k=2
-2 -1 .5 -1 -0 .5 0 0 .5 1 1 .5 2 2 .5 3

Example 195 The function f : R++ R ! R given by

s
x21 + x22
f (x1 ; x2 ) =
x1

is de ned only for x1 > 0. Its level curves f 1 (k) are the loci of equation

s
x21 + x22
=k
x1

that is, x21 + x22 k 2 x1 = 0. Therefore, they are circumferences passing through the origin
6.3. GENERAL PROPERTIES 125

and with centers k 2 =2; 0 , all on the horizontal axis (why?). Graphically:

Although all such circumferences have the origin as common point, the \true" level curves
are the circumferences without the origin because at (0; 0) the function is not de ned. So,
they do not actually have any point in common. N

O.R. The equation f (x1 ; x2 ) = k of a generic level curve of a function f of two variables
can be rewritten, in an apparently more complicated form, as

y = f (x1 ; x2 )
y=k

This rewriting clari es its geometric meaning:

(i) the equation y = f (x1 ; x2 ) represents a surface in R3 (the graph of f );

(ii) the equation y = k represents an horizontal plane (it contains the points (x1 ; x2 ; k) 2
R3 , i.e., all the points of \height" k);

(iii) the brace \f " geometrically means intersection between the sets de ned by the two
previous equations.

The curve of level k is, therefore, viewed as the intersection between the surface that
represents f and a horizontal plane.
126 CHAPTER 6. FUNCTIONS (SDOGANATO)

x3
-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Hence, the di erent level curves are obtained by cutting the surface horizontally with hori-
zontal planes (at di erent levels). They represent the edges of the \slices" obtained in this
way on the plane (x1 ; x2 ). H

Indi erence curves


We now turn to a classic economic application of level curves. Given a utility function
u : A Rn+ ! R, its level curves
1
u (k) = fx 2 A : u (x) = kg
are called indi erence curves. So, an indi erence curve is formed by all the bundles x 2 A
that have the same utility k, which are therefore indi erent for the consumer. The collection
u 1 (k) : k 2 R of all the indi erence curves is sometimes called indi erence map.

Example 196 Consider the Cobb-Douglas utility function u : R2+ ! R given by u (x) =
p
x1 x2 . We have
u 1 (0) = x 2 R2+ : x1 = 0 or x2 = 0
that is, this indi erence curve is the union of the axes of the positive orthant. On the other
hand, for every k > 0 we have
p
u 1 (k) = x 2 R2+ : x1 x2 = k = x 2 R2+ : x1 x2 = k 2
k2
= x 2 R2+ : x2 =
x1
Therefore, the indi erence curve of level k > 0 is the hyperbola of equation
k2
x2 =
x1
By varying k 0, we get the indi erence map u 1 (k) :k 0 . Graphically:
6.3. GENERAL PROPERTIES 127

8
y
7

6 k=3

5
k=2
4

2 k=1

0
O x
-1
0 0.5 1 1.5 2 2.5 3 3.5

Introductory economics courses emphasize that indi erence curves \do not cross", i.e.,
are disjoint: k1 6= k2 implies u 1 (k1 ) \ u 1 (k2 ) = ;. Clearly, this just a special case of the
more general property (6.7) that holds for any family of level curves.

The level curves


1
f (k) = fx 2 A : f (x) = kg

of a production function f : A Rn+ ! R are called isoquants. An isoquant is, thus, the set
of all the input vectors x 2 A that produce the same output. The set f 1 (k) : k 2 R of
all the isoquants is sometimes called isoquant map.
Finally, the level curves
1
c (k) = fx 2 A : c (x) = kg

of a cost function c : A Rn+ ! R are called isocosts. So, an isocost is the set of all the
levels of output x 2 A that have the same cost. The set c 1 (k) : k 2 R of all the isocosts
is sometimes called isocost map.

In sum, indi erence curves, isoquants and isocosts are all examples of level curves, whose
general properties they inherit. For example, the fact that two level curves have no points in
common { property (6.7) { implies the analogous classic property of the indi erence curves,
as already noted, as well as the property that isoquants and isocosts never intersect.

6.3.2 Algebra of functions


Given any two sets A and B, we denote by B A the set of all functions f : A ! B.11 In
particular, RA is the set of all real-valued functions f : A ! R de ned on a set A whatsoever.
In RA we can de ne in a natural way some operations that associate to two functions in RA
a new function still in RA .
11 A
Sometimes we use the notation B instead of B A (the context should clarify).
128 CHAPTER 6. FUNCTIONS (SDOGANATO)

De nition 197 Given any two functions f and g in RA , the sum function f + g is the
element of RA such that

(f + g) (x) = f (x) + g (x) 8x 2 A

The sum function f + g : A ! R is thus constructed by adding, for each element x of the
domain A, the images f (x) and g (x) of x under the two functions.

Example 198 Let RR be the set of all the functions f : R ! R. Consider f (x) = x and
g (x) = x2 . The sum function f + g is de ned by (f + g) (x) = x + x2 . N

In a similar way we de ne:

(i) the di erence function (f g) (x) = f (x) g (x) for every x 2 A;

(ii) the product function (f g) (x) = f (x) g (x) for every x 2 A;

(iii) the quotient function (f =g) (x) = f (x) =g (x) for every x 2 A, provided g (x) 6= 0.

We have thus introduced four operations in the set RA , based on the four basic operations
on the real numbers. It is easy to see that these operations inherit the properties of the
basic operations. For example, addition is commutative, f + g = g + f , and associative,
(f + g) + h = f + (g + h).
In particular, the negative function f of f de ned by

( f ) (x) = f (x) 8x 2 A

can be seen as the di erence between the function which is constant to 0 and f .

N.B. (i) These operations require the functions to have the same domain A. For example,
p
if f (x) = x2 and g (x) = x, the sum f + g is meaningful only when A = [0; 1) for both
functions, that is, when f is restricted to the positive half-line. Indeed, for x < 0 the function
g is not de ned. (ii) The domain A can be any set: numbers, chairs, or other. Instead, it is
key that the codomain is R because it is among real numbers that we are able to perform
the four basic operations. O

6.3.3 Composition

Consider two functions f : A ! B and g : C ! D, with Im f C. Take any point x 2 A.


Since Im f C, the image f (x) belongs to the domain C of the function g. We can apply
the function g to the image f (x), obtaining in such a way the element g (f (x)) of D. Indeed,
6.3. GENERAL PROPERTIES 129

the function g has as its argument the image f (x) of x. Graphically:

1.6 A Im f ⊆ C D

1.4

1.2 f g
x f(x) g(f(x))
1

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

We have, therefore, associated to each element x of the set A the element g (f (x)) of the
set D. This rule, called composition, starts with the functions f and g and de nes a new
function from A in D, denoted by g f . Formally:

De nition 199 Let A, B, C and D be four sets and f : A ! B and g : C ! D two


functions. If Im f C, the composite (or compound) function g f : A ! D is de ned by
(g f ) (x) = g (f (x)) 8x 2 A

Note that the inclusion condition, Im f C, is key in making the composition possible.
Let us give some examples.

Example 200 Let f; g : R ! R be given by f (x) = x2 and g (x) = x + 1. In this case


A = B = C = D = R, so the inclusion condition is trivially satis ed. Consider g f . Given
x 2 R, we have f (x) = x2 . The function g has therefore x2 as its argument, so
g (f (x)) = g x2 = x2 + 1
Hence, the composite function g f : R ! R is given by (g f ) (x) = x2 + 1.
Consider instead f g. Given x 2 R, one has g (x) = x + 1. The function f has therefore
x + 1 as its argument. Hence,
f (g (x)) = f (x + 1) = (x + 1)2
The composite function f g) (x) = (x + 1)2 .
g : R ! R is thus given by (f N
p
Example 201 Consider f : R+ ! R given by f (x) = x and g : R ! R given by
g (x) = x 1. In this case B = C = D = R and A = R+ . The inclusion condition is satis ed
for g f because Im f = R+ R, but not for f g because Im g = R is not included in R+ ,
which is the domain of f .
p p
Consider g f . Given x 0, we have f (x) = x. The function g has therefore x as
its argument, so p p
g (f (x)) = g x = x 1
p
The composite function g f : R+ ! R is given by (g f ) (x) = x 1. N
130 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 202 If in the previous example we consider g~ : [1; 1) ! R given by g~ (x) = x 1,


the inclusion condition is satisp ed for f g~ because Im g~ = [0; 1) = R+ . In particular,
f g~ : [1; 1) ! R is given by x 1. As we will see soon in Section 6.7, the function g~ is
the restriction of g to [1; 1). N

Example 203 Let A be the set of all citizens of a country, f : A ! R the function that to
each of them associates his income for this year, and g : R ! R the function that to each
possible income associates the tax that must be paid. The composite function g f : A ! R
establishes the correspondence between each citizen and the tax that he has to pay. For the
revenue service (and also for the citizens) such composite function is of great interest. N

Example 204 Consider any function g : R+ ! R and the function f : R2 ! R given


by f (x1 ; x2 ) = x21 + x22 . The composite function g f : R2 ! R, given by (g f ) (x) =
g x21 + x22 , takes on the same values on all circles centered at the origin. For instance, if
p p
g = x then (g f ) (x) = x21 + x22 is the norm of x. N

6.4 Classes of functions


In this section we introduce some important classes of functions.

6.4.1 Injective, surjective, and bijective functions


Given any two sets A and B, a function f : A ! B is called injective (or one-to-one) if

x 6= y =) f (x) 6= f (y) 8x; y 2 A (6.8)

To di erent elements of the domain, an injective f thus associates di erent elements of the
codomain. Graphically:

1.6

A B
1.4
a
1
1.2 b
1
b
3
1 a b
2 2

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Example 205 A simple example of injective function is the cubic f (x) = x3 . Indeed, two
distinct scalars have always distinct cubes, so x 6= y implies x3 6= y 3 for all x; y 2 R. A
6.4. CLASSES OF FUNCTIONS 131

classic example of non-injective function is the quadratic f (x) = x2 : for instance, to the two
distinct points 2 and 2 of R there corresponds the same square, that is, f (2) = f ( 2) = 4.
N

Note that (6.8) is equivalent to the contrapositive:

f (x) = f (y) =) x = y 8x; y 2 A

which requires that two elements of the domain that have the same image be equal.12

Given any two sets A and B, a function f : A ! B is called surjective (or onto) if

Im f = B

that is, if for each element y of B there exists at least an element x of A such that f (x) = y.
In other words, a function is surjective if each element of the codomain is the image of at
least one point in the domain.

Example 206 The cubic function f : R ! R given by f (x) = x3 is surjective because each
1 1
y 2 R is the image of y 3 2 R, that is, f (y 3 ) = y. On the other hand, the quadratic function
f : R ! R given by f (x) = x2 is not surjective, because no y < 0 is the image of a point of
the domain. N

A function f : A ! B can always be written as f : A ! Im f , that is, it can be


made surjective by taking B = Im f . For example, if we write the quadratic function as
f : R ! R+ , it becomes surjective. Therefore, by suitably choosing the codomain, each
function becomes surjective. This, however, does not mean that surjectivity is a notion
without interest: as we will see, the set B is often xed a priori (for various reasons) and it
is then important to distinguish the functions that have B as image, that is, the surjective
ones, from those whose image is only contained in B.

Finally, given any two sets A and B, a function f : A ! B is called bijective if it is both
injective and surjective. In this case, we can go \back and forth" between the sets A and B
by using f : from any x 2 A we arrive to a unique y = f (x) 2 B, while from any y 2 B we
go back to a unique x 2 A such that y = f (x). Graphically:

12
Given two properties p and q, we have p =) q if and only if :q =) :p (: stands for \not"). The
implication :q =) :p is the contrapositive of the original implication p =) q. See Appendix D.
132 CHAPTER 6. FUNCTIONS (SDOGANATO)

1.6

A B
1.4
a b
1 1
1.2

1 a b
2 2

0.8
a b
3 3

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

For example, the cubic function f : R ! R given by f (x) = x3 is bijective.

Through bijective functions we can establish a simple, but interesting, result about nite
sets. Here jAj denotes the cardinality of a nite set A, that is, the number of its elements.

Proposition 207 Let A and B be any two non-empty nite sets. There exists a bijection
f : A ! B if and only if jAj = jBj.

As we will see in Chapter 7, by paraphrasing a famous sentence of David Hilbert we can


say that this result is the door to the paradise of Cantor.

Proof Since A and B are nite, let A = fa1 ; a2 ; : : : ; an g and B = fb1 ; b2 ; : : : ; bm g with
m; n 1. \If". Let n = jAj = jBj = m. Then de ne the bijection f : A ! B by f (ai ) = bi
for i = 1; 2; :::; n. \Only if". Let f : A ! B be a bijection. Since f is injective, the elements
f (ai ) in B are distinct. Since f is surjective, we have that B = Im f = ff (a1 ) ; :::; f (an )g.
This implies that B = ff (a1 ) ; :::; f (an )g and so the two sets have the same number of
elements, jBj = jff (a1 ) ; :::; f (an )gj = n = jAj.

6.4.2 Inverse functions


Given any two sets A and B, let f : A ! B be an injective function. Then, to each element
f (x) of the image Im f there corresponds a unique element x 2 A such that f (x) = y. The
function so determined, called inverse function of f , therefore associates to each element of
the image of f its unique preimage. Formally:

De nition 208 Let f : A ! B be an injective function. The function f 1 : Im f ! A


de ned by f 1 (y) = x if and only if f (x) = y is called the inverse function of f .

We have both
1
f (f (x)) = x 8x 2 A (6.9)
and
1
f f (y) = y 8y 2 Im f (6.10)
6.4. CLASSES OF FUNCTIONS 133

Inverse functions go in the opposite way than the original ones, they retrace their steps back
to the domain: from x 2 A we arrive to f (x) 2 B, and we go back with f 1 (f (x)) = x.
Graphically:

1.6

A B
1.4

1.2

f
1 x y
-1
f
0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

It makes sense to talk about the inverse function only for injective functions, which are
then called invertible. Indeed, if f were not injective, there would be at least two elements
of the domain x1 6= x2 with the same image y = f (x1 ) = f (x2 ). So, the set of the preimages
of y would not be a singleton (because it would contain at least the two elements x1 and x2 )
and the relation f 1 would not be a function.
We actually have f 1 : B ! A when the function f is also surjective, and so bijective.
In such a case the domain of the inverse is the entire codomain of f . In particular, when
A = B the relations (6.9) and (6.10) can be summarized as
1 1
f f =f f
In this important case { think of A = B = Rn { the function and its inverse properly
commute.
Example 209 (i) Let f : R ! R be the bijective function f (x) = x3 . From y = x3 it
1 1
follows x = y 3 . The inverse f 1 : R ! R is given by f 1 (y) = y 3 . That is, because of the
1
irrelevance of the label of the independent variable, f 1 (x) = x 3 .
(ii) Let f : R ! R be the bijective function f (x) = 3x . From y = 3x it follows x = log3 y.
The inverse f 1 : R ! R is given by f 1 (y) = log3 y, that is, f 1 (x) = log3 x. N
Example 210 Let f : R ! R be de ned by
8 x
< if x < 0
f (x) = 2
:
3x if x 0
From y = x=2 it follows x = 2y, while from y = 3x it follows x = y=3. Therefore,
8
< 2y if y < 0
f 1 (y) =
: y if y 0
3
134 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 211 Let f : R f0g ! R be de ned by f (x) = 1=x. From y = 1=x, it follows
that x = 1=y, so f 1 : R f0g ! R is given by f 1 (y) = 1=y. In this case f = f 1 . Note
that R f0g is both the domain of f 1 and the image of f: N

Example 212 The curious function f : R ! R de ned by


(
x if x 2 Q
f (x) =
x if x 2
=Q

is bijective, so invertible with f 1 : R ! R. Also in this case we have f = f 1, as the reader


can check. N

Example 213 On the open unit ball B1 (0) = fx 2 Rn : kxk < 1g, de ne the map f :
B1 (0) ! Rn by
x
f (x) =
1 kxk
For instance, when n = 2 we have
!
x x
f (x1 ; x2 ) = p1 ; p2
1 x21 + x22 1 x21 + x22

This map is injective. For, suppose per contra that there exist x; z 2 B1 (0), with x 6= z,
such that f (x) = f (z). Without loss of generality, let x 6= 0. Then,
x z
= =) x = z
1 kxk 1 kzk

where = (1 kxk)=(1 kzk). Thus, x and z are collinear (Example 75). As x; z 2 B1 (0),
we actually have > 0. Thus, f (x) = f ( x) and so
x x
=
1 kxk 1 kxk

that is, (1 kxk) x = (1 kxk) x. By taking the norm on both sides of this equality, we
get
(1 kxk) kxk = (1 kxk) kxk
As x 6= 0, so kxk > 0, this implies that = 1. Being x = z, we thus reach the contradiction
x = z. We conclude that f is injective. This map is also surjective. For, let y 2 Rn . Set
x = y= (1 + kyk). We have

y kyk
kxk = = <1
1 + kyk 1 + kyk

as well as y y y
1+kyk 1+kyk 1+kyk
f (x) = = kyk
= 1 =y
y 1
1 1+kyk 1+kyk 1+kyk
6.4. CLASSES OF FUNCTIONS 135

Thus, x 2 B1 (0) and f (x) = y. We conclude that y 2 Im f .


Summing up, f is a bijective function. Its inverse f 1 : Rn ! B1 (0) is given by
y
f (y) =
1 + kyk
Indeed, for each x 2 B1 (0) it holds
x x x
1 f (x) 1 kxk 1 kxk 1 kxk
f (f (x)) = = = = 1 =x
1 + kf (x)k 1+ x
1 kxk
1 + 1 kxk
kxk 1 kxk

as desired. N

Example 214 Let g : R ! R be a strictly increasing function. For each m 6= 0, de ne


gm : R ! R by gm (x) = g (mx). For instance, for the cubic function g (x) = x3 we have
1 3
g2 (x) = 8x3 and g 1 (x) = x
4 64
The function gm is strictly monotone (increasing if m > 0, decreasing if m < 0) and Im gm =
Im g (why?). To nd its inverse gm1 : Im g ! R, note that, for all x 2 R and y 2 Im g, it
holds
1
x = gm1 (y) () gm (x) = y () g (mx) = y () mx = g 1 (y) () x = g 1 (y)
m
Thus,
1 1
ggm1 = (6.11)
m
For instance, for the cubic function the inverses g2 1 : R ! R and g1=4
1
: R ! R are given by

1 1 1
g2 1 (x) = x 3 and g 1 1 (x) = 4x 3
2 4

Indeed,
1

1 1 1
1 x3 3
g2 (g2 (x)) = 8x3 3
=x and g 1 (g 1 (x)) = 4 =x
2 4 4 64
Formula (6.11) continues to hold when g is strictly monotone, be it increasing or decreasing,
on the real line, as the reader can check. N

The last example, which involves the composition g f with f (x) = mx, suggests a rule
that connects composition and inversion. It is easy to see that, when it exists, the inverse
(g f ) 1 of the composite function g f is
1 1
f g (6.12)
That is, it is the composition of the inverse functions, but exchanged of place. Indeed, from
y = g (f (x)) we get g 1 (y) = f (x) and nally f 1 g 1 (y) = x. On the other hand, in
dressing, we rst put the underpants, f , and then the pants, g; in undressing, rst we take
o the pants, g 1 , and then the underpants, f 1 .13
13 1 1
A caveat. Formula (6.12) presupposes that both f and g exist. This is the case when Im f is equal
to the domain C of g, as the reader can check.
136 CHAPTER 6. FUNCTIONS (SDOGANATO)

Re ection Given a bijective function f : A ! B, the graph of its inverse function f 1 :


B ! A is the mirror image of the graph of the function f with respect to the 45 degree line:

Indeed, the reader can check that, for all x; y 2 A B, we have


1
(x; y) 2 Gr f () (y; x) 2 Gr f

Inverses and cryptography The computation of the cube x3 of any scalar x is much
p
easier than the computation of the cube root 3 x: it is much easier to compute 803 = 512; 000
p
(three multiplications su ce) than 3 512; 000 = 80. In other words, the computation of the
p
cubic function f (x) = x3 is much easier than the computation of its inverse f 1 (x) = 3 x.
This computational di erence increases signi cantly as we take higher and higher odd powers
(for example f (x) = x5 , f (x) = x7 and so on).
Similarly, while the computation of ex is fairly easy, that of log x is much harder (be-
fore electronic calculators became available, logarithmic tables were used to aid such com-
putations). From a computational viewpoint (in the theoretical world everything works
smoothly), the inverse function f 1 may be very di cult to deal with. Injective functions
for which the computation of f is easy, while that of the inverse f 1 is complex, are called
one-way.14
For example, let A = f(p; q) 2 P P : p < qg and consider the function f : A P P ! N
de ned by f (p; q) = pq that associates to each pair of prime numbers p; q 2 P, with p < q,
14
The notions of \simple" and \complex", here used qualitatively, can be made more rigorous (as the
curious reader may discover in cryptography texts).
6.4. CLASSES OF FUNCTIONS 137

their product pq. For example, f (2; 3) = 6 and f (11; 13) = 143. By the Fundamental
Theorem of Arithmetic, it is an injective function.15 Given two prime numbers p and q, the
computation of their product is a trivial multiplication. Instead, given any natural number
n it is quite complex, and may require a long time even to a powerful computer, to deter-
mine if it is the product of two prime numbers. In this regard, the reader may recall the
discussion regarding factorization and primality tests from Section 1.3.2 (to experience the
di culty rst-hand, the reader may try to check whether the number 4343 is the product of
two prime numbers). This makes the computation of the inverse function f 1 very complex,
as opposed to the very simple computation of f . For this reason, f intuitively quali es as a
one-way function.

Let us now look at a simple application of one-way functions to cryptography. Consider


a user who handles sensitive data with an information system accessible by means of a
password. Suppose the password is numerical and that, for the sake of simplicity, it is made
up of any pair of natural numbers. The system has a speci c data storage unit in which
it saves the password chosen by the user. When the user inputs this password, the system
veri es whether it coincides with the one stored in its memory.
This scheme has an obvious Achilles' heel: the system manager can access the data
storage and reveal the password to any third party interested in accessing the user's personal
data. One-way functions help to mitigate this problem. Indeed, let f : A N N ! N
be a one-way function that associates a natural number f (n; m) to any pair of natural
numbers (n; m) 2 A. Instead of memorizing the chosen password, say (n; m), the system
now memorizes its image f (n; m). When the user inserts a password (n; m) the system
computes f (n; m) and compares it with f (n; m). If f (n; m) = f (n; m), the password is
correct { that is, (n; m) = (n; m) { and the system allows the user to log in.
Since the function is one-way, the computation of f (n; m) is simple and requires a level
of e ort only slightly higher than that needed to compare passwords directly. The memory
will no longer store the password (n; m), but its image f (n; m). This image will be the only
piece of information that the manager will be able to access. Even if he (or the third party
to whom he gives the information) knows the function f , the fact that the computation of
the inverse f 1 is very complex (and requires a good deal of time) makes it computationally,
so practically, very di cult to recover the password (n; m) from the knowledge of f (n; m).
But, without the knowledge of (n; m) it is impossible to access the sensitive data.
For example, if instead of any natural number we require the password to consist of a
pair (p; q) of prime numbers, we can use f (p; q) = pq as a one-way function. The manager
has access to the product pq, for example the number 4343, and it will not be easy to recover
the pair of prime numbers (p; q) that generated the product, so the password, in a reasonably
short amount of time.
To sum up, one-way functions make it possible to signi cantly strengthen the protection
of restricted access systems. The design of better and better one-way functions, which
combine the ease of computation of f (x) with increasingly complex inverses f 1 (x), is an
important eld of research in cryptography.

15
But not surjective: for example 4 2
= Im f because there are no two di erent prime numbers whose product
is 4.
138 CHAPTER 6. FUNCTIONS (SDOGANATO)

6.4.3 Bounded functions


Let f : A ! R be a function with domain any set A and codomain the real line. We say
that f is:

(i) bounded (from) above if its image Im f is a set bounded above in R, i.e., if there exists
M 2 R such that f (x) M for all x 2 A;

(ii) bounded (from) below if its image Im f is a set bounded below in R, i.e., if there
exists m 2 R such that f (x) m for all x 2 A;

(iii) bounded if it is bounded both above and below.

For example, the function f : R f0g ! R given by


1
f (x) =
jxj
is bounded below, but not above, since f (x) 0 for every 0 6= x 2 R. Instead, the function
f : R ! R given by f (x) = x2 is bounded above, but not below, since f (x) 0 for every
x 2 R.
The next lemma establishes a simple, but useful, condition of boundedness.

Lemma 215 A function f : A ! R is bounded if and only if there exists k > 0 such that

jf (x)j k 8x 2 A (6.13)

Proof If f is bounded, there exist m; M 2 R such that m f (x) M . Let k > 0 be such
that k m M k. Then (6.13) holds. Vice versa, suppose that (6.13) holds. By (4.5),
which holds also for , we have k f (x) k, so f is bounded both above and below.

The function f : R ! R de ned by


8
< 1
> if x 1
f (x) = 0 if 0 < x < 1 (6.14)
>
:
2 if x 0

is bounded since jf (x)j 2 for every x 2 R.

Thus, we have a rst taxonomy of the real-valued functions f : A ! R, that is, of the
elements of the space RA .16 This taxonomy is not exhaustive: there exist functions that
do not satisfy any of the conditions (i)-(iii). This is the case, for example, of the identity
function f (x) = x. Such \unclassi ed" functions are called unbounded (their image being
an unbounded set).

We denote by supx2A f (x) the supremum of the image of a function f : A ! R bounded


above, that is,
sup f (x) = sup (Im f )
x2A
16
Note the use of the term \space" to denote a set of reference (in this case RA ).
6.4. CLASSES OF FUNCTIONS 139

By the de nition of the supremum, for a scalar M we have f (x) M for all x 2 A if and
only if supx2A f (x) M .
Similarly, we denote by inf x2A f (x) the in mum of the image of a function f : A ! R
bounded below, that is,
inf f (x) = inf (Im f )
x2A

By the de nition of the in mum, for a scalar m we have f (x) m for all x 2 A if and only
if inf x2A f (x) m.
Clearly, a bounded function f : A ! R has both extrema, with

inf f (x) f (x) sup f (x) 8x 2 A


x2A x2A

In particular, for two scalars m and M we have m f (x) M for all x 2 A if and only if
m inf x2A f (x) supx2A f (x) M .

Example 216 For the function (6.14) we have supx2R f (x) = 1 and inf x2R f (x) = 2. For
the function f : R f0g ! R given by f (x) = 1= jxj, which is bounded below but not above,
one has inf x2R f0g f (x) = 0. N

6.4.4 Monotone functions


We now introduce monotone functions, an all-important class of real-valued functions f :
A Rn ! R de ned in terms of the underlying order structure of Rn . We begin by studying
scalar functions and then turn to the multivariable case.

De nition 217 A function f : A R ! R is said to be:

(i) increasing if
x > y =) f (x) f (y) 8x; y 2 A (6.15)

strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A (6.16)

(ii) decreasing if
x > y =) f (x) f (y) 8x; y 2 A (6.17)

strictly decreasing if

x > y =) f (x) < f (y) 8x; y 2 A

(iii) constant if there exists k 2 R such that

f (x) = k 8x 2 A
140 CHAPTER 6. FUNCTIONS (SDOGANATO)

Note that a function is constant if and only if it is both increasing and decreasing. In
other words, constancy is equivalent to having both monotonicity properties. This is why
we have introduced constancy among the forms of monotonicity. Soon, we will see that in
the multivariable case the relation between constancy and monotonicity is more subtle.

Increasing or decreasing functions are called, generically, monotone (or monotonic). They
are called strictly monotone when they are either strictly increasing or strictly decreasing
(two mutually exclusive properties: no functions can be both strictly increasing and strictly
decreasing). The next result shows that strict monotonicity excludes the possibility that the
function is constant on some region of its domain. Formally:

Proposition 218 An increasing function f : A R ! R is strictly increasing if and only


if
f (x) = f (y) =) x = y 8x; y 2 A (6.18)
that is, if and only if it is injective.

A similar result holds for strictly decreasing functions since f is, clearly, strictly decreas-
ing if and only if f is strictly increasing. Strictly monotone functions are therefore injective,
and so invertible.17

Proof \Only if". Let f be strictly increasing and let f (x) = f (y). Suppose, by contradiction,
that x 6= y, say x > y. By (6.16), we have f (x) 6= f (y), which contradicts f (x) = f (y). It
follows that x = y, as desired.
\If". Suppose that (6.18) holds. Let f be increasing. We prove that it is also strictly
increasing. Let x > y. By increasing monotonicity, we have f (x) f (y), but we cannot
have f (x) = f (y) because (6.18) would imply x = y. Thus f (x) > f (y), as claimed.

Example 219 The functions f : R ! R given by f (x) = x and f (x) = x3 are strictly
increasing, while the function (
x if x 0
f (x) =
0 if x < 0
is increasing, but not strictly increasing, because it is constant for every x < 0. The same is
true for the function 8
>
> x 1 if x 1
<
f (x) = 0 if 1<x<1 (6.19)
>
>
:
x + 1 if x 1
because it is constant on [ 1; 1]. N

Note that in (6.15) we can replace x > y by x y without any consequence because we
have f (x) = f (y) if x = y. Hence, increasing monotonicity is equivalently stated as

x y =) f (x) f (y) (6.20)


17
Later in the book we will see a partial converse of this result (Proposition 576).
6.4. CLASSES OF FUNCTIONS 141

The converse implication


f (x) f (y) =) x y (6.21)
is the contrapositive of
y > x =) f (y) > f (x) (6.22)
because \y > x" and \f (y) > f (x)" are the negations of \x y" and \f (x) f (y)",
respectively. So, conditions (6.21) and (6.22) are equivalent. But, up to an immaterial
interchange of the roles of x and y, the latter condition amounts to f being strictly increasing.
We have thus proved the following result.

Proposition 220 A function f : A R ! R is strictly increasing if and only if

f (x) f (y) =) x y 8x; y 2 A

In words, a function is strictly increasing if and only if, to larger values of the image,
correspond larger values of the argument. Next we report a simple variation on this theme {
with () in place of =) { that plays an important role in the ordinalist approach of utility
theory, as we will see later in this section.

Proposition 221 A function f : A R ! R is strictly increasing if and only if

x y () f (x) f (y) 8x; y 2 A (6.23)

To see why we can replace =) with () it is enough to observe that, for a strictly
increasing (so, in particular, increasing) function f , we have

x y =) f (x) f (y) 8x; y 2 A

Summing up, Propositions 218-221 show that, for a function f : A R ! R, the


following conditions are equivalent:

(i) f is strictly increasing;

(ii) f is increasing and injective;

(iii) f satis es condition (6.21);

(iv) f satis es condition (6.23).

A dual version holds, of course, for strictly decreasing functions. That said, we close with
a noteworthy mirror e ect on monotonicity (see the second gure of Section 6.4.2).

Proposition 222 A scalar function is strictly increasing (decreasing) if and only if its in-
verse is strictly increasing (decreasing).

Proof We prove the \only if" as the converse is similarly established. Let f : A R ! R
be a strictly increasing scalar function, with inverse f 1 : Im f ! R. Let z1 ; z2 2 Im f with
z2 > z1 . We want to show that f 1 (z2 ) > f 1 (z1 ). By de nition, there exist x1 ; x2 2 A
such that x1 = f 1 (z1 ) and x2 = f 1 (z2 ). Suppose, by contradiction, that x1 x2 . Then,
142 CHAPTER 6. FUNCTIONS (SDOGANATO)

since f is strictly increasing, z1 = f (x1 ) f (x2 ) = z2 , a contradiction. We conclude that


f 1 (z2 ) > f 1 (z1 ).
A dual argument shows that if f is strictly decreasing, so is its inverse, as the reader can
check.

The monotonicity notions seen in the case n = 1 generalize in a natural way to the case
of arbitrary n, though some subtle issues arise because of the two peculiarities of the case
n 2, that is, the incompleteness of and the presence of two inequality notions > and .
Basic monotonicity is easily generalized: a function f : A Rn ! R is said to be:

(i) increasing if
x y =) f (x) f (y) 8x; y 2 A (6.24)

(ii) decreasing if
x y =) f (x) f (y) 8x; y 2 A

(iii) constant if there exists k 2 R such that

f (x) = k 8x 2 A

This notion of increasing and decreasing function has bite only on vectors x and y that
can be compared, while vectors x and y that cannot be compared, such as for example (1; 2)
and (2; 1) in R2 , are ignored. As a result, while constant functions are both increasing and
decreasing, the converse is no longer true when n 2, as the next example shows.

Example 223 Let A = fa; a0 ; b; b0 g be a subset of the plane with four elements. Assume
that a a0 and b b0 are the only comparisons that can be made in A. For instance,
a = ( 1; 0), a0 = (0; 1), b = (1; 1=2) and b0 = (2; 1=2). The function f : A R2 ! R
0 0
de ned by f (a) = f (a ) = 0 and f (b) = f (b ) = 1 is both increasing and decreasing, but it
is not constant. N

Fortunately, for Cartesian domains the converse holds; next we consider the basic rect-
angular case, leaving to the reader more general Cartesian domains.

Proposition 224 Let A = ni=1 [ai ; bi ] be a rectangle in Rn . A function f : A ! R is


constant if and only if it is both decreasing and increasing.

Proof We prove the \if" as the converse is simple. De ne x and y to be the vectors such
that xi = bi and yi = ai for all i = 1; :::; n. Set k = f (x). Clearly, we have that x z y for
all z 2 A. Since f is increasing, this implies that k = f (x) f (z) f (y) for all z 2 A and,
in particular, f (y) f (z) k for all z 2 A. Since f is decreasing, we can also conclude
that k = f (x) f (z) f (y) k for all z 2 A. Combining these two facts, we obtain that
f (z) = k for all z 2 A.

More delicate is the generalization to Rn of strict monotonicity because of the two distinct
inequalities > and . We say that a function f : A Rn ! R is:
6.4. CLASSES OF FUNCTIONS 143

(iv) strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A

(v) strongly increasing if it is increasing and

x y =) f (x) > f (y) 8x; y 2 A (6.25)

We have a simple hierarchy among these notions:

Proposition 225 Let f : A Rn ! R. We have:

strictly increasing =) strongly increasing =) increasing (6.26)

They are, therefore, increasingly stringent notions of monotonicity. In applications we


have to choose the most relevant form for the problem at hand.

Proof A strongly increasing function is, by de nition, increasing. It remains to prove that
strictly increasing implies strongly increasing. Thus, let f be strictly increasing. We need to
prove that f is increasing and satis es (6.25). If x y, we have x = y or x > y. In the rst
case f (x) = f (y). In the second case f (x) > f (y), so f (x) f (y). Thus, f is increasing.
Moreover, if x y a fortiori we have x > y, and therefore f (x) > f (y). We conclude that
f is strongly increasing.

The converses of the previous implications do not hold. An increasing function that,
like (6.19), has constant parts is an example of an increasing, but not strongly increasing
function { so, not strictly increasing either.18 Therefore,

increasing 6=) strongly increasing

Moreover, the next example shows that there exist functions that are strongly but not strictly
increasing, so
strongly increasing 6=) strictly increasing

Example 226 The Leontief function f : R2 ! R given by

f (x) = min fx1 ; x2 g

is strongly increasing, but not strictly increasing. For example, x = (1; 2) > y = (1; 1) yet
f (x) = f (y) = 1. N

In de ning a strongly increasing function f : A Rn ! R, in item (v) above, we required


f to be increasing. This requirement becomes super uous for continuous functions, under a
mild comparability condition on the domain.19
18
By the contrapositive of (6.26), functions that are not strongly increasing are not strictly increasing as
well.
19
The all-important continuous functions will be introduced later in the book (so this result can be skipped
at a rst reading).
144 CHAPTER 6. FUNCTIONS (SDOGANATO)

Proposition 227 Let A be a subset of Rn containing two vectors x and x such that x
x . A continuous function f : A ! R is strongly increasing if and only if (6.25) holds, i.e.,
x y =) f (x) > f (y) 8x; y 2 A
Proof We prove the \if" part as the converse is trivially true. So, assume that (6.25) holds.
To show that f is strongly increasing it then su ces to show that f is increasing. Let x y.
We want to show that f (x) f (y). If x = y, we clearly have f (x) = f (y). So, let x > y. By
hypothesis, there exist x and x in X such that x x . For each n 1, we then have
1 1 1 1
x + (1 )x x + (1 )y
n n n n
By (6.25),
1 1 1 1
f
x + (1 )x >f x + (1 )y
n n n n
Hence, by the continuity of f we have
1 1 1 1
f (x) = lim f x + (1 )x lim f x + (1 )x = f (y)
n!1 n n n!1 n n
We conclude that f (x) f (y), as desired.

Dual decreasing notions can be introduced in the obvious way. In particular, increasing or
decreasing functions are called, generically, monotone, while they are called strictly (strongly)
monotone when they are either strictly (strongly) increasing or strictly (strongly) decreasing.

The nice characterizations of scalar strict monotonicity established in Propositions 218-


220 altogether fail in the multivariable case. Observe that, when n 2, strictly monotone
functions might well not be injective: the function f : R2++ ! R de ned by
p
f (x1 ; x2 ) = x1 x2
is a simple instance of a strictly increasing function which is not injective.
In the multivariable case, strict monotonicity and injectivity thus become independent
properties.20 Yet, as in the scalar case, they are both implied by condition (6.21).
Proposition 228 A function f : A Rn ! R is injective and strictly increasing if
f (x) f (y) =) x y 8x; y 2 A (6.27)
Proof By interchanging the roles of x and y, from (6.27) it follows that
f (x) = f (y) =) x = y 8x; y 2 A
The contrapositive is
x 6= y =) f (x) 6= f (y) 8x; y 2 A
Thus, f is injective. Let x > y. We want to show that f (x) > f (y). Suppose, per contra,
that f (x) f (y). By (6.27), this implies x y, a contradiction. We conclude that f is
strictly increasing.

The converse of this result fails, as the next example shows.


20
It is because of this independence that the study of indi erence curves in utility analysis is meaningful.
6.4. CLASSES OF FUNCTIONS 145

Example 229 In the plane, the segment

B = f(b; 1 b) : b 2 (0; 1)g

consists of points not comparable. If we add the origin to this segment, we get the set

A = f(0; 0)g [ B (6.28)

De ne f : A ! R by (
0 if x1 = x2 = 0
f (x1 ; x2 ) =
x1 otherwise
This function is easily seen to be strictly increasing and injective. We have

3 1 3 1 3 1
f ; = >f ; =
4 4 4 4 4 4

but not (3=4; 1=4) (1=4; 3=4) because these two vectors are not comparable. Thus, f does
not satisfy condition (6.27). Note that the inverse f 1 : [0; 1) ! A, de ned by f 1 (x) =
(x; 1 x) for all x 2 [0; 1), is not monotone (why?). So, the mirror property established in
Proposition 222 for scalar functions fails in the multivariable case. N

For an operator
f :A Rn ! Rm
the notions of monotonicity can be de ned in the, by now, obvious way. For instance, this
operator is strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A

Yet, when m > 1 the notions of monotonicity studied for the case m = 1 have less bite as they
confront non-comparabilities also on the codomain as images f (x) and f (y) may happen
not be comparable, that is, neither f (x) f (y) nor f (y) f (x) may hold. For example,
if f : R2 ! R2 is such that f (0; 1) = (1; 2) and f (3; 4) = (2; 1), the images (1; 2) and (2; 1)
are not comparable. Yet, an interesting result can be proved about the monotonicity of
operators.

Theorem 230 (Collatz) An operator f : A Rn ! Rm is injective and has a strictly


1 m
increasing inverse f : Im f ! R if and only if

f (x) f (y) =) x y 8x; y 2 A (6.29)

Proof \If". Assume that condition (6.29) holds. It is easy to check that f is injective
(see the proof of Proposition 228). It remains to prove that f 1 is strictly increasing. Let
x; y 2 Im f with x > y. We want to show that f 1 (x) > f 1 (y). Let x ~; y~ 2 A be such that
f (~
x) = x and f (~y ) = y. As f is injective, x
~ 6= y~. By (6.29),
1 1
x y () f (~
x) f (~
y ) =) f (x) = x
~ y~ = f (y)

As x
~ 6= y~, we conclude that f 1 (x) >f 1 (y).
146 CHAPTER 6. FUNCTIONS (SDOGANATO)

\Only if". Assume that f is injective and has a strictly increasing inverse f 1 . Let
x; y 2 A be such that f (x) f (y). If f (x) = f (y), then x = y by the injectivity of f . So,
let f (x) > f (y). As f 1 is strictly increasing, x = f 1 (f (x)) f 1 (f (y)) = y. Thus,
(6.29) holds.

This result is the correct multivariable version of Proposition 220, which is the special
case when n = m = 1. Indeed, in the scalar case f is strictly increasing if and only if its
inverse f 1 is strictly increasing (Proposition 222). This mirror property fails, in general,
for injective operators, as Example 229 showed. In particular, that example shows that the
verbatim multivariable version of Proposition 220, involving the strict monotonicity of f ,
fails. The real protagonist is the strict monotonicity of the inverse: a silent companion in
the scalar case that takes center stage in the general case.
A dual version of Collatz's Theorem says that an operator f : A Rn ! Rm is injective
and strictly increasing if and only if its inverse f 1 : Im f ! Rn satis es condition (6.29),
i.e.,
f 1 (x) f 1 (y) =) x y 8x; y 2 Im f (6.30)
Summing up, for an operator f : A Rn ! Rm consider the following properties:

(i) f is injective and strictly increasing;

(ii) f 1 satis es condition (6.30);

(iii) f is injective and f 1 is strictly increasing;

(iv) f satis es condition (6.29).

By Collatz's Theorem,

(i) () (ii) and (iii) () (iv)

By Proposition 228 and by Example 229, when m = 1 we have

(iii) =) (i) but (i) 6=) (iii)

Next we give a further example of this failure, with m = n = 2.

Example 231 Let A be the set (6.28) and let D = f(d; d) : d 2 [0; 1)g. De ne f : A2 ! R
by
f (x1 ; x2 ) = (x1 ; x1 )
For instance, f (1=4; 3=4) = (1=4; 1=4). This function is easily seen to be strictly increasing
and injective. Its image is the \diagonal" set D and its inverse f 1 : D ! A is given
by f 1 (d; d) = (d; 1 d) for all d 2 [0; 1). For instance, f 1 (3=4; 3=4) = (3=4; 1=4). Let
x = (1=4; 1=4) and y = (3=4; 3=4) be two points of D. We have x > y but their preimages

1 1 3 3 1
f (x) = ; and f (y) = ;
4 4 4 4

are not comparable. Thus, f 1 is not monotone. N


6.4. CLASSES OF FUNCTIONS 147

Collatz's Theorem is especially interesting when m = n, that is, for operators f : A


Rn ! Rn . However, for this class of operators there is an alternative notion of monotonicity,
which will be studied later in the book (Section 31.2.2).

The monotonicity notions so far introduced play a key role in utility theory. Speci cally,
let u : A ! R be a utility function de ned on a set A Rn+ of bundles of goods. A
transformation f u : A ! R of u, where f : Im u R ! R, de nes a mathematically
di erent but conceptually equivalent utility function provided

u (x) u (y) () (f u) (x) (f u) (y) 8x; y 2 A (6.31)

Indeed, under this condition the function f u orders the bundles in the same way as the
original utility function u, that is,

x % y () (f u) (x) (f u) (y) 8x; y 2 A

The utility functions u and f u are thus equivalent because they represent the same under-
lying preference %.
By Proposition 221, the function f satis es (6.31) if and only if it is strictly increasing.
Therefore, f u is an equivalent utility function if and only if f is strictly increasing. To
describe such a fundamental property of invariance of utility functions, we say that they are
ordinal, that is, unique up to monotone (strictly increasing) transformations. This property
lies at the heart of the ordinalist approach, in which utility functions are regarded as mere
numerical representation of the underlying preference %, which is the fundamental notion
(recall the discussion in Section 6.2.1).

Example 232 Consider the Cobb-Douglas utility function on Rn++ given by


n
Y
u (x1 ; x2 ; ; xn ) = xi i (6.32)
i=1
Pn
with each i > 0 and i=1 i = 1. Taking f (x) = log x, its monotone transformation
n
X
f u= i log xi
i=1

is a utility function equivalent to u on Rn++ . It is the logarithmic version of the Cobb-Douglas


function, often called log-linear utility function.21 N

The three notions of monotonicity on Rn { increasing, strongly increasing, and strictly


increasing { are key for utility functions u : A ! R. Since their argument x 2 A is a bundle
of \goods", it is natural to assume that the consumer prefers vectors with larger amounts of
the di erent goods, that is, \the more, the better". According to how we state this motto,
one of the three forms of monotonicity becomes the appropriate one.
21
Recall that, even if mathematically it can be de ned on the entire positive orthant Rn
+ , from the economic
viewpoint it is on Rn++ that the Cobb-Douglas utility function is interesting (cf. Example 233). The fact that
the log-linear utility function can be only de ned on Rn ++ is a further sign that this is, indeed, the proper
economic domain of the Cobb-Douglas utility function.
148 CHAPTER 6. FUNCTIONS (SDOGANATO)

If in a vector x 2 A each component { i.e., each type of good { is deemed important by


the consumer, it is natural to assume that u is strictly increasing:
x > y =) u (x) > u (y) 8x; y 2 A
In this case it is su cient to increase the amount of any of the goods to attain a greater
utility: \the more of any good is always better".
If, instead, we want to contemplate the possibility that some goods may actually be
useless for the consumer, we only require u to be increasing:
x y =) u (x) u (y) 8x; y 2 A (6.33)
Indeed, if a good in the bundles is \useless" for the consumer (as wine is for a dry person,
or for drunk one who had already too much of it), the inequality x > y might be caused by
a larger amount of such good, with all other goods unchanged. It is then reasonable that
u (x) = u (y) because the consumer does not get any bene t in passing from y to x. In this
case \the more of any good can be better or indi erent".
Finally, \the more of any good is always better" motto that motivates strict monotonicity
can be weakened in the sense of strong monotonicity by assuming \the more of all the goods
is always better", that is,
x y =) u (x) > u (y) 8x; y 2 A
In this case, there is an increase in utility only when the amounts of all goods increase, it
is no longer enough to increase the amount of only some good. Strong monotonicity may
re ect a form of complementarity among goods, so that an increase of the amounts of only
some of them can be irrelevant for the consumer if the quantities of the other goods remain
unchanged. Perfect complementarity a la Leontief is the extreme case, a classic example
being pairs of shoes, right and left: it is useless to increase the number of the right shoes
without increasing, in the same quantity, that of the left shoes (and vice versa).
Example 233 (i) The Cobb-Douglas utility function on Rn++ given by (6.32) is strictly
increasing. By (6.26), it is also strongly increasing.
(ii) The Leontief utility function on Rn++ given by
u (x1 ; x2 ; ; xn ) = min xi
i=1;:::;n

in which the goods are perfect complements, is strongly increasing. As we saw in Example
226, it is not strictly increasing.
(iii) The reader can check which properties of monotonicity hold if we consider the two
previous utility functions on the entire positive orthant Rn+ rather than just on Rn++ . N
Consumers with strictly or strongly monotone utility functions are \insatiable" because,
by suitably increasing their bundles, their utility also increases. This property of utility
functions is sometimes called insatiability, and it is thus shared by both strict and strong
monotonicity. The only form of monotonicity compatible with satiety is increasing mono-
tonicity (6.33): as observed for the drunk consumer, this weaker form of monotonicity allows
for the possibility that a given good, when it exceeds a certain level, does not result in a
further increase of utility. However, it cannot happen that utility decreases: if (6.33) holds,
utility either increases or remains constant, but it never (strictly or strongly) decreases.
Therefore, if an extra glass of wine results in a decrease of the drunk's utility, this cannot
be modelled by any form of increasing monotonicity, no matter how weak.
6.4. CLASSES OF FUNCTIONS 149

6.4.5 Concave and convex functions: a preview

The class of concave and convex functions is of great importance in economics. The concept,
which will be fully developed in Chapter 17, is anticipated here in the scalar case.

De nition 234 A function f : I ! R, de ned on an interval I of R, is said to be concave


if
f ( x + (1 ) y) f (x) + (1 ) f (y)

for every x; y 2 I and every 2 [0; 1], while it is said to be convex if

f ( x + (1 ) y) f (x) + (1 ) f (y)

for every x; y 2 I and every 2 [0; 1].

Geometrically, a function is concave if the segment (called chord ) that joins any two
points (x; f (x)) and (y; f (y)) of its graph lies below the graph of the function, while it is
convex if the opposite happens, that is, if such chord lies above the graph of the function.
The following gure illustrates:

Note that the domain of concave and convex functions is an interval, so the points x+
(1 ) y belong to it and the expression f ( x + (1 ) y) is meaningful.

Example 235 The functions f; g : R ! R de ned by f (x) = x2 and g(x) = ex are convex,
while the function f : R ! R de ned by f (x) = log x is concave. The function f : R ! R
given by f (x) = x3 is neither concave nor convex. All this can be checked analytically
150 CHAPTER 6. FUNCTIONS (SDOGANATO)

through the last de nition, but it is best seen graphically:

5 5

4 4

3 3

2 2

1 1

0 0
x y x y
-1 -1

-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4

Convex function x2 Convex function ex

8
3
6

2 4

2
1
0
x
y
0
x -2
y
-4
-1
-6

-2 -8

-3 -2 -1 0 1 2 3 4 5
-3
-1 0 1 2 3 4

Non-concave and non-convex


Concave function log x function x3

6.4.6 Separable functions


In economics an important role is played by vector functions that are sums of scalar functions.

De nition 236 Let A = ni=1 Ai Rn . A function f : A ! R is said to be separable if


there exist n scalar functions gi : Ai R ! R such that
n
X
f (x) = gi (xi ) 8x = (x1 ; :::; xn ) 2 A
i=1

The importance P of this class of functions is due to their great tractability. The simplest
example is f (x) = ni=1 xi , for which the functions gi are the identity, i.e., gi (x) = x for
each i. Let us give some more examples.
6.4. CLASSES OF FUNCTIONS 151

Example 237 The function f : R2 ! R de ned by

f (x) = x21 + 4x2 8x = (x1 ; x2 ) 2 R2

is separable with g1 (x1 ) = x21 and g2 (x2 ) = 4x2 . N

Example 238 The function f : R R++ ! R de ned by

f (x) = x1 + log x2 8x = (x1 ; x2 ) 2 R R++

is separable with g1 (x1 ) = x1 and g2 (x2 ) = log x2 . N

Example 239 The function f : Rn++ ! R, called entropy, de ned by


n
X
f (x) = xi log xi 8x = (x1 ; :::; xn ) 2 Rn++
i=1

is separable with gi (xi ) = xi log xi . N

Example 240 The intertemporal utility function (6.6), that is,


T
X
t 1
U (x) = u (xt )
t=1

is separable with gt (xt ) = t 1 u (xt ) for each t.


Separable utility functions are important in the static case as well. The utility functions
used by the rst marginalists were indeed of the form
n
X
u (x) = ui (xi ) (6.34)
i=1

In other words, they assumed that the utility of a bundle x is decomposable into the utility
of the quantities xi of the various goods that compose it. It is a restrictive assumption that
ignores any possible interdependence, for example of complementarity or substitutability,
among the di erent goods of a bundle. Due to its remarkable tractability, however, (6.34)
remained for a long time the standard form of the utility functions until, at the end of the
nineteenth century, the works of Edgeworth and Pareto showed how to develop consumer
theory for utility functions that are not necessarily separable. N

Example 241 If in (6.34) we set ui (xi ) = xi for all i, we obtain the important special case
n
X
u (x) = xi
i=1

where the goods are perfect substitutes. The utility of bundles x depends only on the sum of
the amounts of the di erent goods, regardless of the speci c amounts of the individual goods.
For example, think of x as a bundle of di erent types of oranges, which di er in origin and
taste, but are identical in terms of nutritional values. In this case, if the consumer only cares
152 CHAPTER 6. FUNCTIONS (SDOGANATO)

about such values, then these di erent types of oranges are perfect substitutes. This case is
opposite to that of perfect complementarity that characterizes the Leontief utility function.
More generally, if in (6.34) we set ui (xi ) = i xi for all i, with i > 0, we have
n
X
u (x) = i xi
i=1

In this case, the goods in the bundle are no longer perfect substitutes; rather, their relevance
depends on their weights i . Therefore, to keep utility constant each good can be replaced
with another according to a linear trade-o . Intuitively, one unit of good i is equivalent
to j = i units of good j. The notion of marginal rate of substitution formalizes this idea
(Section 34.3.2). N

Example 242 The log-linear utility function


n
X
log u (x) = ai log xi
i=1

studied in Example 232 is separable. It is the logarithmic transformation of the Cobb-


Douglas utility function, which is not separable. Thus, sometimes it is possible to obtain
separable versions of utility functions via their strictly monotone transformations. Usually,
the separable versions are the most convenient from the analytical point of view { the log-
linear utility is, indeed, more tractable than the Cobb-Douglas (6.32). N

6.5 Elementary functions on R


The section introduces the so-called \elementary" functions, which include most of the scalar
functions of interest in applications.

6.5.1 Polynomial functions


The polynomial function, or polynomial, f : R ! R of degree n 0 is de ned by

f (x) = a0 + a1 x + + an xn

with ai 2 R for every 0 i n and an 6= 0. The two coe cients a0 and an are called
constant and leading coe cients, respectively. The leading coe cient determines the degree
of the polynomial, while the constant coe cient is equal to f (0), i.e., to the value of the
function at the origin.
Let Pn be the set of all polynomials of degree lower than or equal to n. Clearly,

P0 P1 P2 Pn

Example 243 (i) We have f (x) = x + x2 2 P2 , and f (x) = 3x 10x4 2 P4 . (ii) A


polynomial f has degree zero when there exists a 2 R such that f (x) = a for every x.
Constant functions can, therefore, be regarded as a polynomial of degree zero. N
[
The set of all polynomials, of any degree, is denoted by P. That is, P = Pn .
n 0
6.5. ELEMENTARY FUNCTIONS ON R 153

6.5.2 Exponential and logarithmic functions


Given a > 0, the function f : R ! R de ned by

f (x) = ax

is called the exponential function of base a. By Lemma 43-(iv), the exponential function is:

(i) strictly increasing if a > 1 (e.g., e > 1);

(ii) constant if a = 1;

(iii) strictly decreasing if 0 < a < 1.

Provided a 6= 1, the exponential function ax is strictly monotone, and therefore injective.


Its inverse has as domain the image (0; 1) and, by Proposition 46, it is the function f :
(0; 1) ! R de ned by
f (x) = loga x
called logarithmic function of base a > 0. Note that, by what just observed, a 6= 1.
The properties established in Proposition 46, i.e.,

loga ax = x 8x 2 R

and
aloga x = x 8x 2 (0; 1)
are therefore nothing but the relations (6.9) and (6.10) for inverse functions { i.e., the
relations f 1 (f (x)) = x and f f 1 (y) = y { in the special case of the exponential and
logarithmic functions.
The next result summarizes the monotonicity properties of these elementary functions.

Lemma 244 Both the exponential function ax and the logarithmic function loga x are in-
creasing if a > 1 and decreasing if 0 < a < 1.

Proof For the exponential function, observe that, when a > 1, also ah > 1 for every h > 0.
Therefore ax+h = ax ah > ax for every h > 0. For the logarithmic function, after observing
that loga k > 0 if a > 1 and k > 1, we have

h h
loga (x + h) = loga x 1 + = loga x + loga 1 + > loga x
x x

for every h > 0, as desired.

That said, in the sequel we will mostly use Napier's constant e as base and so we will
refer to f (x) = ex as the exponential function, without further speci cation (sometimes it is
denoted by f (x) = exp x). Thanks to the remarkable properties of the power ex (Section 1.5),
the exponential function plays a fundamental role in mathematics and in its applications.
Its image is (0; 1) and its graph is:
154 CHAPTER 6. FUNCTIONS (SDOGANATO)

5
y
4

1 1

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

The negative exponential function f (x) = e x is also important. Its graph is:

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

In a similar vein, in view of the special importance of the natural logarithm (Section 1.5),
we refer to f (x) = log x as the logarithmic function, without further speci cation. Like the
exponential function f (x) = ex , which is its inverse, the logarithmic function f (x) = log x
is widely used in applications. Its image is R and its graph is:
6.5. ELEMENTARY FUNCTIONS ON R 155

5
y
4

0
O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

The functions ex and log x, being one the inverse of the other, have graphs that are mirror
images of each other:

To test their understanding of the material of this section, readers may want to check {
analytically and graphically { that the inverse of the negative exponential is the logarithmic
function f : ( 1; 0) ! R de ned by f (x) = log ( x).

6.5.3 Trigonometric and periodic functions


Trigonometric functions, and more generally periodic functions, are also important in many
applications.22
22
We refer readers to Appendix C for of some basic notions of trigonometry.
156 CHAPTER 6. FUNCTIONS (SDOGANATO)

Trigonometric functions
The sine function f : R ! R de ned by f (x) = sin x is the rst example of a trigonometric
function. For each x 2 R we have

sin (x + 2k ) = sin x 8k 2 Z

The graph of the sine function is:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

The function f : R ! R de ned by f (x) = cos x is the cosine function. For each x 2 R
we have
cos (x + 2k ) = cos x 8k 2 Z

Its graph is:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6
6.5. ELEMENTARY FUNCTIONS ON R 157

Finally, the function f : R 2 + k ; k 2 Z ! R de ned by f (x) = tan x is the tangent


function. By (C.3),

tan (x + k ) = tan x 8k 2 Z

The graph is:

10
y
8

0
O x
-2

-4

-6

-8

-10
-4 -3 -2 -1 0 1 2 3 4

It is immediate to see that, for x 2 (0; =2), we have the sandwich 0 < sin x < x < tan x.

The functions sin x, cos x and tan x are monotone (so invertible) on, respectively, the
intervals [ =2; =2], [0; ], and ( =2; =2). Their inverse functions are denoted by arcsin x
(or sin 1 x), arccos x (or cos 1 x), and arctan x (or tan 1 x), respectively.
Speci cally, by restricting ourselves to an interval [ =2; =2] of strict monotonicity of
the function sin x, we have

h i
sin x : ; ![ 1; 1]
2 2

Hence, the inverse function of sin x is

h i
arcsin x : [ 1; 1] ! ;
2 2

with graph:
158 CHAPTER 6. FUNCTIONS (SDOGANATO)

3 y

O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4

Restricting ourselves to an interval [0; ] of strict monotonicity of cos x we have:

cos x : [0; ] ! [ 1; 1]
Therefore, the inverse function of cos x is

arccos x : [ 1; 1] ! [0; ]

with graph:

y
3

0
O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4

Finally, restricting ourselves to an interval ( =2; =2) of strict monotonicity of tan x,


we have:

tan x : ; !R
2 2
6.5. ELEMENTARY FUNCTIONS ON R 159

so that the inverse function of tan x is

arctan x : R ! ;
2 2

with graph:

3 y

O x
-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4

Note that (2= ) arctan x is a bijective function between the real line and the open interval
( 1; 1). As we will learn in the next chapter, this means that the open interval ( 1; 1) has
the same cardinality of the real line.23

Periodic functions

Trigonometric functions are the most important class of periodic functions.

De nition 245 A function f : R ! R is said to be periodic if there exists p 2 R such that,


for each x 2 R, we have
f (x + kp) = f (x) 8k 2 Z (6.35)

The smallest (if it exists) among such p > 0 is called the period of f . In particular,
the periodic functions sin x and cos x have period 2 , while the periodic function tan x has
period . Their graphs well illustrate the property that characterizes periodic functions,
that is, that of repeating themselves identical on each interval of width p.

Example 246 The functions sin2 x and log tan x are periodic of period . N

Let us see an example of a periodic function which is not trigonometric.


23
The more readers are puzzled by this remark, the higher the chance that they are actually understanding
it.
160 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 247 The function f : R ! R given by f (x) = x [x] is called mantissa.24 The
mantissa of x > 0 is its decimal part; for example f (2:37) = 0:37. The mantissa function is
periodic with period 1. Indeed, by (1.18) we have [x + 1] = [x] + 1 for every x 2 R. So,

f (x + 1) = x + 1 [x + 1] = x + 1 ([x] + 1) = x [x] = f (x)

Its graph
2.5

2
y
1.5

0.5

-0.5

-1
O x
-1.5

-2

-2.5
-3 -2 -1 0 1 2 3

well illustrates the periodicity. N

Finally, readers can verify that periodicity is preserved by the fundamental operations
among functions: if f and g are two periodic functions of same period p, the functions
f (x) + g (x), f (x) g (x) and f (x) =g (x) are also periodic (of period at most p).

6.5.4 Rational functions


A scalar function f is rational if it is the ratio of two polynomials p and q:
p (x) b0 + b1 x + + bm xm
f (x) = =
q (x) a0 + a1 x + + an xn
Its domain consists of all points of the real line except the real solutions of the equation
a0 + a1 x + + an xn = 0.
A rational function is proper if the degree of the polynomial at the numerator is lower
than that of the polynomial at the denominator, i.e., m < n. Proper rational functions
admit a simple representation { called partial fraction expansion { that often simpli es their
analysis. We focus on the case of distinct real roots, leaving to readers the case of multiple
roots.

Proposition 248 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then
i=1

c1 c2 cn
f (x) = + + + (6.36)
x r1 x r2 x rn
24
Recall from Proposition 42 that the integer part [x] of a scalar x 2 R is the greatest integer x.
6.5. ELEMENTARY FUNCTIONS ON R 161

where, for all i = 1; :::; k,


p (ri )
ci = (6.37)
q 0 (ri )

Proof We rst establish that there exist n coe cients c1 , c2 , ..., cn such that (6.36) holds.
For simplicity, we only consider the case

b0 + b1 x
f (x) =
a0 + a1 x + a2 x2

leaving to readers the general case. Since the denominator is (x r1 ) (x r2 ), we look for
coe cients c1 and c2 such that
b0 + b1 x c1 c2
= +
q (x) (x r1 ) (x r2 )

Since
c1 c2 c1 (x r2 ) + c2 (x r1 ) (c1 + c2 ) x (c1 r2 + c2 r1 )
+ = =
(x r1 ) (x r2 ) q (x) q (x)

we have
b0 + b1 x (c1 + c2 ) x (r2 + r1 )
=
q (x) q (x)
So, by equating coe cients we have the simple linear system

c1 + c2 = b0
c1 r2 + c2 r1 = b1

Since r1 6= r2 , the system is easily seen to have a unique solution (c1 ; c2 ) that provides the
sought-after coe cients.
It remains to show that the coe cients of (6.36) satisfy (6.37). We have

c1 c2 cn
lim (x ri ) f (x) = lim (x ri ) + + +
x!ri x!ri x r1 x r2 x rn
c1 (x ri ) c2 (x ri ) cn (x ri )
= lim + + + ci + = ci
x!ri x r1 x r2 x rn
as well as, by de l'Hospital's rule,

p (x) (x ri ) 1
lim (x ri ) f (x) = lim (x ri ) = p (ri ) lim = p (ri ) 0
x!ri x!ri q (x) x!ri q (x) q (x)

Putting the two limits together, we conclude that ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k, as
desired.

Example 249 Consider the proper rational function


x 1
f (x) =
x2 + 3x + 2
162 CHAPTER 6. FUNCTIONS (SDOGANATO)

The roots of the polynomial at the denominator are 1 and 2, so by (6.37) we have
c1 = p ( 1) =q 0 ( 1) = 2 and c2 = p ( 2) =q 0 ( 2) = 3. So, the partial fraction expansion
of f is
2 3
f (x) = +
x+1 x+2
This can be also checked directly. Indeed, since the denominator is (x + 1)(x + 2), let us look
for c1 and c2 such that
c1 c2 x 1
+ = 2 (6.38)
x+1 x+2 x + 3x + 2
The rst term in (6.38) is equal to

c1 (x + 2) + c2 (x + 1) x(c1 + c2 ) + (2A + c2 )
= (6.39)
(x + 1)(x + 2) (x + 1)(x + 2)

Expressions (6.38) and (6.39) are equal if and only if c1 and c2 satisfy the system:

c1 + c2 = 1
2c1 + c2 = 1

Therefore, c1 = 2 and c2 = 3. This con rms what established via formula (6.37). N

6.6 Maximizers and minimizers: a preview


At this point, it is useful to introduce the concepts of maximizer and minimizer of a scalar
function. We will then discuss them in full generality in Chapter 22.

De nition 250 Let f : A R ! R be a real-valued function. An element x


^ 2 A is called a
( global) maximizer (or maximum point) of f on A if

f (^
x) f (x) 8x 2 A

The value f (^
x) of the function at x
^ is called ( global) maximum value of f on A.

Maximizers thus attain the highest values of the function f on its domain, they outper-
form all other elements of the domain. Note that the maximum value of f on A is nothing
but the maximum of the set Im f , which is a subset of R. That is,

f (^
x) = max f (A) = max Im f

By Proposition 36, the maximum value is unique. We denote such unique value by

max f (x)
x2A
6.6. MAXIMIZERS AND MINIMIZERS: A PREVIEW 163

Example 251 Consider the function f : R ! R given by f (x) = 1 x2 , with graph:

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

The maximizer of f is 0 and the maximum value is 1. Indeed, 1 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = ( 1; 1], we have 1 = max ( 1; 1]. N

Similar de nitions hold for the minimum value of f on A and for the minimizers of f on
A.

Example 252 Consider the quadratic function f (x) = x2 , whose graph is the parabola

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

The minimizer of f is 0 and the minimum value is 0. Indeed, 0 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = [0; 1), we have 0 = min [0; 1). N

While the maximum (minimum) value is unique, maximizers and minimizers might well
not be unique, as the next example shows.
164 CHAPTER 6. FUNCTIONS (SDOGANATO)

Example 253 Let f : R ! R be the sine function f (x) = sin x. Since Im f = [ 1; 1], the
unique maximum of f on R is 1 and the unique minimum of f on R is 1. Nevertheless,
there are both in nitely many maximizers { i.e., all the points x = =2 + 2k with k 2 Z -
and in nitely many minimizers { i.e., all the points x = =2 + 2k with k 2 Z. The next
graph should clarify.

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

6.7 Domains and restrictions


In the rst paragraph of the chapter we de ned the domain of a function as the set on
which the function is de ned: the set A is the domain of a function f : A ! B. In the
various examples of real-valued functions presented until now we have identi ed as domain
the greatest set A R where the function f could be de ned. For example, for f (x) = x2
p
the domain is R, for f (x) = x the domain is R+ , for f (x) = log x the domain is R++ , and
so on. For a function f of one or several variables we call natural domain the largest set on
p
which f can be de ned. For example, R is the natural domain of x2 , R+ is that of x, R++
is that of log x, and so on.
But, there is nothing special, except for maximality, about the natural domain: a function
can be regarded as de ned on any subset of the natural domain. For example, we can
consider x2 only for positive values of x, so to have a quadratic function f : R+ ! R, or
we can consider log x only for values of x greater than 1 so to have a logarithmic function
f : [1; 1) ! R, and so on.
In general, given a function f : A ! B, it is sometimes important to consider restrictions
to subsets of A.

De nition 254 Let f : A ! B be a function and C A. The function g : C ! B de ned


by
g(x) = f (x) 8x 2 C
6.7. DOMAINS AND RESTRICTIONS 165

is called the restriction of f to C and is denoted by fjC .

The restriction fjC can, therefore, be seen as f restricted to the subset C of A. Thanks
to the smaller domain, the function fjC can satisfy properties di erent from those of the
original function f .

Example 255 (i) Let g : [0; 1] ! R be de ned by g(x) = x2 . The function g can be seen as
the restriction to the interval [0; 1] of the quadratic function f : R ! R given by f (x) = x2 ;
that is g = fj[0;1] . Thanks to its restricted domain, the function g has better properties
than the function f . For example: g is strictly increasing, while f is not; g is injective
(so, invertible), while f is not; g is bounded, while f is only bounded below; g has both a
maximizer and a minimizer, while f does not have a maximizer.
(ii) Let g : ( 1; 0] ! R be de ned by g(x) = x. The function g can be seen as
the restriction to ( 1; 0] of both f : R ! R given by f (x) = jxj and h : R ! R given
by h(x) = x. Indeed, a function may be the restriction of several functions (rather, of
in nitely many functions) and it is the speci c application at hand that may suggest which
is the most relevant. In any case, let us analyze the di erences between g and f and those
between g and h. The function g is injective, while f is not; g is monotone decreasing, while
f is not. The function g is bounded below, while h is not; g has a global minimizer, while h
does not. N
p
Example 256 The function f (x1 ; x2 ) = x1 x2 has as natural domain R2+ [ R2 , i.e., the
rst and third quadrants of the plane. Nevertheless, when we regard it as a utility function
of Cobb-Douglas type, its domain is restricted to the rst quadrant, R2+ , because bundles
of goods always have positive components. Moreover, since f (x1 ; x2 ) = 0 even when just
one component is zero, something not that plausible from an economic viewpoint, this util-
ity function is often considered only on R2++ . Therefore, purely economic considerations
determine the domain on which to study f when interpreted as a utility function. N

Example 257 (i) Let g : [0; +1) ! R be de ned by g (x) = x3 : The function g can be seen
as the restriction to the interval [0; +1) of the cubic function f : R ! R given by f (x) = x3 ,
that is, g = fj[0;1] . We observe that g is convex, while f is not; g is bounded below, while f
is not; g has a minimizer, while f does not.
(ii) Let g : ( 1; 0] ! R be de ned by g (x) = x3 . The function g can be seen as the
restriction to the interval ( 1; 0] of the function f : R ! R given by f (x) = x3 , that is,
g = fj( 1;0] . We observe that g is concave, while f is not; g is bounded above, while f is
not; g has a maximizer, while f does not.
(iii) Sometimes smaller domains may actually deprive functions of some of their prop-
erties. For instance, the restriction of the sine function on the interval [0; =2] is no longer
periodic, while the restriction of the quadratic function on the open unbounded interval
(0; 1) has no minimizers. N

We introduce now the concept of extension of a function on a larger domain, which is


dual with respect to that of restriction.

De nition 258 Let f : A ! B be a function and let A C. A function g : C ! B such


that
g (x) = f (x) 8x 2 A
166 CHAPTER 6. FUNCTIONS (SDOGANATO)

is called an extension of f to C.

Restriction and extension are thus two sides of the same coin: g is an extension of f if
and only if f is a restriction of g. In particular, a function de ned on its natural domain
extends to it all its restrictions. Moreover, if a function has an extension, it has in nitely
many ones.

Example 259 (i) The function g : R ! R de ned by


( 1
x if x 6= 0
g(x) =
0 if x = 0

is an extension of the function f (x) = 1=x, which has as natural domain R f0g.
(ii) The function g : R ! R de ned by
(
x for x 0
g(x) =
log x for x > 0

is an extension of the function f (x) = log x, which has natural domain (0; 1). N

6.8 Grand nale: preferences and utility


6.8.1 Preferences
We close the chapter by studying in more depth the notions of preference and utility intro-
duced in Section 6.2.1. Consider a preference (binary) relation % de ned on a subset A of
Rn+ , called consumption set, whose elements are interpreted as the bundles of goods relevant
for the choices of the consumer.25
The preference relation represents the tastes of the consumer over the bundles. In partic-
ular, x % y means that the consumer prefers bundle x over bundle y.26 It is a basic relation
that economists take as a given (leaving to psychologists the study of the psychological
motivations that underlie it). From it, the following two important notions are derived:

(i) we write x y if the bundle x is strictly preferred to y, that is, if x % y but not y % x;

(ii) we write x y if the bundle x is indi erent relative to the bundle y, that is, if both
x % y and y % x.

Relations and are, obviously, mutually exclusive: between two indi erent bundles
there cannot exist strict preference, and vice versa. The next simple result further clari es
the di erent nature of the two relations.

Lemma 260 The strict preference relation is asymmetric (i.e., x y implies not y x),
while the indi erence relation is symmetric (i.e., x y implies y x).
25
The preference relation is an important example of a binary relation (see Appendix A).
26
In the weak sense of \prefers or is indi erent".
6.8. GRAND FINALE: PREFERENCES AND UTILITY 167

Proof Suppose x y. By de nition, x % y but not y % x, so we cannot have y x. This


proves the asymmetry of . As to the symmetry of , suppose x y. By de nition, both
x % y and y % x. This trivially yields that y % x and x % y. So, y x.

On the preference % we consider some axioms.

Re exivity: x % x for every x 2 A.

This rst axiom re ects the \weakness" of %: each bundle is preferred to itself. The next
axiom is more interesting.

Transitivity: x % y and y % z implies x % z for every x; y; z 2 A.

It is a rationality axiom that requires that the preferences of the decision maker have no
cycles:
x%y%z x

Strict preference and indi erence inherit these rst two properties (with the obvious
exception of re exivity for the strict preference).

Lemma 261 Let % be re exive and transitive. Then:

(i) is re exive and transitive;

(ii) is transitive.

Proof (i) Consider x and set y = x. By de nition of and since % is re exive, we have
that x % y and y % x, yielding that x y, that is, x x. Hence, the relation is re exive.
To prove transitivity, suppose that x y and y z. We show that this implies x z. By
de nition, x y means that x % y and y % x, while y z means that y % z and z % y.
Thanks to the transitivity of %, from x % y and y % z it follows x % z, while from y % x
and z % y it follows z % x. We therefore have both x % z and z % x, i.e., x z.
(ii) Suppose that x y and y z. We show that this implies x z. By de nition of
and transitivity of %, this implies that x % y and y % z, which in turn yields that x % z. By
contradiction, suppose that x z does not hold. Since x % z, this implies that z % x. By
transitivity of % and since, as we have seen, x % y, this yields that z % y, a contradiction
with y z.

The last two lemmas together show that, if % is re exive and transitive, the indi er-
ence relation is re exive, symmetric, and transitive (so, it is an equivalence relation; cf.
Appendix A). For each bundle x 2 A, denote by

[x] = fy 2 A : x yg

the collection of the bundles indi erent to it. This set is the indi erence class of % determined
by bundle x.
168 CHAPTER 6. FUNCTIONS (SDOGANATO)

Lemma 262 If % is re exive and transitive, we have

x y () [x] = [y] (6.40)

and
x y () [x] \ [y] = ; (6.41)

Relations (6.40) and (6.41) express two fundamental properties of the indi erence classes.
By (6.40), the indi erence class [x] does not depend on the choice of the bundle x: each
indi erent bundle determines the same indi erence class. By (6.41), di erent indi erence
classes do not have elements in common, they do not intersect.

Proof By the previous lemmas, is re exive, symmetric, and transitive. We rst prove
(6.40). Suppose that x y. We show that this implies [x] [y]. Let z 2 [x], that is, x z.
Since is transitive and symmetric, x y and x z imply that y z, that is, z 2 [y],
which shows that [x] [y]. By symmetry, x y implies y x. Then, the previous argument
shows that [y] [x]. So, we conclude that x y implies [x] = [y]. Since is re exive, the
converse is then obvious and (6.40) is proved.
We move now to (6.41) and suppose that x y. This implies that [x] \ [y] = ;. Let
us suppose, by contradiction, that this is not the case and there exists z 2 [x] \ [y]. By
de nition, we have both x z and y z. By the transitivity and symmetry of , we
then have x y, which contradicts x y. The contradiction shows that x y implies
[x] \ [y] = ;. In light of (6.40) and since x 2 [x], the converse is obvious and the proof is
complete.

The collection f[x] : x 2 Ag of all the indi erence classes is denoted by A= and is
sometimes called indi erence map. Thanks to the last lemma, A= forms a partition of A.

Let us continue the study of %. The next axiom does not concern the rationality, but
rather the information of the consumer.

Completeness: x % y or y % x for every x; y 2 A.

Completeness requires the consumer to be able to compare any two bundles of goods,
even very di erent ones. Naturally, to do so the consumer must, at least, have su cient
information about the two alternatives: it is easy to think of examples where this assumption
is unrealistic. So, completeness is a non-trivial assumption on preferences.
In any case, note that completeness requires, inter alia, that each bundle be comparable
to itself, that is, x % x. Thus, it implies re exivity.

Given the completeness assumption, the relations and are both exclusive (as seen
above) and exhaustive.

Lemma 263 Let % be complete. Given any two any bundles x and y, we always have either
x y or y x or x y.27
27
These \or" are intended as the Latin \aut".
6.8. GRAND FINALE: PREFERENCES AND UTILITY 169

Proof By completeness, we have x % y or y % x.28 Suppose, without loss of generality, that


x % y. One has y % x if and only if x y, while one does not have y % x if and only if
x y.

Since we are considering bundles of economic goods (and not of \bads"), it is natural
to assume monotonicity, i.e., that \more is better". The triad , >, and leads to three
possible incarnations of this simple principle of rationality:

Monotonicity: x y implies x % y for every x; y 2 A.

Strict monotonicity: x > y implies x y for every x; y 2 A.

Strong monotonicity: % is monotone and x y implies x y for every x; y 2 A.

The relationships among the three notions are similar to those seen for the analogous
notions of monotonicity studied (also for utility functions) in Section 6.4.4. For example,
strict monotonicity means that, given a bundle, an increase of the quantity of any good of
the bundle determines a strictly preferred bundle.
Similar considerations hold for the other notions. In particular, (6.26) takes the form:

strict monotonicity =) strong monotonicity =) monotonicity

when % is re exive.

6.8.2 Paretian utility


Although the preference % is the fundamental notion, it is analytically convenient to nd
a numerical representation of %, that is, a function u : A ! R such that, for each pair of
bundles x; y 2 A, we have
x % y () u(x) u(y) (6.42)
The function u is called (Paretian) utility function. It represents also the strict preference
and indi erence:

Lemma 264 We have


x y () u(x) = u(y) (6.43)
and
x y () u(x) > u(y) (6.44)

Proof We have

x y () x % y and y % x () u(x) u(y) and u (y) u (x) () u (x) = u (y)

which proves (6.43). Now consider (6.44). When y % x does not hold, we simply write y 6% x.
By de nition of , we have that

x y () x % y and y 6% x () u(x) u(y) and u (y) 6 u (x) () u (x) > u (y)


28
Here \or" is intended as the Latin \vel".
170 CHAPTER 6. FUNCTIONS (SDOGANATO)

This proves (6.44).

The equivalence (6.43) allows to represent the indi erence classes as indi erence curves
of the utility function:
[x] = fy 2 A : u (x) = u (y)g
Thus, when a preference admits a utility representation, (6.41) reduces to the standard
property that indi erence curves are disjoint (Section 6.3.1).
As already observed, in the ordinalist approach the utility function is a mere represen-
tation of the preference relation, without any special psychological meaning. Indeed, we
already noted that each strictly increasing function f : Im u ! R de nes an equivalent
utility function f u, for which it still holds that

x % y () (f u) (x) (f u) (y)

More is actually true: two functions represent the same underlying preference %, so they are
utility functions for it, if and only if they are a strictly increasing transformation one of the
other. This easily follows from the following result.

Proposition 265 Let g; h : A Rn ! R. We have that

g (x) g (y) () h (x) h (y) 8x; y 2 A (6.45)

if and only if there exists a strictly increasing f : Im g ! R such that h = f g.

Proof We rst prove the \if" part. By Proposition 221 and since f is strictly increasing, we
have
g (x) g (y) () f (g (x)) f (g (y)) () h (x) h (y)
proving (6.45). We next prove the \only if" part. De ne f : Im g ! R to be such that, for
each t 2 Im g,
f (t) = h (x)
where x 2 g 1 (t). Note that f is well de ned. Indeed, since t 2 Im g there always exists a
x 2 A such that g (x) = t, i.e., g 1 (t) 6= ;. Moreover, by (6.45), if x and y are such that
g (x) = g (y) = t, then h (x) = h (y). So, f is well de ned since the rule de ning f assigns to
each t in Im g a unique value in R (that is, h (x)) which does not depend on the speci c x
chosen in g 1 (t), but only on t. Note also that if x 2 A, by de ning t = g (x) we have that
f (g (x)) = f (t) = h (x). Since x was arbitrarily chosen, we conclude that h = f g. We are
left to show that f is strictly increasing. Consider t; s 2 Im g such that t > s. Then there
exists x; y 2 A such that g (x) = t > s = g (y). By (6.45), this implies that h (x) > h (y).
By the de nition of f , we conclude that f (t) = f (g (x)) = h (x) > h (y) = f (g (y)) = f (s),
proving that f is strictly increasing.

6.8.3 Existence and lexicographic preference


In view of all this, a key theoretical problem is to establish under which conditions a pref-
erence relation % admits a utility function. Things are easy when the consumption set is
nite.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 171

Theorem 266 Let % be a preference de ned on a nite set A. The following conditions are
equivalent:

(i) % is transitive and complete;

(ii) there exists a utility function u : A ! R.

Proof (i) implies (ii). Suppose % is transitive and complete. De ne u : A ! R by u (x) =


jfy 2 A : x % ygj. As the reader can check, we have x % y if and only if u(x) u(y), as
desired.
(ii) implies (i). Assume that there exists u : A ! R such that u (x) u (y) if and only
if x % y. The preference % is transitive. Indeed, let x; y; z 2 A be such that x % y and
y % z. By hypothesis, we have that u (x) u (y) and u (y) u (z). Since the order on R is
transitive, we obtain u (x) u (z) which in turn yields x % z, as desired. The preference % is
complete. Indeed, let x; y 2 A. Since u (x) and u (y) are scalars, we either have u (x) u (y)
or u (y) u (x) or both because the order on R is complete. Therefore, either x % y or
y % x or both, as desired.

Thus, if there is a nite number of alternatives, transitivity and completeness are neces-
sary and su cient conditions for the existence of a utility function. Matters become more
complicated when A is in nite: later we will present the famous lexicographic preferences on
R2+ , which do not admit any numerical representation. The next theorem solves the existence
problem on the key in nite set Rn+ . To this end we need a nal axiom, which reminds of the
Archimedean property of the real numbers seen in Section 1.4.3.29

Archimedean: given any three bundles x; y; z 2 Rn+ with x y z, there exist weights
; 2 (0; 1) such that
x + (1 )z y x + (1 )z
The axiom implies that there exist no in nitely preferred and no in nitely \unpreferred"
bundles. Given the preferences x y and y z, for the consumer the bundle x cannot be
in nitely better than y, nor the bundle z can be in nitely worse than y. Indeed, by suitably
combining the bundles x and z we get both a bundle better than y, that is, x+(1 )z, and
a bundle worse than y, that is, x + (1 )z. This would be impossible if x were in nitely
better than y, or if z were in nitely worse than y.
In this respect, recall the analogous property of real numbers: if x; y; z 2 R are three
scalars with x > y > z, there exist ; 2 (0; 1) such that

x + (1 )z > y > x + (1 )z (6.46)

The property does not hold if we consider 1 and 1, that is, the extended real line
R = [ 1; 1]. In this case, if y 2 R but x = +1 and/or z = 1, the scalar x is in nitely
greater than y, and z is in nitely smaller than y, and there are no ; 2 (0; 1) that satisfy
the inequality (6.46). Indeed, 1 = +1 and ( 1) = 1 for every ; 2 (0; 1), as seen
in Section 1.7.
29
For simplicity, we will assume that the consumption set A is the entire Rn + . The axiom can be stated
more generally for convex sets, an important notion that we will study in Chapter 17.
172 CHAPTER 6. FUNCTIONS (SDOGANATO)

In conclusion, the Archimedean axiom makes the bundles qualitatively homogeneous:


however di erent, they belong to the same league. Thanks to this axiom, we can now state
the existence theorem (its not simple proof is omitted).

Theorem 267 Let % be a preference de ned on A = Rn+ . The following conditions are
equivalent:

(i) % is transitive, complete, strictly monotone and Archimedean;

(ii) there exists a strictly monotone and continuous30 utility function u : A ! R.

This is a remarkable result: most economic applications use utility functions and the
theorem shows which conditions on preferences justify such use.31
To appreciate the importance of Theorem 267, we close the chapter with a famous ex-
ample of a preference that does not admit a utility function. Let A = R2+ and, given two
bundles x and y, write x % y if either x1 > y1 or x1 = y1 and x2 y2 . The consumer starts
by considering the rst coordinate: if x1 > y1 , then x % y. If, on the other hand, x1 = y1 ,
then he turns his attention to the second coordinate: if x2 y2 , then x % y.
The preference is inspired by how dictionaries order words; for this reason, it is called
lexicographic preference. In particular, we have x y if x1 > y1 or x1 = y1 and x2 > y2 ,
while we have x y if and only if x = y. The indi erence classes are therefore singletons, a
rst remarkable feature of this preference.
The lexicographic preference is complete, transitive and strictly monotone, as the reader
can easily verify. It is not Archimedean, however. Indeed, consider for example x = (1; 0),
y = (0; 1), and z = (0; 0). We have x y z and

x + (1 ) z = ( ; 0) y z 8 2 (0; 1)

which shows that the Archimedean axiom does not hold.


For this reason, Theorem 267 does not apply to the lexicographic preference, which
therefore cannot be represented by a strictly monotone and continuous utility function.
Actually, this preference does not admit any utility function at all.

Proposition 268 The lexicographic preference does not admit any utility function.

Proof Suppose, by contradiction, that there exists u : R2+ ! R that represents the lex-
icographic preference. Let a < b be any two positive scalars. For each x 0 we have
(x; b) (x; a) and, therefore, u (x; a) < u (x; b). By Proposition 42, there exists a rational
number q (x) such that u (x; a) < q (x) < u (x; b). The rule x 7 ! q (x) de nes, therefore, a
function q : R+ ! Q. It is injective. If x 6= y, say y < x, then:

u (y; a) < q (y) < u (y; b) < u (x; a) < q (x) < u (x; b)

and so q (x) 6= q (y). But, since R+ has the same cardinality of R, the injectivity of the
function q : R+ ! Q implies jQj jRj, contradicting Cantor's Theorem 277. This proves
that the lexicographic preference does not admit any utility function.
30
Continuity is an important property, to which Chapter 13 is devoted.
31
There exist other results on the existence of utility functions, mostly proved in the 1940s and in the 1950s.
Chapter 7

Cardinality (sdoganato)

7.1 Actual in nite and potential in nite


Ideally, a quantity can be always made larger by a unit increase, a set can always become
larger by adding an extra element, a segment can be subdivided into smaller and smaller
parts (of positive length) by continuing to cut it in half. Therefore, potentially, we have
arbitrarily large quantities and sets, as well as arbitrarily small segments. In these cases, we
talk of potential in nite. It is a notion that has been playing a decisive role in mathematics
since the dawn of Greek mathematics. The "- arguments upon which the study of limits is
based are a brilliant example of this, as it is the method of exhaustion upon which integration
relies.1
When the potential in nite realizes and becomes actual, we have an actual in nite. In
set theory, our main interest here, the actual in nite corresponds to sets formed by in nite
elements. Not in potentia (in power) but in act: a set with a nite number of grains of sand
to which we add more and more new grains is in nite in potentia, but not in act, because,
however large, the number of grains remains nite.2 Instead, a set that consists of in nite
grains of sand is in nite in the actual sense. It is, of course, a metaphysical notion that
only the eye of the mind can see: (sensible) reality is necessarily nite. Thus, actual in nite,
starting from Aristotle, to whom the distinction between the two notions of in nite dates
back, was considered with great suspicion { summarized with the Latin saying in nitum actu
non datur .3 On the other hand, the dangers of a naive approach, based purely on intuition,
to the actual in nite had been masterfully highlighted already in Presocratic times by some
1
The "- arguments will be seen in Chapters 8 and 12. The potential in nite will come into play when,
for example, we will consider " > 0 arbitrarily small (but always non-zero) or n arbitrarily large (yet nite).
In Chapter 44 we will study in detail the role of the method of exhaustion in integration.
2
Archimedes, who masterly used the method of exhaustion to compute some remarkable areas, in his work
Arenarius argued that about 8 1063 grains of sand are enough to ll the universe. It is a huge, but nite,
number.
3
In a conference held in 1925, David Hilbert described these notions of in nite with the following words
\Someone who wished to characterize brie y the new conception of the in nite which Cantor introduced
might say that in analysis we deal with the in nitely large and the in nitely small only as limit concepts,
as something becoming, happening, i.e., with the potential in nite. But this is not the true in nite. We
meet the true in nite when we regard the totality of numbers 1; 2; 3; 4; ::: itself as a completed unity, or when
we regard the points of an interval as a totality of things which exists all at once. This kind of in nity is
known as actual in nity." (Trans. in P. Benacerraf and H. Putnam, \Philosophy of mathematics", Cambridge
University Press, 1964).

173
174 CHAPTER 7. CARDINALITY (SDOGANATO)

of the celebrated paradoxes of Zeno of Elea.


All this did change, after more than twenty centuries, with the epoch-making work of
Georg Cantor. Approximately between 1875 and 1885, Cantor revolutionized mathematics
by nding the key concept (bijective functions) that allows for a rigorous study of sets, nite
and in nite, thus putting the notion of set at the foundations of mathematics. It is not by
chance that our book starts with such a notion. The rest of the chapter is devoted to the
Cantorian study of in nite sets, in particular of their cardinality.

7.2 Bijective functions and cardinality


Bijective functions, introduced in the last chapter, are fundamental in mathematics because
criteria of similarity between mathematical entities are often based on them. Cantor's study
of the cardinality of in nite sets is, indeed, a magni cent example of this role of bijective
functions.
We start by considering a nite set A, that is, a set with a nite number of elements. We
call the number of elements of the set A the cardinality (or power ) of A. We usually denote
it by jAj.

Example 269 The set A = f11; 13; 15; 17; 19g of the odd integers between 10 and 20 is
nite, with jAj = 5. N

Thanks to Proposition 207, two nite sets have the same cardinality if and only if their
elements can be put in a bijective correspondence. For example, if we have seven seats and
seven students, we can assign one (and only one) seat to each student, say by putting a name
tag on it. All this motivates the following de nition which elaborates on what we mentioned
at the end of Section 6.4.1.

De nition 270 A set A is nite if it can be put in a bijective correspondence with a subset
of the form f1; 2; :::; ng of N. In this case, we write jAj = n.

In other words, A is nite if there exist a set f1; 2; :::; ng of natural numbers and a
bijective function f : f1; 2; :::; ng ! A. The set f1; 2; :::; ng can be seen as the \prototypical"
set of cardinality n, a benchmark that permits to \calibrate" all the other nite sets of the
same cardinality via bijective functions.

This de nition provides a functional angle on the cardinality of nite sets, based on
bijective functions and on the identi cation of a prototypical set. For nite sets, however, this
angle is not much more than a curiosity. Yet, it becomes fundamental when we want to extend
the notion of cardinality to in nite sets. This was the key insight of Georg Cantor that, by
nding the right angle, led to the birth of the theory of in nite sets. Indeed, the possibility
of establishing a bijective correspondence among in nite sets allows for a classi cation of
these sets by \size" and leads to the discovery of deep and surprising properties.

De nition 271 A set A is said to be countable if it can be put in a bijective correspondence


with the set N of the natural numbers. In this case, we write jAj = jNj.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 175

In other words, A is countable if there exists a bijective function f : N ! A, that is, if


the elements of the set A can be ordered in a sequence a0 ; a1 ; :::; an ; ::: (i.e., 0 corresponds
to f (0) = a0 , 1 to f (1) = a1 , and so on). The set N is, therefore, the \prototype" for
countable sets: any other set is countable if it is possible to pair in a bijective fashion (as the
aforementioned seats and students) its elements with those of N. This is the rst category
of in nite sets that we encounter.

Relative to nite sets, countable sets immediately exhibit a remarkable, possibly puzzling,
property: it is always possible to put a countable set into a bijective correspondence with
an in nite proper subset of it. In other words, losing elements might not a ect cardinality
when dealing with countable sets.

Theorem 272 Each in nite subset of a countable set is also countable.

Proof Let X be a countable set and let A X be an in nite proper subset of X, i.e.,
A 6= X. Since X is countable, its elements can be listed as a sequence of distinct elements
X = fx0 ; x1 ; : : : ; xn ; : : :g = fxi gi2N . Let us denote by n0 the smallest integer larger than
or equal to 0 such that xn0 2 A (if, for example, x0 2 A, we have n0 = 0, if x0 2 = A and
x1 2 A we have n0 = 1, and so on). Analogously, let us denote by n1 the smallest integer
(strictly) larger than n0 such that xn1 2 A. Given n0 ; n1 ; : : : ; nj , with j 1, let us de ne
nj+1 as the smallest integer larger than nj such that xnj+1 2 A. Consider now the function
f : N ! A de ned by f (i) = xni , with i = 0; 1; : : : ; n; : : :. It is easy to check that f is a
bijective function between N and A, so A is countable.

The following example should clarify the scope of the previous theorem. The set E of
even numbers is, clearly, a proper subset of N that we may think contains only \half" of
its elements. Nevertheless, it is possible to establish a bijective correspondence with N by
putting in correspondence each even number 2n to its half n, that is,

2n 2 E !n2N

Therefore, jEj = jNj. Already Galileo realized this remarkable peculiarity of in nite sets,
which clearly distinguishes them from nite sets, whose proper subsets have always smaller
cardinality.4 In a famous passage of the Discorsi e dimostrazioni matematiche intorno a due
nuove scienze,5 published in 1638, he observed that the natural numbers can be put in a
bijective correspondence with their squares by setting n2 $ n. The squares, which prima
facie seem to form a rather small subset of N, are thus in equal number with the natural
numbers: \in an in nite number, if one could conceive of such a thing, he would be forced
4
The mathematical fact considered here is at the basis of several little stories. For example, The Paradise
Hotel has countably in nite rooms, progressively numbered 1; 2; 3; . At a certain moment, they are all
occupied when a new guest checks in. At this point, the hotel manager faces a conundrum: how to nd a
room for the new guest? Well, after some thought, he realizes that it is easier than he imagined! It is enough
to ask every other guest to move to the room coming after the one they are actually occupying (1 ! 2; 2 ! 3;
3 ! 4, etc.). In this way, room number 1 will become free. He also realizes that it is possible to improve
upon this new arrangement! It is enough to ask everyone to move to the room with a number which is twice
the one of the room actually occupied (1 ! 2; 2 ! 4; 3 ! 6, etc.). In this way, in nite rooms will become
available: all the odd ones.
5
The passage is in a dialogue between Sagredo, Salviati, and Simplicio, during the rst day.
176 CHAPTER 7. CARDINALITY (SDOGANATO)

to admit that there are as many squares as there are numbers all taken together". The
clarity with which Galileo exposes the problem is worthy of his genius. Unfortunately, the
mathematical notions available to him were completely insu cient for further developing
his intuitions. For example, the notion of function, fundamental for the ideas of Cantor,
emerged (in a primitive form) only at the end of the seventeenth century in the works of
Leibniz.

Clearly, the union of a nite number of countable sets is also countable. Much more is
actually true.

Theorem 273 The union of a countable collection of countable sets is also countable.

Before providing a proof of this theorem, we give an heuristic argument. Denote by


fAn g1
n=1 the countable collection of the countable sets. The result claims that their union
[1
An is a countable set. Since each set An is countable, we can list their elements as follows:
n=1

A1 = fa11 ; a12; :::a1n ; :::g ; A2 = fa21 ; a22; :::a2n ; :::g ; ::: An = fan1 ; an2; :::ann ; :::g ; :::

We can then construct an in nite matrix A in which the elements of the set An form the
n-th row: 2 3
a11 a12 a13 a14 a15
6 a21 a22 a23 a24 a25 7
6 7
6 a31 a32 a33 a34 a35 7
A=6 6 a41 a42
7
7 (7.1)
6 a43 a44 a45 7
4 a51 a52 a53 a54 a55 5

1
[
The matrix A contains at least as many elements as the union An . Indeed, it may contain
n=1
more elements because some elements can be repeated more than once in the matrix, while
they would only appear once in the union (net of such repetitions, the two sets have the
same number of elements).
We now introduce another in nite matrix, denoted by N , which contains all the natural
numbers except 0.
2 3
1 3 6 10 15
6 2 5 9 14 7
6 7
6 4 8 13 7
N =6 6 7 (7.2)
7
6 7 12 7
4 11 5

Observe that:

1. The rst diagonal of A (moving from SW to NE) consists of one element: a11 . We map
this element into the natural number 1, which is the corresponding element in the rst
diagonal of N . Note that the sum of the indexes of a11 is 1 + 1 = 2.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 177

2. The second diagonal of A consists of two elements: a21 and a12 . We map these elements,
respectively, into the natural numbers 2 and 3, which are the corresponding elements
in the second diagonal of N . Note that the sum of the indexes of a21 and a12 is 3.

3. The third diagonal of A consists of three elements: a31 , a22 , and a13 . We map these
elements, respectively, into the natural numbers 4, 5, and 6, which are the correspond-
ing elements in the third diagonal of N . Note that the sum of the indexes of a31 , a22 ,
and a13 is 4.

4. The fourth diagonal of A consists of four elements: a41 , a32 , a23 , and a14 . We map
these elements, respectively, into the natural numbers 7, 8, 9, and 10, which are the
corresponding elements in the fourth diagonal of N . Note that the sum of the indexes
of a41 , a32 , a23 , and a14 is 5.

These four steps can be illustrated as follows:

0.9

0.8
a a a a ...
11 12 13 14
0.7

0.6 a a a a ...
21 22 23 24
0.5
a a a a ...
0.4 31 32 33 34

0.3
a a a a ...
41 42 43 44
0.2

0.1
..........................

0
0 0.2 0.4 0.6 0.8 1

At each step we have an arrow, indexed by the sum of the indexes of the entries that it hits,
minus 1. So, arrow 1 hits entry a11 , arrow 2 hits entries a21 and a12 , arrow 3 hits entries
a31 , a22 , and a13 , and arrow 4 hits entries a41 , a32 , a23 , and a14 . Each arrow hits one more
entry than the previous one.
Intuitively, by proceeding in this way we cover the entire matrix A with countably many
arrows, each hitting a nite number of entries. So, matrix A has countably many entries.
1
[
The union An is then a countable set.
n=1
That said, next we give a rigorous proof.

Proof of Theorem 273 We rst prove two auxiliary claims.

Claim 1 N N is countable.

Proof Claim 1 Consider the function f1 : N N ! N given by f1 (m; n) = 2n+1 3m+1 . Note
that f1 (m; n) = f1 (m; n) means that 2n+1 3m+1 = 2n+1 3m+1 . By the Fundamental Theorem
of Arithmetic, this implies that n+1 = n+1 and m+1 = m+1, proving that (m; n) = (m; n).
178 CHAPTER 7. CARDINALITY (SDOGANATO)

Thus, f1 is injective and f1 : N N ! Im f1 is bijective. At the same time, by Theorem


272 and since Im f1 is in nite (indeed, it contains the set 2 3; 22 3; :::; 2n 3; ::: ), it
follows that Im f1 is countable, that is, there exists a bijection f2 : N ! Im f1 . The reader
can easily verify that the map f = f1 1 f2 is a bijection from N to N N, proving that
N N is countable.

Claim 2 If g : N ! B is surjective and B is in nite, then B is countable.

Proof Claim 2 De ne h1 : B ! N by h1 (b) = min fn 2 N : g (n) = bg for all b 2 B. Since


g is surjective, fn 2 N : g (n) = bg is non-empty for all b 2 B, thus h1 is well-de ned. Note
that b 6= b0 implies that h1 (b) 6= h1 (b0 ), thus h1 is injective. It follows that h1 : B ! Im h1 is
bijective. At the same time, by Theorem 272 and since Im h1 is in nite (B is in nite), there
exists a bijection h2 : N ! Im h1 . The reader can easily verify that the map h = h1 1 h2 is
a bijection from N to B, thus proving that B is countable.

We are ready to prove the result. Consider the countable collection

A0 ; A1 :::; Am ; ::: (7.3)


S
and de ne B = +1 m=0 Am . Since each Am is countable, clearly B is in nite and there exists
a bijection gm : N ! Am . De ne the map g^ : N N ! B by the rule g^ (m; n) = gm (n). In
other words, the rst natural number m chooses the set while the second natural number
chooses the n-th element of that set. The map g^ is surjective, for, given an element b 2 B, it
belongs to Am for some m and it is paired to a natural number n by the map gm (n), that is,
g^ (m; n) = gm (n) = b. Unfortunately, g^ might not be injective, since the sets in (7.3) might
have elements in common. If we consider g = g^ f where f is like in Claim 1, this function
is from N to B and it is surjective. By Claim 2, it follows that B is countable, thus proving
the result.

With a similar argument it is possible to prove that also the Cartesian product of a nite
number of countable sets is countable. Moreover, the previous result yields that the set of
rational numbers is countable.

Corollary 274 Z and Q are countable.

Proof We rst prove that Z is countable. De ne f : N ! Z by the rule


( n
2 if n is even
f (n) = (n+1)
2 if n is odd

The reader can verify that f is bijective, thus proving that Z is countable. On the other
hand, the set nm o
Q= : m 2 Z and 0 6= n 2 N
n
of rational numbers can be written as union of in nitely many countable sets:
+1
[
Q= An
n=1
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 179

where
0 1 1 2 2 m m
An = ; ; ; ; ;:::; ; ;:::
n n n n n n n

Each An is countable because it is in a bijective correspondence with Z, which, in turn, is


countable. By Theorem 273, it follows that Q is countable.

This corollary is quite surprising: though the rational numbers are much more numerous
than the natural numbers, there exists a way to put these two classes of numbers into
a bijective correspondence. The cardinality of N, and so of any countable set, is usually
denoted by @0 , that is, jNj = @0 . We can then write as

jQj = @0

the remarkable property that Q is countable.6


At this point, we might suspect that all in nite sets are countable. The next result of
Cantor shows that this is not the case: the set R of real numbers is in nite but not countable,
its cardinality being higher than @0 . To establish this fundamental result, we need a new
de nition and an interesting result.

De nition 275 A set A has the cardinality of the continuum if it can be put in a bijective
correspondence with the set R of the real numbers. In this case, we write jAj = jRj.

The cardinality of the continuum is often denoted by c, that is, jRj = c. Also in this case
there exist subsets that are, prima facie, much smaller than R but turn out to have the same
cardinality. Let us see an example which will be useful in proving that R is uncountable.

Proposition 276 The interval (0; 1) has the cardinality of the continuum.7

Proof We want to show that j(0; 1)j = jRj. To do this we have to show that the numbers of
(0; 1) can be put in a bijective correspondence with those of R. The bijection f : R ! (0; 1)
de ned by
(
1 21 ex if x < 0
f (x) = 1 x
2e if x 0

6
@ (aleph) is the rst letter of the Hebrew alphabet. In the next section we will formalize also for in nite
sets the notion of same or greater cardinality. For the time being, we treat these notions intuitively.
7
At the end of Section 6.5.3 we noted that the trigonometric function f : R ! ( 1; 1) de ned by
(2= ) arctan x is a bijection In view of what we learned so far, this shows that ( 1; 1) has the cardinal-
ity of the continuum.
180 CHAPTER 7. CARDINALITY (SDOGANATO)

with graph
2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

shows that, indeed, this is the case (as the reader can also formally verify).

Theorem 277 (Cantor) R is uncountable, that is, jRj > @0 .8

Proof Assume, by contradiction, that R is countable. Hence, there exists a bijective function
g : N ! R. By Proposition 276, it follows that there exists a bijective function f : R ! (0; 1).
The reader can easily prove that f g is a bijective function from N to (0; 1), yielding that (0; 1)
is countable. We will next reach a contradiction, showing that (0; 1) cannot be countable. To
this end, we write all the numbers in (0; 1) using their decimal representation: each x 2 (0; 1)
will be written as
x = 0:c0 c1 cn
with ci 2 f0; 1; :::; 9g, using always in nitely many digits (for example 3:54 will be written
3:54000000 : : :). Since until now we obtained that (0; 1) is countable, there exists a way to
list its elements as a sequence.

x0 = 0:c00 c01 c02 c03 c0n


x1 = 0:c10 c11 c12 c13 c1n
x2 = 0:c20 c21 c22 c23 c2n

and so on. Let us take then the number x = 0:d0 d1 d2 d3 dn such that its generic
decimal digit dn is di erent from cnn (but without choosing in nitely many times 9, thus to
avoid a periodic 9 which, as we know, does not exist on its own). The number x belongs
to (0; 1), but sadly does not belong to the list written above since dn 6= cnn (and therefore
it is di erent from x0 since d0 6= c00 , from x1 since d1 6= c11 , etc.). We conclude that the
8
In Section ?? below, we formally de ne the inequality jRj > @0 , whose intuitive meaning should be
nonetheless clear.
7.3. A PANDORA'S BOX 181

list written above cannot be complete and hence the numbers of (0; 1) cannot be put in a
bijective correspondence with N. So, the interval (0; 1) is not countable, a contradiction.

The set R of real numbers is, therefore, much richer than N and Q. The rational numbers
{ which have, as we remarked, a \quick rhythm" { are comparatively very few with respect to
the real numbers: they form a kind of ne dust that overlaps with the real numbers without
covering them all. At the same time, it is dust so ne that between any two real numbers,
no matter how close they are, there are particles of it.
Summing up, the real line is a new prototype of in nite set.

It is possible to prove that both the union and the Cartesian product of a nite or
countable collection of sets that have the cardinality of the continuum has, in turn, the
cardinality of the continuum. This has the next consequence.

Theorem 278 Rn has the power of the continuum for each n 1.

This is another remarkable nding, which is surprising already in the special case of the
plane R2 that, intuitively, may appear to contain many more points than the real line. It is
in front of results of this type, so surprising for our \ nitary" intuition, that Cantor wrote
in a letter to Dedekind \I see it, but I do not believe it". His key intuition on the use of
bijective functions to study the cardinality of in nite sets opened a new and fundamental
area of mathematics, which is also rich in terms of philosophical implications (mentioned at
the beginning of the chapter).

7.3 A Pandora's box


The symbols @0 and c are called in nite cardinal numbers. The role played by the natural
numbers in representing the cardinality of nite sets is now played by the cardinal numbers
@0 and c for the in nite sets N and R. For this reason, the natural numbers are also called
nite cardinal numbers. The cardinal numbers

0; 1; 2; :::; n; :::; @0 , and c (7.4)

represent, therefore, the cardinality of the prototype sets

;; f1g ; f1; 2g ; :::; f1; 2; :::; ng ; :::; N, and R

respectively. Looking at (7.4), it is natural to wonder whether @0 and c are the only in nite
cardinal numbers. As we will see shortly, this is far from being true. Indeed, we are about
to uncover a genuine Pandora's box (from which, however, no evil will emerge but only
wonders). To do this, we rst need to generalize to any pairs of sets the comparative notion
of size we considered in De nitions 271 and 275.

De nition 279 Two sets A and B have the same cardinality if there exists a bijective
function f : A ! B. In this case, we write jAj = jBj.
182 CHAPTER 7. CARDINALITY (SDOGANATO)

In particular, when A is nite we have jAj = jf1; :::; ngj = n, when A is countable we
have jAj = jNj = @0 , and when A has the cardinality of the continuum we have jAj = jRj = c.
We denote by 2A the power set of the set A, that is, the collection

2A = fB : B Ag

of all its subsets. The notation 2A is justi ed by the cardinality of the power set in the nite
case, as we next show.

Proposition 280 If jAj = n, then 2A = 2n .


n
Proof Combinatorial analysis shows immediately that 2A contains 1 = 0 empty set, n1
sets with one element, n2 sets with two elements,..., nn 1 sets with n 1 elements, and
n
n = 1 sets with all the n elements. Therefore,

n n n n
2A = 1 + + + + +
1 2 n 1 n
n
X n k n k
= 1 1 = (1 + 1)n = 2n
k
k=0

where the penultimate equality follows from Newton's binomial formula (B.7).

Example 281 More compactly, we can write 2A = 2jAj . For instance, when A has three
elements, say A = fa1 ; a2 ; a3 g, its power set

2A = f;; fa1 g ; fa2 g ; fa3 g ; fa1 ; a2 g ; fa1 ; a3 g ; fa2 ; a3 g ; Ag

has cardinality 23 = 8. N

Sets can have the same size, but also di erent sizes. This motivates the following de ni-
tion:

De nition 282 Given any two sets A and B, we say that:

(i) A has cardinality less than or equal to B, written jAj jBj, if there exists an injective
function f : A ! B;

(ii) A has cardinality strictly less than B, written jAj < jBj, if jAj jBj and jAj =
6 jBj.

Next we list a few properties of these comparative notions of cardinality.

Proposition 283 Let A, B, and C be any three sets. Then:

(i) jAj jAj;

(ii) jAj jBj and jBj jCj imply that jAj jCj;

(iii) jAj jBj and jBj jAj if and only if jAj = jBj;

(iv) A B implies that jAj jBj.


7.3. A PANDORA'S BOX 183

Example 284 We have jNj < jRj. Indeed, by Theorem 277 jNj 6= jRj and, by point (iv),
N R implies jNj jRj. N

Properties (i) and (ii) say that the order is re exive and transitive (cf. Appendix
A). As for property (iii), it tells us that and = are related in a natural way. Finally,
(iv) con rms the intuitive idea that smaller sets have a smaller cardinality. Remarkably,
this intuition does not carry over to < { i.e., A B does not imply jAj < jBj { because,
as already noted, a proper subset of an in nite set may have the same cardinality as the
original set (as Galileo had envisioned).

Proof We start by proving an auxiliary fact: the composition of injective functions is


injective, that is, if f : A ! B and g : B ! C are injective, so is g f . For, set h = g f .
Assume that h (a) = h (a0 ). Denote b = f (a) and b0 = f (a). By the de nition of h, we
have g (b) = g (b0 ). Since g is injective, this implies b = b0 , that is, f (a) = f (a0 ). Since f is
injective, we conclude that a = a0 , proving that h is injective.
(i) Let f : A ! A be the identity, that is, f (a) = a for all a 2 A. The function f is
trivially injective and the statement follows.
(ii) Since jAj jBj, there exists an injective function f : A ! B. Since jBj jCj, there
exists an injective function g : B ! C. Next, note that h = g f : A ! C is well-de ned
and, by the initial part of the proof, we also know that it is injective, thus proving that
jAj jCj.
(iii) We only prove the \if" part.9 By de nition and since jAj = jBj, there exists a
bijection f : A ! B. Since f is bijective, it follows that f 1 : B ! A is well-de ned and
bijective. Thus, both f : A ! B and f 1 : B ! A are injective, yielding that jAj jBj and
jBj jAj.
(iv) De ne f : A ! B by the rule f (a) = a. Since A B, the function f is well-de ned
and, clearly, injective, thus proving the statement.

When a set A is nite and non-empty, we clearly have jAj < 2A . Remarkably, the
inequality continues to hold for in nite sets.

Theorem 285 (Cantor) For each set A, nite or in nite, we have jAj < 2A .

Proof Consider a set A and the collection of all singletons C = ffagga2A . It is immediate to
see that there is a bijective mapping between A and C, that is, jAj = jCj, and C 2A . Since
jCj 2A , we conclude that jAj 2A . Next, by contradiction, assume that jAj = j2A j.
Then there exists a bijection between A and 2A which associates to each element a 2 A an
element b = b (a) 2 2A and vice versa: a $ b. Observe that each b (a), being an element of
2A , is a subset of A. Consider now all the elements a 2 A such that the corresponding subset
b (a) does not contain a. Call S the subset of these elements, that is, S = fa 2 A : a 62 b (a)g.
Since S is a subset of A, S 2 2A . Since we have a bijection between A and 2A , there must
exist an element c 2 A such that b (c) = S. We have two cases:

(i) if c 2 S, then by the de nition of S, b (c) does not contain c, so c 2


= b (c) = S;
9
The \only if" part is the content of the Schroeder-Bernstein's Theorem which we leave to more advanced
courses.
184 CHAPTER 7. CARDINALITY (SDOGANATO)

(ii) if c 2
= S, then by the de nition of S, b (c) contains c, so c 2 b (c) = S.

In both cases, we have reached a contradiction, thus proving jAj < j2A j.

Cantor's Theorem o ers a simple way to make a \cardinality jump" starting from a given
set A: it is su cient to consider the power set 2A . For example,
R
2R > jRj ; 22 > j2R j

and so on. We can, therefore, construct an in nite sequence of sets of higher and higher
cardinality. In this way, we enrich (7.4) that now becomes
n R
o
0; 1; 2; :::; n; :::; @0 ; c; 2R ; 22 ; ::: (7.5)

Here is the Pandora's box mentioned above, which Theorem 285 has allowed us to uncover.
The breathtaking sequence (7.5) is only the incipit of the theory of the in nite sets, whose
study (even the introductory part) would take us too far away.
Before moving on with the book, however, we consider a nal famous aspect of the
theory, the so-called continuum hypothesis (which the reader might have already heard of).
By Theorem 285, we know that 2N > jNj. On the other hand, by Theorem 277 we also
have jRj > jNj. The next result (we omit its proof) shows that these two inequalities are
actually not distinct.

Theorem 286 2N = jRj.

Therefore, the power set of N has the cardinality of the continuum. The continuum
hypothesis states that there is no set A such that

jNj < jAj < jRj

That is, there is no in nite set of cardinality intermediate between @0 and c. In other words,
a set that has cardinality larger than @0 must have at least the cardinality of the continuum.
The validity of the continuum hypothesis is the rst among the celebrated Hilbert prob-
lems, posed by David Hilbert in 1900, and represents one of the deepest questions in math-
ematics. By adopting this hypothesis, it is possible to set

@1 = c

and to consider the cardinality of the continuum as the second in nite cardinal number @1
after the rst one @0 = jNj.
The continuum hypothesis can be reformulated in a suggestive way by writing

@1 = 2@0

That is, the smallest cardinal number greater than @0 is equal to the cardinality of the power
set of N or, equivalently, of any set of cardinality @0 (like, for example, the rational numbers).
The generalized continuum hypothesis implies that, for each n, we have

@n+1 = 2@n
7.4. CODA: WHAT IS A NATURAL NUMBER? 185

All the jumps of cardinality in (7.5), not only the rst one from @0 to @1 , are thus obtained
by considering power sets. Therefore,

R 2R
@ 2 = 22 ; @3 = 22

and so on. At this point, (7.5) becomes

f0; 1; 2; :::; n; :::; @0 ; @1 ; @2 ; @3 ; :::g

The elements of this sequence are the cardinal numbers that represent all the di erent car-
dinalities ( nite or in nite) that sets might have, however large they might be. According
to the generalized continuum hypothesis, the power sets in (7.5) are the prototype sets of
the in nite cardinal numbers (the rst two being the two in nite cardinal numbers @0 = jNj
and @1 = c with which we started this section).

Summing up, the depth of the problems that the use of bijective functions opened is
incredible. As we have seen, this study started by Cantor is, at the same time, rigorous
and intrepid { as typical of the best mathematics, at the basis of its beauty. It relies on
the use of bijective functions to capture the fundamental principle of similarity (in terms of
numerosity) among sets.10

7.4 Coda: what is a natural number?


Natural numbers are the simplest kind of number, as the \natural" terminology suggests {
Leopold Kronecker allegedly said in 1886 that \God made the integers, all else is the work
of man." Yet, they are cultural constructs, with the possible exception of 1 and 2, the two
numbers that account for the basic equal-distinct dichotomy: if x and y are alternatives,
either x = y or x 6= y. In some cultures only these two numbers seem to exist, with a \one-
two-many" counting system: this is all they need in terms of numeracy to survive, often in
non-simple environments (see Gordon, 2004). For instance, maybe they do not need to know
exact ages, for their purposes it might be enough to know whether somebody is, say, young
or old.11
In view of what we learned in this chapter, a possibility is to view a natural number as
an equivalence class formed by all nite sets with the same cardinality,12 among which a
bijection exists. For example, 2 is the equivalence class of all sets with two elements. Since
a \concrete" way to identify equivalence classes is via representatives, a natural question
then arises: what are the natural representatives, benchmarks, for these equivalence classes?
Moreover, what about declaring directly these representatives, for \concreteness", as the
natural numbers?
Let us start with 0. This number corresponds to the equivalence class formed by all sets
with no elements. But, such a class is a singleton consisting of the empty set, so we de ne
10
The reader who wants to learn more about set theory can consult Halmos (1960), Suppes (1960) as well
as Lombardo Radice (1981).
11
Pica et al. (2004) claim that Western children begin to become familiar with natural numbers at around
the age of 3.
12
Equivalence classes are introduced in Appendix A.
186 CHAPTER 7. CARDINALITY (SDOGANATO)

0 as the empty set, i.e.,


0=;
What about 1? To address this question, we need a set-theoretic notion. Given a nite
set A, de ne its successor set by
A+ = A [ fAg
In words, A+ consists of the elements of the set A, with in addition the set A itself. For
instance, if A = fAda, Barbarag, then A+ = fAda, Barbara, fAda, Barbaragg. Clearly,
jA+ j = jAj + 1. So, successor sets increase by 1 the cardinality of a nite set.
All this suggests to de ne 1 as the successor set 0+ = ; [ f;g of 0, that is,
1 = 0+ = f;g
So, we select f;g as the representative for the equivalence class of singletons { here identi ed
as the class of sets [f;g] that have the same cardinality than the set f;g { and we de ne the
natural number 1 to be the set f;g (a speci c representative of this equivalence class but,
arguably, the most \natural" one to select).
What about 2? The same reasoning suggests to de ne 2 as the successor set of 1, that
is,
2 = 1+ = f;; f;gg
Here we select f;; f;gg as the representative for the equivalence class of sets that have two
elements and we de ne the natural number 2 to be the set f;; f;gg.
By iterating, the sequence of \representative" sets
;; f;g ; ff;g ; ;g ; ff;g ; ;; ff;g ; ;gg ; :::; f;; f;g ff;; f;ggg ; f;; f;g ff;; f;gggg ; :::g ; :::
de nes the natural numbers f1; 2; 3; :::; n; :::g as follows:

0=;
1 = f;g
2 = f;; f;gg
3 = f;; f;g ff;; f;gggg

n = f;; f;g ff;; f;ggg ; f;; f;g ff;; f;gggg ; :::g

It is useful to rewrite them as


0=;
1 = f0g
2 = f0; 1g
3 = f0; 1; 2g

n = (n 1)+ = f0; 1; :::; n 1g


7.4. CODA: WHAT IS A NATURAL NUMBER? 187

because this clari es a key feature of this approach: natural numbers are de ned as sets that
consist of their own predecessors.
This beautiful iterative construction of the natural numbers was introduced by John von
Neumann in a 1923 article (so, when he was twenty years old). Ex nihilo nihil, but maybe
the von Neumann construction is an exception: from the empty set, the natural numbers!
188 CHAPTER 7. CARDINALITY (SDOGANATO)
Part II

Discrete analysis

189
Chapter 8

Sequences (sdogonato)

8.1 The concept


A numerical sequence is an in nite, endless, \list" of real numbers, for example

f2; 4; 6; 8; :::g (8.1)

where each number occupies a place of order, a position, so it follows (except the rst one)
a number and precedes another one. The next de nition formalizes this idea. We denote by
N+ the set of the natural numbers without 0.

De nition 287 A function f : N+ ! R is called a sequence of real numbers.

In other words, a sequence is a function that associates to each natural number n 1a


real number f (n). In (8.1), to each n we associate f (n) = 2n, that is,

n 7 ! 2n (8.2)

and so we have the sequence of even integers (that are strictly positive). The image f (n)
is usually denoted by xn . With such notation, the sequence of even integers is xn = 2n for
each n 1. The images xn are called terms (or elements) of the sequence. We will denote
sequences by fxn g1
n=1 , or brie y by fxn g.
1

There are di erent ways to de ne a speci c sequence fxn g, that is, to describe the
underlying function f : N+ ! R. A rst possibility is to describe it in closed form through
a formula: for instance, this is what we did with the sequence of the even numbers using
(8.2). Other de ning rules are, for example,

n 7 ! 2n 1 (8.3)
n 7 ! n2 (8.4)
1
n7 ! p (8.5)
2n 1

1
The choice of starting the sequence from n = 1 instead of n = 0 (or of any other natural number k) is a
mere convention. When needed, it is perfectly legitimate to consider sequences fxn g1n=0 or, more generally,
fxn g1
n=k .

191
192 CHAPTER 8. SEQUENCES (SDOGONATO)

Rule (8.3) de nes the sequence of odd integers

f1; 3; 5; 7; :::g (8.6)

while rule (8.4) de nes the sequence of the squares

f1; 4; 9; 16; :::g

and rule (8.5) de nes the sequence

1 1 1
1; p ; p ; p ; ::: (8.7)
2 4 8
To de ne a sequence in closed form thus amounts to specify explicitly the underlying function
f : N+ ! R. The next example presents a couple of classic sequences de ned in closed form.

Example 288 The sequence of unit fractions with xn = 1=n, that is,

1 1 1 1
1; ; ; ; ; :::
2 3 4 5

is called harmonic,2 while the sequence with xn = aq n 1, that is,

a; aq; aq 2 ; aq 3 ; aq 4 ;

is called geometric (or geometric progression) with rst term a and common ratio q. For
example, if a = 1 and q = 1=2, we have f1; 1=2; 1=4; 1=8; 1=16; :::g. N

Another important way to de ne a sequence is by recurrence (or recursion). Consider


the famous Fibonacci sequence

f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g

in which each term is the sum of the two terms that precede it, with xed initial values 0
and 1. For example, in the fourth position we nd the number 2, i.e., the sum 1 + 1 of the
two terms that precede it, in the fth position we nd the number 3, i.e., the sum 1 + 2 of
the two terms that precede it, and so on. The underlying function f : N+ ! R is, hence,
(
f (1) = 0 ; f (2) = 1
(8.8)
f (n) = f (n 1) + f (n 2) for n 3

We have two initial values, f (1) = 0 and f (2) = 1, and a recursive rule that allows to
compute the term in position n once the two preceding terms are known. Di erently from
the sequences de ned through a closed formula, such as (8.3)-(8.5), to obtain the term xn
we now have to rst construct, using the recursive rule, all the terms that precede it. For
example, to compute the term x100 in the sequence of the odd numbers (8.6), it is su cient
to substitute n = 100 in formula (8.3), nding x100 = 199. In contrast, to compute the term
2
Indeed, 1=2; 1=3; 1=4; ::: are the positions in which we have to put a nger on a vibrating string to obtain
the di erent notes.
8.1. THE CONCEPT 193

x100 in the Fibonacci sequence we rst have to construct by recurrence the rst 99 terms of
the sequence. Indeed, it is true that to determine x100 it is su cient to know the values of
x99 and x98 and then to use the rule x100 = x99 + x98 , but to determine x99 and x98 we must
rst know x97 and x96 , and so on.
Therefore, the recursive de nition of a sequence consists of one or more initial values and
of a recurrence rule that, by starting from them, allows to compute the subsequent terms of
the sequence. The initial values are arbitrary. For example, if in (8.8) we choose f (1) = 2
and f (2) = 1 we have the following Fibonacci sequence

f2; 1; 3; 4; 7; 11; 18; 29; 47; :::g

In the next example we de ne by recurrence a classic sequence.3

Example 289 Given any a; b 2 R, de ne f : N+ ! R by


(
f (1) = a
f (n) = f (n 1) + b for n 2

Starting from the initial value f (1) = a, it is possible to construct the entire sequence
through the recursive formula f (n) = f (n 1) + b. This is the so-called arithmetic sequence
(or arithmetic progression) with rst term a and common di erence b. For example, if a = 2
and b = 4, we have f2; 6; 10; 14; 18; 22; :::g. N

To ease notation, the underlying function f is often omitted in recursive formulas. For
instance, the arithmetic sequence is written as
(
x1 = a
(8.9)
xn = xn 1 +b for n 2

The next examples adopt this simpli ed notation.

Example 290 Let P = f3k : k 2 N+ g be the collection of all multiples of 3, i.e., P =


f3; 6; 9; 12; 15; :::g. De ne recursively a sequence fxn g by x1 = a 2 R and, for each n 2,
(
1 if n 2 P
xn xn 1 = (8.10)
+1 else

In words, at each position we can go either up or down of one unit: we go down if we are
getting to positions that are multiples of 3, we go up otherwise. This sequence is an example
of a random walk: it may describe the walk of a drunk person who, at each block, goes
either North, +1, or South, 1, and that, for some (random) reason, always goes South
3
In this chapter we illustrate the idea of recurrence through examples. A formal analysis will be presented in
Section 14.2, which is best read after an intuitive understanding of this fundamental idea has been developed.
194 CHAPTER 8. SEQUENCES (SDOGONATO)

after having gone twice North. For instance, if the initial condition is a = 0 we have:

More generally, given any subset P ( nite or not) of N+ , the recurrence (8.10) is called
random walk. N

Example 291 A Star Wars' jedi begins his career as a padawan apprentice under a jedi
master, then becomes a knight and, once ready to train, becomes a master and takes a
padawan apprentice.
Let
pt = number of jedi padawans at time t
kt = number of jedi knights at time t
mt = number of jedi masters at time t
Assume that, as one (galactic) year passes, padawans become knights, knights become mas-
ters, and masters take a padawan apprentice. Formally:
8
>
> k = pt
< t+1
mt+1 = mt + kt
>
>
:
pt+1 = mt+1
The total number of jedis at time t + 2, denoted by xt+2 , is then:
xt+2 = kt+2 + mt+2 + pt+2 = pt+1 + mt+1 + kt+1 + mt+1 + kt+1
= xt+1 + mt+1 + kt+1 = xt+1 + mt + kt + pt = xt+1 + xt
So, we have a Fibonacci recursion
xt+2 = xt+1 + xt
which says something simple but not so obvious a priori : the number of jedis at time t+2 can
be regarded as the sum of the numbers of jedis at time t + 1 and at time t. Indeed, a jedi
is a master at t + 2 if and only if he was a jedi (of any kind) at t. So, xt gives the number
of all masters at t = 2, who in turn increase at t + 2 the population of jedis by taking new
apprentices.
The recursion is initiated at t = 1 by a \self-taught" original padawan, who becomes
knight at t = 2 and master with a new padawan at t = 3. So:
(
x1 = 1 ; x2 = 1
xt = xt 1 + xt 2 for t 3
8.1. THE CONCEPT 195

with initial values x1 = x2 = 1. We can diagram the recursion as:

p 1=1
k 1=1
mp 1+1=2
mpk 1+2=3
mpkmp 2+3=5
mpkmpmpk 3+5=8
mpkmpmpkmpkmp 5+8=13

Note how every string is the concatenation of the previous two ones. N

Example 292 A Fibonacci recurrence is a classic instance of a linear recurrence of order k


given by (
x1 = 1 ; x2 = 2 ; ::: ; xk = k
(8.11)
xn = a1 xn 1 + a2 xn 2 + + ak xn k for n k + 1
with k initial conditions i and k coe cients ai . Indeed, a Fibonacci recurrence is a linear
recurrence of order 2 with unitary coe cients a1 = a2 = 1. For example,
(
x1 = 1 ; x2 = 2 ; x3 = 2
xn = 2xn 1 xn 2 + xn 3 for n 4

is a linear recurrence of order 3. N

A closed form explicitly describes the underlying function f : N+ ! R, while a recurrence


gives a partial description of such function that only speci es what happens next. So, a closed
form de nition is, in general, more informative than one by recurrence { however interesting,
as a property of a sequence, a recurrence might be per se. Yet, in applications sequences are
often de ned by recurrence because a partial description is all one is able to say about the
phenomenon under study. For instance, if in studying walking habits of drunk people the
only pattern that one is able to detect is that a drunk person always goes South after having
gone twice North, then the recurrence (8.10) is all one can specify about this phenomenon.
An important topic is, then, whether it is possible to solve a recurrence { that is, to
nd the closed form { so to have a complete description of the sequence. In general, solving
a recurrence is not a simple endeavor. However, next we present few examples where this
is possible via a \guess and verify" method in which we rst guess a solution and then
verify it by mathematical induction. Fortunately, there are more systematic methods to
solve recurrences. Though we do not study them in this book { except for a few remarks in
Section 11.2.2 (where we solve linear recursions via generating functions) { it is important
to keep this issue in mind.4

Example 293 Consider the recursion


(
x1 = 2
xn = 2xn 1 for n 2
4
We refer readers to courses in di erence equations for a study of this topic.
196 CHAPTER 8. SEQUENCES (SDOGONATO)

We have
x2 = 4; x3 = 8; x4 = 16
and so on. This suggests that the closed form is the geometric sequence

xn = 2n 8n 1 (8.12)

of both rst term and common ratio 2. Let us verify that this guess is correct. We proceed
by induction. Initial step: for n = 1 we have x1 = 2, as desired. Induction step: assume
that (8.12) holds for some n 2; then

xn+1 = 2xn = 2 (2n ) = 2n+1

and so (8.12) holds for n + 1. By induction, it then holds for all n 1.


In general, the geometric sequence of rst term a and common ratio q solves the recursion
(
x1 = a
xn = qxn 1 for n 2

as the reader can prove. This recursion also motivates the \ rst term" and \common ratio"
terminology. N

Example 294 For the arithmetic sequence (8.9), we have

x2 = a + b; x3 = a + 2b; x4 = a + 3b

and so on. This suggests the closed form

xn = a + (n 1) b 8n 1 (8.13)

Let us verify that this guess is correct. We proceed by induction. Initial step: for n = 1 we
have x1 = a, as desired. Induction step: assume that (8.13) holds for some n 2; then

xn+1 = xn + b = a + (n 1) b + b = a + nb

and so (8.13) holds at n + 1. By induction, it then holds for all n 1. N

Example 295 An investor can at each period of time invest an amount of money x, a
monetary capital, and receive at the end of next period the original amount invested x along
with an additional amount rx computed according to the interest rate r 0. Such additional
amount is the fruit of his investment. For instance, if x = 100 and r = 0:1, then rx = 10 is
such an amount.
Assume that the investor has an initial monetary capital c that he keeps investing at all
periods. The resulting cash ow is described by the following recursion
(
x1 = c
xt = (1 + r) xt 1 for t 2

We have

x2 = c (1 + r) ; x3 = x2 (1 + r) = c (1 + r)2 ; x4 = x3 (1 + r) = c (1 + r)3
8.1. THE CONCEPT 197

This suggests that the solution of the recursion is

xt = (1 + r)t 1
c 8t 1 (8.14)

To verify this guess, we can proceed by induction. Initial step: for t = 1 we have x1 = c, as
desired. Induction step: assume that (8.14) holds for some t 2; then

xt+1 = (1 + r) xt = (1 + r) (1 + r)t 1
c = (1 + r)t c

and so (8.14) holds for t + 1. By induction, it then holds for all t 1. Formula (8.14) is the
classic compound interest formula of nancial mathematics. N

Example 296 The explosive recurrence


(
x1 = 1
xn = nxn 1 for n 2

is solved by the sequence xn = n! of factorials (see Appendix B). N

Not all sequences can be described in closed or recursive form. In this regard, the most
famous example is the sequence fpn g of prime numbers: it is in nite by Euclid's Theorem,
but it does not have a (known) explicit description. In particular:

(i) Given n, we do not know any formula that tells us what pn is; in other words, the
sequence fpn g cannot be de ned in closed form.

(ii) Given pn (or any smaller prime), we do not know any formula that tells us what pn+1
is; in other words, the sequence fpn g cannot be de ned by recurrence.

The situation is actually even more sad:

(iii) Given any prime number p, we do not know of any (operational) formula that gives us
a prime number q greater than p; in other words, the knowledge of a prime number
does not give any information on the subsequent prime numbers.

Hence, we do not have a clue on how prime numbers follow one another, that is, on the
form of the function f : N+ ! R that de nes such sequence. We have to consider all the
natural numbers and check, one by one, whether or not they are prime numbers through the
primality tests (Section 1.3.2). Having at our disposal the eternity, we could then construct
term by term the sequence fpn g. More modestly, in the short time that passed between
Euclid and us, tables of prime numbers have been compiled; they establish the terms of the
sequence fpn g until numbers that may seem huge to us, but that are nothing relative to the
in nity of all the prime numbers.
O.R. As to (iii), for centuries mathematicians have looked for a (workable) rule that, given
a prime number p, would make it possible to nd a greater prime q > p, that is, a function
q = f (p). A famous example of a possible such rule is given by the so-called Mersenne
primes, which are the prime numbers that can be written in the form 2p 1 with p prime.
It is possible to prove that if 2p 1 is prime, then so is p. For centuries, it was believed (or
198 CHAPTER 8. SEQUENCES (SDOGONATO)

hoped) that the much more interesting converse was true, namely: if p is prime, so is 2p 1.
This conjecture was de nitely disproved in 1536 when Hudalricus Regius showed that

211 1 = 2047 = 23 89

thus nding the rst counterexample to the conjecture. Indeed, p = 11 does not satisfy it.
In any case, Mersenne primes are among the most important prime numbers. In particular,
as of 2016, the greatest prime number known is

274207281 1

which has 22338618 digits and is a Mersenne prime.5 H

We close the section by observing that, given any function f : R+ ! R, its restriction fjN+
to N+ is a sequence. So, functions de ned on (at least) the positive half-line automatically
de ne also a sequence.

8.2 The space of sequences


We denote by R1 the space of all the sequences x = fxn g of real numbers. We denote,
therefore, by x a generic element of R1 that, written in \extended" form, reads

x = fxn g = fx1 ; x2 ; : : : ; xn ; : : :g

The operations on functions studied in Section 6.3.2 have, as a special case, the operations
on sequences { that is, on elements of the space R1 . In particular, given any two sequences
x = fxn g and y = fyn g in R1 , we have:

(i) the sequence sum (x + y)n = xn + yn for every n 1;

(ii) the sequence di erence (x y)n = xn yn for every n 1;

(iii) the sequence product (xy)n = xn yn for every n 1;

(iv) the sequence quotient (x=y)n = xn =yn for every n 1, provided yn 6= 0.

To ease notation, we will denote the sum directly by fxn + yn g instead of f(x + y)n g.
We will do the same for the other operations.6

On R1 we have an order structure similar to that of Rn . In particular, given x; y 2 R1 ,


we write:

(i) x y if xn yn for every n 1;

(ii) x > y if x y and x 6= y, i.e., if x y and there is at least a position n such that
xn > yn ;
5
See the Great Internet Mersenne Prime Search.
6
If f; g : N+ ! R are the functions underlying the sequences fxn g and fyn g, their sum is equivalently
written (x + y)n = (f + g) (n) = f (n)+g (n) for every n 1. A similar remark holds for the other operations.
So, the operations on functions imply those on sequences, as claimed.
8.3. APPLICATION: INTERTEMPORAL CHOICES 199

(iii) x y if xn > yn for every n 1.

Moreover, (iii) =) (ii) =) (i), i.e.,

x y =) x > y =) x y 8x; y 2 R1

That said, like in Rn also in R1 the order is not complete and sequences might well
be not comparable. For instance, the alternating sequence xn = ( 1)n and the constant
sequence yn = 0 cannot be compared. Indeed, they are f 1; 1; 1; 1; :::g and f0; 0; 0; 0; :::g,
respectively.

The functions g : A R1 ! R de ned on subsets of R1 are important. Thanks to the


order structure of R1 , we can classify these functions through monotonicity, as we did on
Rn (Section 6.4.4). Speci cally, a function g : A R1 ! R is:

(i) increasing if
x y =) g (x) g (y) 8x; y 2 A (8.15)

(ii) strongly increasing if it is increasing and

x y =) g (x) > g (y) 8x; y 2 A

(iii) strictly increasing if

x > y =) g (x) > g (y) 8x; y 2 A

(iv) constant if there exists k 2 R such that

g (x) = k 8x 2 A

The decreasing counterparts of these notions are similarly de ned. For brevity, we do
not dwell upon these notions. We just note that, as in Rn , strict monotonicity implies
the other two kinds of monotonicity and that constancy implies increasing and decreasing
monotonicity, but not vice versa (cf. Example 223).

8.3 Application: intertemporal choices


The Euclidean space RT can model a problem of intertemporal choice of a consumer over
T periods (Section 2.4). However, in many applications it is important not to x a priori a
nite horizon T for the consumer, but to imagine that he faces an in nite horizon. In this
case, in the sequence x = fx1 ; x2 ; : : : ; xt ; : : :g the term xt denotes the quantity of the good
consumed (say, potatoes) at time t 1.
This is, of course, an idealization. But it permits to model in a simple way the intertem-
poral choices of agents that ex ante, at the time of the decision, are not able to specify the
last period T relevant for them (for example, the nal date might be their death, which they
do not know ex ante).
In analogy with what we saw in Section 6.2.2, the consumer has a preference over the
consumption streams x = fx1 ; x2 ; : : : ; xt ; : : :g that is represented by an intertemporal utility
200 CHAPTER 8. SEQUENCES (SDOGONATO)

function U : R1+ ! R. For example, if we assume that the consumer evaluates the consump-
tion xt of each period through a instantaneous (bounded) utility function u : R+ ! R, then
a standard form of the intertemporal utility function is

t 1
U (x) = u (x1 ) + u (x2 ) + + u (xt ) +

where 2 (0; 1) can be interpreted as a subjective discount factor that, as we have seen,
depends on the degree of patience of the consumer (Section 6.2.2).
The monotonicity properties of intertemporal utility functions U : R1 + ! R are, clearly,
those seen in points (i)-(iv) of the previous section for a generic function g de ned on subsets
of R1 .

8.4 Application: prices and expectations

Economic agents' decisions are often based on variables the value of which they will only
learn in the future. At the moment of the decision, agents can only rely on their subjective
expectations about such values. For this reason, expectations come to play a key role in
economics and the relevance of this subjective component is a key feature of economics
as a social science that distinguishes it from, for instance, the natural sciences. Through
sequences we can give a rst illustration of their importance.

8.4.1 A market for a good

Let us consider the market, denoted by M , of some agricultural good, say potatoes. It is
formed by a demand function D : [a; b] ! R and by a supply function S : [a; b] ! R, with
0 a < b. The image D (p) is the overall amount of potatoes demanded at price p by
consumers, while the image S (p) is the overall amount of potatoes supplied at price p by
producers. We assume that both such quantities respond instantaneously to changes in the
market price p: in particular, producers are able to adjust in real time their production levels
according to the market price p.

De nition 297 A pair (p; q) 2 [a; b] R+ of prices and quantities is called an equilibrium
of market M if

q = D (p) = S (p)

The pair (p; q) is the equilibrium of our market of potatoes. Graphically, it corresponds
to the classic intersection of supply and demand:
8.4. APPLICATION: PRICES AND EXPECTATIONS 201

6
y
D
5

S
3

0
O b x
-1
-0.5 0 0.5 1 1.5 2

For simplicity, let us consider linear demand and supply functions:


D (p) = p (M)
S (p) = + p
with > 0, 0 and ; > 0. Since consumers demand positive quantities, we set
b = = > 0 (because D (p) 0 if and only if p = ); similarly, since producers supply
positive quantities and reasonably will produce nothing if the price is 0, we set = 0 and
a = 0 (because, with = 0, S (p) 0 if and only if p 0). There can be trade only at prices
that belong to the interval
[a; b] = 0; (8.16)

where both quantities are positive. So, we consider demand and supply functions de ned
only on such interval even though, mathematically, they are straight lines de ned on the
entire real line.
For our linear economy, the equilibrium condition becomes
p= p
So, the equilibrium price and quantity are

p= (8.17)
+
and
q = D (p) = p= =
+ +
Note that, equivalently, we can retrieve the equilibrium quantity via the supply function:

q = S (p) = p = =
+ +
Thus, the pair
;
+ +
is the equilibrium of our market of potatoes.
202 CHAPTER 8. SEQUENCES (SDOGONATO)

8.4.2 Delays in production


Suppose that the market of potatoes opens periodically, say once a month. Denote by t,
with t = 1; 2; : : :, a generic month and by pt the corresponding market price. Assume that
the demand and supply functions

D (pt ) = pt (Mt )
S (pt ) = pt

form the market, denoted by Mt , of potatoes at t. Besides the hypothesis of instantaneous


adjustment, already made for the market M , we make two further assumptions on the
markets Mt : (i) at every t the same producers and consumers trade, so the coe cients ,
and do not change; (ii) the good traded at each t, the potatoes, is perishable and does not
last till the next month t + 1: the quantities demanded and supplied at t + 1 and at t are
independent, so the markets Mt have no links among them.
Now we need to consider all markets Mt , not just a single one M , so demand and supply
have to be in equilibrium at each t. In place of the pair of scalars (p; q) of the last de nition,
we now have a pair of sequences.7

De nition 298 A pair of sequences fpt g 2 [a; b]1 and fqt g 2 R1


+ of prices and quantities
is called a uniperiodal market equilibrium of markets Mt if

qt = D (pt ) = S (pt ) 8t 1

It is easy to check that the resulting sequence of equilibrium prices fpt g is constant:

pt = 8t 1 (8.18)
+

We thus go back to the equilibrium price (8.17) of market M . This is not surprising: because
of our assumptions, the markets Mt are independent and, at each t, we have a market identical
to M .
The hypothesis of instantaneous production upon which our analysis relies is, however,
implausible. Let us make the more plausible hypothesis that producers can adjust their
production only after one period: their production technology requires that the quantity
that they supply at t has to be decided at t 1 (to harvest potatoes at t, we need to sow at
t 1).
At the decision time t 1, producers do not know the value of the future equilibrium
price pt , they can only have a subjective expectation about it. Denote by Et 1 (pt ) such
expected value. In this case the market at t, denoted by MRt , has the form

D (pt ) = pt (MRt )
S (Et 1 (pt )) = Et 1 (pt )

where the expectation Et 1 (pt ) replaces the price pt as an argument of the supply function.
Indeed, producers' decisions now rely upon such expectation.
7
Here [a; b]1 denotes the collection of sequences with terms that all belong to the interval [a; b].
8.4. APPLICATION: PRICES AND EXPECTATIONS 203

De nition 299 A triple of sequences of prices fpt g 2 [a; b]1 , quantities fqt g 2 R1 + , and
expectations fEt 1 (pt )g 2 [a; b]1 is called a uniperiodal market equilibrium of markets MRt
if
qt = D (pt ) = S (Et 1 (pt )) 8t 1

In a uniperiodal market equilibrium, the sequences of prices and expectations have to be


such that demand and supply are in equilibrium at each t. In particular, in equilibrium we
have
pt = Et 1 (pt ) 8t 1 (8.19)

Since prices are positive and belong to the interval [0; = ], we must have

0 Et 1 (pt ) 8t 1 (8.20)

This inequality is a necessary condition for equilibrium expectations. In what follows, we


consider expectations that are averages of previous prices and expectations. Condition (8.20)
then requires prices to belong to [0; = ]. This is achieved when = = . For this reason,
in the rest of the section we assume that which means that the demand has a steeper
slope than the supply. But, except these simple inequalities, there are no restrictions on
equilibrium expectations: they just have to balance with prices, nothing else.

8.4.3 Expectation formation


Let us make a few hypotheses on how expectations can be formed. An important piece
of information that producers have at time t is the sequence of previous equilibrium prices
fp1 ; p2 ; :::; pt 1 g. Let us assume that, a bit lazily, producers expect that the last observed
price, pt 1 , will be also the future equilibrium price, that is,

Et 1 (pt ) = pt 1 8t 2 (8.21)

with an arbitrary initial expectation E0 (p1 ).8 With this process of expectation formation {
the so-called classic expectations 9 { the market MRt becomes

D (pt ) = pt
S (pt 1) = pt 1

In view of (8.19), at a uniperiodal market equilibrium, prices then evolve according to the
linear recursion
pt = pt 1 8t 2 (8.22)

with initial value


p1 = E0 (p1 ) (8.23)

determined by the initial expectation E0 (p1 ).10


8
Indeed, expectations on the initial price p1 cannot rely on any previous price information.
9
In a seminal paper, Muth (1961) discusses inter alia the di erent kinds of expectations that we will
consider here and later in the book.
10
In this section, we consider recursions that feature a constant term.
204 CHAPTER 8. SEQUENCES (SDOGONATO)

So, starting from an initial expectation, prices are determined by recurrence. Expecta-
tions no longer play an explicit role in the evolution of prices, thus dramatically simplifying
the analysis. Yet, one should not forget that, though they do not appear in the recursion,
expectations are key in the underlying economic process. Speci cally, once xed a value of
E0 (p1 ), from (8.23) we have the initial equilibrium price, which in turn determines both the
expectation E1 (p2 ) via (8.21) and the next equilibrium price p2 via the recursion (8.22), and
so on so forth. Starting from an initial expectation, this process thus features equilibrium
sequences fpt g and fEt 1 (pt )g of prices and expectations.

Assume, instead, that producers expect that the future price be an average of the last
two observed prices:
1 1
Et 1 (pt ) = pt 1 + pt 2 8t 3 (8.24)
2 2

with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). In view of (8.19), at a uniperiodal
market equilibrium, prices then evolve according to the following linear recursion of order 2:

(
p1 = E0 (p1 ) ; p2 = E1 (p2 )
pt = 2 pt 1 2 pt 2 for t 3

with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). Expectations based on (possibly
weighted) averages of past prices { the so-called extrapolative expectations { make it possible
to describe equilibrium prices via a linear recurrence, a very tractable form.
It is, however, a quite naive mechanism of price formation, agents might well feature more
sophisticated ways to form expectations. For instance, this is the case for adaptive expecta-
tions, the most important mechanism of expectation formation: a sequence of expectations
fEt 1 (pt )g is said to be adaptive if there exists a coe cient 2 (0; 1] such that

Et 1 (pt ) Et 2 (pt 1 ) = [pt 1 Et 2 (pt 1 )] 8t 2 (8.25)

with (arbitrary) initial expectation E0 (p1 ). So, the adjustment

Et 1 (pt ) Et 2 (pt 1 )

in expectations is proportional to the last forecast error

pt 1 Et 2 (pt 1 )

Over time, expectations are updated according to the previous error forecasts. If = 1 we
8.4. APPLICATION: PRICES AND EXPECTATIONS 205

get back to the classic case Et 1 (pt ) = pt 1. In general, for 2 (0; 1) by iterating we get:

E1 (p2 ) = E0 (p1 ) + (p1 E0 (p1 )) = (1 ) E0 (p1 ) + p1


E2 (p3 ) = E1 (p2 ) + (p2 E1 (p2 )) = (1 ) E1 (p2 ) + p2
2
= (1 ) E0 (p1 ) + (1 ) p1 + p 2
E3 (p4 ) = E2 (p3 ) + (p3 E2 (p3 )) = (1 ) E2 (p3 ) + p3
2
= (1 ) (1 ) E0 (p1 ) + (1 ) p1 + p 2 + p 3
= p3 + (1 ) p2 + (1 )2 p1 + (1 )3 E0 (p1 )

Et 1 (pt ) = Et 2 (pt 1) + (pt 1 Et 2 (pt 1 )) = (1 ) Et 2 (pt 1 ) + pt 1


t 1
X
= (1 )i 1
pt i + (1 )t 1
E0 (p1 )
i=1

By induction, we thus have

t 1
X
Et 1 (pt ) = (1 )i 1
pt i + (1 )t 1
E0 (p1 ) (8.26)
i=1

Adaptive expectations are an average of the initial expectation E0 (p1 ) and past prices. It
is an average in which the more recent prices have a higher weight than the older ones.
However, being 2 (0; 1), we have limt!1 (1 )t 1 = 0 and, as t gets larger, the term
t 1
(1 ) gets smaller. The initial expectation E0 (p1 ) thus becomes, as t increases, less
and less important in the formation of the expectation Et 1 (pt ), only past prices eventually
matter.

Except in the classic case = 1, adaptive and extrapolative expectations are distinct
mechanisms. Indeed, in (8.26) all past prices matter, though the more recent ones have a
higher weight. In contrast, extrapolative expectations only rely on the last prices { e.g., in
(8.24) only the ones of the past two periods.

Assume that producers' expectations are adaptive. In view of the equilibrium relation
(8.19), we have

Et 1 (pt ) = pt and Et 2 (pt 1 ) = pt 1

By replacing it in (8.25) we get for t 2

pt 1 pt = pt 1 + pt 1

that is,
pt 1 pt = pt 1 1+
206 CHAPTER 8. SEQUENCES (SDOGONATO)

which implies the following linear recurrence of order 1:


8
< p1 = E0 (p1 )
h i (8.27)
: pt = (1 ) pt 1 + for t 2

with arbitrary initial expectation E0 (p1 ). Remarkably, also adaptive expectations result in
a simple linear recurrence for equilibrium prices. If = 1, we get back to the \classic"
recurrence (8.22)-(8.23).

8.5 Images and classes of sequences


In a sequence the same values can appear several times. For example, the two values 1 and
1 keep being repeated in the alternating sequence xn = ( 1)n , i.e.,

f 1; 1; 1; 1; :::g (8.28)

The constant sequence xn = 2 is


f2; 2; 2; :::g (8.29)
It is thus constituted only by the value 2 (so, the underlying f is the constant function
f (n) = 2 for every n 1).
In this respect, it plays an important role the image

Im f = ff (n) : n 1g

of the sequence, which consists exactly of the values that the sequence takes on, disregarding
repetitions. For example, the image of the alternating sequence (8.28) is f 1; 1g, while for
the constant sequence (8.29) it is the singleton f2g. The image thus gives an important
piece of information in that it indicates which values the sequence actually takes on, net of
repetitions: as we have seen, such values may be very few and just repeat themselves over and
over again along the sequence. On the other hand, the sequence of the odd numbers (8.6) does
not contain any repetition; its image consists of all its terms, that is, Im f = f2n 1 : n 1g.

Through the image, in Section 6.4.3 we studied some notions of boundedness for functions.
In the special case of sequences { i.e., of the functions f : N+ ! R { these notions take the
following form. A sequence fxn g is:

(i) bounded (from) above if there exists k 2 R such that xn k for every n 1;
(ii) bounded (from) below if there exists k 2 R such that xn k for every n 1;
(iii) bounded if it is bounded both above and below, i.e., if there exists k > 0 such that
jxn j k for every n 1.

For example, the alternating sequence xn = ( 1)n is bounded, while that of the odd
numbers (8.6) is only bounded below. Note that, as usual, this classi cation is not exhaustive
because there exist sequences that are both unbounded above and below: for example, the
(strongly) alternating sequence xn = ( 1)n n.11 Such sequences are called unbounded.
11
By \unbounded above (below)" we mean \not bounded from above (below)".
8.6. EVENTUALLY: A KEY ADVERB 207

Monotone sequences are another important class of sequences. By applying to the un-
derlying function f : N+ ! R the notions of monotonicity introduced for functions (Section
6.4.4), we say that a sequence fxn g is:

(i) increasing if
xn+1 xn 8n 1
strictly increasing if
xn+1 > xn 8n 1

(ii) decreasing if
xn+1 xn 8n 1
strictly decreasing if
xn+1 < xn 8n 1

(iii) constant if it is both increasing and decreasing, i.e., if there exists k 2 R such that

xn = k 8n 1

A (strictly) increasing or decreasing sequence is called (strictly) monotone. For example,


the Fibonacci sequence is increasing (not strictly, though), the sequence (8.6) of the odd
numbers is strictly increasing, while the sequence (8.7) is strictly decreasing.

8.6 Eventually: a key adverb


A key feature of sequences is that properties often hold \eventually".

De nition 300 We say that a sequence satis es a property P eventually if, starting from
a certain position n = nP , all the terms of the sequence satisfy P.

The position n depends on the property P, as indicated by writing n = nP .

Example 301 (i) The sequence f2; 4; 6; 32; 57; 1; 3; 5; 7; 9; 11; g is eventually increas-
ing: indeed, starting from the 6th term, it is increasing.

(ii) The sequence fng is eventually 1:000: indeed, all the terms of the sequence, starting
from those of position 1:000, are 1:000.
(iii) The same sequence is also eventually 1:000:000:000 as well as 10123 .
(iv) The sequence f1=ng is eventually smaller than 1=1:000:000.
(v) The sequence
f27; 65; 13; 32; ; 125; 32; 3; 3; 3; 3; 3; 3; 3; 3; g
is eventually constant. N

O.R. To satisfy eventually a property, the sequence in its \youth" can do whatever it wants;
what matters is that when old enough (i.e., from a certain n onward) it settles down. Youthful
blunders are forgiven as long as, sooner or later, all the terms of the sequence will satisfy
the property. H
208 CHAPTER 8. SEQUENCES (SDOGONATO)

8.7 Limits: introductory examples


The purpose of the notion of limit is to formalize rigorously the concept of \how a sequence
behaves as n becomes larger and larger", that is, asymptotically. In other words, as for a
thriller story, we ask ourselves \how will it end?". For sequences that represent the val-
ues that an economic variable takes on at subsequent dates, economists talk of \long run
behavior".
We start with some examples to understand intuitively what we mean by limit of a
sequence. Consider the sequence (8.7), i.e.,
1 1 1
1; p ; p ; p ;
2 4 8
p
For larger and larger values of n, its terms xn = 1= 2n 1 become closer and closer to, \tend
to", the value L = 0. In this case, we say that the sequence tends to 0 and write
1
lim p =0
n!1 2n 1
For the sequence (8.6) of the odd numbers
f1; 3; 5; 7; g
the terms xn = 2n 1 of the sequence become larger and larger as the values of n become
larger and larger. In this case, we say that the sequence diverges positively and write
lim (2n 1) = +1
n!1
In a dual manner, the sequence of the negative odd numbers xn = 2n+1 diverges negatively,
written
lim ( 2n + 1) = 1
n!1
Finally, the alternating sequence xn = ( 1)n , i.e.,
f 1; 1; 1; 1; g
continues to oscillate, as n varies, between the values 1 and 1, never approaching (eventu-
ally) any particular value. In this case, the sequence is irregular (or oscillating): it does not
have any limit.

8.8 Limits and asymptotic behavior


In the introductory examples we identi ed three possible asymptotic behaviors of the terms
of a sequence:
(i) convergence to a value L 2 R;
(ii) divergence to either +1 or 1;
(iii) oscillation.
In the cases (i) and (ii) we say that the sequence is regular : it tends to (it approaches
asymptotically) a value, possibly in nite. In case (iii) we say that the sequence is irregular
(or oscillating). In the rest of the section we focus on regular sequences and formalize the
intuitive idea of \tending to a value".
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 209

8.8.1 Convergence
We start with convergence, that is, with case (i) above.

De nition 302 A sequence fxn g converges to a point L 2 R, in symbols xn ! L or


limn!1 xn = L, if for every " > 0 there exists n" 1 such that

n n" =) jxn Lj < " (8.30)

The number L is called the limit of the sequence.

The implication (8.30) can be rewritten as

n n" =) d (xn ; L) < " (8.31)

Therefore, a sequence fxn g converges to L when, for each quantity " > 0, arbitrarily small
but positive, there exists a position n" { that depends on "! { starting from which the
distance between the terms xn of the sequence and the limit L is always smaller than ".
Intuitively, the sequence's terms xn eventually approximate the value L within any stan-
dard of approximation " > 0 that one may posit, however small (so, however demanding),
this posited standard " may be. A sequence fxn g that converges to a point L 2 R is called
convergent.

O.R. To converge to L, the sequence has to pass a highly demanding test: given any threshold
" > 0 selected by a relentless examiner, there has to be a position n" far enough so that all
terms of the sequence that come after such position are " close to L. A convergent sequence
is able to pass any such test, however tough the examiner can be { i.e., however small is the
posited " > 0. H

We emphasized through an exclamation point that the position n" depends on ", a key
feature of the previous de nition. Moreover, such n" is not unique: if there exists a position
n" such that jxn Lj < " for every n n" , the same is true for any subsequent position,
which then also quali es as n" . The choice of which among these positions to call n" is
irrelevant for the de nition, which only requires the existence of, at least, one of them.
That said, there is always a smallest n" , which is a genuine threshold. As such, its
dependence on " takes a natural monotone form: such n" becomes larger and larger as "
becomes smaller and smaller. The smallest n" thus best captures, because of its threshold
nature, the spirit of the de nition: for each arbitrarily small " > 0, there exists a threshold
n" { the larger, the smaller (so, more demanding) " is { beyond which the terms xn are
" close to the limit L. The two examples that we will present shortly should clarify this
discussion.

A neighborhood of a scalar L has the form

B" (L) = fx 2 R : d (xn ; L) < "g = (L "; L + ")

So, in view of (8.31) we can rewrite the de nition of convergence in the language of neigh-
borhoods. Conceptually, it is an important rewriting that deserves a separate mention.
210 CHAPTER 8. SEQUENCES (SDOGONATO)

De nition 303 A sequence fxn g converges to a point L 2 R if, for every neighborhood
B" (L) of L, there exists n" 1 such that

n n" =) xn 2 B" (L)

In words, a sequence tends to a scalar L if, eventually, it belongs to each neighborhood


of L, however small it might be (it is easy to belong to a large neighborhood, but di cult to
belong to a very small one). Although this last de nition is a mere rewriting of De nition
302, the use of neighborhoods should further clarify the nature of convergence.

Example 304 Consider the sequence xn = 1=n. The natural candidate for its limit is 0.
Let us verify that this is the case. Let " > 0. We have
1 1 1
0 < " () < " () n >
n n "

Therefore, if we take as n" any integer greater that 1=", for example the smallest one n" =
[1="] + 1,12 we then have
1
n n" =) 0 < < "
n
Therefore, 0 is indeed the limit of the sequence. For example, if " = 10 100 , we have
n" = 10100 + 1. Note that we could have chosen n" to be any integer greater than 10100 + 1,
which is indeed the smallest n" . N
p
Example 305 Consider the sequence (8.7), that is, xn = 1= 2n 1 . Also here the natural
candidate for its limit is 0. Let us verify this. Let " > 0. We have
1 1 n 1 1 1
p 0 < " () n 1 < " () 2 2 > () n > 1 + 2 log2
2n 1 2 2 " "

Therefore, by taking n" to be any integer greater than 1 + 2 log2 " 1, we have
1
n n" =) 0 < p <"
2n 1

Therefore, 0 is the limit of the sequence. When " < 1 the smallest n" is 1 + 1 + 2 log2 " 1 ;
for example, when " = 10 100 it is 1 + 1 + 2 log2 10100 = 1 + [1 + 200 log2 10]. N

We saw two examples of sequences that converge to 0. Such sequences are called in nites-
imal (or null ). Thanks to the next result, the computation of their limits is of particular
importance.

Proposition 306 A sequence fxn g converges to a point L 2 R if and only if d (xn ; L) ! 0.

Proof \If". Suppose that limn!1 d (xn ; L) = 0. Let " > 0. There exists n" 1 such that
d (xn ; L) < " for every n n" . Therefore, xn 2 B" (L) for every n n" , as desired.
\Only if". Let limn!1 xn = L. Consider the sequence of distances, whose term is
yn = d(xn ; L). We have to prove that limn!1 yn = 0, i.e., that for every " > 0 there exists
12
Recall that [ ] denotes the integer part (Section 1.4.3).
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 211

n" 1 such that n n" implies jyn j < ". Since yn 0, this is actually equivalent to showing
that
n n" =) yn < " (8.32)
Since xn ! L, given " > 0 there exists n" 1 such that yn = d(xn ; L) < " for every n n" .
Therefore, (8.32) holds.

We can thus reduce the study of the convergence of any sequence to the convergence to
0 of the sequence of distances fd (xn ; L)g. In other words, to check if xn ! L, it is su cient
to check if d (xn ; L) ! 0, that is, if the sequence of distances is in nitesimal.

Example 307 The sequence


1
xn = 1 + ( 1)n
n
converges to L = 1. Indeed,
( 1)n ( 1)n 1
d (xn ; 1) = 1 + 1 = = ! 0;
n n n
and so, by Proposition 306, xn ! 1. N

Since d (xn ; 0) = jxn j, a simple noteworthy consequence of the last proposition is that

xn ! 0 () jxn j ! 0 (8.33)

A sequence is, thus, in nitesimal if and only if it is \absolutely" in nitesimal, in that the
distances of its terms from the origin become smaller and smaller.

We close with an important observation: in applying the De nition 302 of convergence,


we have always to posit a possible candidate limit L 2 R, and then to verify whether it
satis es the de nition. It is a \guess and verify" procedure.13 For some sequences, however,
to guess a candidate limit L might not be obvious, thus making problematic the application
of the de nition. We will talk again about this important issue when discussing Cauchy
sequences (Section 8.12).14

8.8.2 Limits from above and from below


It may happen that xn ! L 2 R and that, eventually, we also have L xn . In other words,
fxn g approaches L by remaining to its right. In such a case we say that fxn g tends to L
from above, and write limn!1 xn = L+ or xn ! L+ . In particular, if fxn g is decreasing, we
write xn # L.
The notations xn ! L+ and xn # L are more informative than xn ! L: besides saying
that fxn g converges to L, they also convey the information that this happens from above
(monotonically if xn # L).
Similarly, if xn ! L 2 R and eventually xn L, we say that fxn g tends to L from below
and write limn!1 xn = L or xn ! L . In particular, if fxn g is increasing we write xn " L.
13
The \guess" part, i.e., how to posit a candidate limit, relies on experience (so we have an \educated
guess"), inspiration, revelation, or just a little bird suggestion.
14
Section 14.2 will show that for sequences de ned by recurrences there is an elegant way, via xed points,
to supply candidate limit points.
212 CHAPTER 8. SEQUENCES (SDOGONATO)
p
Example 308 (i) We have 1=n # 0, and 1= 2n 1 # 0, as well as f1 1=ng " 1. (ii) We
have 1 + ( 1n ) n 1 ! 1 but neither to 1+ nor to 1 . N

Example 309 Consider the sequence xn = n 1 + ( 1)n n 1, i.e.,


(
0 if n odd
xn = 2
n if n even

So, xn ! 0+ but not xn # 0 because this sequence is not monotone. N

The notions of limits from above and from below can be also stated in terms of right and
left neighborhoods of L, as readers can check.

8.8.3 Divergence
We now consider divergence. We begin with positive divergence. The spirit of the de nition
is similar, mutatis mutandis, to that of convergence (as soon will be clear).

De nition 310 A sequence fxn g diverges positively, written xn ! +1 or limn!1 xn =


+1, if for every K 2 R there exists nK 1 such that

n nK =) xn > K

In other words, a sequence diverges positively when it eventually becomes greater than
every scalar K. Since the constant K can be taken arbitrarily large, this can happen only
if the sequence is not bounded above (it is easy to be > K when K is small, increasingly
di cult the larger K is).

Example 311 The sequence of even numbers xn = 2n diverges positively. Indeed, let
K 2 R. We have:
K
2n > K () n >
2
and so we can choose as nK any integer greater than K=2. For example, if K = 10100 , we
can put nK = 10100 =2 + 1. Therefore, xn = 2n diverges positively. N

O.R. For divergence there is a demanding \above the bar" test to pass: a relentless examiner
now sets an arbitrary bar K, for a sequence to diverge there has to be a position nK far enough
so that all terms of the sequence that come after such position are above the arbitrarily
posited bar. A divergent sequence is able to pass any such test, however tough the examiner
can be { i.e., however high K is. H

The de nition of negative divergence is dual.

De nition 312 A sequence fxn g diverges negatively, written xn ! 1 or limn!1 xn =


1, if for every K 2 R there exists nK 1 such that

n nK =) xn < K
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 213

In such a case, the terms of the sequence are eventually smaller than every scalar K:
although the constant can take arbitrarily large negative values (in absolute value), there
exists a position beyond which all the terms of the sequence are smaller than or equal to the
constant. This characterizes the convergence to 1 of the sequence.

Intuitively, divergence is a form of \convergence to in nity". The next simple, but


important, result highlights the strong connection between convergence and divergence.

Proposition 313 A sequence fxn g, with eventually xn > 0, diverges positively if and only
if the sequence f1=xn g converges to zero.

A dual result holds for negative divergence.15

Proof \If". Let 1=xn ! 0. Let K > 0. Setting " = 1=K > 0, by De nition 302 there exists
n1=K 1 such that 0 < 1=xn < 1=K for every n n1=K . Therefore, xn > K for every
n n1=K , and by De nition 310 we have xn ! +1.
\Only if". Let xn ! +1 and let " > 0. Setting K = 1=" > 0, by De nition 310 there
exists n1=" such that xn > 1=" for every n n1=" . Therefore, 0 < 1=xn < " for every n n1="
and so 1=xn ! 0.

Adding, subtracting or changing in any other way a nite number of terms of a sequence
does not alter its asymptotic behavior: if it is regular, i.e., convergent or (properly) divergent,
it remains so, and with the same limit; if it is irregular (oscillating), it remains so. Clearly,
this depends on the fact that the notion of limit requires that a property { either \hitting"
an arbitrarily small neighborhood in case of convergence or being greater (smaller) than
an arbitrarily large (small) number in case of (negative) positive divergence { holds only
eventually.

8.8.4 Topology of R and a general de nition of limit


The topology of the real line can be extended in a natural way to the extended real line R
by de ning the neighborhoods of the points at in nity +1 and 1 in the following way.

De nition 314 A neighborhood of +1 is a half-line (K; +1], with K 2 R. A neighborhood


of 1 is a half-line [ 1; K), with K 2 R.

Therefore, a neighborhood of +1 is formed by all scalars greater than a scalar K, while


a neighborhood of 1 is formed by all scalars smaller than K.

O.R. The smaller " > 0 is, the smaller a neighborhood B" (x) of a point. In contrast, the
greater K > 0 is, the smaller a neighborhood (K; +1] of +1 is. For this reason, for a
neighborhood of +1 the value of K becomes signi cant when positive and arbitrarily large
(while for a neighborhood of 1 the value of K becomes signi cant when negative and
arbitrarily large, in absolute value). H
15
The hypothesis \eventually xn > 0" is redundant in the \only if" since a sequence that diverges positively
always satis es this condition.
214 CHAPTER 8. SEQUENCES (SDOGONATO)

The neighborhoods (K; +1] and [ 1; K) are open intervals in R for every K 2 R.16
That said, we can state a lemma that will be useful in de ning limits of sequences.

Lemma 315 Let A be a set in R. Then,

(i) +1 is a limit point of A if and only if A is unbounded above.

(ii) 1 is a limit point of A if and only if A is unbounded below.

Proof We only prove (i) since the proof of (ii) is similar. \If". Let A be unbounded above,
i.e., A has no upper bounds. Let (K; +1] be a neighborhood of +1. Since A has no upper
bounds, K is not an upper bound of A. Therefore, there exists x 2 A such that x > K,
i.e., x 2 (K; +1] \ A and x 6= +1. It follows that +1 is a limit point of A. Indeed, each
neighborhood of +1 contains points of A di erent from +1.
\Only if". Let +1 be a limit point of A. We show that A does not have any upper
bound. Suppose, by contradiction, that K 2 R is an upper bound of A. Since +1 is a limit
point of A, the neighborhood (K; +1] of +1 contains a point x 2 A such that x 6= +1.
Therefore K < x, contradicting the fact that K is an upper bound of A.

Example 316 The sets A such that (a; +1) A for some a 2 R are an important class of
sets unbounded above. By Lemma 315, +1 is a limit point for such sets A. Similarly, 1
is a limit point for the sets A such that ( 1; a) A for some a 2 R. N

Using the topology of R we can give a general de nition of convergence that generalizes
De nition 303 of convergence so to include De nitions 310 and 312 of divergence as special
cases. In the next de nition, which uni es all previous de nitions of limit of a sequence, we
set: 8
>
> B (L) if L 2 R
< "
U (L) = (K; +1] if L = +1
>
>
:
[ 1; K) if L = 1

De nition 317 A sequence fxn g in R converges to a point L 2 R if, for every neighborhood
U (L) of L, there exists nU 1 such that

n nU =) xn 2 U (L)

If L 2 R, we get back to De nition 303. If L = 1, thanks to the De nition 314 of neigh-


borhood, De nition 317 becomes a reformulation in terms of neighborhoods of De nitions
310 and 312.
This general de nition of convergence shows the unity of the notions of convergence and
divergence studied so far, thus con rming the strong connection between convergence and
divergence that already emerged in Proposition 313.

O.R. If L 2 R, the position nU depends on an arbitrary radius " > 0 (in particular, as small
as we want), so we can write nU = n" : If, instead, L = +1, then nU depends on an arbitrary
16
Each point x 2 (K; +1] is interior because, by taking K 0 with K < K 0 < x, we have x 2 (K 0 ; +1]
(K; +1]. A similar argument shows that each point x 2 [ 1; K) is interior.
8.9. PROPERTIES OF LIMITS 215

scalar K (in particular, positive and arbitrarily large), so we can write nU = nK . Finally,
if L = 1, then nU depends on any negative real number K (in particular, negative and
arbitrarily large, in absolute value) and, without loosing generality, we can set nU = nK .
Thus, when L is nite it is crucial that the property holds also for arbitrarily small values
of ". When L = 1, it is instead key that the property holds also for K arbitrarily large
in absolute value. H

8.9 Properties of limits


In this section we study some properties of limits. The rst result shows that the limit of a
sequence, if exists, is unique.

Theorem 318 (Uniqueness of the limit) A sequence fxn g converges to at most one limit
L 2 R.

Proof Suppose, by contradiction, that there exist two distinct limits L0 and L00 that belong
to the set R. Without loss of generality, we assume that L00 > L0 . We consider di erent
cases and show that in each of them we reach a contradiction. So, L0 = L00 and we conclude
that the limit is unique.
We begin with the case when both L0 and L00 are nite, i.e., L0 ; L00 2 R. Take " > 0 so
that
L00 L0
"<
2
Then
B" L0 \ B" L00 = ;
as the reader can verify and the next gure illustrates:

10 y

8 L''+ε
L''
L''- ε
6
L'+ ε
L'
4
L'- ε

O x
0

-2
-2 -1 0 1 2 3 4

By De nition 303, there exists n0" 1 such that xn 2 B" (L0 ) for every n n0" , and there
exists n" 1 such that xn 2 B" (L ) for every n n" . Setting n" = max fn" ; n00" g, we have
00 00 00 0

therefore both xn 2 B" (L0 ) and xn 2 B" (L00 ) for every n n" , i.e., xn 2 B" (L0 ) \ B" (L00 )
216 CHAPTER 8. SEQUENCES (SDOGONATO)

for every n n" . But this contradicts B" (L0 ) \ B" (L00 ) = ;. We conclude that L0 = L00 , so
the limit is unique.
Turn now to the case in which L0 is nite and L00 = +1. For every " > 0 and every
K > 0, there exist n" and nK such that

L0 " < xn < L0 + " 8n n" and xn > K 8n nK

For n max fn" ; nK g, we therefore have simultaneously

L0 " < xn < L0 + " and xn > K

It is now su cient to take K = L0 + " to realize that, for n max fn" ; nK g, the two
inequalities cannot coexist. Also, in this case we reached a contradiction.
The remaining cases can be treated in a similar way and are thus left to the reader.

The next classic result shows that the terms of a convergent sequence eventually have the
same sign of the limit point. In other words, the sign of the limit point eventually determines
the sign of the terms of the sequence.

Theorem 319 (Permanence of sign) Let fxn g be a sequence that converges to a limit
L 6= 0. Then, eventually xn has the same sign as L, that is, xn L > 0.

Analogously, it is easy to see that if xn ! +1 (resp., 1), then eventually xn K


(resp., xn K) for every K > 0 (resp., K < 0).

Proof Suppose L > 0 (a similar argument holds if L < 0). Let " 2 (0; L). By De nition
302, there exists n 1 such that jxn Lj < ", i.e., L " < xn < L + " for every n n.
Since " 2 (0; L), we have L " > 0. Therefore,

0<L " < xn 8n n

We conclude that xn > 0 for every n n, as desired.

This last theorem has established a property of the limits with respect to the order
structure of the real line. Next we give another simple result of the same kind, leaving the
proof to the reader. A piece of notation: xn ! L 2 R indicates that the sequence fxn g
either converges to L 2 R or diverges (positively or negatively).

Proposition 320 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If eventually xn yn , then L H.

The scope of this proposition is noteworthy. It allows, for example, to check the positive
or negative divergence of a sequence through a simple comparison with other divergent
sequences. Indeed, if xn yn and xn diverges negatively, so does yn ; if xn yn and yn
diverges positively, so does xn .
The converse of the proposition does not hold: for example, let L = H = 0, fxn g =
f 1=ng and fyn g = f1=ng. We have L H, but xn < yn for every n. However, if we
assume L > H, the converse then holds \strictly":
8.9. PROPERTIES OF LIMITS 217

Proposition 321 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If L > H, then eventually xn > yn .

Proof We prove the statement for L; H 2 R, leaving the other cases to the reader. Let
0 < " < (L H) =2. Since H +" < L ", we have (H "; H +")\(L "; L+") = ;. Moreover,
there exist n0" ; n00" 1 such that yn 2 (H "; H + ") for every n n0" and xn 2 (L "; L + ")
for every n n00" . For every n maxfn0" ; n00" g, we then have yn 2 (H "; H + ") and
xn 2 (L "; L + "), so xn > L " > H + " > yn . We conclude that eventually xn > yn .

8.9.1 Monotonicity and convergence


The next result gives a simple necessary condition for convergence.

Proposition 322 Each convergent sequence is bounded.

Proof Suppose xn ! L. Setting " = 1, there exists n1 > 1 such that xn 2 B1 (L) for every
n n1 . Let M > 0 be a constant such that

M > max [1; d (x1 ; L) ; : : : ; d (xn1 1 ; L)]

We have d (xn ; L) < M for every n 1, i.e., jxn Lj < M for every n 1. This implies
that, for all n 1,
L M < xn < L + M
Therefore, the sequence is bounded.

Thanks to this proposition, the convergent sequences form a subset of the bounded ones.
Therefore, if a sequence is unbounded, it cannot be convergent.

In general, the converse of Proposition 322 is false. For example the alternating sequence
xn = ( 1)n is bounded but does not converge. A partial converse will be soon established
by the Bolzano-Weierstrass' Theorem. A full- edged converse, however, holds for the im-
portant class of monotone sequences: for such sequences, boundedness is both a necessary
and su cient condition for convergence. This result is actually a corollary of the following
general theorem on the asymptotic behavior of monotone sequences.

Theorem 323 Each monotone sequence is regular. In particular,

(i) it converges if it is bounded;

(ii) it diverges positively if it is increasing and unbounded;

(iii) it diverges negatively if it is decreasing and unbounded.

Proof Let fxn g be an increasing sequence (the proof for decreasing sequences is similar). It
can be either bounded or unbounded above (for sure, it is bounded below because x1 xn
for every n 1). Suppose that fxn g is bounded. We want to prove that it is convergent. Let
E be the image of the sequence. By hypothesis, it is a bounded subset of R. By the Least
Upper Bound Principle, sup E exists. Set L = sup E. Let us prove that xn ! L. Let " > 0.
218 CHAPTER 8. SEQUENCES (SDOGONATO)

Since L is the supremum of E, by Proposition 127 we have: (i) L xn for every n 1, (ii)
there exists an element xn" of E such that xn" > L ". Since fxn g is an increasing sequence,
it then follows that
L xn xn" > L " 8n n"

Hence, xn 2 B" (L) for every n n" , as desired.


Suppose that fxn g is unbounded above. Then, for every K > 0 there exists an element
xnK such that xnK > K. Since fxn g is increasing, we then have xn xnK > K for every
n nK , so it diverges to +1.

Thus, monotone sequences cannot be irregular. We are now able to state and prove the
result anticipated above on the equivalence of boundedness and convergence for monotone
sequences.

Corollary 324 A monotone sequence is convergent if and only if it is bounded.

Proof Consider an increasing sequence. If it is convergent, then by Proposition 322 it is


bounded. If it is bounded, then by Theorem 323 is convergent.

Needless to say, the results just discussed hold, more generally, for sequences that are
eventually monotone.

8.9.2 Bolzano-Weierstrass' Theorem


The famous Bolzano-Weierstrass' Theorem is a partial converse of Proposition 322. It is
the deepest result of this chapter, with far-reaching consequences. To state it, we must rst
introduce subsequences. Consider a sequence fxn g. Given a strictly increasing sequence
fnk g1
k=1 that takes on only strictly positive integer values, i.e.,

n1 < n2 < n3 < < nk <

the sequence
fxnk g1
k=1 = fxn1 ; xn2 ; xn3 ; :::; xnk ; :::g

is called a subsequence of fxn g. In words, the subsequence fxnk g is a new sequence con-
structed from the original sequence fxn g by taking only the terms of position nk . A few
examples should clarify.

Example 325 Consider the sequence

1 1 1 1
1; ; ; ; : : : ; ; : : : (8.34)
2 3 4 n

with term xn = 1=n. A subsequence is given by

1 1 1 1
1; ; ; ; : : : ; ;:::
3 5 7 2k + 1
8.9. PROPERTIES OF LIMITS 219

where fnk gk 1 is the sequence of the odd numbers f1; 3; 5; : : :g. Thus, this subsequence has
been constructed by selecting the elements of odd position in the original sequence. Another
subsequence of (8.34) is given by
1 1 1 1 1
; ; ; ;:::; n;:::
2 4 8 16 2

where now fnk gk 1 is formed by the powers of 2, that is, 2; 22 ; 23 ; : : : . This subsequence
is constructed by selecting the elements of the original sequence whose position is a power
of 2. N

Example 326 Consider the alternating sequence xn = ( 1)n . A simple subsequence is


given by
f1; 1; 1; : : : ; 1; : : :g (8.35)
where fnk gk 1 is the sequence of the even numbers. This subsequence has been thus con-
structed by selecting the elements of even position in the original sequence. If we select those
of odd position, we construct the subsequence

f 1; 1; 1; : : : ; 1; : : :g (8.36)

By taking fnk gk 1 = f1000kg, i.e., by selecting only the elements of positions 1; 000, 2; 000,
3; 000, ... we still get the subsequence (8.35). On the other hand, (8.35) is not a subsequence
of (8.34) because the term 1 appears only at the initial position of (8.34). N

A subsequence is obtained by discarding some terms (possibly, in nitely many) of the


original sequence, still keeping an in nite number of them. So, if a sequence is regular, all
its subsequences are regular and with the same limit: ubi maior, minor cessat. More is true:

Proposition 327 A sequence is regular, with limit L 2 R, if and only if all its subsequences
are regular and with the same limit L.

Proof We prove the result for L 2 R, leaving the case L = 1 to the reader. \Only if".
Suppose that fxn g converges to L. Let " > 0. There exists n" 1 such that jxn Lj < "
for every n n" . Let fxnk g1 k=1 be a subsequence of fxn g. Since nk k for every k 1, a
fortiori we have jxnk Lj < " for every k n" , so that fxnk g converges to L.
\If". Suppose that each subsequence of fxn g converges to L. Suppose, by contradiction,
that fxn g does not converge to L. Then, there exists "0 > 0 such that, for every integer
n 1 there exists a position mn n for which xmn 2= B"0 (L), i.e., jxmn Lj "0 . This
implies that the set M = fm 2 N : jxm Lj "0 g contains in nitely many elements. Let
n1 = min fm 2 M : m 1g and, for each k 1, de ne recursively

nk+1 = min fm 2 M : m > nk g

where the set on the right hand side is nonempty because M is in nite. By construction,
nk+1 > nk and jxnk Lj "0 , yielding that fxnk g is a subsequence of fxn g that does not
converge to L. This contradiction allows us to conclude that fxn g converges to L.

In the last example we extracted, from an oscillating sequence, a constant subsequence


by selecting only the elements of even position (or, only those of odd position). So, it might
220 CHAPTER 8. SEQUENCES (SDOGONATO)

well happen that, by suitably selecting the elements, we can extract a convergent \trend"
out of an irregular one. There might be order even in chaos (and method in madness).
Bolzano-Weierstrass' Theorem shows that this is always possible, as long as the sequence is
bounded.

Theorem 328 (Bolzano-Weierstrass) Each bounded sequence has a convergent subse-


quence.

In other words, from any bounded sequence fxn g, even if highly irregular, it is always
possible to extract a convergent subsequence fxnk g, i.e., such that there exists L 2 R for
which limk!1 xnk = L. So, we can always extract convergent behavior from any bounded
sequence, a truly remarkable property.

Example 329 The alternating sequence xn = ( 1)n is bounded because its image is the
bounded set f 1; 1g. By Bolzano-Weierstrass' Theorem, it has at least one convergent
subsequence. Indeed, such are the constant subsequences (8.35) and (8.36). N

The proof of Bolzano-Weierstrass' Theorem is based on the next lemma.

Lemma 330 Each sequence has a monotone subsequence.

Proof Let fxn g be a sequence. De ne the set M = fn 2 N : xn xm for all m ng.17 We


consider two cases.

Case 1: M in nite. We can then de ne fnk g recursively as follows: n1 = min M and

nk+1 = min fn 2 M : n > nk g

for all k 1 where the set on the right hand side is well de ned because M is in nite.
By construction, nk+1 > nk and fnk g M . By the de nition of M , this implies that
xnk+1 xnk , proving that fxnk g is increasing.

Case 2: M nite. Let n1 = max M + 1 if M is nonempty, and n1 = 1 otherwise. Clearly,


n1 2 = M , that is, there exists m > n1 such that xn1 > xm . It follows that the set M1 =
fm 2 N : xn1 > xm and m > n1 g is nonempty. Set n2 = min M1 . De ne recursively nk+1 =
min Mk , where Mk = fm 2 N : xnk > xm and m > nk g. Here Mk is nonempty because nk
n1 . By construction, nk+1 > nk and xnk+1 < xnk , proving that fxnk g is decreasing.

Proof of Bolzano-Weierstrass' Theorem Let fxn g be a bounded sequence. By Lemma


330, there exists a monotone subsequence fxnk g. Since this subsequence is bounded (being
a subsequence of a bounded sequence), Theorem 323 shows that it is convergent, as desired.

For unbounded sequences, it is possible to establish a quite similar property.


17
For example, if xn = ( 1)n for all n 1, then M consists of all the odd numbers. Indeed, if n is odd
it holds xn = 1 ( 1)m = xm for all m n. Conversely, an even n does not belong to M because
xn = 1 > 1 = xn+1 .
8.10. ALGEBRA OF LIMITS 221

Proposition 331 Each unbounded sequence has a divergent subsequence (to +1 if un-
bounded above, to 1 if unbounded below).18

Proof Suppose that the sequence is unbounded above (the other case is similar). Then,
for every K > 0 there exists at least one element of the sequence greater than K. We
denote by xnK the smallest term in the sequence fxn g that turns out to be > K. By taking
K = 1; 2; : : :, the resulting sequence fxnK g is clearly a subsequence of fxn g (indeed, all its
terms have been taken among those of fxn g) that diverges to +1.

Summing up:

Proposition 332 Each sequence has a regular subsequence.

Remarkably, from any sequence, however wild it can be, we can always extract a regular
asymptotic behavior.

O.R. The Bolzano-Weierstrass' Theorem says that it is not possible to take in nitely many
scalars (the elements of the sequence) in a bounded interval in a way that make them (or a
part of them) \well separated" one from the other: necessarily they crowd in the proximity of
(at least) one point. More generally, the last proposition says that there is no way of taking
in nitely many scalars without at least a part of them crowding somewhere (in proximity of
either a nite number or of +1 or of 1; i.e., of some point of R). H

8.10 Algebra of limits


8.10.1 The (many) certainties
In computing limits it is important to know how they behave with respect to the basic
operations on sequences of Section 8.2. Besides its theoretical interest, this is important
operationally because through the basic operations the computation of limits often reduces
to the computation of simpler limits or of some common limits (that we will introduce soon)
or of both.
The next result, based on the properties of the extended real line, shows that limits nicely
interchange with the basic operations (so, the \limit of a sum" is the \sum of the limits", and
so on). Except in the forms of indetermination { i.e., except with respect to the operations
that are indeterminate in the extended real line (Section 1.7).

Proposition 333 Let xn ! L 2 R and yn ! H 2 R. Then:19

(i) xn + yn ! L + H, provided that L + H is not an indeterminate form (1.24), of the type

+1 1 or 1+1
18
If it is both unbounded above and below, it has both a subsequence diverging to +1 and a subsequence
diverging to 1.
19
Recall that xn ! L 2 R indicates that the sequence fxn g either converges to L 2 R or diverges positively
or negatively.
222 CHAPTER 8. SEQUENCES (SDOGONATO)

(ii) xn yn ! LH, provided that LH is not an indeterminate form (1.25), of the type

1 0 or 0 ( 1)

(iii) xn =yn ! L=H provided that eventually yn =


6 0 and that L=H is not an indeterminate
form (1.26), of the type20
1 a
or
1 0
Proof (i) Let xn ! L and yn ! H, with L; H 2 R. This means that, for every " > 0, there
exist n1 and n2 such that

L " < xn < L + " 8n n1 and H " < yn < H + " 8n n2

By adding the inequalities member by member, for every n n3 = max fn1 ; n2 g we have

L+H 2" < xn + yn < L + H + 2"

Since 2" is arbitrary, it follows that xn + yn ! L + H.


Now let xn ! L 2 R and yn ! +1. This means that, for every " > 0 and for every
K > 0, there exist n1 and n2 such that

L " < xn < L + " 8n n1 and yn > K 8n n2

By adding, we have, for every n n3 = max fn1 ; n2 g,

xn + yn > K + L "

Since K + L " > 0 is arbitrary, it follows that xn + yn ! +1. The other cases with in nite
limit are treated similarly.

(ii) Let xn ! L and yn ! H, with L; H 2 R. This means that, for every " > 0, there
exist n1 and n2 such that

L " < xn < L + " 8n n1 and H " < yn < H + " 8n n2

Moreover, being convergent, fyn g is bounded (recall Proposition 322): there exists b > 0
such that jyn j b for every n. Now, for every n n3 = max fn1 ; n2 g,

jxn yn LHj = jyn (xn L) + L (yn H)j jyn j jxn Lj + jLj jyn Hj < " (b + jLj)

By the arbitrariness of " (b + jLj), we conclude that xn yn ! L H.


If L > 0 and H = +1, then in addition to having, for every " 2 (0; L),

L " < xn < L + " 8n n1

we also have, for every K > 0, yn > K for every n n2 . It follows that, for every
n n3 = max fn1 ; n2 g,
xn yn > (L ") K
20
Note that a=0 is equivalent to H = 0.
8.10. ALGEBRA OF LIMITS 223

By the arbitrariness of (L ") K > 0, we conclude that xn yn ! +1. If L < 0 and H = +1,
we have xn yn < (L + ") K, with " 2 (0; L). By the arbitrariness of (L + ") K < 0, we
conclude that xn yn ! 1. The other cases of in nite limits are treated in an analogous
way.

Finally, we leave point (iii) to the reader.

Example 334 (i) Let xn = n= (n + 1) and yn = 1 + ( 1)n =n. Since xn ! 1 and yn ! 1,


we have xn + yn ! 1 + 1 = 2 and xn yn ! 1.
(ii) Let xn = 2n and yn = 1 + ( 1)n =n. Since xn ! +1 and yn ! 1, we have
xn + yn ! +1 and xn yn ! +1. N

The following result shows that the case a=0 of point (iii) with a 6= 0 is actually not
indeterminate for the algebra of limits, although it is so for the extended real line (as seen
in Section 1.7).

Proposition 335 Let xn ! L 2 R, with L 6= 0, and yn ! 0. The limit of the sequence


xn =yn exists if and only if the sequence fyn g eventually has constant sign.21 In such a case:

(i) if either L > 0 and yn ! 0+ or L < 0 and yn ! 0 , then


xn
! +1
yn

(ii) if either L > 0 and yn ! 0 or L < 0 and yn ! 0+ , then


xn
! 1
yn

This proposition does not, unfortunately, say anything for the case a = 0, that is, for the
indeterminate form 0=0.

Proof Let us prove the \only if" part (we leave to the reader the rest of the proof). Let
L > 0 (the case L < 0 is similar). Suppose that the sequence fynng does
o not have eventually
constant sign. Hence, there exist two subsequences fynk g and yn0k such that ynk ! 0+
and yn0k ! 0 . Therefore, xnk =ynk ! +1 while xnk =yn0k ! 1. Since two subsequences of
xn =yn have distinct limits, Proposition 327 shows that the sequence xn =yn has no limit.

Example 336 (i) Take xn = 1=n 2 and yn = 1=n. We have xn ! 2 and yn ! 0.


Since fyn g has always (and therefore also eventually) positive sign, the proposition yields
xn =yn ! 1.
(ii) Take xn = 1=n + 3 and yn = ( 1)n =n. In this case xn ! 3, but yn ! 0 with
alternating signs, that is, yn has not eventually constant sign. Thanks to the proposition,
the sequence fxn =yn g has no limit. N
21
That is, its terms are eventually either all positive or all negative.
224 CHAPTER 8. SEQUENCES (SDOGONATO)

Summing up, in view of the last two propositions we have the following indeterminate
forms for the limits:
+1 1 or 1+1 (8.37)

which is often denoted by just writing 1 1;

1 0 or 0 ( 1) (8.38)

which is often denoted by just writing 0 1; and

1 0
or (8.39)
1 0

which are often denoted by just writing 1=1 and 0=0. Section 8.10.3 will be devoted to
these indeterminate forms.

Besides the basic operations, the next result shows that limits nicely interchange also
with the power (and the root, which is a special case), the exponential, and the logarithm.
Indeed, (13.10) of Chapter 13 will show that such nicely interchange holds, more generally,
for all functions that { like the power, exponential, and logarithm functions { are continuous.
We thus omit the proof of the next result.

Proposition 337 Except in the indeterminate forms (1.27), that is,

1
1 ; 00 ; (+1)0

we have:22

(i) lim xn = (lim xn ) provided 2 R and xn > 0;

(ii) lim xn = lim xn provided > 0;

(iii) lim loga xn = loga lim xn provided a > 0 and a 6= 1.

We have, therefore, also the following indeterminate forms for the limits:

1
1

which is often denoted by 11 ;


(+1)0

which is often denoted by 10 ; and


00
22
From now on, since there is no danger of confusion, we will simply write lim xn instead of limn!1 xn .
Indeed, the limit of a sequence is de ned only for n ! 1, so we can safely omit this detail.
8.10. ALGEBRA OF LIMITS 225

8.10.2 Some common limits


We introduce two basic sequences (one being the reciprocal of the other). From their limit
behavior we will then deduce many other limits thanks to the algebra of limits (Propositions
333 and 337).
For the sequence xn = n , we have

lim n = +1

because n > K for every n [K] + 1.

For the \reciprocal" harmonic sequence xn = 1=n, we have


1
lim =0
n
because 0 < 1=n < " for every n [1="] + 1.

As anticipated, from these two elementary limits we can infer, via the algebra of limits,
many other ones. Speci cally:

(i) lim n = +1 for every > 0;

(ii) lim (1=n) = lim n = 0+ for every > 0; therefore,


8
< +1 if > 0
lim n = 1 if = 0
: +
0 if < 0

(iii) we have: 8
< +1 if > 1
n
lim = 1 if = 1
: +
0 if 0 < < 1

+1 if > 1
lim log n =
1 if 0 < < 1
Many other limits hold; for example,
7
lim 5n + n2 + 1 = +1 + 1 + 1 = +1

as well as
3 1
lim n2 3n + 1 = lim n2 1 + 2 = +1 (1 0 + 0) = +1
n n
5 7
n2 5n 7 n2 1 n n2 1 0 0 1
lim = lim 4 6 = =
2n2 + 4n + 6 n2 2 + n + n2
2+0+0 2
1
5 n
lim = [0 (5 0)] = 0
2n2
226 CHAPTER 8. SEQUENCES (SDOGONATO)

and
n (n + 1) (n + 2) n n 1 + n1 n 1 + n2
lim = lim 1 2 4
(2n 1) (3n 2) (5n 4) 2n 1 2n 3n 1 3n 5n 1 5n
1 2
1+ n 1+ n
= lim 1 2 4
30 1 2n 1 3n 1 5n
1 1 1
= =
30 1 1 1 30

8.10.3 Indeterminate forms for the limits


In the previous section we have carefully avoided the indeterminate forms of the limits (8.37)-
(8.39) because in such cases we cannot say, in general, anything. For instance, the limit of
the sum of two sequences whose limits are in nite of opposite sign can be nite, in nite or
even not exist, as the examples below will show. Such limit is thus \indeterminate" based
on the information that the two summands diverge to +1 and to 1, respectively.
Fortunately, in many cases such indeterminacies do not arise and the limit of a sequence
can be computed via the algebra of limits established in Propositions 333 and 337. For
instance, if xn ! 5 and yn ! 3, then xn + yn ! 5 + ( 3) = 2 and xn yn ! 5 ( 3) = 15.
Indeed, these limits involve operations on the extended real line that are well-de ned, so the
algebra of limits is e ective.
That said, when we come across an indeterminate form, the algebra of limits is useless:
we need to roll up our sleeves and work on the speci c limit at hand. There are no shortcuts.

Indeterminate form 1 1
Consider the indeterminate form 1 1. For example, the limit of the sum xn + yn of the
sequences xn = n and yn = n2 falls under this form of indetermination, so one cannot
resort to the algebra of limits. We have, however,
xn + yn = n n2 = n (1 n)
where n ! 1 and 1 n ! 1, so that, being in the case +1 ( 1), it follows that
xn + yn ! 1. Through a very simple algebraic manipulation, we have been able to nd
our way out of the indeterminacy.
Now take xn = n2 and yn = n. Also in this case, the limit of the sum xn + yn falls
under the indeterminacy 1 1. By proceeding as we just did, this time we get
lim (xn + yn ) = lim n (n 1) = lim n lim (n 1) = +1
1
Next, take xn = n and yn = n n, still of type 1 1. Here again, a simple calculation
allows us to nd a way out:
1 1
lim (xn + yn ) = lim n + n = lim =0
n n
Finally, take xn = n2 + ( 1)n n and yn = n2 , which is again of type 1 1 since xn ! +1
because xn n2 n = n (n 1). Now, the limit
lim (xn + yn ) = lim ( 1)n n
8.10. ALGEBRA OF LIMITS 227

does not exist.

In sum, when we have an indeterminate form 1 1, the limit might be either +1 or


1 or nite or nonexistent. In other words, everything goes. So, just to remark that the
case at hand is of type 1 1 does not allow us to say anything on the limit of the sum.23
We have to study carefully the two sequences and come up, each time, with a way to get out
of the indeterminacy (as we have seen in the simple examples just discussed). The same is
true for the other indeterminate forms, as it will be seen next.

Indeterminate form 0 1
Let, for example, xn = 1=n and yn = n3 . The limit of their product has the indeterminate
form 0 1, so we cannot use the algebra of limits. We have, however,
1
lim xn yn = lim n3 = lim n2 = +1
n
If xn = 1=n3 and yn = n, then
1 1
lim xn yn = lim 3
n = lim 2 = 0
n n
If xn = n3 and yn = 7=n3 , then
7
lim xn yn = lim n3 =7
n3
If xn = 1=n and yn = n(cos n + 2),24 then

lim xn yn = lim(cos n + 2)

does not exist.


Again, everything goes. Only the direct calculation of the limit at hand can determine
its value.

Indeterminate forms 1=1 and 0=0


Consider, for example, xn = n and yn = n2 . The limit of their ratio has the form 1=1, but
xn n 1
lim = lim 2 = lim = 0
yn n n

On the other hand, by exchanging xn with yn , the indeterminate form 1=1 remains but

yn n2
lim = lim = lim n = +1
xn n
23
In contrast, if the case were, say, of type 1 + a with a 6= 1, then { even without knowing the speci c
form of the two sequences { the algebra of limits (speci cally, Proposition 333-(i)) would allow us to conclude
that the limit of their sum is 1.
24
Using the comparison criterion, that we will study soon (Theorem 338), it is possible to prove easily that
yn ! +1 since yn n.
228 CHAPTER 8. SEQUENCES (SDOGONATO)

with a limit altogether di erent from the previous one.25


Another example 1=1 is given by xn = n2 and yn = 1 + 2n2 . We have

xn n2 1 1
lim = lim = lim 1 =
yn 1 + 2n2 n2
+2 2

That said, if xn = n2 (sin n + 7) and yn = n2 , then

xn
lim = lim (sin n + 7)
yn

which does not exist. Everything goes.


Naturally, the same is true for the indeterminate form 0=0. For example, let xn = 1=n
and yn = 1=n2 . We have
1
xn
lim = lim n1 = lim n = +1
yn n2

whereas, by exchanging the role of xn and yn , we have


1
yn n2 1
lim = lim 1 = lim =0
xn n
n

The indeterminate form 1=1 and 0=0 are closely connected: if the limit of the ratio
of the sequences fxn g and fyn g falls under the indeterminate form 1=1, then the limit of
the ratio of the sequences f1=xn g and f1=yn g falls under the indeterminate form 0=0. The
vice versa requires convergence to 0 from either above or below (cf. Proposition 313). The
reader should think of xn = ( 1)n =n which converges to 0, while the reciprocal 1=xn wildly
oscillates and so does not converge.

8.10.4 Summary tables


We can summarize what we learned on the algebra of limits in three tables. In them, the
rst row indicates the limit of the sequence fxn g, and the rst column indicates the limit of
the sequence fyn g.
We start with the limit of the sum: the cells report the value of lim (xn + yn ); we write
?? in case of indeterminacy.

sum +1 L 1
+1 +1 +1 ??
H +1 L+H 1
1 ?? 1 1

We have two indeterminate cases out of nine.


25
Since xn =yn = 1= (yn =xn ), for the two limits Proposition 313 holds.
8.10. ALGEBRA OF LIMITS 229

Turn to the product: the cells now report the value of lim xn yn .
product +1 L>0 0 L<0 1
+1 +1 +1 ?? 1 1
H>0 +1 LH 0 LH 1
0 ?? 0 0 0 ??
H<0 1 LH 0 LH +1
1 1 1 ?? +1 +1
Here there are four indeterminate cases out of twenty- ve.
Finally, for the ratio we have the following table, where the cells report the value of
lim (xn =yn ).
ratio +1 L > 0 0 L<0 1
+1 ?? 0 0 0 ??
L L
H>0 +1 H 0 H 1
0 1 1 ?? 1 1
L L
H<0 1 H 0 H +1
1 ?? 0 0 0 ??
In view Proposition 335, in the third row we assumed that yn tends to 0 from above, yn ! 0+ ,
or from below, yn ! 0 . In turn, this determines the sign of the in nity; for example,
1 1
lim 1 = lim n = +1 and lim 1 = lim ( n) = 1
n n

For the ratio, we thus have ve indeterminate cases out of twenty- ve.

The tables make it clear that in the majority of the cases we can rely upon the algebra
of limits (in particular, Propositions 333 and 337). Only relatively few case are actually
indeterminate.

O.R. The case 0 1 is not indeterminate. Clearly, it is shorthand notation for lim xynn ,
where the base is a positive sequence approaching 0 (more precisely, 0+ ) and the exponent
is a divergent sequence. We can set 0+1 = 0: if we multiply 0 by itself \in nitely many
times" we still get a zero (a \zerissimo", if you wish). The form 0 1 is the reciprocal, so
0 1 = +1. H

8.10.5 How many indeterminate forms are there?


We mentioned seven indeterminate forms:
1 0
; ; 0 1; 1 1; 00 ; 10 ; 11
1 0
They are actually all connected. We could regard, for example, 0 1 (or any other) as the
basic indeterminate form and reduce all the other ones to it. Indeed:
230 CHAPTER 8. SEQUENCES (SDOGONATO)

(i) If xn ; yn ! 1, their ratio xn =yn appears in the form 1=1, but it is su cient to write
the ratio as
1
xn
yn
to get the form 0 1.

(ii) If xn ; yn ! 0, their ratio xn =yn appears in the form 0=0, but it is su cient to write
the ratio as
1
xn
yn
to get the form 0 1.

(iii) If xn ! 1 and yn ! 1, their sum xn + yn appears in the form 1 1. However,


we can write
yn
xn + yn = 1+ xn
xn
If yn =xn does not tend to 1, the form is no longer indeterminate, while if yn =xn ! 1
then the form is of the type 0 1.

(iv) For the last three cases it is su cient to consider the logarithm to end up, again, in
the case 0 1. Indeed:

log 00 = 0 log 0 = 0 ( 1) ; log 10 = 0 log 1 = 0 1; log 11 = 1 log 1 = 1 0

The reader can try to reduce all the forms of indeterminacy to either 0=0 or 1=1.

8.11 Convergence criteria


The computation of limits can be rather tedious and, in many cases, might not be that easy.
In these cases, results that establish su cient conditions for convergence { the so-called
convergence criteria { are most useful.26

8.11.1 Comparison criterion


We start with the classic comparison criterion: when two sequences converge to the same
limit, then the same is true for any sequence whose terms are \sandwiched" between those
of the two original sequences.

Theorem 338 (Comparison criterion) Let fxn g, fyn g, and fzn g be three sequences. If,
eventually,
yn xn zn (8.40)
and
lim yn = lim zn = L 2 R (8.41)
then
lim xn = L
26
In this book the term \criterion" (or \test") will be always understood as \su cient condition".
8.11. CONVERGENCE CRITERIA 231

We can think of fxn g as a convict who is escorted by the two policemen fyn g and fzn g
(one on each \side"), so he is forced to go wherever they go.

Proof Suppose L 2 R (we leave to the reader the case L = 1). Let " > 0. From (8.41) it
follows, by De nition 303, that there exists n1 such that yn 2 B" (L) for every n n1 , and
there exists n2 such that zn 2 B" (L) for every n n2 . Finally, let n3 be the position starting
from which one has yn xn zn . Setting n = max fn1 ; n2 ; n3 g, we then have yn 2 B" (L),
zn 2 B" (L), and yn xn zn for every n n. So,

L " < yn xn zn < L + " 8n n

that is, xn 2 B" (L) for every n n. Hence, xn ! L as claimed.

The typical use of this result is in proving the convergence of a given sequence by showing
that it can be \trapped" between two suitable convergent sequences.

Example 339 (i) Consider the sequence xn = n 2 sin2 n. Since 1 sin n 1 for every
n 1, we have 0 sin2 n 1 for every n 1. So,

sin2 n 1
0 8n 1
n2 n2

(ii) Consider the sequences yn = 0 and zn = 1=n2 . Conditions (8.40) and (8.41) hold with
L = 0. By the comparison criterion, we conclude that lim xn = 0. N

Example 340 The sequence xn = n 1 sin n converges to 0. Indeed,

1 sin n 1
8n 1
n n n
and both sequences f1=ng and f 1=ng converge to 0. N

The previous example suggests that, if fxn g is a bounded sequence, say k xn k for
all n 1, and yn ! +1 or yn ! 1, then
xn
!0
yn

Indeed, we have
k xn k
jyn j yn jyn j
and k= jyn j ! 0.

8.11.2 Ratio criterion


The ratio and root criteria are often useful to establish that a sequence is in nitesimal. They
will be used also for the convergence of series, as we will see in next chapter. Let us begin
with the ratio criterion.
232 CHAPTER 8. SEQUENCES (SDOGONATO)

Theorem 341 (Ratio criterion) If there exists a scalar q < 1 such that, eventually,

xn+1
q (8.42)
xn

then lim xn = 0.

Condition (8.42) requires that the sequence of the absolute values jxn j to be eventually
strictly decreasing, i.e., eventually jxn+1 j < jxn j =
6 0. By Corollary 324, we then have jxn j # L
for some L 0. The theorem claims that, indeed, L = 0.

Proof Suppose that the inequality holds starting from n = 1 (if it held from a certain n
onwards, just recall that eliminating a nite number of terms does not alter the limit). By
Proposition 306, it is enough to prove that jxn j ! 0. From (8.42), it follows jxn+1 j q jxn j.
In particular, by iterating this inequality from n = 1 we have:

jx2 j q jx1 j ; jx3 j q jx2 j q 2 jx1 j ; ; jxn j qn 1


jx1 j ;

So,
0 jxn j qn 1
jx1 j 8n 2
Since 0 < q < 1, we have q n 1 ! 0. So, by the comparison criterion we get jxn j ! 0.

Note that the theorem does not simply require the ratio jxn+1 =xn j to be < 1, that is,

xn+1
<1
xn

but that it be \far from it", i.e., smaller than a number q which, in turn, is itself smaller
than 1. The next example clari es this observation.

Example 342 The sequence xn = ( 1)n (1 + 1=n) does not converge { indeed, the subse-
quence of its terms of even positions tends to +1, whereas that of its terms of odd positions
tends to 1. Yet:
1
xn+1 1 + n+1 n2 + 2n
= = <1
xn 1 + n1 n2 + 2n + 1
for every n 1. N

Though stated as a criterion to establish whether a sequence is in nitesimal, the ratio


criterion is important for the general study of convergence because of the special status that
Proposition 306 gives in nitesimal sequences. Indeed, by that proposition we have xn ! L
if and only if jxn Lj ! 0, so by the ratio criterion we have

xn+1 L
q =) xn ! L
xn L

The ratio criterion (and also the root criterion that we will see soon) thus applies, mutatis
mutandis, to the study of any convergence xn ! L.
8.11. CONVERGENCE CRITERIA 233

An important case when condition (8.42) holds is when the ratio jxn+1 =xn j has a limit,
and such limit is < 1, that is,
xn+1
lim <1 (8.43)
xn
Indeed, denote by L this limit and let " > 0 be such that L + " < 1. By the de nition of
limit, eventually we have
xn+1
L <"
xn
that is, L " < jxn+1 =xn j < L + ". Therefore, by setting q = L + " it follows that eventually
jxn+1 =xn j < q, which is property (8.42).
The limit form (8.43) is actually the most common form in which the ratio criterion is
applied. The next common limits illustrate its use:
(i) For any > 1 and k 2 R, we have
nk
lim n
=0 (8.44)

Indeed, set
nk
xn = n

By taking the ratio of two consecutive terms (the absolute value is here irrelevant since
all terms are positive), we have
k k
xn+1 (n + 1)k n n+1 1 1 1 1
= n+1
= = 1+ ! <1
xn nk n n

(ii) If k 2 R and yn ! +1, then


logk yn
lim =0
yn
Indeed, by setting yn = ezn we get back to the previous case. In particular,
logk n log n
lim = lim =0
n n
O.R. What precedes indicates a hierarchy among the following classes of divergent sequences:
n
with > 1; nk with k > 0; logk n with k > 0 (8.45)
The \strongest" are the exponentials, graded according to the base , then the powers follow,
graded according to the exponent k, and, nally, the logarithms, graded according to the
exponent k. For example, we have
5n 6 2n n123 + 7n87 n36 log n ! +1
since the sequence inherits the behavior of 5n , while we have
n4 3n3 + 6n2 4 1
4 3 2
!
5n + 7n + 25n + 342 5
because the numerator inherits the behavior of n4 and the denominator that of 5n4 .
Soon, in Section 8.14 we will make rigorous these observations on limits based on the
rate of convergence (or divergence). H
234 CHAPTER 8. SEQUENCES (SDOGONATO)

8.11.3 Root criterion


Next we turn to the second convergence criterion for in nitesimal sequences.

Theorem 343 (Root criterion) If there exists a scalar q < 1 such that, eventually,
p
n
jxn j q (8.46)
then lim xn = 0.

The strict inequality


p q < 1 is, again, key: the constant sequence xn = 1 does not converge
to 0 although n jxn j 1 for every n.

Proof As in the previous proof, suppose that (8.46) holds starting with n = 1. From
p
n
jxn j q
we immediately get jxn j q n , i.e., q n xn q n . Since 0 < q < 1, then q n ! 0, so the
result follows from the comparison criterion.

For the root criterion we can make observations similar to those n pthatowe made for the
ratio criterion. In particular, property (8.46) holds if the sequence n jxn j has a limit, and
such limit is < 1, that is, p
lim n jxn j < 1 (8.47)
This limit form is the most common with which the criterion is applied.

Example 344 Given k 2 R, let


n
n2 + 3
xn = k+
n3
p
Then, lim n
jxn j = jkj, so the root criterion implies lim xn = 0 as long as jkj < 1. N

The next simple example shows that both the ratio and the root criteria are su cient,
but not necessary, conditions for convergence. However useful, they might turn out to be
useless to establish the convergence of some sequences.

Example 345 The harmonic sequence xn = 1=n converges to 0. It is hard to think of a


simpler limit. However, we have
xn+1 n
= !1
xn n+1
and so the ratio criterion is not applicable. Furthermore, since
r
n 1 1 log n
log = log n n = !0
n n
we have r
p
n 1 n
jxn j = !1
n
Then, also the root criterion is not applicable. In sum, neither criterion is of any use for
such a simple limit. N
8.12. THE CAUCHY CONDITION 235

Finally, note that both sequences xn = 1=n and xn = ( 1)n n satisfy condition

xn+1
!1
xn

although the rst sequence converges to 0 and the second one does not converge at all.
Therefore, this condition does not allow us to draw any conclusion about the asymptotic
behavior of a sequence. The same is true for the condition
p
n
jxn j ! 1

Indeed, it is enough to look again at the sequences xn = 1=n and xn = ( 1)n n. All this
con rms the key importance of the \strict" clause < 1 in (8.43) and (8.47). The next classic
limit further illustrates this remark.
p
Proposition 346 For every k > 0, we have lim n k = 1.
p
n
This can
p be proved by setting y n = log k = (log k) =n ! 0. By Proposition 337, we
n y
then have k =e ! 1. Below, we o er a proof that proceeds through more elementary
n

methods.

Proof The result is obvious


p for k = 1. Let k > 1. For any n, let xn > 0 be such that
(1 + xn )n = k, so that n k = 1 + xn . pFrom Newton's binomial formula (B.7), we have
nxn k, and so xn ! 0. It follows that n k ! 1. p p
Now, let k < 1. From what just seen, we have n 1=k ! 1, so thep sequence n 1=k
is bounded (Proposition
p 322). This, in turn, implies that the sequence n k is bounded as
n
well, say 0 k K for some scalar K > 0. By the comparison criterion, the equality
p
n
p p
k 1 = n
1=k 1 n k implies
r r
p
n n 1 p
n n 1
0 k 1 = 1 k K 1 !0
k k
p
n
So, lim k = 1.

8.12 The Cauchy condition


To check whether a sequence converges amounts to compute its limit, a \guess and verify"
procedure in which we rst posit a candidate limit and then we check whether it is indeed
a limit (Section 8.8.1). It is often not so easy to implement this procedure, so to check
convergence.27 Moreover, the limit is an object which is, in a sense, \extraneous" to the
sequence because, in general, it is not a term of the sequence. Therefore, to establish the
convergence of a sequence we have to rely on a \stranger" that, in addition, might even be
di cult to identify.
For this reason, it is important to have an \intrinsic" criterion for convergence that only
makes use of the terms of the sequence, without involving any extraneous object. To see how
27
The role of little birds' suggestions in the \guess" part is especially troublesome.
236 CHAPTER 8. SEQUENCES (SDOGONATO)

to do this, consider the following simple intuition: if a sequence converges, then its elements
become closer and closer to the limit; but, if they become closer and closer to the limit, then
as a by-product they also become closer and closer one another. The next result formalizes
this intuition.

Theorem 347 (Cauchy) A sequence fxn g is convergent if and only if it satis es the Cauchy
condition, that is, for each " > 0 there exists an integer n" 1 such that

jxn xm j < " 8n; m n" (8.48)

Sequences that satisfy the Cauchy condition are called Cauchy sequences. The Cauchy
condition is an intrinsic condition that only involves the terms of the sequence. According
to the theorem, a sequence converges if and only if it is Cauchy. Thus, to determine whether
a sequence converges it is enough to check whether it is Cauchy, something that does not
require to consider any extraneous object and just rely on the sequence itself.
But, as usual, there are no free meals: checking that a sequence is Cauchy informs us
about its convergence, but it does not say anything about the actual limit point. To nd it,
we need to go back to the usual procedure that requires that a candidate be posited.

Proof \Only if". If xn ! L then, by de nition, for each " > 0 there exists n" 1 such that
jxn Lj < " for every n n" . This implies that, for every n; m n" ,

jxn xm j = jxn L+L xm j jxn Lj + jxm Lj < " + " = 2"

Since " was arbitrarily chosen, the statement follows.


\If". If jxn xm j < " for every n; m n" , it easily follows that jxn xn" j < " for
n = n" + 1; n" + 2; : : :, that is,

xn" " < xn < xn" + " for n = n" + 1; n" + 2; : : :

Set A = fa 2 R : xn > a eventuallyg and B = fb 2 R : xn < b eventuallyg. Note that:

(i) A and B are not empty. Indeed, we have xn" " 2 A and xn" + " 2 B.

(ii) If a 2 A and b 2 B, then b > a. Indeed, since a 2 A (respectively, b 2 B), there


exists na 1 such that xn > a for every n na (resp., there exists nb 1 such that
b > xn for every n nb ). De ne n = max fna ; nb g. It follows that b > xn > a.

(iii) We have sup A = inf B. Indeed, by the Least Upper Bound Principle and by the
previous two points, sup A and inf B are well-de ned and are such that sup A inf B.
Since, by point (i), xn" " 2 A and xn" + " 2 B, we have xn" " sup A inf B
xn" + "; in particular, jinf B sup Aj 2". Since " can be chosen to be arbitrarily
small, we then have jinf B sup Aj = 0, that is, inf B = sup A.

Call z the common value of sup A and inf B. We claim that xn ! z. Indeed, by xing
arbitrarily a number > 0, there exist a 2 A and b 2 B such that 0 b a < and,
therefore,
z <a<b<z+
8.12. THE CAUCHY CONDITION 237

because a z b, and so z < a and b < z + . But, by the de nition of A and B, the
sequence is eventually strictly larger than a and strictly smaller than b. So, eventually,

z < xn < z +

Due to the arbitrary choice of , this shows that xn ! z, as desired.

Example 348 (i) The harmonic sequence xn = 1=n is Cauchy. Indeed, let " > 0. We have
to show that there exists n" 1 such that for every n; m n" one has jxn xm j < ".
Without loss of generality, assume that n m. Note that for n m we have
1 1 1
0 < jxn xm j = <
m n m
Since " > 1=m amounts to m > 1=", by choosing n" = [1="] + 1 we have jxn xm j < " for
every n m n" , thus proving that xn = 1=n is a Cauchy sequence.
(ii) The sequence xn = log n is not Cauchy. Suppose, by contradiction, that for a xed
" > 0 there exists n" 1 such that for every n; m n" we have jxn xm j < ". First, note
that if n = m + k with k 2 N, we have
m+k
jxn xm j = log < " () k < m(e" 1)
m

Thus, by choosing k = [m(e" 1)] + 1 and m n" , we obtain jxn xm j = log m+k m ".
This contradicts jxn xm j < " since n; m n" . We conclude that xn = log n is not a Cauchy
sequence. N

The previous theorem states a fundamental property of convergent sequences, yet its
relevance is also due to the structural property of the real line that it isolates, the so-called
completeness of the real line. For example, let us assume { as it was the case for Pythagoras
{ that we only knew the rational numbers: so, the space on which we operate is Q. Consider
the sequence whose elements (all rationals) are the decimal approximations of :

x1 = 3, x2 = 3:1, x3 = 3:14, x4 = 3:141, x5 = 3:1415, :::

Being a decimal approximation, this sequence satis es the Cauchy condition because the
inequality
jxn xm j < 10 minfm 1;n 1g
can be made arbitrarily small. The sequence, however, does not converge to any point of Q:
if we knew R, we could say that it converges to . Therefore, in Q the Cauchy condition is
necessary, but not su cient, for convergence. The reason is that Q has not \enough points"
to handle well convergence, unlike R. For instance, the previous sequence converges in R
because of the point , which is missing in Q. We thus say that R is complete (with respect
to convergence), while Q is incomplete. Indeed, R can be seen as a way to complete Q by
adding all the missing limit points, like , as readers will learn in more advanced courses.

We close with a remark on the proof of Cauchy's Theorem. The \if" is its more di cult
part and we proved it via the Least Upper Bound Principle. Next we report a di erent,
arguably more illuminating, proof.
238 CHAPTER 8. SEQUENCES (SDOGONATO)

Alternative proof Assume that the sequence fxn g is Cauchy, i.e., for each " > 0, there
exists n" such that jxn xm j < " for every n; m n" . We want to prove that fxn g converges.
We start by proving that it is bounded. Setting " = 1, there exists n1 1 such that
jxn xm j < 1 for all n; m n1 . Hence, for each n n1 we have:
jxn j = jxn xn1 + xn1 j jxn xn1 j + jxn1 j < 1 + jxn1 j
which implies that the sequence fxn g is bounded. This allows us to de ne two scalar se-
quences fzn g and fyn g by setting
zn = inf xk and yn = sup xk (8.49)
k n k n

By construction, fzn g is an increasing sequence and fyn g is a decreasing sequence. Moreover,


inf xn zn xn yn sup xn 8n 1
n n

Since fzn g and fyn g are bounded and monotone, both limits y = lim yn and z = lim zn exist.
Let " > 0. Since fzn g and fyn g are convergent, there exist ny ; nz 1 such that jy yn j < "=5
for all n ny and jzn zj < "=5 for all n nz . On the other hand, since fxn g is Cauchy
there exists nx 1 such that jxn xm j < "=5 for all n; m nx . Let n = max fnx ; ny ; nz g.
In view of (8.49), there exist nxy ; nxz n such that yn xnxy < "=5 and xnxz zn < "=5.
Since nxy ; nxz n nx , we have xnxy xnxz < "=5. Being n ny ; nz , we conclude that

jy zj = jy yn + y n zn + zn zj jy yn j + jyn zn j + jzn zj
= jy yn j + yn xnxy + xnxy xnxz + xnxz zn + jzn zj
jy yn j + yn xnxy + xnxy xnxz + jxnxz zn j + jzn zj
" " " " "
< + + + + ="
5 5 5 5 5
Since " > 0 was arbitrarily chosen, this yields that y = z. That is,
lim zn = z = y = lim yn
Being zn xn yn for all n 1, this implies that lim xn exists (and is equal to z).

The sequences fzn g and fyn g used in the proof are instances of the two fundamental
notions of inferior and superior limits that will be studied in Section 10.1. Speci cally, as it
will be seen, we have lim sup xn = lim yn and lim inf xn = lim zn .

8.13 Napier's constant


The limit of the sequence
1 n
xn = 1+ (8.50)
n
involves the indeterminate form 11 , so the algebra of limits is useless and we have to study
it directly.
The next result proves that the limit exists and is, indeed, a fundamental number, denoted
by e and called Napier's constant.28
28
The notation e is due to Euler.
8.13. NAPIER'S CONSTANT 239

Theorem 349 The sequence (8.50) is convergent. Its limit is denoted by e, i.e.,
n
1
e = lim 1 + (8.51)
n

Since the sequence involves powers, the root criterion is a rst possibility to consider to
prove the result. Unfortunately,
s
n 1 n 1
1+ =1+ !1
n n

and, therefore, this criterion cannot be applied. The proof is based, instead, on the following
classic inequality.

Lemma 350 Let 1 < a 6= 0. We have, for every n > 1,29

(1 + a)n > 1 + an (8.52)

Proof The proof is done by induction. Inequality (8.52) holds for n = 2. Indeed, for each
a 6= 0 we have:
(1 + a)2 = 1 + 2a + a2 > 1 + 2a
Suppose now that (8.52) holds for some n 2 (induction hypothesis), i.e.,

(1 + a)n > 1 + an

We want to prove that (8.52) holds for n + 1. We have:

(1 + a)n+1 = (1 + a)(1 + a)n > (1 + a)(1 + an)


= 1 + a(n + 1) + a2 n > 1 + a(n + 1)

where the rst inequality, due to the induction hypothesis, holds because a > 1. This
completes the induction step.

Proof of Theorem 349 Set, for each n 1,


n n+1
1 1
an = 1+ ; bn = 1+
n n
We proceed by steps.

Step 1: fbn g is decreasing. Clearly, b1 > b2 . Moreover, for n 2 we have


2 3n " #
1 n+1 1 n+1 n
bn 1+ n 1 4 1+ n 5 1 n
= n = 1+ = 1+ n
bn 1 1+ 1 n 1+ 1 n n 1
n 1 n 1
n 1
1 (n + 1) (n 1) 1+ n
= 1+ = n
n n2 1+ 1
n2 1

29
For n = 1, equality holds trivially.
240 CHAPTER 8. SEQUENCES (SDOGONATO)

and, using the inequality (8.52),30 we see that


n
1 n n 1
1+ >1+ >1+ =1+
n2 1 n2 1 n 2 n

So, bn =bn 1 < 1.

Step 2: fan g is increasing. Clearly, a1 < a2 . Moreover, for n 2 we have


n
1 n n+1 n n 1 n n2 1 1 n
an 1+ n n n n2 1 n2
= n 1 = n 1 = 1 = 1
an 1 1 1 1
1+ n 1
n n n

and, again by the inequality used above with a = 1=n2 ,


n
1 n 1
1 >1 =1
n2 n2 n

we see that an =an 1 > 1.

Step 3: bn > an for every n and, moreover, bn an ! 0. Indeed


!
n+1 n n+1
1 1 1 1
bn an = 1+ 1+ = 1+ 1 1
n n n 1+ n
n+1
1 1 1
= 1+ = bn >0
n n+1 n+1

Given that bn < b1 , one gets that

bn b1
0 < bn an = < !0
n+1 n+1

By step 1, the sequence fbn g is decreasing and bounded below (being positive). So,
lim bn = inf bn . By step 2, the sequence fan g is increasing and, being an < bn for each
n (step 3), is bounded above. Hence, lim an = sup an . Since bn an ! 0 (step 3), from
bn inf bn sup an an it follows sup an = inf bn , so lim an = lim bn .

One obtains
a1 = 21 = 2 b1 = 22 = 4
3 2 3 3
a2 = 2 = 2:25 b2 = 2 = 3:375

11 10 11 11
a10 = 10 ' 2:59 b10 = 10 ' 2:85
Therefore, Napier's constant lies between 2:59 and 2:85. Indeed, it is equal to 2:71828:::
30
Note that 1 < 1= n2 1 6= 0 for n 2.
8.13. NAPIER'S CONSTANT 241

Later we will prove that it is an irrational number (Theorem 400). It can be proved that it
is actually a transcendental number.31
Napier's constant is, inter alia, the most convenient base of exponential and logarithmic
functions (Section 6.5.2). Later in the book we will see that it can be studied from di erent
angles: as many important mathematical entities, Napier's constant is a multi-faceted dia-
mond. Besides the \sequential" angle just seen in Theorem 349, a summation angle will be
studied in Section 9.3.4, a functional angle { with a compelling economic interpretation in
terms of compounding { will be presented in Section 17.5, and a di erential angle in Section
26.7.

From the fundamental limit (8.51), we can deduce many other important limits.

(i) If jxn j ! +1 (for example, xn ! +1 or xn ! 1), we have


xn
k
lim 1 + = ek
xn
For k = 1 the proof just requires to consider the integer part of xn . For k = 0, it is
immediate. For any k 6= 0, it is su cient to set yn = xn =k, so that
xn kyn yn k
k 1 1
1+ = 1+ = 1+ ! ek
xn yn yn

(ii) If an ! 0 and an > 0, then


1
lim (1 + an ) an = e
It is su cient to set xn = 1=an to nd again the previous case (i).
(iii) If an ! 0 and an > 0, then
log (1 + an )
lim =1
an
It is su cient to take the logarithm in the previous limit. More generally,
logb (1 + an )
lim = logb e 80 < b 6= 1
an

(iv) If c > 0, yn ! 0, and yn > 0, then


cyn 1
lim = log c (8.53)
yn
It is su cient to set cyn 1 = an (so that also an ! 0) to see that
cyn 1 an
=
yn logc (1 + an )
So, we are back to the (reciprocal of the) previous case in which the limit is 1= logc e =
loge c = log c.
31
An irrational
p number is called algebraic if it is a root of some polynomial equation with integer coe cients:
for example, 2 is algebraic because it is a root of the equation x2 2 = 0. Irrational numbers that are not
algebraic are called transcendental.
242 CHAPTER 8. SEQUENCES (SDOGONATO)

(vi) If 2 R and zn ! 0, with zn > 0, then

(1 + zn ) 1
lim =
zn
The result is obvious for = 1. Let 6= 1, and set an = (1 + zn ) 1. That is,
log (1 + an ) = log (1 + zn ), so that also an ! 0. We have, therefore,
log (1 + an ) log (1 + zn ) log (1 + zn ) zn
= =
an (1 + zn ) 1 zn (1 + zn ) 1
Since
log (1 + an ) log (1 + zn )
lim = lim =1
an zn
the result then follows.

Let us apply to some simple limits what we just learned. We have:


n n
n+5 5
= 1+ ! e5
n n
as well as !
3 3
2 1 1 + 1=n2 1
n 1+ 2 1 = !3
n 1=n2
and
1 log (1 + 1=n)
n log 1 + = !1
n 1=n
and
21=n 1
n 21=n 1 = ! log 2
1=n

8.14 Orders of convergence and of divergence


8.14.1 Generalities
Some sequences converge to their limit \faster" than others. For instance, consider two
sequences fxn g and fyn g, both diverging to +1. For example, yn = n and xn = n2 .
Intuitively, the sequence fxn g diverges faster than fyn g. If we compare them through their
ratio
yn
xn
we have
yn 1
lim = lim = 0
xn n
Even though the numerator also tends to +1, the denominator has driven the ratio to its
end, forcing it to zero. Hence, the higher rate of divergence { i.e., of convergence to +1 {
of the sequence fxn g reveals itself in the convergence to zero of the ratio yn =xn . The ratio
seems, therefore, to be a natural test for the relative speed of convergence/divergence of the
two sequences.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 243

The next de nition formalizes this intuition, important both conceptually and computa-
tionally.

De nition 351 Let fxn g and fyn g be two sequences, with the terms of the former eventually
di erent from zero.

(i) If
yn
!0
xn
we say that fyn g is negligible with respect to fxn g, and write

yn = o (xn )

(ii) If
yn
! k 6= 0 (8.54)
xn
we say that fyn g is of the same order (or comparable) with fxn g, and write

yn xn

(iii) In particular when k = 1, i.e., when


yn
!1
xn
we say that fyn g and fxn g are asymptotic, and write

yn xn

This classi cation is comparative. For example, if fyn g is negligible with respect to fxn g,
it does not mean that fyn g is negligible per se, but that it becomes so when compared to
fxn g. The sequence yn = n2 is negligible with respect to xn = n5 , but it is not negligible at
all per se (it tends to in nity!).
Observe that, thanks to Proposition 313, we have
yn xn
! 1 () ! 0 () xn = o (yn )
xn yn
Therefore, we can use the previous classi cation also when the ratio yn =xn diverges, no
separate analysis is needed.

Terminology The expression yn = o (xn ) reads \fyn g is little-o of fxn g".

We collect a few simple properties of these notions.

Lemma 352 Let fxn g and fyn g be two sequences with terms eventually di erent from zero.

(i) The relation of comparability (in particular, ) is both symmetric, i.e., yn xn if


and only if xn yn , and transitive, i.e., zn yn and yn xn imply zn xn .32
32
Comparability is, indeed, an equivalence relation (cf. Appendix A).
244 CHAPTER 8. SEQUENCES (SDOGONATO)

(ii) The relation of negligibility is transitive, i.e., zn = o (yn ) and yn = o (xn ) implies
zn = o (xn ).

Proof The symmetry of follows from


yn xn 1
! k 6= 0 () ! 6= 0
xn yn k
We leave to the reader the easy proof of the other properties.

Finally, observe that


1 1
yn xn ()
yn xn
and, in particular,
1 1
yn xn () (8.55)
yn xn
provided that fxn g and fyn g are eventually di erent from zero. In other words, comparability
and negligibility are preserved when one moves to the reciprocals.

We now consider the more interesting cases in which both sequences are either in-
nitesimal or divergent. We start with two in nitesimal sequences fxn g and fyn g, that
is, lim xn = lim yn = 0. In this case, the negligible sequence tends faster to zero. Consider,
for example, xn = 1=n and yn = 1=n2 . Intuitively, yn goes to zero faster than xn . Indeed,
1
n2 1
1 = !0
n
n

that is yn = o (xn ). On the other hand, we have


p r
n 1
p = 1 !1
n+1 n+1
p p
and so the in nitesimal sequences xn = 1= n and yn = 1= n + 1 are comparable.

Suppose now that the sequences fxn g and fyn g are both divergent, positively or nega-
tively, that is, limn!1 xn = 1 and limn!1 yn = 1. In this case, negligible sequences
tend slower to in nity (independently on the sign), that is, they take on values greater and
greater, in absolute value, less rapidly. For example, let xn = n2 and yn = n. Intuitively, yn
goes to in nity more slowly than xn . Indeed,
yn n 1
= 2 = !0
xn n n
that is, yn = o (xn ). On the other hand, the same is true if xn = n2 and yn = n because
it is not the sign of the in nity that matters, but the rate of divergence.

The meaning of negligibility must, therefore, be quali ed depending on whether we con-


sider convergence to zero or to in nity (i.e., divergence). It is important to distinguish
carefully the two cases.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 245

N.B. Setting xn = n and yn = n + k, with k > 0, the sequences fxn g and fyn g are
asymptotic. Indeed, no matter how large k is, the divergence to +1 of the two sequences
will make negligible, from the asymptotic point of view, the role of k. Such a fundamental
viewpoint, central to the theory of sequences, should not make us forget that two asymptotic
sequences are, in general, very di erent (to x ideas, set for example k = 1010 , i.e., 10 billions,
and consider the asymptotic, yet very di erent, sequences xn = n and yn = n + 1010 ). O

8.14.2 Little-o algebra


The application of the concept of \little-o" is not always straightforward. Indeed, knowing
that a sequence fyn g is little-o of another sequence fxn g does not convey too much infor-
mation on the form of fyn g, apart from being negligible with respect to fxn g. There exists,
however, an \algebra" of little-o that allows for manipulating safely the little-o of sums and
products of sequences.

Proposition 353 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:

(i) o(xn ) + o (xn ) = o (xn );

(ii) o(xn )o(yn ) = o(xn yn );

(iii) c o(xn ) = o(xn );

(iv) o(yn ) + o (xn ) = o (xn ) if yn = o(xn ).

The relation o(xn ) + o (xn ) = o (xn ) in (i), bizarre at rst sight, simply means that the
sum of two little-o of a sequence is still a little-o of such sequence, that is, it continues to be
negligible with respect to that sequence. Similar re-readings hold for the other properties in
the proposition. Note that (ii) has the remarkable special case

o(xn )o(xn ) = o(x2n )

Proof If fyn g is little-o of fxn g it can be written as xn "n , where "n is an in nitesimal
sequence. Indeed, just set "n = yn =xn . The proof will be based on this useful remark.
(i) Let us call xn "n the rst of the two little-o on the left-hand side of the equality and
xn n the second one, with "n and n two in nitesimal sequences. Then

xn "n + xn n
lim = lim ("n + n) =0
xn

which shows that o(xn ) + o (xn ) is o (xn ).


(ii) Let us call xn "n the little-o of xn and yn n the little-o of yn , with "n and n two
in nitesimal sequences. Then

xn "n yn n
lim = lim ("n n) =0
xn yn

so that o(xn )o (yn ) is o (xn yn ).


246 CHAPTER 8. SEQUENCES (SDOGONATO)

(iii) Let us call xn "n the little-o of xn , with "n in nitesimal sequence. Then
c xn "n
lim = c lim "n = 0
xn
that shows that c o(xn ) is o (xn ).
(iv) Let us call yn = xn "n , with "n an in nitesimal sequence. Then, the little-o of yn
can be written as yn n that is, xn "n n , with n an in nitesimal sequence. Moreover, we call
xn n the little-o of xn , with n an in nitesimal sequence. Then
xn "n n+ xn n
lim = lim ("n n + n) =0
xn
so that o(yn ) + o (xn ) = o (xn ).

Example 354 Consider the sequence xn = n2 , as well as the sequences yn = n and zn =


2(log n n). It is immediate to see that yn = o(xn ) = o(n2 ) and zn = o(xn ) = o(n2 ).

(i) Adding up the two sequences we obtain yn + zn = 2 log n n, which is still o(n2 ) in
accordance with (i) proved above.

(ii) Multiplying the two sequences we obtain yn zn = 2n log n 2n2 , which is o(n2 n2 ) ,
i.e., o(n4 ), in accordance with (ii) proved above (in the special case o(xn )o(xn )). Note
that yn zn is not o(n2 ).

(iii) Take c = 3 and consider c yn = 3n. It is immediate that 3n is still o(n2 ), in accordance
with (iii) proved above.
p
(iv) Consider the sequence wn = n 1. It is immediate that wn = o(yn ) = o(n). Consider
now the sum wn + zn (with zn de ned above), which is the sum of a o(yn ) and a o(xn ),
p
with yn = o(xn ). We have wn + zn = n 1 + 2 log n 2n, which is o(xn ) = o(n2 ) in
accordance with (iv) proved above. Note that wn + zn is not o(yn ), even if wn is o(yn ).
N

N.B. (i) To say that a sequence is o (1) simply means that it tends to 0. Indeed, xn = o (1)
means that xn =1 = xn ! 0. Note that, by the de nition of little-o, we have

o (xn ) = xn o (1) (8.56)

a simple property that becomes handy in some cases. (ii) The fourth property in the last
proposition is especially important because it highlights that, if yn is negligible with respect
to xn , in the sum o(yn ) + o (xn ) the little-o o(yn ) is subsumed in o (xn ). O

8.14.3 Asymptotic equivalence


The relation identi es sequences that are asymptotically equivalent to one another. Indeed,
it is easy to see that yn xn implies that, for L 2 R,

yn ! L () xn ! L (8.57)

In detail:
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 247

(i) if L 2 R, we have yn ! L if and only if xn ! L;

(ii) if L = +1, we have yn ! +1 if and only if xn ! +1;

(iii) if L = 1, we have yn ! 1 if and only if xn ! 1;

All this suggests that it is possible to replace xn by yn (or vice versa) in the calculation
of the limits. Intuitively, such possibility is attractive because it might allow to replace a
complicate sequence by a simpler one that is asymptotic to it.
To make this intuition precise we start by observing that the asymptotic equivalence
is preserved under the fundamental operations.

Lemma 355 Let yn xn and zn wn . Then,

(i) yn + zn xn + wn provided there exists k > 0 such that, eventually,33

xn
k
xn + wn

(ii) yn zn xn wn ;

(iii) yn =zn xn =wn provided that eventually zn 6= 0 and wn 6= 0.

Note that for sums, di erently from the case of products and ratios, the result does not
hold in general, but only with a non-trivial ad hoc hypothesis. For this reason, points (ii)
and (iii) are the most interesting ones. In the sequel we will thus focus on the asymptotic
equivalence of products and ratios, leaving to the reader the study of sums.
Proof (i) We have
y n + zn yn zn yn xn zn wn
= + = +
xn + wn xn + wn xn + wn xn xn + wn wn xn + wn
yn xn zn xn yn zn xn zn
= + 1 = +
xn xn + wn wn xn + wn xn wn xn + wn wn

Since yn =xn ! 1 and zn =wn ! 1, we have


yn zn
!0
xn wn
hence
yn zn xn yn zn xn yn zn
0 = k!0
xn wn xn + wn xn wn xn + wn xn wn

By the comparison criterion,

yn zn xn
!0
xn wn xn + wn
33
For example, the condition holds if fxn g and fwn g are both eventually positive (in this case, any k 1
works).
248 CHAPTER 8. SEQUENCES (SDOGONATO)

and hence, since zn =wn ! 1, we have


y n + zn
!1
xn + wn
as desired.
(ii) and (iii) We have
y n zn y n zn
= !1
xn wn xn wn
and yn
zn yn wn yn wn
xn = = !1
wn zn xn xn zn
since yn =xn ! 1 and zn =wn ! 1.

The next simple lemma is very useful: in the calculation of a limit, one should neglect
what is negligible.

Lemma 356 We have


xn xn + o (xn )

Proof It is su cient to observe that


xn + o (xn ) o (xn )
=1+ !1
xn xn

By (8.57), we therefore have

xn + o (xn ) ! L () xn ! L

What is negligible with respect to the sequence fxn g { i.e., what is o (xn ) { is asymptotically
irrelevant and one can safely ignore it. Together with Lemma 355, this implies for products
and ratios, that
(xn + o (xn )) (yn + o (xn )) xn yn (8.58)
and
xn + o (xn ) xn
(8.59)
yn + o (xn ) yn
We illustrate these very useful asymptotic equivalences with some examples, which should
be read with particular attention.

Example 357 (i) Consider the limit

n4 3n3 + 5n2 7
lim
2n5 + 12n4 6n3 + 4n + 1
By (8.59), we have

n4 3n3 + 5n2 7 n4 + o n4 n4 1
= = !0
2n5 + 12n4 6n3 + 4n + 1 2n5 + o (n5 ) 2n 5 2n
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 249

(ii) Consider the limit


1 3
lim n2 7n + 3 2 +
n n2
By (8.58),34 we have

1 3
n2 7n + 3 2 + = n2 + o n2 (2 + o (1)) 2n2 ! +1
n n2

(iii) Consider the limit


n (n + 1) (n + 2) (n + 3)
lim
(n 1) (n 2) (n 3) (n 4)
By (8.59), we have

n (n + 1) (n + 2) (n + 3) n4 + o n4 n4
= 4 =1!1
(n 1) (n 2) (n 3) (n 4) n + o (n4 ) n4
(iv) Consider the limit
n 1
lim e 7+
n
By (8.58), we have
n 1 n n
e 7+ =e (7 + o (1)) 7e !0
n
N

By (8.55), we have
yn xn zn wn
() (8.60)
zn wn yn xn
provided that the ratios are (eventually) well-de ned and not zero. Therefore, once we have
established the asymptoticity of the ratios yn =zn and xn =wn , we \automatically" have also
the asymptoticity of their reciprocals zn =yn and wn =xn .

Example 358 Consider the limit

e5n n7 4n2 + 3n
lim
6n + n 8 n4 + 5n3
By (8.59),
n
e5n n7 4n2 + 3n e5n + o e5n e5n e5
= = ! +1
6n + n8 n4 + 5n3 6n + o (6n ) 6n 6
If, instead, we consider the reciprocal limit

6n + n 8 n4 + 5n3
lim
e5n n7 4n2 + 3n
34
For 0 6= k 2 R, we have k + o(1) k. Indeed,
k + o(1) 1
= 1 + o(1) ! 1
k k
250 CHAPTER 8. SEQUENCES (SDOGONATO)

then, by (8.60),
n
6n + n8 n4 + 5n3 6
!0
e5n n7 4n2 + 3n e5
N
In conclusion, a clever use of (8.58)-(8.59) often allows to simplify substantially the
calculation of limits. But, beyond calculations, they are illuminating relations conceptually.

8.14.4 Characterization and decay


The next result establishes an enlightening characterization of asymptotic equivalence.
Proposition 359 We have
xn yn () xn = yn + o (yn )
In words, two sequences are asymptotic when they are equal, up to a component that is
asymptotically negligible with respect to them. This result further clari es how the relation
can be seen as an asymptotic equality. Note that, in view of (8.56) we can equivalently
write
xn yn () xn = yn [1 + o (1)]
Proof \If." From xn = yn + o (yn ) it follows that
xn yn + o (yn ) o (yn )
= =1+ !1
yn yn yn
\Only if." Let xn yn . Denoting zn = xn yn , one has that
zn xn yn xn
= = 1!0
yn yn yn
and therefore zn = o (yn ).

The next result is a nice application of this characterization.


Proposition 360 Let fxn g be a sequence with terms eventually non-zero. Then
1
log jxn j ! k 6= 0 (8.61)
n
if and only if jxn j = ekn+o(n) .
Proof \If." From jxn j = ekn+o(n) it follows that
1 1 kn + o (n)
log jxn j = log ekn+o(n) = !k
n n n
\Only if." Set zn = log jxn j. Since k 6= 0, from (8.61) it follows that zn =kn ! 1, i.e.,
zn kn. From the previous proposition and Proposition 353-(iii) it follows that
jxn j = ezn = ekn+o(kn) = ekn+o(n)
as claimed.

When k < 0, the condition (8.61) characterizes the sequences that converge to zero at
exponential rate. In that case, we speak of exponential decay. When k > 0, there is instead
an explosive exponential behavior.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 251

8.14.5 Terminology
Due to its importance, for the comparison both of in nitesimal sequences and of divergent
sequences there is a speci c terminology. In particular,

(i) if two in nitesimal sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is in nitesimal of higher order with respect to fxn g;

(ii) if two divergent sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is of lower order of in nity with respect to fxn g.

In other words, a sequence is in nitesimal of higher order if it tends to zero faster, while
it is of lower order of in nity if it tends to in nity slower. Besides the terminology (which is
not universal), it is important to recall the idea of negligibility that lies at the basis of the
relation yn = o (xn ).

8.14.6 Scales of in nities


Through the orders of convergence we can compare exponential sequences f n g, power se-
quences nk , and logarithmic sequences logk n , thus making precise the hierarchy (8.45)
that we established with the ratio criterion.
First of all, observe that they are of in nite order when > 1 and k > 0 and in nitesimal
when 0 < < 1 and k < 0. Moreover, we have:

(i) If > , then n


= o( n ). Indeed, n
= n = ( = )n ! 0.

(ii) nk = o ( n ) for every > 1, as already proved with the ratio criterion. We have
n = o nk if, instead, 0 < < 1 and k < 0.

(iii) If k1 > k2 , then nk2 = o nk1 . Indeed, nk2 =nk1 = 1=nk1 k2 ! 0.

(iv) logk n = o (n), as already proved with the ratio criterion.

(v) If k1 > k2 , then logk2 n = o logk1 n . Indeed,

logk2 n 1
k1
= k1 k2
!0
log n log n

The next lemma reports two important comparisons of in nities that show that expo-
nentials are of lower order of in nity than factorials (we omit the proof).

Lemma 361 We have n = o (n!), with > 0, and n! = o (nn ).

Note that this implies, by Lemma 352, that n = o (nn ). Exponentials are, therefore, of
lower order of in nity also compared with sequences of the type nn .

The di erent orders of in nity and in nitesimal are sometimes organized through scales.
If we limit ourselves to the in nities (similar considerations hold for the in nitesimals), the
252 CHAPTER 8. SEQUENCES (SDOGONATO)

most classic scale of in nities is the logarithmic-exponential one. Taking xn = n as the basis,
we have the ascending scale
2 k n
n; n2 ; :::; nk ; :::; en ; e2n ; :::; ekn ; :::; en ; :::; en ; :::; ee ; :::

and the descending scale


1 1 p p
k
p p
k
n; n 2 ; :::; n k ; :::; log n; log n; :::; log n; :::; log log n; log log n; :::; log log n; :::

They provide \benchmarks" to caliber the asymptotic behavior of a sequence fxn g that tends
to in nity. For example, if xn log n, the sequence fxn g is asymptotically logarithmic; if
2
xn n , the sequence fxn g is asymptotically quadratic, and so on.
n
In applications one rarely considers orders of in nity higher than ee and lower than
log log n. Indeed, log log n has an almost imperceptible increase, it is almost constant:

n 10 102 103 104 105 106


(8.62)
log log n 0:834 03 1:527 2 1:932 6 2:220 3 2:443 5 2:625 8
n
while ee increases explosively:

n 3 4 5 6
n
ee 5:284 9 108 5:148 4 1023 2:851 1 1064 1:610 3 10175

The asymptotic behavior of divergent sequences that are relevant in applications usually
n
ranges between the slowness of log log n and the explosiveness of ee . But, from a theoretical
point of view, we can go well beyond them.35 The study of the scales of in nity, pioneered
by Paul Du Bois-Reymond in the 1870s, is of great elegance (see, Hardy, 1910).

8.14.7 The De Moivre-Stirling formula


To better illustrate how little-o analysis works, we will present the De Moivre-Stirling for-
mula. Besides being a quite surprising formula, it is also used in many theoretical and applied
problems in dealing with the asymptotic behavior of n!.

Theorem 362 We have

log n! = n log n n + o (n)


1 p
= n log n n + log n + log 2 + o (1)
2
Two approximations of log n! are thus established. The rst one, which De Moivre came
up with, is slightly less precise because it has an error term of order o (n). The second
approximation was given by Stirling and is more accurate { its error term is o (1) { but also
more complex.36
35
Although for brevity we omit the details, Lemma 361 shows that the logarithmic-exponential scale can
be remarkably re ned with orders of in nity of the type n! and nn .
36
Since o (1) =n ! 0, a sequence which is o (1) is also o (n). For this reason, an error term of order o (1) is
better than one of order o (n).
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 253

Proof We will only show the rst equality. Set xn = n!=nn . We next show that
xn+1 1
lim =
xn e
By Theorem 349, note that
xn+1 (n + 1)! nn n + 1 nn 1 1
lim = lim n+1 n! = lim n = lim n =
xn (n + 1) n + 1 (n + 1) 1 + n1 e

From (10.18), we also have p


n
p n! 1
lim n
xn = lim =
n e
p
We thus conclude that n= n n! = e (1 + o (1)), that is,
n
n! = nn e n
(1 + o (1))

Hence, log n! = n log n n n log (1 + o (1)). Since log (1 + an ) an as an ! 0, we have

n log (1 + o (1)) n o (1) = o (n)

as desired.
p
We conclude that n! = nn e n 2 neo(1) , and so
n!
p = eo(1) ! 1
nn e n 2 n
We thus obtain the following remarkable formula
p
n! nn e n 2 n

that allow us to elegantly conclude our asymptotic analysis of factorials.

8.14.8 Distribution of prime numbers


The little-o notation was born and rst used at the end of the nineteenth century in the
study of the distribution of prime numbers. We introduced prime numbers in Section 1.3
where we showed their \atomic" centrality among the other natural numbers by means of the
Fundamental Theorem of Arithmetic. The existence of in nitely many prime numbers was
also proven thanks to a well-known theorem by Euclid, so that we can speak of the sequence
of prime numbers fpn g. Nevertheless, in Section 8.1 we noted that it is unfortunately not
possible to explicitly describe such a sequence. This issue brought mathematicians to wonder
about the distribution of prime numbers in N. Let : N+ ! R be the sequence whose n-th
term (n) is the number of prime numbers that are less than or equal than n. For example

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(n) 0 1 2 2 3 3 4 4 4 4 5 5 6 6 6

It is, of course, not possible to fully describe the sequence as this would be equivalent to
describing the sequence of prime numbers, which we have argued to be hopeless (at least,
254 CHAPTER 8. SEQUENCES (SDOGONATO)

operationally). Nevertheless, we can still ask ourselves whether there is a sequence fxn g that
is described in closed form and is asymptotically equal to . In other words, the question is
whether we can nd a reasonably simple sequence that asymptotically approximates well
enough.
Around the year 1800, Gauss and Legendre noted independently that the sequence
fn= log ng well approximates , as we can check by inspection of the following table.
n (n)
n (n) log n n= log n

10 4 4; 3 0; 921
102 25 21; 7 1; 151
103 168 145 1; 161
104 1:229 1:086 1; 132
105 9:592 8:686 1; 104
1010 455:052:511 434:294:482 1; 048
1015 29:844:570:422:669 28:952:965:460:217 1; 031
1020 2:220:819:602:560:918:840 2:171:472:409:516:250:000 1; 023
This suggests that the ratio
(n)
n
log n
becomes closer and closer to 1 as n increases. Gauss and Legendre's conjectured that this
was so because is asymptotically equal to fn= log ng. Their conjecture remained open
for about a century, until it was, independently, proven to be true in 1896 by two great
mathematicians, Jacques Hadamard and Charles de la Vallee Poussin. The importance of
such a result is testi ed by its name, which is as simple as it is demanding.37
Theorem 363 (Prime Number Theorem) It holds that
n
(n)
log n
Although we are not able to describe the sequence , thanks to the Prime Number
Theorem we can say that its asymptotic behavior is similar to that of the simple sequence
fn= log ng, that is, their number (n) (m) in any given interval of natural numbers [m; n]
is approximately
n m
log n log m
with increasing accuracy. In particular, by Lemma 355-(iii) we have
(n) 1
!0
n log n
37
The proof of this theorem requires complex analysis methods which we do not cover in this book. The
use of complex analysis in the study of prime numbers is due to a Bernhard Riemann's deep insight. Only in
1949 two outstanding mathematicians, Paul Erd• os and Atle Selberg, were able to prove this results using real
analysis methods. We refer readers to Ivic (1985) for a comprehensive study of this topics, with all relevant
references.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 255

Most natural numbers are thus not prime and, as n gets larger, prime numbers become less
frequent.
The Prime Number Theorem is a wonderful result, which undoubtedly has a statisti-
cal \ avor", is incredibly elegant. Even more so if we consider its following remarkable
consequence, which relies on the simple observation that (pn ) = n.

Theorem 364 It holds that


pn n log n (8.63)

The sequence of prime numbers fpn g is thus asymptotically equivalent to fn log ng. The
n-th prime number's value is, approximately, n log n. For example, by inspecting the prime
number table one can see that for n = 100 one has that pn = 541 while its \estimate" is
n log n = 460 (rounding down). Similarly:
pn
n pn n log n n log n

100 541 460 1; 176 1


1:000 7:919 6:907 1; 146 5
10:000 104:729 92:104 1; 137 1
100:000 1:299:709 1:151:292 1; 128 9
10:00:000 154:85:863 13:815:510 1; 120 9
10:000:000 179:424:673 161:180:956 1; 113 2
100:000:000 2:038:074:743 1:842:068:074 1; 106 4
1:000:000:000 22:801:763:489 20:723:265:836 1; 100 3

One can see that the ratio between pn and its estimate n log n stays steadily around 1.

Proof From the Prime Number Theorem one has that


log n
(n) !1
n
Hence, for any " > 0, there is an n" such that

log n
(n) 1 <" 8n n" (8.64)
n

Since pn ! 1, there is an n" such that pn n" for every n n" . Hence, (8.64) implies that

log pn
(pn ) 1 <" 8n n"
pn

At the same time, one has that (pn ) = n, so that

log pn
n 1 <" 8n n"
pn
256 CHAPTER 8. SEQUENCES (SDOGONATO)

that is,
log pn
n !1 (8.65)
pn
from which it follows that
log pn
log n ! log 1 = 0
pn
or, log n + log log pn log pn ! 0. Since log pn ! +1,
log n log log pn log n + log log pn log pn
+ 1= !0
log pn log pn log pn
Yet, log log pn = log pn ! 0 (can you explain why?), and so
log n
!1
log pn
Multiplying by (8.65), we get that
n log n log pn log n
=n !1
pn pn log pn
thus showing that (8.63) holds.

Observe that we obtained pn n log n as a consequence of the following nice relation

log pn log n (8.66)

O.R. Counting objects is one of the most basic activities common across cultures, arguably
the most universal one: counting emerges as soon as similar, yet distinguished, entities come
up (cf. Section 7.4). If so, the identi cation of prime numbers { the atoms of numbers { can
be viewed as an important step in the evolution of a civilization. Indeed, their study emerged
in the Greek world, which also marked the emergence of reason (Section 1.8). The depth
with which a civilization studies prime numbers is, then, a possible universal benchmark to
assess its degree of evolution. Under this scale, the Prime Number Theorem is one of best
evidence of its evolution that mankind can o er when going where no one has gone before
(unless sure of their intentions, better not to meet civilizations that have found the closed
form of the sequence of primes). H

8.15 Convergence rate


In this section we study the rate of convergence, a key topic in applications. To this end, we
rst introduce a further important asymptotic relation.

8.15.1 Big-O
De nition 365 Given two sequences fxn g and fyn g, if there exists c > 0 such that, even-
tually,
jyn j c jxn j
we say that fyn g is asymptotically dominated by fxn g. In symbols, yn = O (xn ).
8.15. CONVERGENCE RATE 257

Note that the value of the constant c is left unspeci ed: it only matters that such constant
exists.

Example 366 (i) We have n = O n2 and 1=n2 = O (1=n). Indeed,


1
n 1 n2 1
2
= 1 and 1 = 1
n n n
n

for all n 1. (ii) We have 3n + ( 1)n n2 = O n2 . Indeed,


3n + ( 1)n n2 3n + n2 3
= +1!1
n2 n 2 n
So, if we take any c > 1 we have eventually 3n + ( 1)n n2 cn2 . (iii) We have yn = O (1)
when there exists c > 0 such that, eventually, jyn j c. So, a sequence is O (1) if and only if
is bounded. N

Terminology The expression yn = O (xn ) reads \fyn g is big-O of fxn g".

The relation O is easily seen to be transitive, i.e., zn = O (yn ) and yn = O (xn ) implies
zn = O (xn ). It can be manipulated with an algebra similar to the little-o one.

Proposition 367 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:

(i) O(xn ) + O (xn ) = O (xn );


(ii) O(xn )O(yn ) = O(xn yn );
(iii) c O(xn ) = O(xn );
(iv) O(yn ) + O (xn ) = O (xn ) if yn = O(xn ).

Proof (i) Let yn = O(xn ) and yn0 = O(xn ). By de nition, there exist c; c0 > 0 such that
eventually jyn j c jxn j and jyn0 j c0 jxn j. Then
yn + yn0 jyn j + yn0 c jxn j + c0 jxn j = c + c0 jxn j
So, yn + yn0 = O(xn ), as desired.
(ii) Let zn = O(xn ) and zn0 = O(yn ). By de nition, there exist c; c0 > 0 such that
eventually jzn j c jxn j and jzn0 j c0 jyn j. Then
zn zn0 kk 0 jxn j jyn j kk 0 jxn yn j
So, zn zn0 = O(xn yn ), as desired.
(iii) is obvious. (iv) Let zn = O(xn ) and zn0 = O(yn ), with yn = O(xn ). By de nition,
there exist c; c0 > 0 such that eventually jzn j c jxn j and jzn0 j c0 jyn j. So, being yn = O(xn ),
there exists c00 > 0 such that eventually
zn + zn0 jzn j + zn0 = c jxn j + c0 jyn j c jxn j + c0 c00 jxn j = c + c0 c00 jxn j
as desired.

The next lemma relates the relations , o and O.


258 CHAPTER 8. SEQUENCES (SDOGONATO)

Proposition 368 (i) yn xn implies yn = O (xn ).

(ii) yn = o (xn ) implies yn = O (xn ).

Conditions yn xn and yn = o (xn ) are thus su cient for yn = O (xn ) but not necessary.
A trivial example is yn = ( 1)n and xn = 1: the limit yn =xn does not exist, so neither
yn xn nor yn = o (xn ), but we still have yn = O (xn ).

Proof (i) If yn xn then jyn j = jxn j = jyn =xn j ! 1, so if we take any c > 1 we have
eventually jyn j c jxn j. (ii) If yn = o (xn ) then jyn j = jxn j = jyn =xn j ! 0, so, if we take any
c > 0 we have eventually jyn j c jxn j.

Next we give another useful result.

Proposition 369 We have both yn = O (xn ) and xn = O (yn ) if and only if there exists
c > 0 such that, eventually,
1
jxn j jyn j c jxn j (8.67)
c

Proof If yn = O (xn ) there exists c0 > 0 such that, eventually, jyn j c0 jxn j, while if xn =
O (yn ) there exists c00 > 0 such that, eventually, jxn j c00 jyn j. By setting c = max fc0 ; c00 g,
we get (8.67).

Jointly, the relations yn = O (xn ) and xn = O (yn ) thus generalize the relation of
comparability between sequences, which requires the convergence of the sequence of ratios
yn =xn . When such a convergence occurs, by the last result yn xn is equivalent to have both
yn = O (xn ) and xn = O (yn ). But, lacking such a convergence, the relations yn = O (xn ) and
xn = O (yn ) still amount to (8.67), which is a form of comparability between two sequences,
though rougher than yn xn .

De nition 370 Given two sequences fxn g and fyn g, if yn = O (xn ) and xn = O (yn ), we
say that fyn g is of the same order (or comparable) with fxn g. In symbols yn xn .

This de nition of comparability is fully consistent with the previous De nition 351, the
scope of which is enlarged because the convergence of the ratios yn =xn is no longer required.
This is why we kept the notation .

Example 371 (i) The sequences with terms xn = 1 and yn = ( 1)n are comparable: take
c = 1 in (8.67). (ii) The sequences with terms xn = n (2 + sin n) and yn = n are comparable:
take c = 3 in (8.67). N

If, with an analogy, we think of as an equality = and of little o as a strict inequality


>, big-O is then a kind of weak inequality.
8.15. CONVERGENCE RATE 259

8.15.2 Convergence rate


The fact that a sequence fxn g converges to a limit x is (often) good news, yet in applications
it is often important to have an evaluation of the rate at which such converge occurs. For
instance, assume that x is a sought-after unknown solution of an equation that we need to
solve { e.g., a rst-order condition of some optimization problem { and that the sequence
fxn g is the outcome of a solution procedure that, by converging to x, would allow us to
nd the solution. The rate of convergence tells us how fast the sequence gets close to the
desired solution, so how good is the underlying solution procedures. Indeed, better solution
procedures generate sequences that converge faster to a solution.
Consider the sequence fen g of errors

en = x xn

By Proposition 306, the sequence fxn g convergence to x if and only if errors vanish, i.e.,
en ! 0. To wonder about the rate of the converge of xn to x thus amounts to wonder about
the rate of the converge to 0 of errors. This observation motivates the following de nition.

De nition 372 A sequence fxn g converges to x with rate of order k if

1
en = O
nk

Condition en = O 1=nk amounts to require jxn xj = O 1=nk . Clearly, any further


piece of information on errors is welcomed, for instance if it permits to say that en = o 1=nk
or, even better, that en 1=nk . Yet, often to establish that en = O 1=nk is the best one
can do. Note that two convergent sequences have the same convergence rate if and only if
they are comparable (in the sense of De nition 370).

Example 373 The two sequences

5 7 7 20
xn = + ( 1)n + 2 and x0n = + 2
+ 10
n n n n

converge to with rates of order 1 and 2, respectively. Indeed, en = O (1=n) and e0n =
O 1=n2 . N

The higher k is, the faster the error vanishes, so the better the convergence is. For in-
stance, a O 1=n2 error is substantially better than a O (1=n) error. So, a solution procedure
that generates a sequence that converges to the solution with rate of order 2 is substantially
better than an alternative solution procedure that converges to the solution with rate of
order 1.
Section 37.7 will best illustrate the contents of this section. We close by noting that,
though powers 1=nk form a natural scale of in nitesimals (which are duals of the scales of
in nities studied before), one could also use other scales, such as logarithmic and exponential
ones.
260 CHAPTER 8. SEQUENCES (SDOGONATO)

8.16 Sequences in Rn
We close the chapter by considering sequences xk of vectors in Rn . For them we give a de -
nition of limit that follows closely the one already given for sequences in R. The fundamental
di erence is that each element of the sequence is now a vector xk = (xk1 ; xk2 ; :::; xkn ) 2 Rn and
not a scalar.

De nition 374 We say that the sequence xk in Rn tends to L 2 Rn , in symbols xk ! L


or lim xk = L, if for every " > 0 there exists n" 1 such that

k n" =) kxk Lk < "

In other words, xk = (xk1 ; xk2 ; :::; xkn ) ! L = (L1 ; L2 ; :::; Ln ) if the scalar sequence of
distances xk L converges to zero (cf. Proposition 306). Since
r
Xn 2
k
x L = xki Li
i=1

we see immediately that

xk L ! 0 () xki Li ! 0 8i = 1; 2; : : : ; n (8.68)

That is, xk ! L if and only if the scalar sequences xki of the i-th components converge
to the component Li of the vector L. The convergence of a sequence of vectors, therefore,
reduces to the convergence of the sequences of the single components. So, it is a componen-
twise notion of convergence that, as such, does not present any signi cant novelty relative
to the scalar case.

N.B. A sequence in Rn may be regarded as the restriction to N+ of a vector function


f : R ! Rn . O

Example 375 Consider the sequence

1 1 2k + 3
1 + ; 2;
k k 5k 7

in R3 . Since
1 1 2k + 3 2
1+ !1 , !0 and !
k k2 5k 7 5
the sequence converges to the vector (1; 0; 2=5). N

Notation Sequences of vectors are denoted by a superscript xk instead of a subscript


fxn g to avoid confusion with the dimension n of the space Rn and to be able to indicate the
single components xki of each vector xk of the sequence.
Chapter 9

Series (sdoganato)

9.1 The concept


The idea that we want to develop here is, roughly, about the possibility of summing in nitely
many addends. Imagine a stick 1 meter long and cut it in half, obtaining in this way two
pieces 1=2 meter long; then cut the second piece in half, obtaining two pieces 1=4 meter long;
cut again the second piece, obtaining two pieces 1=8 meter long, and continue, without never
stopping. This cutting process results in in nitely many pieces of length 1=2, 1=4, 1=8, ...
in which the original stick of 1 meter has been divided into. It is rather natural to imagine
that
1 1 1 1
+ + + + n+ =1 (9.1)
2 4 8 2
i.e., that { by reassembling the individual pieces { we get back the original stick.
In this chapter we will give a precise meaning to equalities like (9.1). Consider, therefore,
a sequence fxn g and suppose that we want to \sum" all its terms, i.e., to carry out the
operation
X1
x1 + x2 + + xn + = xn
n=1

To make rigorous this new operation of \addition of in nitely many summands", which is
di erent from the ordinary addition (as we will realize), we will sum a nite number of terms,
say n, then make n tend to in nity and take the resulting limit, if it exists, as the value to
assign to the series. We are, therefore, thinking of constructing a new sequence fsn g de ned
by

s1 = x1 (9.2)
s2 = x1 + x2
s3 = x1 + x2 + x3

sn = x1 + + xn

and to take the limit of fsn g as the sum of the series. Formally:

261
262 CHAPTER 9. SERIES (SDOGANATO)

P1 nition 376 The series with terms given by a sequence fxn g of scalars, in symbols
De
n=1 xn , is the sequence fsn g de ned in (9.2). The terms sn of the sequence are called
partial sums of the series.
P
The series 1 n=1 xn is therefore de ned as the sequence fsP n g of the partial sums (9.2).
Its limit behavior determines its value;1 in particular, a series 1 n=1 xn is:

P
1
(i) convergent, with sum S, in symbols xn = S, if lim sn = S 2 R;
n=1
P1
(ii) positively divergent, in symbols n=1 xn = +1, if lim sn = +1;
P1
(iii) negatively divergent, in symbols n=1 xn = 1, if lim sn = 1;

(iv) irregular (or oscillating) if the sequence fsn g is irregular.

We thus attribute to the series the same character { convergence, divergence, or irregu-
larity { as that of its sequence of partial sums.2

Partial sums can be de ned recursively by


(
s1 = x1
(9.3)
sn = sn 1 + xn for n 2

This formulation can be operationally useful to construct partial sums through a guess and
verify procedure: we rst posit a candidate expression for the partial sum, which we then
verify by induction. At the end of Example 379 we will illustrate this procedure. However,
as little birds suggesting guesses are often not around, the main interest of this recursive
formulation is, ultimately, theoretical in that it further clari es that a series is nothing but a
new sequence constructed from an existing one. Indeed, given a sequence fxn g, the recursion
(9.3) de nes the sequence of partial sums fsn g. It is this recursion that, thus, underlies the
notion of series.3

O.R. Sometimes it is useful to start the series with the index n = 0 rather than from n = 1.
When the option exists (we will see that this is not the case for some types of series, like the
harmonic series, which cannot be de ned for n = 0), the choice to start a series from either
n = 0 or n = 1 (or from another value of n) is a pure matter of convenience (as it was for
sequences). Actually, one can start the series from any k in N. The context itself typically
suggests the best choice. In any case, this choice does not alter the character of the series
and, therefore, it does not a ect the problem of determining whether the series converges or
not. H
1
We thus resorted to a limit, that is, to a notion of potential in nity. On the other hand, we cannot really
sum in nitely many summands: all the world paper would not su ce, nor would our entire life (and, by
the way, we would not know where to put the line that one traditionally writes under the summands before
adding them).
2
Using the terminology already employed for the sequences, a series is sometimes called regular when it
is not irregular, that is when one of the cases (i)-(iii) holds. The systematic study of series began with the
works of Abel, Bolzano, and Cauchy in the rst part of the nineteenth century.
3
As we will see later in the book, we can nicely express this recursion in the di erence form (10.4).
9.1. THE CONCEPT 263

O.R. The variables xn and sn in recursion (9.3) can be interpreted as \ ow" and \stock"
variables, respectively. If xn is the n-th amount of water that we decide to pour into our
bathtub and sn 1 is the amount of water that we already poured there in the rst n 1
pourings, then sn is the resulting amount of water in the bathtub after n pourings. If xn
is the number of kilometers that we decide to travel in the n-th day of our trip and sn 1 is
how many kilometers we travelled in the rst n 1 days of the trip, then sn is the resulting
kilometers travelled after n days. The values of the stock variables thus keep track of the
accumulation determined by the values of the ow variables.
Flows are typically controlled by some decision maker { an individual, a government, a
company, and the like. For this reason, the ow variable xn is often called a control variable.
The stock variable sn is, instead, called a state variable: it describes the \state" of ows'
accumulation. So, (9.3) can be viewed as a \controlled recursion", in which a sequence of
controls fxn g determines, via recursion (9.3), a sequence of states fsn g. H

9.1.1 Three classic series


We illustrate the previous notions with three important series (and a Epicurus piece).

Example 377 (Mengoli series) The Mengoli series is given by:


1
X
1 1 1 1
+ + + + =
1 2 2 3 n (n + 1) n (n + 1)
n=1

Since
1 1 1
=
n (n + 1) n n+1
one has that
1 1 1
sn = + + +
1 2 2 3 n (n + 1)
1 1 1 1 1 1 1 1
=1 + + + + =1 !1
2 2 3 3 4 n n+1 n+1
Therefore,
1
X 1
=1
n (n + 1)
n=1

So, the Mengoli series converges and has sum 1. N

Example 378 (Harmonic series) The harmonic series adds up unit fractions:
1
X
1 1 1 1
1+ + + + + =
2 3 n n
n=1

The recursion of its partial sums (9.3) is


(
s1 = 1
1
sn = sn 1 + n for n 2
264 CHAPTER 9. SERIES (SDOGANATO)

In particular, the partial sums sn that are powers of 2 { i.e., n = 2k { are:


1
s1 = 1; s2 = 1 +
2
1 1 1 1 1 1 1 1
s4 = 1 + + + > 1 + + + = s2 + = 1 + 2
2 3 4 2 4 4 2 2
1 1 1 1 1 1 1 1 1 1
s8 = s4 + + + + > s4 + + + + = s4 + > 1 + 3
5 6 7 8 8 8 8 8 2 2
By continuing in this way we see that
1
s2k > 1 + k (9.4)
2
The sequence of partial sums is strictly increasing (since the summands are all positive) and
so it admits limit; the inequality (9.4) guarantees that it is unbounded above and therefore
lim sn = +1. Hence,
X1
1
= +1
n
n=1

i.e., the harmonic series diverges positively.4 N

Example 379 (Geometric series) The geometric series with ratio q is de ned by:
1
X
2 3 n
1+q+q +q + +q + = qn
n=0

Its character depends on the value of q. In particular, we have that:


8
> +1 if q 1
>
>
X1 < 1
qn = if jqj < 1
>
> 1 q
n=0 >
:
irregular if q 1
To verify this, we start by observing that when q = 1 we have

sn = |1 + 1 +
{z + 1} = n + 1 ! +1
n+1 times

Let now q 6= 1. Since

sn qsn = 1 + q + q 2 + q 3 + + qn q 1 + q + q2 + q3 + + qn
= 1 + q + q2 + q3 + + qn q + q2 + q3 + + q n+1 = 1 q n+1

we have
(1 q) sn = 1 q n+1
and therefore, since q 6= 1,
1 q n+1
sn =
1 q
4
In Appendix E.2, we present another proof of the divergence of the harmonic series, due to Pietro Mengoli.
9.1. THE CONCEPT 265

It follows that
1
X 1 q n+1
q n = lim
n!1 1 q
n=0
The study of this limit is divided into several cases:

(i) if 1 < q < 1, we have q n+1 ! 0 and so


1
sn !
1 q

(ii) if q > 1, we have q n+1 ! +1 and so sn ! +1;

(iii) if q = 1, the partial sums of odd order are equal to zero, while those of even order
are equal to 1. The sequence formed by them is hence irregular;

(iv) if q < 1, the sequence q n+1 is irregular and, therefore, so is fsn g as well.

To conclude the study of the geometric series, observe that we can use the recursive
de nition of partial sums (9.3) to guess and verify (by induction) what are the partial sums
of the geometric series. The, highly inspired, guess is that

1 q n+1
sn =
1 q
We verify the guess by induction. For n = 0 it is trivially true. Assume it is true for n
(induction hypothesis). Then

1 q n+1 1 q n+1 + q n+1 q n+2 1 q (n+1)+1


sn+1 = sn + q n+1 = + q n+1 = =
1 q 1 q 1 q

as desired.5 N

Epicurus in a letter to Herodotus wrote \Once one says that there are in nite parts in
a body or parts of any degree of smallness, it is not possible to conceive how this should
be, and indeed how could the body any longer be limited in size?" The previous examples
show that, indeed, if these \parts", these particles, have a strictly positive, but di erent size
{ for example either 1=n (n + 1) or q n , with q 2 (0; 1) { then the series might converge, so
the size of the \body" can be de ned. Nevertheless, Epicurus was right in the sense that, if
we assume { as it seems he does too { that all the particles have same size, no matter how
small, then the series
"+"+"+ +"+
P1
positively diverges. That is, n=1 " = +1 for every " > 0. Indeed, for the partial sums
we have sn = n" ! +1. This simple series thus helps to clarify the import of Epicurus'
argument (properties of series have been often used, even within philosophy, to try to clarify
the nature of the potential in nite).
5 P
1 P
1
Sometimes it is convenient to start a geometric series at n = 1. In this case qn = qn 1 =
n=1 n=0
q= (1 q).
266 CHAPTER 9. SERIES (SDOGANATO)

9.1.2 Sub specie aeternitatis: in nite horizon


Series are important in economics. For example, let us go back to the intertemporal choices
introduced in Section 8.3. We saw that a consumption stream can be represented by a
sequence
x = fx1 ; x2 ; : : : ; xt ; : : :g
and can be evaluated by an intertemporal utility function U : R1
+ ! R. In particular, we
mentioned the discounted U given by
t 1
U (x) = u (x1 ) + u (x2 ) + + u (xt ) + (9.5)
where 2 (0; 1) is the subjective discount factor. In view of what we have just seen, (9.5) is
the series
X1
t 1
u (xt ) (9.6)
t=1
Series thus give a rigorous meaning to the fundamental discounted form (9.5) of intertem-
poral utility functions. Naturally, we are interested in the case in which the series (9.6)
is convergent, so that the overall utility that the consumer gets from a stream is nite.
Otherwise, how could we compare, hence choose, streams if they have in nite utility?
Using the properties of the geometric series, momentarily will show in Example 392
that the series (9.6) converges if < 1, provided that the utility function u is positive and
bounded. In such a case, the intertemporal utility function
1
X
t 1
U (x) = u (xt ) (9.7)
t=1

has as domain the entire space R1 1


+ , that is, U (x) 2 [0; 1) for every x 2 R+ . We can thus
compare all possible consumption streams.

9.2 Basic properties


Given that the character of a series is determined by the character of the sequence of its
partial sums, it is evident that subtracting, adding, or modifying a nite number of terms of
aPseries, does not change its character. In contrast, its sum P
might well change. For instance,
1 1
n=1 xn has the same character, but not the same sum, as n=k xn for every integer k > 1.
As to the fundamental operations, we have
1
X 1
X
cxn = c xn 8c 2 R
n=1 n=1

and
1
X 1
X 1
X
(xn + yn ) = xn + yn
n=1 n=1 n=1
when we do not fall in a indeterminate form 0 1 or 1 1, respectively.
The next result is simple, yet important. If a series converges, then its terms necessarily
tend to 0: summands must eventually vanish to avoid having an exploding sum (memento
Epicurus).
9.3. SERIES WITH POSITIVE TERMS 267

P1
Theorem 380 If the series n=1 xn converges, then xn ! 0.

Proof Clearly, we have xn = sn sn 1 and, given that the series converges, sn ! S as well
as sn 1 ! S. Therefore, xn = sn sn 1 ! S S = 0.

Convergence to zero of the sequence fxn g is, therefore, a necessary condition for conver-
gence P
of its series. This condition is only necessary: even though 1=n ! 0, the harmonic
series 1n=1 1=n diverges.

Example 381 The series with term


2n2 3n + 4
xn =
17n2 + 4n + 5
is not convergent because xn is asymptotic to 2n2 =17n2 = 2=17, so it does not tend to 0. N

9.3 Series with positive terms


9.3.1 Comparison criterion
P
We study now the important case of series 1 n=1 xn with positive terms, that is, xn 0 for
all n 1.6 In such a case, the sequence fsn g of the partial sums is increasing and therefore
the following regularity result holds trivially.

Proposition 382 Each series with positive terms is either convergent or positively diver-
gent. In particular, it is convergent if and only if it is bounded above.7

Series with positive terms thus inherit the remarkable regularity properties of monotone
sequences. This gives them an important status among series. In particular, for them we
now recast the convergence criteria presented in Section 8.11 for sequences.
P P1
Proposition 383 (Comparison criterion) Let 1 n=1 xn and n=1 yn be two series with
positive terms, with xn yn eventually.
P P1
(i) If 1 n=1 xn diverges positively, then so does n=1 yn .
P1 P1
(ii) If n=1 yn converges, then so does n=1 xn .
P 0
Proof Let n0 1 be such that xn yn for all n n0 , and set = nn=1 (yn xn ). By
calling sn (resp., n ) the partial sums of the sequence fxn g (resp., fyn g), for n > n0 we have
Xn
n sn = + (yk xk )
k=n0 +1

That is, n sn + . Therefore, the result follows from Proposition 320 (which is the
sequential counterpart of this statement).
6
Nothing changes if the terms are positive only eventually. Indeed, we can always discard a nite number
of terms without altering the asymptotic behavior of the series. Hence, all the results on the asymptotic
behavior of series with positive terms hold, more generally, for series with terms that are eventually positive.
7
By de nition, a series is bounded above when the sequence of the partial sums is so, i.e., there exists
k > 0 such that sn k for every n 1.
268 CHAPTER 9. SERIES (SDOGANATO)

Note that (i) is the contrapositive of (ii), and vice versa: indeed, thanks to Proposition
382, for a series with positive terms the negation of convergence is positive divergence.8
Because of their usefulness, we stated both; but, it is the same property seen in two equivalent
ways.

Example 384 The series


X1 10n
n=1 n52n+3
converges. Indeed, since
n
10n 10n 10n 1 2
= =
n52n+3 5 2n+2 n
25 5 2 25 5

the convergence of the geometric series with ratio 2=5 guarantees, via the comparison crite-
rion, the convergence of the series. N

Example 385 The series of the reciprocal of the factorials9


1
X 1
n!
n=0

converges. Indeed, observe that


1
X 1
X 1
X
1 1 1
=1+1+ =2+
n! n! (n + 1)!
n=0 n=2 n=1

But the series


1
X 1
(n + 1)!
n=1

converges because, for every n 1,


1 1
(n + 1)! n (n + 1)

where the right-hand side is the generic term of theP


Mengoli series, which we know converges.
By the comparison criterion, the convergence of 1 n=0 1=n! then follows from that of the
Mengoli series. We will see later that, remarkably, its sum is Napier's constant e (Theorem
398). N

Example 386 We call generalized harmonic series the series


1
X 1
n
n=1
8
Recall that, given two properties p and q, the implication :q =) :p is the contrapositive of the original
implication p =) q (see Appendix D).
9
Recall
P that 0! = 1. For this reason, we start the series
P1 from n = 0 (so, in Proposition 398 we will able to
write 1 n=0 1=n! = e, a more elegant expression than n=1 1=n! = e 1).
9.3. SERIES WITH POSITIVE TERMS 269

with 2 R. If = 1, it reduces to the harmonic series that we know diverges to +1.


If < 1, it is easy to see that, for every n > 1,

1 1
> (i.e., n < n)
n n
Therefore, by the comparison criterion,
1
X 1
= +1
n
n=1

If = 2, the generalized harmonic series converges. Indeed, let us observe that


1
X 1
X 1
X
1 1 1
2
= 1 + 2
= 1 +
n n (n + 1)2
n=1 n=2 n=1

But the series


1
X 1
(n + 1)2
n=1

converges because, for every n 1,

1 1
(n + 1)2 n (n + 1)

which is the genericPterm of the convergent Mengoli series.10 By the P comparison criterion,
the convergence of 1 n=1 1=n 2 is a consequence of the convergence of 1 2
n=1 1= (n + 1) .
If > 2, then
1 1
< 2
n n
for every n > 1 and therefore we still have convergence.
Finally, it is possible to see, but it is more delicate, that the generalized harmonic series
converges also if 2 (1; 2).
Summing up, the generalized harmonic series
1
X 1
n
n=1

converges for > 1 and diverges for 1. N

For the generalized harmonic series, the case = 1 is thus the \last" case of divergence:
it is su cient to very slightly increase the exponent, from 1 to 1+" with " > 0, and the series
will converge. This suggests that the divergence is extremely slow, as the reader can check
by calculating some of the partial sums.11 This intuition is made precise by the following
beautiful result.
P
10
Indeed, 1 2
n=1 1=n =
2
=6 but here we do not have the tools to prove this remarkable result.
11
A \cadaverous in nity", in the words of a professor.
270 CHAPTER 9. SERIES (SDOGANATO)

Proposition 387 We have


1 1 1
1+ + + + log n (9.8)
2 3 n
In words, the sequence of the partial sums of the harmonic series is asymptotic to the
logarithm. This result can be further improved: it can be shown that there is a scalar > 0,
the so-called Euler-Mascheroni constant, such that
1 1 1
1+ + + + = log n + + o (1) (9.9)
2 3 n
This approximation, with an error term o (1), is more accurate than (9.8), which by Propo-
sition 359 can be written as
1 1 1
1+ + + + = log n + o (log n)
2 3 n
with an error term o (log n).12 Thus, the partial sums of the harmonic series are equal to the
logarithm, up to a positive constant and a term that goes to 0. In particular, (9.9) amounts
to
1 1 1
= lim 1 + + + + log n
2 3 n
So, the Euler-Mascheroni constant is the limit of the di erence between the partial sums of
the harmonic series and the logarithm. It is a remarkable number, which is approximately
0:5772156649, whose nature is still elusive.13

Proof The proof of this result may be skipped on a rst reading since it relies on integration
notions that will be presented in Chapter 44. De ne : [0; 1) ! R by
1
(x) = 8x 2 [i 1; i)
i
with i 1. That is, (x) = 1 if x 2 [0; 1), (x) = 1=2 if x 2 [1; 2), and so on. It is easy to
see that
1 1
(x) 8x > 0 (9.10)
x+1 x
Therefore, the restriction of on every closed interval is a step function. By Proposition
1856, we then have
n
X n Z
X i Z n
1
= (x) dx = (x) dx 8k = 1; :::; n
i i 1 k 1
i=k i=k

for every n 1. By (9.10),


Z n n
X n
X Z n
1 1 1 1
log (1 + n) = dx =1+ 1+ dx = 1 + log n
0 x+1 i i 1 x
i=1 i=2
12
Indeed o (1) = log n ! 0, so a sequence which is o (1) is also o (log n). This is why an error term of order
o (1) is better than one of order o (log n). Mutatis mutandis, the relations between these two approximations
is similar to that between the two approximations that we saw for the De Moivre-Stirling formula.
13
It is not even known if it is irrational, i.e., we do not have for it the counterpart of Euler's Theorem 400.
9.3. SERIES WITH POSITIVE TERMS 271

for every n 2. Therefore,


Pn 1
log (1 + n) i=1 i 1 + log n
8n 2
log n log n log n
By the comparison criterion, we conclude that
Pn 1
i=1 i
!1
log n
as desired.

Example 388 The last example can be generalized by showing that the series14
1
X 1
n=2
n log n

converges for > 1 and any > 0, as well as for = 1 and > 1. It diverges for <1
and any 2 R, as well as for = 1 and any 1. N

The comparison criterion has a nice and useful asymptotic version, based on the asymp-
totic comparison of the terms of the sequences.
P P1
Proposition 389 (Asymptotic comparison criterion) Let 1 n=1 xn and n=1 yn be two
series with strictly positive terms.15 If xn yn , then the two series have the same character.

Therefore, the character of a series is invariant with respect to the asymptotic equivalence
relation.

Proof Since xn yn , for every " > 0 there exists n" 1 such that
xn
1 " 1+" 8n n"
yn
For every n > n" , we have
n
X n"
X n
X n"
X n
X n
X
xk
xk = xk + yk xk + (1 + ") yk c + (1 + ") yk (9.11)
yk
k=1 k=1 k=n" +1 k=1 k=n" +1 k=n" +1

and
n
X n"
X Xn Xn
xk
xk = xk + yk c + (1 ") yk (9.12)
yk
k=1 k=1 k=n" +1 k=n" +1
Pn" P1 P
where c = n=1 xk . The character of the series n=1 yn is the same as that of 1 k=n" +1 yk
because the value assumedPby a nite number of initial terms is irrelevant P1for the character
of a series. PTherefore, if 1 n=1 yn converges, by (9.11) it follows thatP n=1 xn converges,
whereas if 1 n=1 yn diverges to +1, from (9.12) it follows that also
1
n=1 xn diverges to
+1. Since the roles of the two series can be interchanged, the proof is complete.
14
The series starts with n = 2 because for n = 1 the term is not de ned.
15
The hypothesis that the terms are strictly positive, so non-zero, is necessary to make the ratio xn =yn well
de ned. This hypothesis will be used several times throughout the chapter.
272 CHAPTER 9. SERIES (SDOGANATO)

Example 390 Let


2n3 3n + 8
xn =
5n5 n4 4n3 + 2n2 12
Since
2n3 2
xn = 2
5n5 5n
P1
the series n=1 xn converges. Let, instead,
n+1
xn =
n2 3n + 4
P1
Since xn 1=n, the series n=1 xn diverges to +1. N

We can use the asymptotic comparison criterion to establish a celebrated result, proved
in 1737 by Leonhard Euler, that says that the sum of the reciprocals of the prime numbers
is in nite.

Theorem 391 (Euler) We have


1
X
1 1 1 1 1 1
+ + + + + = = +1
2 3 5 7 11 pn
n=1

Proof. By Theorem 364, we have pn n log n. Therefore (recall (8.55)),


1 1
pn n log n
P1
By
P1the asymptotic comparison criterion, the series n=1 1=pn has the same character of
n=2 1=n log n. In view of Example 388, we have
1
X 1
= +1
n log n
n=2
P1
It follows that n=1 1=pn = +1, as desired.

Euler's Theorem, along with the comparison criterion, con rms the divergence to +1 of
the harmonic series. Indeed
1 1
pn n
for every n 1.16 Euler's Theorem is, however, a truly remarkable result with respect to
the divergence of the harmonic series in that it involves only the reciprocals of the prime
numbers, whereas the harmonic series considers the reciprocals of all natural numbers (be
they prime or not).

Euler's Theorem con rms also that there are in nitely many prime numbers P
and that they
tend to +1 more slowly than the powers n , with > 1, for which we have 1 n=1 1=n <
+1.

We conclude our analysis of the comparison criterion with an important economic appli-
cation.
16
We have 1=p1 = 1=2 1, 1=p2 = 1=3 1=2, 1=p3 = 1=5 1=3, 1=p4 = 1=7 1=4, and so on.
9.3. SERIES WITH POSITIVE TERMS 273

Example 392 Consider the series


1
X
t 1
ut (xt ) (9.13)
t=1

It is a generalization of the series (9.6) in which the instantaneous utility function ut : R+ !


R is allowed to depend on time. In particular, we assume that these instantaneous utility
functions are positive and uniformly bounded above, that is, there is a common constant
M > 0 such that, for all t 1,

0 ut (x) M 8x 2 R+ (9.14)

As a result, we can write:


1
X 1
X
t 1 t 1
0 ut (xt ) M
t=1 t=1
P
By the comparison criterion, it remains to check whether the geometric series 1 t=1
t 1

converges. In view of Example 379, we conclude that the series (9.13) converges if < 1.17
We can extend the analysis to streams and utility functions that are not necessarily
positive. This becomes relevant when streams represent, for example, cash ows that at
some point of time might well feature losses, so negative amounts of money. For utility
functions ut : A R ! R, where A is any set in the real line (possibly the real line itself),
the uniform boundedness condition (9.14) takes the form jut (x)j M for all x 2 A, i.e.,

M ut (x) M 8x 2 A (9.15)
P1 t 1
The comparison criterion continues to ensure the convergence of the geometric series t=1
if < 1. N

9.3.2 Ratio criterion: prelude


The next section will be devoted to the important ratio criterion for convergence. For the
impatient reader, we rst state its simplest version.
P1
Proposition 393 (Ratio criterion, elementary limit form) Let n=1 xn be a series
with, eventually, strictly positive terms. Suppose that lim xn+1 =xn exists.

(i) If
xn+1
lim <1
xn
the series converges.

(ii) If
xn+1
lim >1
xn
the series diverges positively.
17
The asymptotic behavior as ! 1, that is, as patience becomes in nite, will be addressed by the
Hardy-Littlewood's Theorem in Section 11.3.2.
274 CHAPTER 9. SERIES (SDOGANATO)

The criterion is thus based on the study of the limit of the ratio
xn+1
xn
of the terms of the series. The condition that the limit lim xn+1 =xn exists is rather demand-
ing, as we will see in the next section. But, when it is satis ed, the elementary limit form of
the ratio criterion is the easiest to apply.

Example 394 (i) The series


1
X n2 + 5n + 1
n2n + 1
n=1
converges. Indeed,
(n+1)2 +5(n+1)+1
xn+1 (n+1)2n+1 +1 (n + 1)2 + 5 (n + 1) + 1 n2n + 1
= n2 +5n+1
= 2
xn n + 5n + 1 (n + 1) 2n+1 + 1
n2n +1
2
n + 7n + 7 n2n + 1 n2 n2n
=
n2 + 5n + 1 (n + 1) 2n+1 + 1 n2 (n + 1) 2n+1
n2n 1 n 1
= n+1
= !
(n + 1) 2 2n+1 2
So, by the ratio criterion the series convergences.

(ii) The series


1
X 2n!
3n
n=1
diverges positively. Indeed,
2(n+1)!
xn+1 3n+1 2 (n + 1)! 3n 1 (n + 1)! 1
= 2n!
= n+1
= = (n + 1) ! +1
xn 3n
3 2n! 3 n! 3

which, by the ratio criterion, implies the divergence of the series. N

If lim xn+1 =xn exists but


xn+1
lim =1
xn
then nothing
P can be said
P1about the character of the series. This is well illustrated by the two
series 1 n=1 1=n and n=1 1=n 2 : although for both we have lim x
n+1 =xn = 1, the former
diverges, while the latter converges.

9.3.3 Ratio criterion


We now study the ratio criterion in more generality by dropping the hypothesis that lim xn+1 =xn
exists.
P
Proposition 395 (Ratio criterion) Let 1 n=1 xn be a series with, eventually, strictly pos-
itive terms.
9.3. SERIES WITH POSITIVE TERMS 275

(i) If there exists a number q < 1 such that, eventually,


xn+1
q (9.16)
xn

then the series converges.

(ii) If, instead, the ratio is eventually 1, then the series diverges positively.

The theorem requires the ratios xn+1 =xn to be (uniformly) smaller than a number q < 1
(so, the terms form a strictly decreasing sequence), not just that they are all < 1. Indeed,
the ratios of the harmonic series
1
n+1 n
1 = n+1
n

are all < 1, but the series diverges (since the ratios tend to 1, there is no room to insert a
number q < 1 greater than all them).
Since the convergence of a series implies that the sequence of its terms is in nitesimal
(Theorem 380), the ratio criterion for series can be seen as an extension of the homonymous
criterion for sequences. The same is true for the root criterion that we will see in the next
chapter.

Proof (i) Without loss of generality, assume that xn > 0 and (9.16) holds for every n. From
xn+1 qxn we deduce, as in the analogous criterion for sequences, that 0 < xn q n 1 x1 ,
and the rst statement follows from the comparison criterion (Proposition 383) and from
the convergence of the geometric series. (ii) If we have eventually xn+1 =xn 1 and xn > 0,
then eventually xn+1 xn > 0. In other words, the sequence fxn g is eventually increasing
and therefore it cannot tend to 0, yielding that the series must diverge positively.

Example 396 Let fxn g be a sequence such that x1 > 0 and


( 1
2 xn if n even
xn+1 = 1
3 xn if n odd

For instance, if x1 = 1 then


P1f1; 1=3; 1=6; 1=18; :::g. Since xn+1 =xn 1=2 for all n 1, by the
ratio criterion the series n=1 xn converges. Note that here lim xn+1 =xn does not exist. N

It is possible to prove (see Section 10.4) that, if the lim xn+1 =xn exists, then the ratio
criterion assumes exactly the tripartite form that follows from Proposition 393. That is:

(i) lim xn+1 =xn < 1, then the series converges;

(ii) lim xn+1 =xn > 1, then diverges positively;

(iii) lim xn+1 =xn = 1, then the criterion fails and gives no indication about the character
of the series.
276 CHAPTER 9. SERIES (SDOGANATO)

Operationally, this tripartite form is the standard form in which the ratio criterion is
applied. At a mechanical level, it might be su cient to recall this tripartition and the
illustrative examples given in the prelude. But, not to do plumbing rather than mathematics,
it is important to keep in mind the theoretical foundations provided by Proposition 395 (the
last simple example, in which the tripartite form is useless, shows that it can be even useful).
Let us see other tripartite examples.
P1 n =n
Example 397 (i) By the ratio criterion, the series n=1 q converges for every 2R
and every 0 < q < 1. Indeed,

n q n+1 n
= q!q<1
(n + 1) q n n+1

Again by the ratio criterion, this series diverges


P positively when q > 1. Finally, if q = 1 we
are back to the generalized harmonic series 1 n=1 1=n of Example 386.

(ii) By the ratio criterion, the series


1
X xn
n!
n=0

converges for every x > 0. Indeed, for n 0 we have

xn+1 n! x
n
= !0 8x > 0
(n + 1)! x n+1

(iii) By the ratio criterion, the series


1
X xn
n
n=1

converges for every 0 < x < 1. Indeed,

xn+1 n n
n
= x!x
n+1 x n+1

which obviously is < 1 when 0 < x < 1. If x > 1, the ratio criterion implies that the
series diverges positively. Finally, if x = 1 we are back to the harmonic series, which
diverges positively. N

We stop here our study of convergence criteria. Much more can be said: in Section 10.4
we will continue to investigate this topic in some more depth.

9.3.4 A rst series expansion


Napier's constant has been introduced in the previous chapter as the limit of the sequence
(1 + 1=n)n . Surprisingly, it emerges also as sum of the series of the reciprocals of the facto-
rials, as Newton proved in 1665.
9.3. SERIES WITH POSITIVE TERMS 277

Theorem 398 (Newton) We have


1
X 1
=e (9.17)
n!
n=0

Proof In Example 385 we showed that the series converges. Let us compute its sum. By
Newton's binomial formula (B.7) for each n 1, we have
n n
X n
X
1 n 1 1 n! 1
1+ = =
n k nk k! (n k)! nk
k=0 k=0

On the other hand,


n! k
= n (n 1) (n k + 1) |n {z n} = n
(n k)! | {z }
k times k times

Therefore,
n! 1
1
(n k)! nk
which implies
n n
X n
X
1 1 n! 1 1
1+ =
n k! (n k)! nk k!
k=0 k=0
By passing to the limit, it follows that
1
X 1
e (9.18)
n!
n=0

As for the opposite inequality, we begin by observing that, for every k 0,


n! 1
lim =1 (9.19)
n!1 (n k)! nk
Indeed,
n! 1 n (n k) (n k + 1) nk
k
= =1
(n k)! n nk nk
Fix m 1. For every n > m, we have
n n
X m
X n
X
1 1 n! 1 1 n! 1 1 n! 1
1+ = k
= k
+
n k! (n k)! n k! (n k)! n k! (n k)! nk
k=0 k=0 k=m+1
Xm
1 n! 1
k! (n k)! nk
k=0

Therefore, thanks to (9.19),


n m
X m
X m
X
1 1 n! 1 1 n! 1 1
e = lim 1+ lim = lim =
n!1 n n!1 k! (n k)! nk k! n!1 (n k)! nk k!
k=0 k=0 k=0
278 CHAPTER 9. SERIES (SDOGANATO)

Since this holds for every m 1, we have that


Xm 1
X
1 1
e lim =
m!1 k! n!
k=0 n=0

that, along with (9.18), implies (9.17).


The beautiful equality (9.17) can be substantially generalized.18
Theorem 399 For every x 2 R, we have
1
X
x x n xn
e = lim 1+ = (9.20)
n!1 n n!
n=0

The equality (9.20) holds for every number x and reduces to (9.17) in the special case
x = 1. Note the remarkable series expansion of the exponential function
X1
xn x2 x3 xn
ex = =1+x+ + + + + (9.21)
n! 2 3! n!
n=0
Proof We are going to generalize some of the arguments in the proof of Theorem 398. As
in that proof, we start by applying Newton's binomial formula:
n n
x n X n xk X xk n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0
As before, note that
n! k
= n (n 1) (n k + 1) n
| {z n} = n
(n k)! | {z }
k times k times

and
n! 1
1
(n k)! nk
Fix m 1. For every n > m, we have
m
X n
X
x n xk n! 1 xk n! 1
1+ =
n k! (n k)! nk k! (n k)! nk
k=0 k=m+1
Xn n
X
jxjk n! 1 jxjk
k! (n k)! nk k!
k=m+1 k=m+1

Therefore, thanks to (9.19),


m
X m
X
x xk x n xk n! 1
e = lim 1+ lim
k! n!1 n k! n!1 (n k)! nk
k=0 k=0
m
X
x n xk n! 1
= lim 1+
n!1 n k! (n k)! nk
k=0
n
X 1
X
jxjk jxjk
lim =
n!1 k! k!
k=m+1 k=m+1
18
We adopt the convention 00 = 1 to handle the case x = 0.
9.3. SERIES WITH POSITIVE TERMS 279

Since this holds for every m 1, we have


m
X 1
X
x xk jxjk
0 lim e lim =0
m!1 k! m!1 k!
k=0 k=m+1
P
since the series 1 k
k=0 jxj =k!
Pmconverges in view of Example 397 and of the convention 00 = 1
(why?). This proves that k=0 xk =k! converges to ex , hence the statement.
P1
N.B. In the proof we used aPnoteworthy fact: if the series k=0 xk converges, then the
sequence of \forward" sums f 1
k=m xk g, i.e.,

1
X 1
X 1
X 1
X
xk ; xk ; xk ; :::; xk ; :::
k=0 k=1 k=2 k=m

converges to 0 as m ! +1. Intuitively, if from a nite sum we rst remove the rst
summand, then the rst two summands, then the rst three summands, and so on and so
forth, then what is left is going to vanish. The reader may want to make this argument
rigorous. O

Later in the book we will see that (9.20) is a power series (Chapter 11). For this reason,
the equality (9.21) is called the power series expansion of the exponential function. It is a
result, as elegant as important, that allows to \decompose" the exponential function in a
sum of (in nitely many) simple functions such as the powers xn .
We will study in greater generality series expansions with the tools of di erential calculus,
of which series expansions are one of the most remarkable applications.

We close this section by establishing the irrationality of Napier's constant, a property,


rst proved by Leonhard Euler, that we already mentioned a few times. We can now nally
prove it as a corollary of its series expansion (9.17).

Theorem 400 (Euler) Napier's constant is an irrational number.

Proof We have:
n
X 1
X
1 1
0 < e =
k! k!
k=0 k=n+1
1 1 1 1
= + + + +
n! n + 1 (n + 1) (n + 2) (n + 1) (n + 2) (n + k)
! 1
1 1 1 1 1 X 1 1 1
< + + + + = =
n! n + 1 (n + 1)2 (n + 1)k n! (n + 1)k n! n
k=1

where the last equality holds because the geometric series that starts at k = 1 with ratio
1= (n + 1) has sum 1=n. By Theorem 398, we then have the following interesting bounds:
n
X 1 1 1
0<e <
k! n! n
k=0
280 CHAPTER 9. SERIES (SDOGANATO)

Suppose, by contradiction, that e is rational, i.e., e = p=q for some natural numbers p and
q. By multiplying both sides of the last inequality by n!, we then have
n
X
p 1 1
0 < n! n! < (9.22)
q k! n
k=0

Since q!=k! is an integer for all k = 0; :::; q, if n = q then


q!
p (q 1)! q! + + +1
2!
is an integer, which cannot be between 0 and 1=n as (9.22) requires. This contradiction
proves that e is not rational.

9.4 Series with terms of any sign


9.4.1 Absolute convergence
P
We close the chapter by brie y considering the general case of series 1 n=1 xn with terms that
are not necessarily positive, even eventually. To study such series, we consider an auxiliary
series with positive terms.
P P1
De nition 401 The series 1 n=1 xn is said to be absolutely convergent if the series n=1 jxn j
of its absolute values is convergent.
The next result shows that the convergence of the series of absolute values { which can
be veri ed with the criteria discussed in the previous sections { guarantees the convergence
of the not necessarily positive, so possibly much wilder, original series.
Theorem 402 If a series converges absolutely, then it converges.
The condition is only su cient, as we will soon show (Proposition 407). The class of
absolutely convergent series is, therefore, contained in that of convergent series. As the next
section will show, this subclass has key regularity properties: absolutely convergent series
are, among the series with terms of any sign, the ones that behave well.
Example 403 Let us revisit the series in Example 397 by permitting negative terms.
P
(i) By Theorem 402 and by the ratio criterion, the series 1 n
n=1 q =n converges for every
2 R and every 1 < q < 1. Indeed, from
jxn+1 j n q n+1 n
= = jqj ! jqj < 1
jxn j (n + 1) q n n+1
it follows that it converges absolutely.
P
(ii) The series 1 n
n=1 x =n! converges for every x 2 R. In fact, from

jxn+1 j xn+1 n! x
= n
= !0 8x 2 R
jxn j (n + 1)! x n+1
it follows that it converges absolutely. So, the series in Theorem 399 is, indeed, con-
vergent.
9.4. SERIES WITH TERMS OF ANY SIGN 281

P1 n
(iii) The series n=1 x =n converges for every 1 < x < 1. Indeed,

jxn+1 j xn+1 n n
= = jxj ! jxj
jxn j n + 1 xn n+1

which obviously is < 1 when 1 < x < 1. Thus, also this series converges absolutely.N

Example 404 (i) The series


1
X ( 1)n
n2
n=1
n
converges. Indeed, we have ( 1) =n2 = 1=n2 , so this series converges absolutely.
(ii) The series
1
X
x3 x5 x7 x2n+1
x + + = ( 1)n
3! 5! 7! (2n + 1)!
n=0

converges for every x 2 R. Indeed,

x2n+3 (2n + 1)! x2


= !0 8x 2 R
(2n + 3)! x2n+1 (2n + 3) (2n + 2)

and so the series converges absolutely.


(iii) The series
1
X
x2 x4 x6 x2n
1 + + = ( 1)n+1
2! 4! 6! (2n)!
n=0

converges for every x 2 R. Indeed,

x2n+2 (2n)! x2
= !0 8x 2 R
(2n + 2)! x2n (2n + 2) (2n + 1)

and, therefore, also this last series converges absolutely. N

The next example, due to Dirichlet (1829) p. 158, con rms the elusive nature of series
with terms of any sign. Indeed, it shows that the asymptotic comparison criterion fails when
we consider series with terms of any sign: two such series may have terms of the same order
{ so, arbitrarily close as n gets larger { and, yet, have a di erent character.
P1 P1
Example 405 Consider the series n=1 xn and n=1 yn with terms

( 1)n ( 1)n ( 1)n


xn = p and yn = p 1+ p
n n n
P1
Clearly, xn yn . Yet, the two series do not have the sameP character: n=1 xn converges
(see the comment after the proof of Proposition 407), while 1 y
n=1 n diverges positively:
X1 X1 ( 1)n 1 X1 ( 1)n X1 1
yn = p + = p + = +1
n=1 n=1 n n n=1 n n=1 n

N
282 CHAPTER 9. SERIES (SDOGANATO)

Theorem 402 is a consequence of the following simple lemma, which should also further
clarify its nature.
P P1
Lemma 406 Given a series 1 n=1 xn , suppose there is a convergent series n=1 yn with
positive terms such that, for every n 1,

(i) xn + yn 0,

(ii) xn kyn for some k > 0.


P1 P1
Then, both the series n=1 (xn + yn ) and n=1 xn converge, with
X1 X1 X1
xn = (xn + yn ) yn
n=1 n=1 n=1

Proof Set zn P = xn + yn . Since 0 Pzn (1 + k) yn , by the comparison criterion the


convergence of n=1 yn implies that of n=1 zn . Let sxn , syn and szn be the partial sums of
1 1

the three series involved. Both lim szn and lim syn exist. Clearly, sxn = szn syn for every n 1.
By Proposition 333-(i), we then have lim sxn = lim szn lim syn , as desired.
P P1
The series 1 n=1 yn thus \lifts", via addition, the series of interest n=1 xnPand takes it
back to the familiar terrain of series with positive terms. The convergence of 1 n=1 xn can
then be established by studying two auxiliary series with positive terms, for which we have
at our disposal all the tools learned in the previous sections.
Theorem 402 follows from the lemma by considering yn = jxn j because jxn j + xn 0 and
xn jxn j for every nP 1. ThisPclari es the \lifting" P1nature of absolute convergence. In
1 1
particular, it implies n=1 xn = n=1 (xn + jxn j)
P n=1 jxn j, so that the sum of the series
1
n=1 x n can be expressed in terms of the sums of two series with positive terms.

Absolute convergence is only a su cient condition for convergence. Indeed, the alternat-
ing harmonic (or Mercator ) series
1
X ( 1)n+1 1 1 1 1 1
=1 + + + (9.23)
n 2 3 4 5 6
n=1

converges to log 2, as the next elegant result will show. However, it does not converge
absolutely:
X1 X1
( 1)n+1 1
= = +1
n n
n=1 n=1

Proposition 407 We have


1
X ( 1)n+1
= log 2
n
n=1

Proof The subsequences of the odd and even partial sums

s1 ; s3 ; s5 ; ::: and s2 ; s4 ; s6 ; :::


9.4. SERIES WITH TERMS OF ANY SIGN 283

are decreasing and increasing, respectively. So, they converge to two scalars Lodd and Leven ,
respectively. Since s2n+1 s2n = x2n+1 ! 0, we then have Lodd = Leven . If we call L this
common limit, we conclude that sn ! L, so the alternating harmonic series converges.
It remains to show that L = log 2 . It is enough to consider the even partial sums s2n
and show that lim s2n = log 2. We have
2n
X n 1 n n n
X1 1 n
( 1)k+1 X 1 X 1 X 1 X 1
s2n = = = + 2
k 2k + 1 2k 2k 2k + 1 2k
k=1 k=0 k=1 k=1 k=0 k=1
X2n n
X
1 1
=
k k
k=1 k=1

By (9.9),
n
X 2n
X
1 1
= + log n + o (1) and = + log 2n + o (1)
k k
k=1 k=1

where is the Euler-Mascheroni constant. Thus,


2n
X n
X
1 1
s2n = = log 2 + o (1)
k k
k=1 k=1

so that lim s2n = log 2, as desired.

It is easy to check that the argument just used to show the convergence of the alternating
P
1
series (9.23) proves, more generally, that any alternating series ( 1)n+1 xn , with xn 0
n=1
for every n 1, converges provided the sequence fxn g is decreasing and in nitesimal, i.e.,
xn # 0.

9.4.2 Hic sunt leones


Series that are not absolutely convergent are, in general, not thatPwell behaved.19 To see why
this is the case, we introduce rearrangements. Given a series 1 n=1 xn , x a permutation
: N ! N that, to each position n, associates a unique position (n) and vice versa.20 The
new series
1
X
x (n)
n=1
P
constructed via is called a rearrangement of 1 n=1 xn . In words, the new series has been
obtained by permuting the terms of the original series. That said, is it true that
1
X 1
X
xn = x (n)
n=1 n=1

19
We refer interested readers to Chapter 3 of Rudin (1976) for a more detailed analysis, which includes the
proofs of the results of this section.
20
Recall that a permutation is a bijective function (see Appendix B).
284 CHAPTER 9. SERIES (SDOGANATO)

for any permutation : N ! N? In other words, are series stable under permutations of
their elements?
This stability seems inherent to any proper notion of \addition", which should not be
a ected by mere rearrangements of the summands. Indeed, the answer is obviously positive
for nite sums because of the classic associative and commutative properties of addition.
The next result shows that the answer continues to be positive for series that are absolutely
convergent.
P P1
Proposition 408 Let 1 n=1 xn be a series that converges absolutely. Then, n=1 xn and
all its rearrangements have the same sum.

Absolutely convergent series thus exhibit the same nice behavior that characterizes nite
sums. Unfortunately, this is no longer the case if we drop absolute convergence. For instance,
consider the alternating harmonic series

1 1 1 ( 1)n+1
1 + + + +
2 3 4 n
We learned that it converges, with sum log 2, but that it is not absolutely convergent.
Through a suitable permutation, we con construct the rearrangement
1 1 1 1 1
1+ + + +
3 2 5 7 4
p
which is still convergent, but with sum log 2 2. So, rearrangements have, in general, di erent
sums. The next classic result of Riemann shows that everything goes, so the answer to the
previous question turns out to be dramatically negative.
P
Theorem 409 (Riemann) Let 1
P n=1 xn be a series that converges
P1but not absolutely (i.e.,
1
n=1 jxn j = +1). Given any L 2 R, there is a rearrangement of n=1 xn that has sum L.

Summing up, series that are absolutely convergent behave as the standard addition. But,
as soon as we drop absolute convergence, everything goes.
Chapter 10

Discrete calculus (sdoganato)

Discrete calculus deals with problems analogous to those of di erential calculus, with the
di erence that sequences, that is, functions f : N f0g ! R with discrete domain, are
considered instead of functions on the real line. Despite a more rough domain, some highly
non-trivial results hold that make discrete calculus useful in applications.1 In particular, in
this chapter we will show its use in the study of series and sequences, allowing for a deeper
analysis of some issues which we have already discussed.

10.1 Preamble: limit points


10.1.1 Limit superior and inferior
Let fxn g be a bounded sequence of scalars, so that there exists a positive constant M > 0
with M xn M for every n. Consider the ancillary sequences fyn g and fzn g de ned by
yn = sup xk and zn = inf xk
k n k n

They describe the \tail" behavior of the sequence.


Example 410 For the alternating sequence xn = ( 1)n , we have yn = 1 and zn = 1 for
every n, whereas for the sequence xn = 1=n we have yn = 1=n and zn = 0 for every n. N
It is immediate to check that
M zn xn yn M 8n 1 (10.1)
Hence, also the ancillary sequences are bounded. Moreover,
n1 < n2 =) sup xk sup xk and inf xk inf xk
k n1 k n2 k n1 k n2

so fyn g is decreasing and fzn g is increasing. Being monotone, both fyn g and fzn g converge
(Theorem 323). If we denote their limits as y and z, that is, yn ! y and zn ! z, we can
write
lim sup xk = y and lim inf xk = z
n!1 k n n!1 k n

1
Some parts of this chapter require a basic knowledge of di erential calculus (so it can be read seamlessly
after reading Chapter 26).

285
286 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

The limits y and z are, respectively, called limit superior and limit inferior of fxn g, and are
denoted by lim sup xn and lim inf xn .

Example 411 In view of the last example, for the alternating sequence xn = ( 1)n we have

lim sup xn = 1 and lim inf xn = 1

whereas for the convergent sequence xn = 1=n we have

lim sup xn = lim inf xn = lim xn = 0

This example shows two key properties of the limits inferior and superior: they always
exist, even if the original sequence has no limit and their equality is a necessary and su cient
condition for the convergence of the sequence fxn g.2 Formally:

Proposition 412 Let fxn g be a bounded sequence. We have

1 < lim inf xn lim sup xn < +1 (10.2)

In particular, xn ! L 2 R if and only if lim inf xn = lim sup xn = L.

Proof Thanks to (10.1), Proposition 320 implies (10.2) and Theorem 338 yields the \if". As
for the \only if", we leave the easy proof to the reader (just use the de nition of convergence).

Other noteworthy properties are

lim inf xn = lim sup xn and lim sup xn = lim inf xn (10.3)

They are duality properties that relate the limit superior and limit inferior of a sequence fxn g
with those of the opposite sequence f xn g. For instance, this simple duality allows to easily
translate some properties of the limit superior into properties of the limit inferior, and vice
versa (this is exactly what will happen in the next proof). Another interesting consequence
of the duality is the possibility to rewrite the inequality (10.2) as lim inf xn lim inf xn .

The next result lists some basic properties of the limits superior and inferior. Thanks to
the previous result, they imply the analogous properties that we established for convergent
sequences.3

Lemma 413 Let fxn g and fyn g be two bounded sequences. We have:

(i) lim inf xn + lim inf yn lim inf (xn + yn ),

(ii) lim sup xn + lim sup yn lim sup (xn + yn ),

(iii) lim inf xn lim inf yn and lim sup xn lim sup yn if eventually xn yn .
2
Since it is bounded, fxn g converges or oscillates, but does not diverge.
3
Speci cally, (i) and (ii) extend Proposition 333-(i), while (iii) extends Proposition 320.
10.1. PREAMBLE: LIMIT POINTS 287

Proof We start by observing that fxn + yn g is bounded. (i) For every n we have inf k n (xk + yk )
inf k n xk + inf k n yk . Since the sequences finf k n (xk + yk )g, finf k n xk g and finf k n yk g
converge, (i) follows from Propositions 333-(i) and 320. (ii) follows from (i) and the duality
formulas contained in (10.3):

lim sup (xn + yn ) = lim inf (( xn ) + ( yn ))


lim inf ( xn ) lim inf ( yn ) = lim sup xn + lim sup yn

Point (iii) readily follows from the de nitions of lim inf and lim sup, and from Proposition
320.

10.1.2 Limit points


It is possible to give a topological characterization of limits superior and inferior. To do so,
we introduce the notion of limit point.

De nition 414 A scalar L 2 R is a limit point for a sequence if every neighborhood of L


contains an in nite number of elements of the sequence.

If the sequence converges, there exists a unique limit point: the limit of the sequence.
If the sequence does not converge, the limit points are the scalars that are approached by
in nitely many elements of the sequence. Indeed, it can be easily shown that L is a limit
point for a sequence if and only if there exists a subsequence that converges to L.

Example 415 (i) The interval [ 1; 1] is the set of limit points of the sequence xn = sin n,
whereas f 1; 1g are the limit points of the alternating sequence xn = ( 1)n . (ii) The
singleton f0g is the unique limit point of the convergent sequence xn = 1=n. N

The next result shows that the limit points belong to the interval determined by the
limits superior and inferior.

Proposition 416 Let fxn g be a bounded sequence. If x 2 R is a limit point for the sequence,
then x 2 [lim inf xn ; lim sup xn ].

Proof Consider a limit point x. By contradiction, assume that lim inf xn > x. De ne
" = lim inf xn x > 0 and zn = inf k n xk for every n. On the one hand, in light of the
previous part of the chapter, we know that zn+1 zn for every n and zn ! lim inf xn . This
implies that there exists n" 2 N such that
" "
lim inf xn < zn < lim inf xn +
2 2
for every n n" . On the other hand, since x is a limit point, there exists xn such that
"
x 2 < xn < x + 2" where n can be chosen to be strictly greater than n" (recall that
each neighborhood of x must contain an in nite number of elements of the sequence). By
construction, we have that zn = inf k n xk xn . This yields that
" "
lim inf xn < zn xn < x +
2 2
288 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

thus lim inf xn < x+". We reached a contradiction since by de nition " = lim inf xn x which
we just proved being strictly smaller than ". An analogous argument yields that lim sup xn
x (why?).

Intuitively, the larger the set of limit points, the more the sequence is divergent; in par-
ticular, this set reduces to a singleton when the sequence converges. In light of the last result,
the di erence between superior and inferior limits, that is, the length of [lim inf xn ; lim sup xn ],
is a (not that precise) indicator of the divergence of a sequence.

Thanks to the inequality lim inf xn lim inf xn , the interval [lim inf xn ; lim sup xn ]
can be rewritten as [lim inf xn ; lim inf xn ]. For instance, if xn = sin n or xn = cos n, we
have that [lim inf xn ; lim inf xn ] = [ 1; 1].

N.B. Up to this point, we have considered only bounded sequences. Versions of the previ-
ous results, however, can be provided for generic sequences. Clearly, we need to allow the
limits superior and inferior to assume in nity as a value. For instance, if we consider the
sequence xn = n, which diverges to +1, we have lim inf xn = lim sup xn = +1; for the
sequence xn = en , which diverges to 1, we have lim sup xn = lim inf xn = 1, whereas
for the sequence xn = ( 1)n n we have lim inf xn = 1 and lim sup xn = +1, so that
[lim inf xn ; lim sup xn ] = R. We leave to the reader the extension of the previous results to
generic sequences. O

10.2 Discrete calculus


10.2.1 Finite di erences
The ( nite) di erences
xn = xn+1 xn
of a sequence fxn g are the discrete case counterparts of the derivatives of a function de ned
on the real line.4 Indeed, the smallest discrete increment starting from n is equal to 1,
therefore
xn+1 xn xn+1 xn xn
xn = = =
1 (n + 1) n n

De nition 417 The sequence f xn g = fxn+1 xn g is called sequence of ( nite) di erences


of a sequence fxn g.

As a rst use of this notion, note that it permits to cast the recursive de nition (9.3) of
a series in the succinct di erence form
(
s1 = x1
(10.4)
sn = xn for n 2

The next result lists the algebraic properties of the di erences, that is, their behavior
with respect to the fundamental operations.5
4
See Section 26.14.
5
It is the discrete counterpart of the results in Section 26.8.
10.2. DISCRETE CALCULUS 289

Proposition 418 Let fxn g and fyn g be any two sequences. For every n, we have:

(i) ( xn + yn ) = xn + yn for every ; 2 R;

(ii) (xn yn ) = xn+1 yn + yn xn ;


xn yn xn xn yn
(iii) = provided yn 6= 0 for every n.
yn yn yn+1

On the one hand, (i) shows that the di erence preserves addition and subtraction,
on the other hand, (ii) and (iii) show that more complex rules hold for multiplication and
division. Properties (ii) and (iii) are called product rule and quotient rule, respectively.

Proof (i) Obvious. (ii) It follows from

(xn yn ) = xn+1 yn+1 xn yn = xn+1 yn+1 xn+1 yn + xn+1 yn xn yn


= xn+1 (yn+1 yn ) + yn (xn+1 xn ) = xn+1 yn + yn xn

(iii) It follows from


xn xn+1 xn xn+1 yn xn yn+1 xn+1 yn xn yn + xn yn xn yn+1
= = =
yn yn+1 yn yn yn+1 yn yn+1
yn (xn+1 xn ) xn (yn+1 yn ) yn xn xn yn
= =
yn yn+1 yn yn+1

Monotonicity of sequences is characterized through di erences in a simple, yet interesting


way.

Lemma 419 A sequence is increasing (decreasing) if and only if xn 0( 0) for every


n 1.

Therefore, the monotonicity of the original sequence is revealed by the sign of the di er-
ences.

Example 420 (i) If xn = c for all n 1, then xn = 0 for all n 1. In words, constant
sequences (which are both increasing and decreasing) have zero di erences. (ii) If xn = an ,
with a > 0, we have

xn = an+1 an = (a 1) an = (a 1) xn

Therefore, the sequence fan g is increasing if and only if a 1. N

The case a = 2 in this last example is noteworthy.

Proposition 421 We have xn = xn for every n 1 and x1 = 2 if and only if xn = 2n


for every n.
290 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

The sequence xn = 2n thus equals the sequence of its own nite di erences, so it is the
discrete counterpart of the exponential function in di erential calculus.

Proof \If". From the last example, if a = 2 then for the increasing sequence f2n g we have
xn = xn for every n and x1 = 2. \Only if". Suppose that xn = xn for all n 1, that is,
xn+1 xn = xn . A simple induction argument shows that xn = 2n 1 x1 . Since x1 = 2, we
obtain xn = 2n for every n.

The sequence of di erences of f xn g is denoted by 2x and is called sequence of


n
second di erences; in particular:
2
xn = xn+2 xn+1 (xn+1 xn ) = xn+2 2xn+1 + xn

Analogously, for every k 2, we denote by kx the di erences of k 1x , that is,


n n

k
X k
k
xn = k 1
xn = k 1
xn+1 k 1
xn = ( 1)k i
xn+i (10.5)
i
i=0

This formula can be proved by induction on k (a common technique for this chapter). Here,
we only outline the induction step. Assume that (10.5) holds for k. We show it holds for
k + 1. Fix n. First, observe that (why?)

k+1 k k
= + 8i = 1; :::; k (10.6)
i i 1 i

This implies that

k
X k
X
k k
k+1
xn = k
xn+1 k
xn = ( 1)k i
xn+1+i ( 1)k i
xn+i
i i
i=0 i=0
k 1
X k
X
k k
= ( 1)k i
xn+1+i + xn+k+1 k
( 1) xn ( 1)k i
xn+i
i i
i=0 i=1
k
X k
X
k+1 k k+1 i k
= ( 1) xn + ( 1) xn+i + ( 1)k+1 i
xn+i + xn+k+1
i 1 i
i=1 i=1
X k+1 k
k+1 k+1
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + xn+k+1
0 i k+1
i=1
k+1
X k+1
= ( 1)k+1 i
xn+i
i
i=0

Note that the second equality is justi ed by the induction hypothesis.

Example 422 If xn = n, we have

n = (n + 1) n=1
10.2. DISCRETE CALCULUS 291

and kn = 0 for every k > 1. If xn = n2 , we have

n2 = (n + 1)2 n2 = 2n + 1
2 2
n = 2 (n + 1) + 1 (2n + 1) = 2

and k n2 = 0 for every k > 2. N

Formula (10.5) permits the following beautiful generalization of the series expansion
(9.20) of the exponential function. From now on, we set 0 xn = xn for every n. Note that
if we set 00 = 1 too, then (10.5) holds for k = 0 as well.

Theorem 423 Let fyn g be any bounded sequence. Then, for each n 1,
1
X 1
X
xk k x xj
yn = e yn+j 8x 2 R (10.7)
k! j!
k=0 j=0

The series expansion (9.20) extends the one of (10.7). Indeed, let n = 1 so that (10.7)
becomes
X1 X1
xk k xj
y1 = e x y1+j (10.8)
k! j!
k=0 j=0

Assume that yj = 1 for every j. Then, 0y = y1 = 1 and ky = 0 if k 1. Hence, (10.8)


1 1
becomes
1
X
x xj
1=e
j!
j=0

which is the series expansion (9.20).

Proof Since fyn g is bounded, the two series in the formula converge. By (10.5), we have to
show that, for each n,
1
X k 1
xk X k X xj
( 1)k i yn+i = e x
yn+j 8x 2 R (10.9)
k! i j!
k=0 i=0 j=0

In reality, we are going to prove a much stronger fact. Fix an integer j 0. We show that
the coe cients of yn+j on the two sides of (10.9) are equal. Clearly, on the right-hand side
this coe cient is e x xj =j!. As to the left-hand side, note that yn+j appears as soon as k j
and this coe cient is
1
X xk k
( 1)k j
k! j
k=j

Therefore, it remains to prove that


1
X xk k xx
j
( 1)k j
=e (10.10)
k! j j!
k=j
292 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

Set i = k j. Then,
1
X 1
X 1
X
xk k xi+j i+j xi+j (i + j)!
( 1)k j
= ( 1)i = ( 1)i
k! j (i + j)! j (i + j)! i!j!
k=j i=0 i=0
1 1
xj X ( 1)i xj X ( x)i xj
= xi = = e x
j! i! j! i! j!
i=0 i=0

where the last equality follows from Theorem 399, thus proving (10.10) and the statement.

10.2.2 Newton di erence formula


The next result, which generalizes Example 422, shows a further analogy between in
discrete calculus and the derivative in \continuous" calculus. Indeed, in the continuous case
it is necessary to derive k times the power function xk in order to obtain a constant and
k + 1 times to get the constant 0. In the discrete case, we must apply k times the operator
to the sequence nk { the restriction of the power function on N+ { in order to obtain a
constant and k + 1 times to get the constant 0.

Proposition 424 Let xn = nk with k 1. Then, k nk = k! and


m k
n =0 8m > k (10.11)

The proof relies on the following lemma of independent interest.

Lemma 425 Let fxn g be a sequence. For every k and for every n, we have k+1 x =
n
kx = k x .
n n

We leave the proof of this lemma to the reader and move to the proof of the last propo-
sition.

Proof We begin by proving a version of (10.11), namely that


k+1 s
n =0 8k 2 N; 8s 2 f0; 1; :::; kg (10.12)

We proceed by induction. For k = 1, note that s can only be either 0 or 1 and the result
holds in view of the last example. Assume now that k+1 ns = 0 for all s 2 f0; 1; :::; kg
(induction hypothesis on k), we need to show that k+2 ns = 0 for all s 2 f0; 1; :::; k + 1g.
Let s belong to f1; :::; k + 1g: either s < k + 1 or s = k + 1. In the rst case, by the induction
hypothesis, we have that k+2 ns = k+1 ns = 0. In the second case, by using Newton's

binomial, we have

k+1 k k+1 k
nk+1 = (n + 1)k+1 nk+1 = nk+1 + n + n 1
+ +1 nk+1
1 2
k+1 k
= (k + 1) nk + n 1
+ +1
2
10.2. DISCRETE CALCULUS 293

Therefore, by the previous lemma we have


k+2 k+1 k+1 k+1 k
n = nk+1 = k+1
(k + 1) nk + n 1
+ +1
2
k+1 k k+1 k+1 k 1 k+1
= (k + 1) n + n + + 1=0+0+ +0=0
2
where the zeroes follow from the induction hypothesis. We conclude that k+2 nk+1 = 0.
The statement in (10.12) follows. From (10.12), it is then immediate to derive, by induction
on m, equation (10.11) (why?). Next we show that k nk = k!. We proceed by induction.
Again, for k = 1 the result holds in view of the last example. Assume now that the statement
holds for k (induction hypothesis). We need to show that k+1 nk+1 = (k + 1)!. We then
have
k+1 k+1 k+1 k 1
n = k nk+1 = k (k + 1) nk + n + +1
2
k+1
= (k + 1) k nk + k k 1
n + + k1
2
= (k + 1) k! + 0 + + 0 = (k + 1)!
where the zeroes follow from (10.12). Summing up, k nk = k!, as desired.
That said, in di erential calculus a key feature of the powers xk is that their derivatives
are kxk 1 . In this respect, the discrete powers nk are disappointing because their di erences
do not take such a form: for instance, for the sequence xn = n2 we have n2 = 2n + 1 6= 2n
(Example 422).
To restore the formula kxk 1 , we need to introduce the falling factorial n(k) de ned by
n!
n(k) = = n (n 1) (n k + 1)
(n k)!
with 0 k n. Clearly, if k = n we go back to standard factorials, i.e., n(n) = n!. Moreover,
binomial coe cients can be expressed in terms of falling factorials as follows:
n n(k)
= (10.13)
k k!
Proposition 426 We have n(k) = kn(k 1) for every 1 k n.
Proof We have
(n + 1)! n! (n + 1) n! n!
n(k) = (n + 1)(k) n(k) = =
(n + 1 k)! (n k)! (n + 1 k) (n k)! (n k)!
n+1 n! k
= 1 = n(k)
n+1 k (n k)! n+1 k
n (n 1) (n k + 2) (n k + 1)
= k = kn (n 1) (n k + 2) = kn(k 1)
n+1 k
as desired.
Thus, for nite di erences the sequences xn = n(k) are the analog of powers for di erential
calculus.6 This analogy underlies the next classic di erence formula proved by Isaac Newton
6
Observe that, given k, the terms xn = n(k) are well de ned for n k.
294 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

in 1687 in the Principia. Recall that 0x = xn .


n

Theorem 427 (Newton) We have


m
X m(j) j
xn+m = xn (10.14)
j!
j=0

Proof Before starting, note that for every sequence fxn g and for n 1 and m 1 equality
(10.14) can be rewritten as
m
X m
X
m! j m j
xn+m = xn = xn
j! (m j)! j
j=0 j=0

Let fxn g be a generic sequence and n a generic element in N+ . We proceed by induction on


m. For m = 1 the statement is true, indeed we have that
1
X
1 0 1 m j
xn+1 = xn + xn+1 xn = xn + xn = xn
0 1 j
j=0

Assume now the statement is true for m. We need to show it holds for m + 1. Note that
m
X m
X
m j m j
xn+m+1 = xn+m + xn+m = ( xn ) + xn
j j
j=0 j=0
m
X Xm
m j+1 m j
= xn + xn
j j
j=0 j=0
m
X1 m
X
m+1 m j+1 m j 0
= xn + xn + xn + xn
j j
j=0 j=1
Xm Xm
m+1 m j m j 0
= xn + xn + xn + xn
j 1 j
j=1 j=1
m
X m+1
X
m+1 m+1 j 0 m+1 j
= xn + xn + xn = xn
j j
j=1 j=0

where the second to last equality follows from (10.6), proving the statement.

This expansion can be written as


m (m 1) 2
xn+m xn = m xn + xn + + m xn
2
So, it represents the di erence between two terms of a sequence via di erences of higher
orders. It can be viewed as a discrete analog of Taylor expansion.

Example 428 Let xn = nk with k 1. By Proposition 424, we have


m (m 1) m(k 1)
xn+m xn = m nk + 2 k
n + + k 1 k
n + m(k)
2 (k 1)!
provided m k. N
10.2. DISCRETE CALCULUS 295

10.2.3 Asymptotic behavior


The limit of the ratio
xn
yn
is fundamental, as we have seen in the analysis of the order of convergence. Consider the
following example.

Example 429 Let xn = n ( 1)n and yn = n2 . We have


xn ( 1)n
= !0
yn n
If we consider their di erences we get

xn xn+1 xn ( 1)n+1 (1 + 2n)


= = = ( 1)n+1
yn yn+1 yn 1 + 2n
So, the ratio xn = yn does not converge. N

Therefore, even if the ratio xn =yn does converge, the behavior of the ratio xn = yn
of the di erences may not. On the other hand, the next result shows that the asymptotic
behavior of the ratio xn = yn determines the one of xn =yn .

Theorem 430 (Cesaro-Stolz) Let fyn g be a strictly increasing sequence that diverges to
in nity, that is, yn " +1, and let fxn g be any sequence. Then,
xn xn xn xn
lim inf lim inf lim sup lim sup (10.15)
yn yn yn yn
In particular, this inequality implies that, if the ( nite or in nite) limit of the ratio
xn = yn exists,7 we have
xn xn xn xn
lim inf = lim inf = lim sup = lim sup (10.16)
yn yn yn yn
that is, xn =yn converges to the same limit. Therefore, as stated above, the \regularity" of
the asymptotic behavior of the ratio xn = yn implies the \regularity" of the original ratio
xn =yn . At the same time, if the ratio xn =yn presents an \irregular" asymptotic behavior, so
will the di erence ratio.

Proof In view of (10.3), it is enough to prove that


xn xn
lim inf lim inf
yn yn
If lim inf xn = yn = 1 the inequality trivially holds. So, let lim inf xn = yn = L 2 R.
It follows that, for " > 0, there exists n" such that
xn
L "< 8n n"
yn
7
This is the case originally studied by Ernesto Cesaro and Otto Stolz in the 1880s (cf. Cesaro, 1888 p.
54).
296 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

Since, by hypothesis, yn+1 yn > 0 for every n, we have


(L ") (yn+1 yn ) < xn+1 xn 8n n"
In particular, for every n n" , we obtain
(L ") (yn" +1 yn" ) < xn" +1 xn"
(L ") (yn" +2 yn" +1 ) < xn" +2 xn" +1

(L ") (yn yn 1) < xn xn 1

Summing over the previous inequalities, we get

(L ") (yn yn" ) < xn xn" 8n > n"


that is,
xn"
(L ") yn" xn
L "+ < 8n > n"
yn yn
Since n" is a given integer and yn " +1 for n ! 1, it follows that
xn" (L ") yn"
lim =0
n!1 yn
Therefore,
xn
L " lim inf
yn
Since " > 0 is arbitrary, it follows that
xn
L lim inf
yn
as desired. We leave the case lim inf xn = yn = +1 to the reader.

The previous result can be interpreted as a discrete version of de l'Hospital's Theorem.


As the de l'Hospital's Theorem is useful in nding the limit of functions, in particular if they
present indeterminate forms, the discrete Cesaro-Stolz analogous proves itself to be useful
operationally in nding the limit of sequences that present indeterminate forms.

Example 431 The limit of the sequence


log (1 + n)
(10.17)
n
has the indeterminate form 1=1. Consider the sequences xn = log (1 + n) and yn = n. The
sequence (10.17) can be then written as xn =yn . We have
xn log (1 + n + 1) log (1 + n) 1
= = log 1 + !0
yn 1 1+n
Therefore
log (1 + n)
lim =0
n
by the Cesaro-Stolz Theorem. N
10.2. DISCRETE CALCULUS 297

At a conceptual level, in the next section we will see how Cesaro-Stolz's Theorem allows
for a better understanding of convergence criteria for series (see Section 10.4). To this end,
the following remarkable consequence of Cesaro-Stolz's Theorem will be crucial.

Corollary 432 Let fxn g be a sequence such that, eventually, xn > 0. Then,
xn+1 p p xn+1
lim inf lim inf n
xn lim sup n
xn lim sup (10.18)
xn xn

Proof Without loss of generality, let fxn g be a strictly positive sequence. We have
xn+1 p 1
log = log xn+1 log xn and log n
xn = log xn
xn n
Consider log xn and yn = n , (10.15) takes the form

log xn log xn log xn log xn


lim inf lim inf lim sup lim sup
yn yn yn yn
that is,

log xxn+1
n
p p log xxn+1
n
lim inf lim inf log n
xn lim sup log n
xn lim sup
1 1
from which (10.18) follows since, for every strictly positive sequence fzn g, we have

elim inf zn = lim inf ezn and elim sup zn = lim sup ezn

as the reader can check.

We close with a straightforward but remarkable consequence of the Cesaro-Stolz Theo-


rem.
P1
Corollary 433 Let fyn g be a strictly positive sequence with n=1 yn = +1 and let fxn g
be any sequence. Then,
xn x1 + + xn x1 + + xn xn
lim inf lim inf lim sup lim sup (10.19)
yn y1 + + yn y1 + + yn yn

Proof De ne the partial sums fsxn g and fsyn g by sxn = x1 + + xn and syn = y1 + + yn .
Then,
x1 + + xn sx xn sxn
= yn and = 8n 2
y1 + + yn sn yn syn
By the Cesaro-Stolz Theorem, inequality (10.19) then holds.
P1
P1This corollary con rms the asymptotic comparison criterion for two series n=1 xn and
n=1 yn with positive terms that diverge. Indeed, by the inequality (10.19) we have

xn yn =) sxn syn (10.20)

In words, asymptotic equivalence is inherited by partial sums.


298 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

10.3 Convergence in mean


10.3.1 In medio stat virtus
The next elegant result, due to Cauchy, is a deterministic version of the law of large numbers,
one of the main results in probability theory.

Theorem 434 (Cauchy) Let fxn g be a sequence that converges to L 2 R. Then,

x1 + x2 + + xn
!L
n

Proof Consider the sequences zn = x1 + x2 + + xn and yn = n. We have


zn+1 zn xn+1
= = xn+1
yn+1 yn 1

Therefore, from the previous results, it follows that


zn zn
lim inf xn+1 lim inf lim sup lim sup xn+1
n n
and, since by hypothesis lim inf xn+1 = lim sup xn+1 = lim xn = L , it follows

x1 + x2 + + xn
lim zn = lim =L
n
as desired.

The sequence Pn
i=1 xi
n
of arithmetic means converges always to the same limit of the sequence fxn g, whereas the
converse does not hold: the sequence of means may converge while the original one does not.

Example 435 The alternating sequence xn = ( 1)n does not converge, whereas
Pn
k=1 xk
!0
n
Indeed (
x1 + x2 + + xn 0 if n is even
=
n 1
if n is odd
n
N

Therefore, the sequence of means is more \stable" than the original one: by averaging, we
smooth out the behavior of a sequence. This motivates the following, more general, de nition
of limit of a sequence, named after Ernesto Cesaro. It is fundamental in probability theory
(and in its applications).
10.3. CONVERGENCE IN MEAN 299

De nition 436 We say that a sequence fxn g converges in the sense of Cesaro (or in mean)
C
to L, written xn ! L, when
x1 + x2 + + xn
!L
n
From the last result, it follows that ordinary convergence to a limit implies Cesaro con-
vergence to the same limit. The converse does not hold: we may have Cesaro convergence
without ordinary convergence.

Example 437 The alternating sequence xn = ( 1)n from the last example does not con-
C
verge but it converges in the sense of Cesaro, i.e., ( 1)n ! 0. N

It is useful to nd conditions under which the converse holds, so the convergence of the
sequence of means implies that of the original sequence. These results are called Tauberian
theorems. We state one of them as an example. To this end, we say that a sequence of
scalars is one-sided bounded when bounded below or above.

Theorem 438 (Landau) Let fxn g be a sequence with one-sided bounded auxiliary sequence
fn xn g. Given L 2 R, we have
C
xn ! L () xn ! L
C
Proof By Theorem 434, we need only to prove the \(" part. So, suppose xn ! L. We
want to show that xn ! L. We prove this implication under the stronger condition that
the sequence n xn converges (so, it is bounded). We begin with a claim.

Claim It holds n xn ! 0.

Proof of the Claim Set L0 = lim n xn . We have:8

L0 + L = lim n xn + lim xn+1 = lim (n xn + xn+1 ) = lim ((n + 1) xn+1 nxn )


(n + 1) xn+1 nxn nxn nxn
= lim = lim = lim =L
(n + 1) n n n

where the penultimate equality follows from the Cesaro-Stolz Theorem. Thus, L0 + L = L
and so L0 = 0.

The sequence n xn thus converges to 0. Next we prove that, for each n 1,


Pn
x1 + + xn+1 k xk
xn+1 = k=1 (10.21)
n+1 n+1
Let us proceed by induction. Initial step: for n = 1,

x1 + x2 2x2 x1 x2 x2 x1 x1
x2 = = =
2 2 2 2
8
We follow Cesaro (1894) p. 108.
300 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

Induction step: suppose (10.21) holds for n. We want to show that it holds for n + 1.

x1 + + xn+2 (n + 2) xn+2 x1 xn+2


xn+2 =
n+2 n+2
n + 1 (n + 1) xn+2 x1 xn+1
=
n+2 n+1
n + 1 (n + 1) xn+2 x1 + + xn+1
= xn+1 + xn+1
n+2 n+1 n+1
Pn
n + 1 (n + 1) (xn+2 xn+1 ) k x k
= + k=1
n+2 n+1 n+1
" n
# n+1
1 X 1 X
= (n + 1) xn+1 + k xk = k xk
n+2 n+2
k=1 k=1

C
as desired. We conclude that formula (10.21) holds. As xn ! L, by this formula we have
Pn
k=1 k xk
! 0 () xn ! L (10.22)
n+1

By Theorem 434, we then have


Pn
k=1 k xk
n xn ! 0 =) ! 0 () xn ! L
n+1

as desired.

In particular, the one-sided boundedness hypothesis of Landau's Theorem is always sat-


is ed when the sequence fxn g is monotone { so, a monotone sequence converges to L if and
only if it Cesaro converges to L.
Whenever a sequence does not converge in mean, we may consider the sequence of the
\means of the means", that, by the previous results, it is more likely to converge than the
sequence of means: this is called (C; 2) convergence. This idea can be extended to the means
of the means iterated k times. We will not consider such cases.9 However, the fundamental
principle is that means tend to smooth the behavior of a sequence. In various fashions, often
stochastic (an example is the law of large number previously mentioned), this principle is of
central importance in applications. In medio stat virtus.

10.3.2 Creatio ex nihilo


The previous analysis has a particularly interesting application to the sequence of partial
sums. Indeed, if we consider the Cesaro limit of the sequence of partial sums fsn g, we can
C P C
extend the concept of summation of aP series: if sn ! S we write 1 n=1 xn = S and say
1
that S is the Cesaro sum of the series n=1 xn .
Highly divergent series become convergent according to this broader de nition. Consider
a famous example.
9
We refer interested readers to Hardy (1949).
10.3. CONVERGENCE IN MEAN 301

Example 439 The series, named after Grandi,


X1
1 1+1 1+ = ( 1)n+1
n=1

does not converge. Its partial sums

s1 = 1 ; s2 = 0 ; s3 = 1 ; s4 = 0 ; s5 = 1 ;

lead to the following sequence of means (of partial sums)


1+0 1 2 2 1 3
y1 = 1 ; y2 = = ; y3 = ; y4 = = ; y5 = ;
2 2 3 4 2 5
In general, observe that the relation between the means at n and at n + 1 is
n 1
yn+1 = yn + sn+1
n+1 n+1
An easy induction argument yields that yn is equal to 1=2 when n is even and
n 1 1 n 11 1 1 1
yn = yn 1 + sn = + = +
n n n 2 n 2 2n
when n is odd and 3, when n = 1 we have yn = 1. Therefore, yn ! 1=2, so
1
X C 1
( 1)n+1 =
2
n=1

Grandi's series converges in the sense of Cesaro, with sum 1=2.

Even if this is not his main scienti c contribution, the name of Guido Grandi is remem-
bered for his treatment of this series. It is curious to note that, until the mid-nineteenth
century, also the greatest mathematicians believed { like Grandi { that this series summed
to 1=2. Until then, mathematics had been developing untidily: highly complex theorems
were known, but attention to well posed de nitions and rigor, which we are now used to,
was lacking.
The monk Guido Grandi proposed the following explanation, which contains two mis-
takes. First of all, he identi ed

1 1+1 1+1 1+

as a geometric series with common ratio q = 1 (correct) and therefore having sum
1 1 1
= =
1 q 1 ( 1) 2

(wrong: the geometric series converges only when jqj < 1). In an unfortunate crescendo, by
pairing the addends (wrong: the associative property does not generally hold for series; cf.
Section 9.4.2), Grandi then derived the equality

(1 1) + (1 1) + =0+0+
302 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

in order to conclude that


1
=0+0+
2
That is, the sum of in nite zeroes is equal to 1=2. This led him not to deny the existence of
God, but to deem as irrelevant his intervention in the creation. Indeed, even without divine
intervention something can come out of nothing (if you wait long enough): creatio ex nihilo.
That said, in a sense Grandi's intuition that the sum of the series is 1=2 can be vindicated
through the Cesaro convergence. We close with a Tauberian remark: as the reader can check,
when the auxiliary sequence fnxn g is one-sided bounded (a condition that Grandi fails), for
S 2 R we have X1 X1
C
xn = S () xn = S
n=1 n=1

that is, Cesaro sums reduce to ordinary ones.10 In particular, this condition holds when
xn = O (1=n).

10.4 Convergence criteria for series


The results of this chapter will allow us to achieve a better understanding of the convergence
criteria for series provided in Section 9.3.11 We begin with a lemma of independent interest.

Lemma 440 Let fxn g be a sequence and k 2 R. There exists q < k such that, eventually,
xn q if and only if
lim sup xn < k (10.23)

Proof \Only if". Suppose that there exists q < k such that eventually xn q holds. There
exists n such that xn q for every n n. Therefore, for any such n we have supm n xm q,
which implies
lim sup xn = lim sup xm q < k
n!1 m n

\If". Suppose that (10.23) holds. Set L = limn!1 supm n xm . Since L < k, for every
" > 0 there exists n such that

sup xm L <" 8n n
m n

that is
L " < sup xm < L + " 8n n
m n

If we choose " su ciently small so that L + " < k, by setting q = L + " we obtain the desired
condition.

This lemma has the following consequence for sequences of ratios, our main object of
interest here.
10
This is the version of Landau's Theorem originally considered in 1910 by Edmund Landau.
11
For the sake of brevity, we will only consider series. Nonetheless, similar considerations hold for sequences
(Section 8.11). Example 431 is explanatory.
10.4. CONVERGENCE CRITERIA FOR SERIES 303

Lemma 441 Let fxn g be a sequence with, eventually, xn > 0. There exists q < 1 such that,
eventually, xn+1 =xn q if and only if
xn+1
lim sup <1 (10.24)
xn
Proof In view of the last lemma, it is enough to consider the sequence with term yn =
xn+1 =xn and to put k = 1.

Analogously, we can prove that eventually xn+1 =xn 1 if


xn+1
lim inf >1 (10.25)
xn
and only if
xn+1
lim inf 1 (10.26)
xn
Therefore, the condition \eventually xn+1 =xn 1" implies (10.26) and is implied by (10.25).
However, we cannot prove anything more. The constant sequence xn = 1 shows that the
aforementioned condition holds even if (10.25) does not hold, whereas the sequence f1=ng
shows that (10.26) may hold even if the condition is violated.

The previous analysis leads to the following corollary, which is useful for computations,
in which the ratio criterion is expressed in terms of limits.
P1
Corollary 442 Let n=1 xn be a series with, eventually, xn > 0.

(i) If
xn+1
lim sup <1
xn
then the series converges.

(ii) If
xn+1
lim inf >1
xn
then the series diverges positively.

Note that, thanks to Lemma 441, point (i) is equivalent to point (i) of Proposition 395.
In contrast, point (ii) is weaker than point (ii) of Proposition 395 since condition (10.25) is
only su cient, but not necessary, to have that xn+1 =xn 1 eventually.

As shown by the following examples, this speci cation of the ratio criterion is particularly
useful when the limit
xn+1
lim
xn
exists, that is, whenever
xn+1 xn+1 xn+1
lim = lim sup = lim inf
xn xn xn
In this particular case, the ratio criterion takes the useful tripartite form of Proposition 393:
304 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

(i) if
xn+1
lim <1
xn
the series converges;

(ii) if
xn+1
lim >1
xn
the limit of the series is 1;

(iii) if
xn+1
lim =1
xn
the criterion fails and it does not determine the behavior of the series.

As we have seen in Section 8.11, this form of the ratio criterion is the one which is usually
used in applications. Examples P394 and 397 have
P1shown 2cases (i) and (ii). The unfortunate
case (iii) is well exempli ed by 1n=1 1=n and n=1 1=n .

The next convergence criterion is, from a theoretical point of view, the most powerful
one.
P1
Proposition 443 (Root criterion) Let n=1 xn be a series with positive terms.

(i) If there exists a number q < 1 such that, eventually,


p
n
xn q

then the series converges.


p
(ii) If instead n xn 1 for in nitely many values of n, then the series diverges.
p
Proof From n xn q we immediately have that 0 xn q n and, by using the comparison
criterion and the convergence of the geometric series, the statement follows. If instead
pn x
n 1 for in nitely many values of n, for them xn is 1 and it cannot tend to 0.

Let us see the limit form of this result. In view of Lemma 440, point (i) can be equivalently
stated as
p
lim sup n xn < 1
p
As to point (ii), it requires that n xn 1 for in nitely many values of n, that is, that there
p
is a subsequence fnk g such that nk xnk 1 for every k. Such a condition holds if
p
lim sup n
xn > 1 (10.27)

and only if
p
lim sup n
xn 1 (10.28)
The constant sequence xn = 1 exempli es how condition (10.28) can hold even if (10.27)
does not. The sequence xn = (1 1=n)n on the other hand, shows how even condition (ii)
10.4. CONVERGENCE CRITERIA FOR SERIES 305

from Proposition 443 may not hold although (10.28) holds. It is, therefore, clear that (10.27)
implies point (ii) of Proposition 443, which in turn implies (10.28), but that the opposite
implications do not hold.

All this brings us to the following limit form, in which point (i) is equivalent to that of
Proposition 443, while point (ii) is weaker than its counterpart since, as we have seen above,
p
condition (10.27) only is a su cient condition for n xn 1 to hold for in nitely many values
of n.
P
Corollary 444 (Root criterion in limit form) Let 1 n=1 xn be a series with positive terms.
p
(i) If lim sup n xn < 1, the series converges.
p
(ii) If lim sup n xn > 1, the series diverges positively.
p p
Proof If lim sup n xn < 1, we have that n xn q for some q < 1, eventually. The desider-
p p
atum follows from Proposition 443. If lim sup n xn > 1, then n xn 1 for in nitely many
values of n, and the result follows from Proposition 443.

As for the limit form of the ratio criterion, also that of the root criterion is particularly
p
useful when lim n xn exists. Under such circumstances the criterion takes the following
tripartite form:

(i) if
p
lim n
xn < 1
the series converges;
(ii) if
p
lim n
xn > 1
the series diverges positively;
(iii) if
p
lim n
xn = 1
the criterion fails and it does not determine the behavior of the series.

The tripartite form of the root criterion is, like that of the ratio criterion, most useful
computationally. Nonetheless, we hope that the reader will always keep in mind the theoret-
ical underpinning the criterion: \ye were not made to live like unto brutes, but for pursuit
of virtue and of knowledge", as Dante's Ulysses famously remarked.12

Example 445 (i) Let q > 0. The series


1
X qn
nn
n=1
converges as r
n qn q
= !0
nn n
12
\fatti non foste a viver come bruti, ma per seguir virtute e canoscenza", Inferno, Canto XXVI.
306 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

P p
(ii) Let 0 q < 1. The series 1 k n
n=1 n q converges for every k: indeed
n
nk q n = qnk=n ! q
because n k=n ! 1 (since log n k=n = (k=n) log n ! 0). N
p
The ratio and root criteria are based on the behavior of sequences fxn+1 =xn g and n xn ,
which are related via the important inequalities (10.18). In particular, if lim xn+1 =xn exists,
we have
xn+1 p
lim = lim n xn (10.29)
xn
and so the two criteria are equivalent in their limit form. However, if lim xn+1 =xn does not
exist, we still obtain from (10.18) that
xn+1 p
lim sup < 1 =) lim sup n xn < 1
xn
and
xn+1 p
lim inf > 1 =) lim sup n xn > 1
xn
This suggests that the root criterion is more powerful than the ratio criterion in determining
convergence: whenever the ratio criterion rules in favor of convergence or of divergence, we
would have reached the same conclusion by using the root criterion. The opposite does not
hold, as the next example shows: the ratio criterion fails while the root criterion determines
that the series in question converges.

Example 446 Consider the sequence13


( 1
2n if n odd
xn = 1
2n 2 if n even

that is:
1 1 1 1 1 1 1
+1+ + + + + + +
2 8 4 32 16 128 64
We have 8 1
>
> 2(n+1) 2
=2 if n odd
xn+1 < 1
2n
=
xn >
>
1
1
: 2n+1
1 = 8 if n even
2n 2

and ( 1
p 2 if n odd
n
xn = p
n
4
2 if n even
so that
xn+1 xn+1 1
lim sup =2 , lim inf =
xn xn 8
and
1 p
lim sup n
xn =
2
The ratio criterion thus fails, while the root criterion tells us that the series converges. N
13
See Rudin (1976) p. 67.
10.5. MULTIPLICATION OF SERIES 307

Even though the root criterion is more powerful, the ratio criterion can still be useful as
it is generally easier to compute the limit of ratios than that of roots. The root criterion
may be more powerful from a theoretical standpoint, but harder to use from a computational
perspective.
In light of this, when using the criteria for solving problems, one should rst check
whether lim xn+1 =xn exists and, if it does, compute it. In such a case, thanks to (10.29) we
p
can also know the value of lim n xn and thus we can use the more powerful root criterion.
In the unfortunate case in which lim xn+1 =xn does not exist, and we can at best compute
lim sup xn+1 =xn and lim inf xn+1 =xn , we can either use the less powerful ratio criterion (which
p
may fail, as we have seen in the previous example), or we may try to compute lim sup n xn
directly, hoping it exists (as in the previous example) so that the root criterion can be used
in its handier limit form.
Finally, note that, however powerful it may be, the root criterion { a fortiori, the weaker
ratio criterion { only gives a su cient condition for convergence, as the following example
shows.

Example 447 The series


1
X 1
n2
n=1
converges. However, by recalling Example 345, we have that
r r r
n 1 n 1 n 1
lim 2
= lim lim =1
n n n
N
P1 2
The root criterion is of no help in determining whether the simple series n=1 n
p
converges. The reason behind such a \failure" is evident: if lim sup xn < 1, then eventually
n
pn x
n q, that is, eventually xn q n for some q 2 (0; 1). In words, xn needs to converge to
zero at least as fast as q n . This is not the case for series with term n 2 and, more in general,
with term n k and k 1.

10.5 Multiplication of series


The operation of term-by-term
P addition
P1 of convergent series preserves limits:Pif we add two
convergent series 1 x
n=1 n and x y
n=1 n with sums S and S , their sum
y 1
n=1 (xn + yn )
x
converges to the limits' sum S + S . y 14

This propertyP does not hold for the analog operation of multiplication: the series' term-
by-term product 1 n=1 xn yn does not converge, in general, to the limits' product S
x Sy,
as the next example shows.
p P P1
Example 448 TakeP xn = yn = ( P 1)n = n. The series 1 n=1 xn = n=1 yn converges (see
1 1
Example 405). Yet, n=1 xn yn = n=1 1=n = +1. N

In this section we present a notion of product of series that preserves limits. To this end,
in the next subsection we rst introduce an important operation on sequences.
14 P1
Throughout this section, Lx denotes the limit of a sequence x = fxn g and S x the sum of a series n=0 xn .
308 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

10.5.1 Convolutions of sequences


De nition 449 The convolution of two sequences x = fxn g and y = fyn g is the sequence
x y with term
(x y)n = x1 yn + x2 yn 1 + + xn y1

Thus, the operation of convolution associates to two sequences x = fxn g and y = fyn g
the sequence
x y = fx1 y1 ; x1 y2 + x2 y1 ; x1 y3 + x2 y2 + x3 y1 ; :::g
Through sums we can write the n-th term of a convolution in two equivalent ways:
n
X X
(x y)n = xk yn+1 k = xk ym (10.30)
k=1 k+m=n+1

It is easy to see that the operation of convolution is commutative and associative.


Next we present an elegant generalization of Theorem 434.

Proposition 450 If two sequences x = fxn g and y = fyn g converge, then


C
(x y)n ! Lx Ly

The convolution of two convergent sequences thus converges in the sense of Cesaro to the
product of their limits. That is,
x1 yn + x2 yn 1 + + xn y1
! Lx Ly
n
Theorem 434 is the special case when yn = 1 for all n.

Proof We have
x1 yn + x2 yn 1 + + xn y1
(x y)n = (10.31)
n Pn
(x1 Lx ) yn + (x2 Lx ) yn 1 + + (xn Lx ) y1 k=1 yi
= + Lx
n n
The convergent sequence fyn g is bounded (Proposition 322), so there exists a positive scalar
M such that jyn j M for all n. Thus,
(x1 Lx ) yn + (x2 Lx ) yn 1 + + (xn Lx ) y1
n
jx1 Lx j + jx2 x
L j+ + jxn Lx j
M
Pn n
k=1 jxi Lx j
= M !0
n
P
where the convergence to zero holds because,
P by Theorem 434, n 1 nk=1 xi ! Lx .
Again by Theorem 434, we have n 1 nk=1 yi ! Ly . From (10.31) it then follows that
(x y)n ! Lx Ly , as desired.

A stronger form of convergence is needed to establish ordinary convergence


10.5. MULTIPLICATION OF SERIES 309

Proposition 451 If two sequences x = fxn g and y = fyn g converge, then

(x y)n ! Lx Ly
P1 P1
provided at least one of the series n=1 jyn Ly j and n=1 jxn Lx j converges.
P1
Proof Assume that n=1 jyn Ly j < 1. We begin with a claim.
P1
Claim If n=1 jyn j < 1 and xn ! 0, then (y x)n ! 0.

Proof of the Claim Let " > 0. Since xn ! 0, there exists nx" 1 such that jxn j
P1 x y Pnx"
"= n=1 jyn j for all n n" . Since yn ! 0, there exists n" 1 such that jyn j "= n=1 jxn j
for all n ny" . Hence, given any n > nx" + ny" , we have
"
jyn+1 kj Pnx" 8k = 1; :::; nx"
n=1 jxn j

because n + 1 k ny" for each k = 1; :::; nx" . Thus,

j(y x)n j = jy1 xn + y2 xn 1 + + yn x1 j jy1 j jxn j + jy2 j jxn 1j + + jyn j jx1 j


nx n
X" X
= jyn+1 k j jxk j + jyn+1 k j jxk j
k=1 k=nx
" +1
x
n"
X n
X
" "
Pnx" jxk j + P1 jyn+1 kj "+"
n=1 jxn j k=1 n=1 jyn j k=nx +1
"

We conclude that (y x)n ! 0.

By the Claim,

(x1 Lx ) (yn Ly ) + (x2 Lx ) (yn 1 Ly ) + + (xn Lx ) (y1 Ly )


!0 (10.32)
n
Yet,

(x1 Lx ) (yn Ly ) + (x2 Lx ) (yn 1 Ly ) + + (xn Lx ) (y1 Ly )


n
x1 yn + x2 yn 1 + + xn y1 sx syn
= Ly n Lx + Lx Ly
n n n
By Theorem 434,
sxn syn
lim Ly = lim Lx = Lx Ly
n n
In view of (10.32), this implies
x1 yn + x2 yn 1 + + xn y1
! Lx Ly
n
as desired.
310 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

10.5.2 Cauchy products of series


Next we introduce a notion of product of series by convolving their terms.
P1 P1
De
P1 nition 452 The Cauchy product of two series n=1 xn and n=1 yn is the series
n=1 (x y)n .

This apparently ad hoc notion of product, introduced by Cauchy in 1821, turns out to
be the sought-after one that preserves limits, as the following classic result, proved by Franz
Mertens in 1875, shows.
P P1
Theorem 453 (Mertens) Let 1 n=1 xn and n=1 yn be two convergent series. If at least
one of them converges absolutely, then their Cauchy product converges and
1
X
(x y)n = S x Sy
n=1

Under a simple hypothesis of absolute convergence, the product of the sums of two
convergent series is equal to the sum of their Cauchy product. We can thus regard the map
that associates to two convergent series their Cauchy product as a multiplication operation
of convergent series.
The proof is a simple consequence of Proposition 451, which indeed can be seen as a
version of Mertens' Theorem for convolutions, along with the following interesting lemma.
Lemma 454 It holds
sxn y
= sx1 yn + sx2 yn 1 + + sxn y1 (10.33)
and
sx1 y
+ + sxn y
= sx1 syn + sx2 syn 1 + + sxn sy1 (10.34)
Proof It holds
Xn
sxn y = (x y)k = x1 y1 + (x1 y2 + x2 y1 ) + (x1 y3 + x2 y2 + x3 y1 ) + + (x1 yn + x2 yn 1 +
k=1
= sx1 yn + sx2 yn 1 + + sxn y1
Thus,
sx1 y
+ sx2 y
+ sx3 y
+ + sxn y

= sx1 y1 + (sx1 y2 + sx2 y1 ) + (sx1 y3 + sx2 y2 + sx3 y1 ) + + (sx1 yn + sx2 yn 1 + + sxn y1 )


= sx1 (y1 + + yn ) + sx2 (y1 + + yn 1) + + sxn y1
= sx1 syn + sx2 syn 1 + + sxn sy1
as desired.
P1
Proof of Merten's Theorem Assume that n=1 jyn j < 1. By (10.33), we have
sxn y
= sx1 yn + sx2 yn 1 + + sxn y1
= (sx1 S x + S ) yn + (sx2
x
S x + S x ) yn 1+ + (sxn S x + S x ) y1
= (sx1 S x ) yn + (sx2 S x ) yn 1 + + (sxn S x ) y1 + S x (yn + yn 1 + + y1 )
= (sx1 x
S ) yn + (sx2 x
S ) yn 1 + + (sxn x
S ) y1 + S x syn
10.5. MULTIPLICATION OF SERIES 311

By Proposition 451, we have

(sx1 S x ) yn + (sx2 S x ) yn 1 + + (sxn S x ) y1 ! 0


P
Since S x syn ! S x S y , we conclude that 1 x y
n=1 (x y)n = lim sn = S
x S y , as desired.

Example 455 Consider the series expansion of the exponential function (9.20), i.e.,
1
X
x xn
e =
n!
n=0
1
X
The series xn =n! converges absolutely for each x 2 R (why?). Along with the Newton
n=0
binomial formula, Merten's Theorem implies:15
1 n
! 1 n
!
X X 1 1 X X 1 n k n
x y
e e = xk y n k
= x y k
k! (n k)! n! k
n=0 k=0 n=0 k=0
X1
1
= (x + y)n = ex+y
n!
n=0

Thus, a Cauchy product underlies the standard multiplication rule for exponential functions.
N

The next example shows the importance of absolute convergence in Merten's Theorem.
p
Example 456 As in Example 448, take xn = yn = ( 1)n = n. It holds
n
X n
X n
X
n+1 1 1 1
j(x y)n j = ( 1) p p = p p
k=1
k n+1 k k=1
k (n + 1 k) k=1
kn
n
X n
X
1 1
p = =1
n 2 n
k=1 k=1
P1 P1
P1 Cauchy product n=1 j(x y)n j thus diverges positively even though both series n=1 xn =
The
n=1 yn converge. N

In sum, the ordinary convergence of the Cauchy product does not hold under plain
convergence. Yet, a remarkable convergence result, due to Cesaro (1890), still holds.16
P P1
Theorem 457 (Cesaro) Let 1 n=1 xn and n=1 yn be two convergent series. Their Cauchy
product converges in the sense of Cesaro and
1
X C
(x y)n = S x Sy
n=1
15
Pn For series that start at n = 0, the convolution formula (10.30) is easily seen to become (x y)n =
k=0 xk yn k :
16
This result marked the beginning of the systematic study of divergent series, as summarized about sixty
years later in Hardy (1949).
312 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

Thus, it is the Cesaro sum of the Cauchy product that equals the product of the sums of
two convergent series. A suitable Tauberian theorem may then ensure that this Cesaro sum
is actually an ordinary one.

Proof By (10.34),

sx1 y
+ + sxn y
= sx1 syn + sx2 syn 1 + + sxn sy1

By Proposition 450, from sxn ! sx and syn ! sy it follows that

C
sx1 syn + sx2 syn 1 + + sxn sy1 ! S x Sy

C
In turn, this implies that sx1 y
+ + sxn y
! Sx S y , as desired.

Cesaro's Theorem has, as an immediate corollary, an earlier result that Niels Abel proved
in 1826.
P1 P1
Corollary 458 (Abel) Let n=1 xn and n=1 yn be two convergent series. If their Cauchy
product converges, then
1
X
(x y)n = S x Sy
n=1

Proof In view of Cesaro's Theorem, it follows from Theorem 434.

10.6 In nitely often: a second key adverb


10.6.1 Tail bounds
Recall that a sequence fxn g is bounded below if there exists k 2 R such that xn k for
every n 1, while it is bounded above if the inequality is reversed. Finally, a sequence is
bounded when it is bounded both above and below.
These notions are important, yet it is natural to wonder whether there exist tail versions
of them, in which it only matters the tail behavior of the sequence. Here we show that the
limits inferior and superior permit to establish such tail versions. We begin with a basic
case.

Proposition 459 Let fxn g be a bounded sequence and k 2 R. Consider the following prop-
erties:

(i) xn k eventually,

(ii) lim inf xn k,

(iii) xn k " eventually, for each " > 0.

We have (i) =) (ii) () (iii).


10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 313

So, a lower bound for the limit inferior is, up to an arbitrarily small " > 0, also a \tail"
lower bound for the sequence, which holds eventually. To see that (ii) does not imply (i) it
is enough to consider the sequence xn = 1=n: we have lim xn = 0 though xn < 0 for all
n 1.

Proof (i) easily implies (ii). We prove that (ii) implies (iii). Let lim inf xn k. Fix " > 0.
By the de nition of limit inferior, there exists n" 1 such that for all n n" we have
zn = inf k n xk > k ". Since xn zn for all n 1, we conclude that xn > k " eventually.
(iii) implies (ii). Assume that, for each " > 0, there exists n" 1 such that xn k "
for all n n" . In turn, this implies zn = inf k n xk k " for all n 1, so that lim inf xn =
lim zn k ". Since this inequality holds for each " > 0, we conclude that lim inf xn k.

If we reverse the inequality in Proposition 459, from to <, we get the following inter-
esting result.

Proposition 460 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:

(i) lim inf xn < k,

(ii) for each n 1 there exists n n such that xn < k.

Proof (i) implies (ii). Let lim inf xn = x < k. Let " > 0 such that x + " < k. By de nition
of limit inferior, there exists n" 1 such that zn = inf k n xk < x + " < k for all n n" .
Thus, for all n n" there exists some m n such that xm < k. We conclude that, for each
n 1, there exists some n n such that xn < k.
(ii) implies (i). Assume that, for each n 1, there exists n n such that xn < k. Then,
zn = inf k n xk < k and so, being fzn g a decreasing sequence, lim inf xn = lim zn < k.

In point (ii) of this proposition emerges a kind of tail behavior for which the adverb
\eventually" is too stringent. For this reason, next we introduce a more relaxed adverb,
\in nitely often".

De nition 461 We say that a sequence satis es a property P in nitely often if, starting
from each position n = nP , there exists at least a position n nP whose term of the sequence
satis es P.

Clearly, a sequence that satis es a property eventually, a fortiori satis es it in nitely


often. The converse is false: the sequence xn = ( 1)n is 1 in nitely often, but not
eventually.
Using this new adverb, the last result says that we have lim inf xn < k if and only if
xn < k in nitely often. If we relax the strict inequality we get the following corollary.

Corollary 462 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:

(i) lim inf xn k,


314 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

(ii) xn k + " in nitely often, for each " > 0.

Proof (i) implies (ii). Let lim inf xn k. Let " > 0. We have lim inf xn < k + ", so
by Proposition 459 we have xn < k + " in nitely often, as desired (since " was arbitrarily
chosen). (ii) implies (i). Assume that xn k + " in nitely often, for each " > 0. That is,
for each n 1, there exists some n n such that xn k + ". Then, zn = inf k n xk k + "
for all n 1 and so lim inf xn k + ". Since this holds for each " > 0, we conclude that
lim inf xn k.

The next corollary nicely summarizes what we did so far in this section by characterizing
limit inferiors.

Corollary 463 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:

(i) lim inf xn = k,

(ii) xn k " eventually and xn k + " in nitely often, for each " > 0.

The results seen so far for limit inferiors and lower bounds have, of course, dual versions
for limit superiors and upper bounds that are based on the duality properties (10.3). Next
we state the dual version of Proposition 459 and leave to the reader the dual versions of the
other results of the section.

Proposition 464 Let fxn g be a bounded sequence and k 2 R. We have:

(i) xn k eventually,

(ii) lim sup xn k,

(iii) xn k + " eventually, for each " > 0.

We have (i) =) (ii) () (iii).

A nice consequence of the equivalence of (ii) and (iii) in this result is that a bounded
sequence fxn g converges to some x 2 R if and only if

lim sup jxn xj = 0

as the reader can check. In contrast, the condition lim inf jxn xj = 0 is irrelevant for the
convergence of fxn g: the sequence xn = ( 1)n does not converge even though lim inf jxn 1j =
0.
10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 315

10.6.2 A tale of two adverbs


As the reader can check, we can diagram as follows the basic relations between limits inferior
and superior and the new adverb, abbreviated as \i.o.":
lim inf xn < k lim sup xn > k
m m
xn < k i.o. xn > k i.o.
+ +
xn k i.o. xn k i.o.
+ +
lim inf xn k lim sup xn k
m m
8" > 0; xn k + " i.o. 8" > 0; xn k " i.o.
As to the older adverb, we have:
lim inf xn > k lim sup xn < k
m m
xn > k eventually xn < k eventually
+ +
xn k eventually xn k eventually
+ +
lim inf xn k lim sup xn k
m m
8" > 0; xn k " eventually 8" > 0; xn k + " eventually
N.B. Note that we might well have xn > k i.o. as well as xn < k i.o., something that cannot
happen eventually. For instance, if we take again xn = ( 1)n , we have both xn > 0 i.o. and
xn < 0 i.o.. O

The two adverbs are easily applied to the comparison of sequences. For instance, we
write xn yn i.o. when xn yn 0 i.o., while we write xn yn eventually when xn yn 0
eventually. So, the last two tables is easily adapted to the comparison of sequences, with an
interesting twist in view of Lemma 413.
lim inf (xn yn ) < 0 lim sup (xn yn ) > 0
m m
xn < yn i.o. xn > yn i.o.
+ +
xn yn i.o. xn yn i.o.
+ +
lim inf (xn yn ) 0 lim sup (xn yn ) 0
m m
8" > 0; xn yn + " i.o. 8" > 0; xn + " yn i.o.
+ +
lim inf xn lim sup yn lim inf xn lim sup yn
We close with a nal remark. Say that a set A of natural numbers is dense if, for each
natural number n, there is some element a of A such that a n. The set of even numbers
316 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

and the set of powers 2n are instances of dense sets. A moment's re ection shows that a
sequence satis es a property i.o. when it does for all terms xn whose indexes n belong to
a dense set of natural numbers. For example, the sequence xn = ( 1)n is > 0 i.o. (resp.,
< 0 i.o.) because all the terms with even (resp., odd) indexes are strictly positive (resp.,
negative).
In contrast, a sequence satis es a property eventually when it does for all terms xn whose
indexes n belong to a co nite set of natural numbers, that is, a set whose complement is
nite (a co nite set contains all natural numbers except at most nitely many of them).
Clearly, co nite sets are dense, but the converse is obviously false. This provides another
angle on the two adverbs and should further clarify why one is much stronger than the other.

10.6.3 Illustration: asymptotic partial sums


As we learned in studying mean convergence, the sequence of partial sums
n
X
sxn = xk
k=1

of a sequence often behave better than the sequence x = fxn g itself. So, two sequences may
have asymptotic partial sums even though they are not asymptotic, that is, we may have
xn yn but sxn syn . Our new adverb makes it possible to characterize this case.

Proposition 465 Let x = fxn g and y = fyn g be two sequences such that

sxn syn and sxn ! +1 (10.35)

Then, for each " > 0,


xn (1 + ") yn i.o. (10.36)
and
xn (1 ") yn i.o. (10.37)

In words, if two sequences have (unbounded) asymptotic partial sums, their terms are,
in nitely often, arbitrarily close. Note that the hypothesis sxn ! +1 is needed: if x =
f2; 0; 0; 0; : : :g and y = f1; 1=2; 1=4; 1=8; :::g, then sxn syn but xn < yn eventually.

Proof Fix " > 0 and suppose, by contradiction, that eventually xn (1 + ") yn , i.e., there
exists n" 1 such that xn (1 + ") yn for all n n" . It follows that, for all integers n n" ,
n"
X n
X n"
X n
X n"
X
sxn = xk + xk xk + (1 + ") yk = (xk yk ) + (1 + ")syn
k=1 k=n" +1 k=1 k=n" +1 k=1

Since lim sxn = lim syn = +1, we have eventually


"
sxn (1 + )syn
2
which contradicts sxn syn . A similar argument proves (10.37).

The next corollary is a consequence of this proposition that will be soon useful.
10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 317

Corollary 466 Let x = fxn g be a sequence such that

sxn n log n (10.38)

Then, for each " > 0,


xn (1 + ") log n i.o. (10.39)
and
xn (1 ") log n i.o. (10.40)

Proof We have
log n! = log 1 + log 2 + log 3 + + log n n log n (10.41)
Indeed, we have, for each n 1,
Z n n
X
log x dx log i n log n:
1 i=1

Since Z n
log x dx = x(log x 1)jn1 = n(log n 1) + 1 n log n;
1

we conclude that (10.41) holds. By Lemma 355-(iii) and by (10.38), we have


n
X
log i n log n sxn
i=1

By Proposition 465, both (10.39) and (10.40) hold.

10.6.4 Illustration: prime gaps


For the sequence fpn g of prime numbers, the di erence pn = pn+1 pn is called prime gap.
The sequences of di erences f pn g is thus the sequence of prime gaps. For instance, for the
rst fteen prime numbers we have:

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pn 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
pn 1 2 2 4 2 4 2 4 6 2 6 4 2 4

Except the rst prime gap, all other prime gaps pn with n > 1 are even: all prime
numbers > 2 are odd and so their di erences are even numbers.17 So, pn 2 for all n > 1.
The smallest prime gap is 1 and is given only by the initial term

p1 = p 2 p1 = 3 2=1

Indeed, if we take any other pair of consecutive natural numbers, one of them has to be even
and so it cannot be prime.
17
Two odd numbers have the form 2n1 + 1 and 2n2 + 1, with n1 > n2 . So, their di erence is the even
number 2 (n1 n2 ).
318 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

In contrast, there are many pairs of prime with a prima gap 2, the so-called twin primes.
For instance, among the rst fteen prime numbers we have six pairs of twin primes

f3; 5g ; f5; 7g ; f11; 13g ; f17; 19g ; f29; 31g ; f41; 43g

This simple example seems to indicate that the are plenty of twin primes, but also that
they seem to become less frequent as we go on along the sequence of primes. Indeed, it is
not know even if there exist in nitely many of them, that is, if

lim inf pn = 2

It is one of the (many) prime mysteries, the so-called twin prime conjecture.
Be that as it may, this is only the beginning: if the prime gap 2 is already elusive, let alone
larger prime gaps. Something non-trivial about prime gaps can be said, however, thanks to
the Prime Number Theorem.

Proposition 467 We have


lim sup pn = +1
and
pn = o (pn )
as well as
p1 + p2 + + pn
log n (10.42)
n
Prime gaps are thus unbounded above, yet their magnitude is asymptotically negligible
relative to that of prime numbers { something not surprising in view of the last table, where
for example to the prime number pn = 43 corresponds a gap pn = 4.18 The peaceful
behavior of prime gaps is con rmed by the slow logarithmic pace with which their average
increases: their behavior can be erratic but it is de nitely not exuberant.

Proof Suppose, by contradiction, that lim sup pn < +1. So, there exists K > 0 such that

pn K 8n 1 (10.43)

Yet, the n 1 consecutive numbers

n! + 2; n! + 3; :::; n! + n

are not prime because n! + 2 = 2 (n!=2 + 1) is divisible by 2, n! + 3 = 3 (n!=3 + 1) is divisible


by 3, and so no. Hence,
p (n!+2) n 1
which contradicts (10.43).
By Theorem 364, we have pn n log n. By Lemma 355-(iii), we then have

pn+1 (n + 1) log (n + 1) n + 1 log (n + 1)


= !1
pn n log n n log n
18
Ricci (1952) reports the prime gaps till n = 170. The largest prime gap among them is 18, which
corresponds to consecutive primes 523 and 541 (the previous prime gap is 2 and the next one is 6).
10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 319

because
log (n + 1) log (n + 1) log n + log n log (n+1)
n
= = +1!1
log n log n log n
So,
pn pn+1 pn pn+1
= = 1!0
pn pn pn
We conclude that pn = o (pn ).
Again by Theorem 364 and Lemma 355-(iii), we have:

p1 + p2 + + pn pn+1 2 pn+1 (n + 1) log (n + 1)


=
n n n n
n + 1 log (n + 1)
log n log n
n log n
as desired.

By (8.66), we have log pn log n. By (10.42), we have


p1 + p2 + + pn
log pn
n
that is, by Proposition 359,
p1 + p2 + + pn
= [1 + o (1)] log pn
n
In turn, this easily implies

min pk [1 + o (1)] log pn max pk


1 k n 1 k n

This suggests logarithmic lower and upper bounds for prime gaps. Indeed, in view of Corol-
lary 466, from (10.42) and log pn log n we infer, for every " > 0, the upper bound

pn < (1 + ") log pn i.o. (10.44)

for \small" prime gaps, as well as the lower bound

pn > (1 ") log pn i.o. (10.45)

for \large" prime gaps.


The next result, proved in 1940 by Paul Erdos, substantially improves the upper bound
(10.44). Recall the question that such a bound addresses: how small can remain prime
gaps as n gets larger? The twin prime conjecture claims that, in nitely often, one can meet
prime gaps of size as small as 2. Though far from this conjecture, Erdos' upper bound is
remarkable.

Theorem 468 (Erdos) There is a constant 0 c < 1 such that


pn
lim inf <c (10.46)
log pn
320 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)

Along the sequence of primes one can thus nd, in nitely often, two consecutive primes
whose gap is such that
pn < c log pn
i.e., for each n 1 there exists n n such that pn < c log pn . The sequence c log pn
provides, in nitely often, an upper bound for prime gaps. The constant c is < 1 and so it
improves the constant (1 + ") in the upper bound (10.44).
However interesting, Erdos' logarithmic upper bound (10.46) diverges because log n !
+1 as n ! +1. So, it does not say much about the persistence of small gaps as n gets
large, for instance whether one can hope to nd in nitely many twin primes { as the twin
prime conjecture claims.
A remarkable result in this direction has been recently proved.
Theorem 469 (Zhang) There is a constant b > 0 such that lim inf pn < b.
So, along the sequence of primes one can nd, in nitely often, two consecutive primes
whose gap is < b, i.e., for each n 1 there exists n n such that pn < b. This result
substantially improves Erdos' (and subsequent) results. Indeed, it implies inter alia that in
Erdos' result the constant c is actually equal to 0, that is, lim inf pn = log n = 0.
Though it is not know whether there exist in nitely many twin primes, this result permits
to say that there are in nitely many gaps bounded above by a constant b, so in nitely many
small gaps.19 To prove the twin prime conjecture, one would need to prove that b = 3, i.e.,
that
lim inf pn = 2
So far, the best bound is b = 247, i.e., lim inf pn 246. Still far from 2, but a truly
remarkable bound nevertheless.

All that said, let us turn to the lower bound (10.45) for large prime gaps. Here we would
like to understand how large can become prime gaps pn , as n gets larger. In the 1930s
few remarkable results appeared that shed light on the issue, summarized in Erdos (1940).
Among them, maybe the easiest to report is the next one, due to Ricci (1934) p. 194.
Theorem 470 (Ricci) There is a constant c > 0 such that
pn
lim sup >c
log pn log log log pn
In words, along the sequence of primes one can nd, in nitely often, two consecutive
primes whose gap is such that
pn > c log pn log log log pn
i.e., for each n 1 there exists n n such that pn > c log pn log log log pn . The sequence
c log pn log log log pn provides, in nitely often, a lower bound for prime gaps. It improves the
lower bound (1 ") log n in (10.45) because, for any c > 0, eventually c log log log pn > 1 ".
The logarithms involved in this lower bound grow extremely slowly { recall table 8.62 on
log log n. Recently, these lower bounds from the 1930s have been signi cantly improved. We
refer interested readers to Maynard (2018) for a recent perspective on this fascinating topic.
19
When he published this major result Yitang Zhang was 59 years old and held a teaching position (for-
tunately, mathematics is much more than a young man's game, unlike what Hardly famously claimed in his
Apology).
Chapter 11

Power series

In the nal chapter of this part we study in some detail power series, a fundamental class of
series that plays a key role in many applications (for instance, in the economic analysis of
temporal choices).

11.1 Power series


Power series are an important class of series of the form
1
X
an xn (11.1)
n=0

with an 2 R for every n 0. The scalars an are called coe cients of the series.
The generic term of a power series is xn = an xn . The scalar x parameterizes the series:
to di erent values of x correspond di erent series, possibly with a di erent character. In
P
1 X1
particular, a power series an xn converges (diverges) at x0 2 R if the series an xn0
n=0 n=0
converges (diverges).
We set 00X= 1. In this way, a power series always converges at 0: indeed, from 00 = 1 it
1
follows that an 0n = a0 .
n=0

X1
Proposition 471 If a power series an xn converges at x0 6= 0, then it converges
n=0
(absolutely) at each x 2 R such that jxj < jx0 j. If it diverges at x0 6= 0, then it diverges at
each x 2 R such that jxj > jx0 j.
X1
Proof We rst prove convergence. Since an xn0 converges, we have an xn0 ! 0 (The-
n=0
orem 380). By (8.33), we then have jan xn0 j ! 0, so the sequence fjan xn0 jg is bounded
(Proposition 322). That is, there is M > 0 such that jan xn0 j M for all n 1. Let
jxj < jx0 j. Set q = jxj = jx0 j. Then, for all n 1 we have:
n
n jan xn j n jxn j jxj
jan x j = n jx0 j = jan j jxn0 j n = jan xn0 j M qn
jx0 j jx0 j jx0 j

321
322 CHAPTER 11. POWER SERIES
X1 X1 X1
Being 0 q < 1, the geometric series q n converges. From jan xn j M qn
X1 n=0 X
n=0
1
n=0
it then follows that the series jan xn j converges, so the series an xn is absolutely
n=0 n=0
convergent (Theorem X1 402).
Suppose that an xn0 diverges. Let jxj > jx0 j. Suppose, by contradiction, that
X1 n=0 X1
an xn converges. By the previous part of the proof, an xn0 converges absolutely.
n=0 X1 n=0
On the other hand, since an xn0 diverges, by Theorem 402 we have
n=0
X1 X1
jan j jxn0 j = jan xn0 j = +1
n=0 n=0

Since jxj > jx0 j, we then have


X1 X1
jan xn j = jan j jxn j = +1
n=0 n=0
X1 X1
This contradicts the absolute convergence of an xn0 . We conclude that an xn
n=0 n=0
diverges.

Inspired by thisX result, we say that a positive scalar r 2 [0; 1] is the radius of convergence
1
of a power series an xn if the series converges at each jxj < r and diverges at each
n=0
jxj > r. If it exists, the radius of convergence is a watershed that separates convergent and
divergent behavior of the power series (at jxj = r the character of the series is ambiguous,
it may be regular or not). In particular, if r = +1 the power series converges at all x 2 R,
while if r = 0 it converges only at the origin.
The next important result, a simple yet remarkable consequence of the root criterion,
proves the existence of such radius and gives a formula to compute it.
X1
Theorem 472 (Cauchy-Hadamard) The radius of convergence of a power series an xn
n=0
is 8
>
> 0 if = +1
<
1
r= if 0 < < +1
>
>
:
+1 if = 0
where p
n
= lim sup jan j 2 [0; 1]

Proof Assume 2 (0; 1). We already remarked that the power series converges at x = 0.
So, let x 6= 0. We have
p p jxj
lim sup n jan xn j = jxj lim sup n jan j = jxj =
r
So, by the root criterion the series converges if jxj =r < 1, namely if jxj < r, and it diverges
if jxj =r > 1, namely if jxj > r. We leave the case 2 f0; +1g to the reader.
X1
The interval A formed by the points at which the power series an xn converges is
n=0
called interval of convergence of the power series. By the Cauchy-Hadamard Theorem, we
have
( r; r) A [ r; r]
11.1. POWER SERIES 323

where r 2 [0; 1] is the radius of convergence of the power series. Depending on the character
of the series at x = r, the inclusions may become equalities. For instance, if the power
series converges at both points r, we have A = [ r; r], while if it does not converge at
either point we have A = ( r; r). The next examples illustrate.

Example 473 (i) The power series


1
X xn
(11.2)
n!
n=0
has as coe cients the factorials' reciprocals 1=n!. Its radius of convergence is r = +1.
Indeed,
1
(n+1)! 1
1 = !0
n!
n+1
p
which, thanks to the inequality (10.18), implies = lim sup n 1=n! = 0, namely r = +1.
The power series thus converges at all x 2 R, so the real line is its interval of convergence.
Indeed, in Theorem 399 we saw that its sum is ex for every x 2 R.
(ii) The power series
1
X xn
(11.3)
n
n=1
with coe cients 1=n has radius of convergence r = 1. Indeed,
1
(n+1) n
1 = !1
n
n+1
p
which, thanks to the inequality (10.18), implies = lim sup n 1=n = 1, namely r = 1.
At x = 1, it becomes the harmonic series, so it diverges, while at x = 1 it becomes the
alternating harmonic series, so it converges (Proposition 407). We conclude that the power
series (11.3) has the interval of convergence [ 1; 1).
(iii) The geometric power series
X1
n n
x
n=0
n
with 2 (0;p1], has power coe cients . It has radius of convergence r = 1= . Indeed,
= lim sup n n = . As well-known, it converges at each x 2 ( 1= ; 1= ) with sum
1
X
n n 1
x =
1 x
n=0

Its interval of convergence is ( 1= ; 1= ).


(iv) The power series with factorial coe cients
1
X
n!xn
n=1

has radius of convergence r = 0. This can be checked directly because, if x 6= 0, wephave


n! jxjn ! +1, as well as via Cauchy-Hadamard's Theorem by noting that = lim sup n n! =
+1. N
324 CHAPTER 11. POWER SERIES

Next we consider two important power series with \probabilistic" coe cients that are
positive and add up to 1.

Example 474 (i) The Poisson power series


1
X n
e xn
n!
n=0

n
with > 0, has positive coe cients e =n! that add up to 1:
1
X n 1
X n
e =e =e e =1
n! n!
n=0 n=0

Since r r
n
n n 1
= lim sup e =e lim sup =0
n! n!
its radius of convergence is r = +1.
(ii) The normalized geometric power series
1
X
n n
(1 ) x
n=0

with 2 (0; 1), has positive coe cients (1 ) n


that are easily seen to add up to 1. N

In the following example we consider a power series that plays a key role in many economic
models.

Example 475 Consider the power series


1
X
t 1
ut (xt ) (11.4)
t=1

where the utility functions ut : R ! [0; 1) are positive (cf. Example 392). By Cauchy-
Hadamard's Theorem, this power series has radius of convergence
1
r= p
t 1
lim sup ut (xt )

Assume that all ut are uniformly bounded by a constant M , i.e., the inequality (9.14) of
Example 392 holds. In this case,
1 1
r= p p =1 (11.5)
lim sup t 1
ut (xt ) lim sup t 1 M

where the last equality follows from Proposition 346. So, the series now converges for all
j j < 1, in particular for all 2 (0; 1) which are the values of economically meaningful
when it is interpreted as a subjective discount rate.
11.1. POWER SERIES 325

In the uniformly bounded case, the convergence of the power series can be also easily
checked directly, without invoking Cauchy-Hadamard's Theorem. Indeed, in view of the
properties of the geometric series, for all j j < 1 we have
T
X T
X T
X
t 1 t 1 t 1 T !1 M
0 jut (xt )j M =M !
1
t=1 t=1 t=1

So, the series converges absolutely for all j j < 1. N


Note that an inequality similar to (11.5) holds for any power series with uniformly
bounded coe cients. So, the radius of convergence of such series is 1 and its interval
of convergence includes the interval ( 1; 1).

Next we consider the multiplication of power series.


P P1
Proposition 476 Let 1 n
n=0 an x and
n
n=0 bn x be power series with radii of convergence
ra and rb , respectively. Then,
1 1
! 1 !
X X X
n n n
(a b)n x = an x bn x
n=0 n=0 n=0

for all jxj < min fra ; rb g.


Proof Since1
Xn n
X n
X
ak xk bn+1 kx
n k
= ak bn k n k
kx x = xn ak bn+1 k = (a b)n xn
k=0 k=0 k=1

the result readily follows from the Cauchy-Hadamard and Mertens Theorems.
P
Example 477 The power series 1 n n
n=0 ( 1) x has radius 1 and sum 1= (1 + x) (why?).
Since !
X1 X n X1
( 1)k ( 1)n k xn = ( 1)n (n + 1) xn
n=0 k=0 n=0
by Proposition 476 we have
1
X 1
( 1)n (n + 1) xn =
n=0
(1 + x)2
for all jxj < 1. N
We close by observing that, for simplicity, so far we (tacitly) considered power series
centered at x0 = 0. Of course, everything goes through if x0 is any scalar. In this case, the
power series has the form
X1
an (x x0 )n
n=0
and converges at each jx x0 j < r and diverges at each jx x0 j > r. So, its interval of
convergence A is such that (x0 r; x0 + r) A [x0 r; x0 + r].
1
As previously (foot)noted,
P for series that start at n = 0, the convolution formula (10.30) is easily seen to
become (x y)n = n k=0 xk y n k:
326 CHAPTER 11. POWER SERIES

11.2 Generating functions


11.2.1 De nition and properties
We can revisit the previous notions from a functional angle that will clarify the nature of
power series. Given a sequence fan g of scalars, the function f : A R ! R de ned by
1
X
f (x) = an xn (11.6)
n=0

is called the generating function


Xfor the sequence fan g. Its domain A is the interval of
1
convergence of the power series an xn .
n=0

Example 478 (i) The generating function


1
X xn
f (x) = (11.7)
n!
n=0

of the sequence f1=n!g, so de ned via the power series (11.2), has the entire real line as
its domain. By Theorem 399, it is the exponential f (x) = ex . Relatedly, the generating
function
X1 X1
n
n ( x)n
f (x) = e x =e =e e x=e + x
n! n!
n=0 n=0
n
of the Poisson sequence e =n! has the entire real line as its domain.
(ii) The generating function
X1
xn
f (x) =
n
n=1

of the harmonic sequence f1=ng, so de ned via the power series (11.3), has domain [ 1; 1).
(iii) The \geometric" function
1
X
n n 1
f (x) = x =
1 x
n=0

with 2 (0; 1], generating the power sequence f n g, has domain ( 1= ; 1= ).


(iv) The generating function
1
X
f (x) = n!xn
n=1

for the factorials' sequence has a singleton domain f0g.


(v) The function f : ( 1; 1) ! R given by
1
X
t 1
f( )= ut (xt )
t=1

is the generating function of the sequence fut (xt )g when all ut are positive and uniformly
bounded. N
11.2. GENERATING FUNCTIONS 327

It is always rewarding to nd the analytic expression of a generating function, like the ex-
ponential function for the generating function (11.7). The next example shows that, however
nice they might be, one should not be too enchanted by such expressions.

Example 479 The generating function for the sequence f( 1)n g is


1
X 1
X
n 1
f (x) = ( 1) x =n
( x)n = 8x 2 ( 1; 1)
1+x
n=0 n=0

Note that x = 1 do not belong to the domain of f , which is the open interval ( 1; 1).
The function 1= (1 + x) is de ned for all x 6= 1, but outside the open interval ( 1; 1) is no
longer the generating function of the sequence f( 1)n g. In other words, outside such open
interval the generating function and its analytic expression part ways. N

Next we give an important di erential property of generating functions. We adopt the


convention f (0) (0) = f (0).

Proposition 480 The generating function f for a sequence fan g is in nitely di erentiable
on ( r; r), with
f (n) (0)
an = 8n 0 (11.8)
n!
Proof Let f : ( r; r) ! R be the generating function for the sequence fan g restricted
P
1
to the open interval ( r; r). By de nition, f (x) = an xn for all x 2 ( r; r). By the
n=0
Cauchy-Hadamard Theorem, the derived series
1
X
nan xn 1

n=1

P1
has the same radius of convergence of the original series an xn . To see why, observe that
p n=0
lim n n = 1. Indeed,
p p
n 1
lim n n = lim elog n = lim e n log n = 1
because (log n) =n ! 0. Thus,
p p p p
lim sup n jnan j = lim sup n njan j = lim sup n n n jan j
p p p
= lim n n lim sup n jan j = lim sup n jan j
as desired. With this, a uniform continuity argument (see Rudin, 1976, p. 173) shows that
P
1
the function f 0 : ( r; r) ! R given by f 0 (x) = nan xn 1 is the derivative function of f .
n=0
Thus, f is di erentiable and
f 0 (0) = a1
By applying a similar argument to f 0 , which is the generating function for the sequence
fnan g, one proves that f is twice di erentiable with
1
X
00
f (x) = n (n 1) an xn 2

n=2
328 CHAPTER 11. POWER SERIES

Hence, f 00 (0) = a2 . By iterating, one proves that f is k times di erentiable, for each k 1,
with
1
X
(n)
f (x) = n (n 1) (n k + 1) an xn k
n=k

Thus, f (k) (0) = ak .

We can thus write a generating function as


1
X f (n) (0)
f (x) = xn
n!
n=0

Later in the book, we will learn more about this remarkable di erential representation (Sec-
tion 30.3). Here we observe that it implies that generating functions are uniquely determined.

Corollary 481 There is a one-to-one relation between generating functions and scalar se-
quences.

To know a generating function thus amounts to know its underlying sequence.

Proof Let fa and fb be the generating functions of two sequences a = fan g and b = fbn g.
We want to show that
a = b () fa = fb
The implication =) readily follows from the de nition of generating function. As to the
(n)
converse implication, let fa = fb . By the last result, an = bn = fa (0) =n! for all n 0.
Thus, a = b.

We established the di erentiability, so the continuity, of generating functions on the open


interval ( r; r), i.e., on the interior of the interval of convergence A. When the interval A is
bounded, i.e., r < +1, it is natural to wonder about their continuity at the endpoints of A.
The next classic result, proved by Niels Abel in 1826, addresses this issue.

Theorem 482 (Abel) Let f : A R ! R be the generating function for a sequence fan g,
with A bounded.
P
1
(i) If the series an rn converges (so, r 2 A), then
n=0
1
X
lim f (x) = f (r) = an rn
x!r
n=0

i.e., f is is continuous at r from the left.


P
1
(ii) If the series an ( r)n converges (so, r 2 A), then
n=0
1
X
lim f (x) = f ( r) = an ( r)n
x!r +
n=0

i.e., f is continuous at r from the right.


11.2. GENERATING FUNCTIONS 329

Proof To ease notation, we prove the result when the interval of convergencePis ( 1; 1) and
r = 1 (cf. Rudin, 1964). By hypothesis, ( 1; 1] A. Let s 1 = 0 and sn = nk=0 xk for all
n 0. By the combinatorial formula (11.19), proved later in the chapter, for each n 0 we
have
n
X1 n
X
(1 x) sk xk + xn sn = ak xk
k=0 k=0
1
X
By hypothesis, sn ! f (1) and the series ak xk converges for all x 2 ( 1; 1). So,
k=0
1
X 1
X
(1 x) sk xk = ak xk 8x 2 ( 1; 1)
k=0 k=0

Fix " > 0. Since sn ! f (1), there is n" 1 such that jsn f (1)j < "=2 for all n n" .
X1
Since (1 x) xk = 1 for all x 2 (0; 1), for all such x we then have
k=0

1
X 1
X 1
X
k k
jf (x) f (1)j = (1 x) x sk f (1) = (1 x) x sk (1 x) f (1) xk
k=0 k=0 k=0
X1 nX
" 1 1
X
"
= (1 x) xk (sk f (1)) (1 x) xk jsk f (1)j + (1 x) xk
2
k=0 k=0 k=n"
nX
" 1
"
(1 x) xk jsk f (1)j +
2
k=0

There exists " 2 (0; 1) small enough such that, for all 1 " < x < 1, we have
nX
" 1
"
(1 x) xk jsk f (1)j
2
k=0

Thus,
" "
1 " < x < 1 =) jf (x) f (1)j + ="
2 2
We conclude that limx!1 f (x) = f (1).

Example 483 As in (ii) of the last example, consider the generating function f : [ 1; 1) !
P
1 P
1
R given by f (x) = xn =n. Since ( 1)n =n converges, by Abel's Theorem it holds
n=1 n=1
limx! 1+ f (x) = f ( 1). N

When r = 1, by Abel's Theorem we have


1
X
lim f (x) = an (11.9)
x!1
n=0

P
1
provided the series an converges. Next we illustrate this observation.
n=0
330 CHAPTER 11. POWER SERIES

Example 484 Consider the generating function


1
X xk
f (x) = ( 1)k+1
k
k=1
n o P1
of the sequence ( 1)n+1 =n . For the power series k=1 ( 1)
k+1
xk =k , we have (cf.
Example 473) v
u r
u ( 1)n+1
t
n
n 1
= lim sup = lim sup =1
n n
By the Cauchy-Hadamard's Theorem, its radius is r = 1 and so the power series converges
on ( 1; 1). On the other hand,P as proved in the rst part of the proof of Proposition 407,
the alternating harmonic series 1 n=1 ( 1)
n+1
=n converges, so the domain of the generating
function f is ( 1; 1]. By Abel's Theorem, we then have
1
X 1
X ( 1)n+1
k+1 xk
lim f (x) = lim ( 1) = = f (1)
x!1 x!1 k n
k=1 n=1

By Corollary 1369, we actually have f (x) = log (1 + x). N

A nal example is the occasion to introduce a beautiful formula.

Example 485 Recall from formula (10.13) that we can express binomial coe cients for two
natural numbers 0 k n through falling factorials as

n n(k)
=
k k!
If rather than a natural number n we consider a scalar 2 R, we de ne a generalized falling
factorial by
(k)
= ( 1) ( k + 1)
The standard notion of falling factorial is the special case when is a natural number k.
Note that (k) = 0 for all k + 1 and (k) 0 for all 0 k . These generalized falling
factorials allow us to de ne the generalized binomial coe cients
(k) ( 1) ( k + 1)
=1 ; = =
0 k k! k!
When is a natural number k, we go back to the standard notion of binomial coe cient
for natural numbers. Note that, for all k + 1,

=0 (11.10)
k
Armed with these notions, given any 2 R we look for the generating function f for the
binomial sequence
1
(11.11)
k k=0
11.2. GENERATING FUNCTIONS 331

We rst determine its domain. It holds


( 1) ( k+1)( k)
k+1 (k+1)! k
= = !1
k
( 1) ( k+1) k+1
k!

which, thanks to the inequality (10.18), implies = 1, namely r = 1. So, the domain of f
includes the open interval ( 1; 1). Later in the book (see Example 1409), we will show that
1
X
(1 + x) = xk 8x 2 ( 1; 1) (11.12)
k
k=0

where the power series on the right-hand side is called binomial series. This beautiful
formula allows us to conclude that f (x) = (1 + x) is the sought-after generating function
on ( 1; 1).
Note that, in view of (11.10), for = n formula (11.12) reduces to formula (B.8), i.e., to
n
X n k
n
(1 + x) = x
k
k=0

11.2.2 Solving recurrences via generating functions


Denote by fa the generating function for a sequence a = fan g. As remarked after the last
proposition, fa is uniquely determined by a, so one can go back and forth between a and fa .
We can diagram this univocal relationship as follows:

!
a fa

This observation is important because, remarkably, it turns out that a generating function
fa may be constructed by just using a de nition by recurrence of the sequence a = fan g.
This makes it possible to solve the recurrence if one is able to retrieve (in closed form) the
coe cients of the sequence a = fan g that generates fa . Indeed, such a sequence is unique
and so it has then to be the one de ned by the recurrence at hand.2 We can diagram this
solution scheme as follows:

a recurrence ! fa ! a closed form

The next classic example gives a avor of this scheme.


2
The di erential formula (11.8) is of less operational interest than one might expect to nd the sequence
fan g because taking subsequently higher order derivatives is another kind of recurrence that can be as
demanding as going over the original recurrence itself. That said, it will be momentarily used in proving
Proposition 488.
332 CHAPTER 11. POWER SERIES

Example 486 Consider the classic Fibonacci recursion, started at n = 0,

(
a0 = 0 ; a1 = 1
(11.13)
an = an 1 + an 2 for n 2

that is, f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g. We want to construct its generating function
p f :
n
A R ! R. Since the sequence is positive and increasing, clearly lim sup jan j > 0.
By the Cauchy-Hadamard's Theorem, the domain A contains an open interval ( "; ") with
0 < " < 1. For each scalar x, we have

N
X N
X N
X
an xn = a0 + a1 x + an xn = a0 + a1 x + (an 1 + an 2) x
n

n=0 n=2 n=2


N
X N
X N
X N
X
n n n 1 2 n 2
= x+ an 1x + an 2x =x+x an 1x +x an 2x
n=2 n=2 n=1 n=2
XN N
X
n 1
= x+x an 1x + x2 an 2x
n 2

n=1 n=2

If x 2 ( "; "), by taking limits we then get f (x) = x + xf (x) + x2 f (x), so

x
f (x) = 8x 2 ( "; ")
1 x x2

The solutions of the equation 1 x x2 = 0 are

p
1 5
x=
2

Some simple algebra then shows that

p1 p1
1 5 5
= p p (11.14)
1 x x2 x 1+ 5
+ 2 x + 2 5
1

So, for each x 2 ( "; ") we have:

! 0 p p 1
1 5 1+ 5
x 1 1 x @ 2 2 A
f (x) = p p p =p p p p p
5 1+ 5 1 5 5 1 5 x+ 1 5 1+ 5 1+ 5
x+ 2 x+ 2 2 2 2 x + 2
p p ! p p !
1 5 1+ 5 1+ 5 1 5
x x
= p p 2 p 2 =p 2 p 2 p
5 1 5 1+ 5 5 1 1+ 5 x 1 1 5
2 x 1 2 x 1 2 2 x
11.2. GENERATING FUNCTIONS 333

By the properties of the geometric series, for each x 2 ( "; ") we then have
" p 1 p !n p 1 p !n #
x 1+ 5X 1+ 5 n 1 5X 1 5
f (x) = p x xn
5 2 2 2 2
n=0 n=0
2 3
1 p !n+1 1 p !n+1
x X 1+ 5 X 1 5
= p 4 xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
x 4X 1 + 5 X 1 5
= p xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
1 4X 1 + 5 X 1 5
= p xn+1 xn+1 5
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 X 1 5
= p xn 1 xn + 1
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 n
X 1 5
= p x xn
5 n=0 2 2
n=0
1
" p !n p !n #
1 X 1+ 5 1 5
= p xn
5 2 2
n=0

By equating coe cients, we conclude that f is generated by the sequence with terms
" p !n p !n #
1 1+ 5 1 5
an = p 8n 0 (11.15)
5 2 2

So, this sequence solves the previous Fibonacci recursion. N

We call Fibonacci numbers the terms of the sequence (11.15). There is an elegant char-
acterization of their asymptotic behavior.

Proposition 487 For the Fibonacci numbers an we have


p !n
1 1+ 5
an p
5 2

Proof We have
h p n p ni p n p n
p1 1+ 5 1 5 1+ 5 1 5
an 5 2 2 2 2
p n = p n = p n
p1 1+ 5 p1 1+ 5 1+ 5
5 2 5 2 2
p n
1 5 p !n
2 1 5
= 1 p n =1 p !1
1+ 5 1+ 5
2
334 CHAPTER 11. POWER SERIES
p p
where the last step follows from (8.33) since 0 < 1 5 = 1+ 5 < 1.

In solving the Fibonacci recurrence (11.13) it was key that its generating function (11.14)
is a proper rational function, which can be then studied via its partial fraction expansion.
This suggests that, more generally, one can solve recurrences that have proper rational
generating functions. For simplicity, we focus on the case of distinct real roots.

Proposition 488 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then, f is a generating function for
i=1
the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
where, for all i = 1; :::; k,
p (ri )
bi =
ri q 0 (ri )

We give two proofs of this result: the rst one is direct, while the second one relies on
formula (11.8).

Proof 1 By Proposition 248, the partial fraction expansion of f is


c1 c2 ck
f (x) = + + + (11.16)
x r1 x r2 x rk

where ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k. So, we can write

c1 c2 ck c1 1 c2 1 ck 1
f (x) = + + + =
x r1 x r2 x rk r1 1 rx1 r2 1 rx2 rk 1 rxk
1
X 1
X 1
c1 x n
c2 x n
ck X x n
=
r1 r1 r2 r2 rk rk
n=0 n=0 n=0
1
X n n n
c1 x c2 x ck x
=
r1 r1 r2 r2 rk rk
n=0
X1 1
X
c1 xn c2 xn ck xn c1 1 c2 1 ck 1
= = + + + xn
r1 r1n r2 r2n rk rkn n
r1 r1 r2 r2n rk rkn
n=0 n=0
X1
1 1 1
= b1 n + b2 n + + bk xn
r1 r2 rkn
n=0

where bi = p (ri ) =ri q 0 (ri ) for all i = 1; :::; k. We conclude that f is a generating function
for the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn

as desired.
11.2. GENERATING FUNCTIONS 335

Proof 2 Consider the function g (x) = 1= (x r). It can be proved by induction that its
derivative of order n is
n!
g (n) (x) =
(r x)n+1
In view of (11.16), we then have

n! n! n!
f (n) (x) = c1 c2 ck
(r1 x)n+1 (r2 x)n+1 (rk x)n+1

where ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k. So,

f (n) (0) c1 1 c2 1 ck 1
= n
n! r1 r1 r2 r2n rk rkn
1 1 1
= b1 n + b2 n + + bk n
r1 r2 rk

The result now follows from formula (11.8).

As a dividend of this result, we can solve linear recurrences of order k given by (8.11),
that is,3
(
a0 = 0 ; a1 = 1 ; ::: ; ak 1 = k 1
(11.17)
an = p1 an 1 + p2 an 2 + + pk an k for n k

Some algebra, left to the reader, shows that the Fibonacci formula (11.14) here takes the
general form of a proper rational function given by

+( 2 k 1
0 1 0) x +( 2 1 0) x + +( k 1 k 2 0) x
f (x) =
1 p1 x p2 x2 pk xk
Assume that the polynomial at the denominator has k distinct real roots r1 , r2 , ..., rk . By
the last result, f is then the generating function of the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn

where, for all i = 1; :::; k,


2 k 1
0 +( 1 0 ) ri +( 2 1 0 ) ri + +( k 1 k 2 0 ) ri
bi =
ri p1 2p2 ri kpk rik 1

This sequence thus solves the linear recurrence (11.17). The key equation

1 p1 x p2 x2 pk xk = 0

is a version of the so-called characteristic equation of the recurrence.


3
Relative to (8.11), we use the letters a and p in place of x and a, respectively, because the letter x is in
this section the argument of a power series.
336 CHAPTER 11. POWER SERIES

Example 489 We can solve the Fibonacci recurrence (11.13) through this method.
p It is a
linear recurrence of order 2 where p1 = p2 = 1, a0 = 0, a1 = 1, r1 = 1 + 5 =2, and
p
r2 = 1 5 =2. So,

r1 1 1 1
b1 = = = p =p
r1 ( 1 2r1 ) 1 + 2r1 1+ 1+ 5 5
r2 1 1 1
b2 = = = p = p
r2 ( 1 2r2 ) 1 + 2r2 1+ 1 5 5

and, by the last proposition, the sequence with terms

1 1 1 1
an = p p n p p n
5 1+ 5 5 1 5
2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p n p n p p n p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p p n p p p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p !n p !n
1 1+ 5 1 1 5
= p p
5 2 5 2

solves the Fibonacci recurrence (11.13), in accordance with (11.15). N

Example 490 Consider the linear recurrence of order 3 given by


(
a0 = 1 ; a1 = 2 ; a2 = 2
11
an = 6 an 1 an 2 + 16 an 3 for n 3

where p1 = 2, p2 = 1, p3 = 1, a0 = 1, and a1 = 1. Since the cubic equation

11 1 3
1 x + x2 x =0
6 6
has solutions r1 = 1, r2 = 2, and r3 = 3, we have

1 + ri ri2
bi =
ri 2 + 2ri 3ri2

So, b1 = 1=3, b2 = 1=20, and b3 = 5=69. By the last proposition, the sequence with terms

1 1 1 5 1
an =
3 20 2n 69 3n
solves this linear recurrence of order 3. N
11.3. DISCOUNTED CONVERGENCE 337

11.3 Discounted convergence


11.3.1 Abel convergence
So far we studied two notions of converge for sequences: the basic convergence xn ! L
C
and the weaker Cesaro convergence xn ! L. There is a further, even weaker, important
\discounted" mode of convergence based on power series.

De nition 491 We say that a sequence fxn g converges in the sense of Abel to L 2 R, and
A
we write xn ! L, when4
1
X
n 1
lim (1 ) xn = L
"1
n=1

The factor 1 is a normalization factor since


1
X
n 1
(1 ) =1 (11.18)
n=1

In particular, if xn is a constant sequence, say xn = k for all n 1, then


1
X 1
X
n 1 n 1
lim (1 ) xn = k lim (1 ) =k
"1 "1
n=1 n=1

So, thanks to the normalizing factor the constant sequence indeed converges to k, as one
would expect from any bona de mode of convergence.

Theorem 492 (Frobenius) If a sequence fxn g converges in the sense of Cesaro to L 2 R,


then it converges to L also in the sense of Abel.

Convergence in mean thus implies the discounted one, so we have


1
X n
n 1 1X
lim (1 ) xn = lim xk
"1 n!1 n
n=1 k=1

The proof relies on a couple of lemmas of independent interest. The rst one connects
power and ordinary partial sums.

Pn
Lemma 493 Let s0 = 0 and sn = k=1 xk for all n 1. Then, for each n 1 we have:

n
X n
X
k 1 k 1 n
xk = (1 ) sk + sn (11.19)
k=1 k=1

4
For the meaning of " 1 we refer the reader to Section 8.8.2. For a comprehensive study of Abel (and
Cesaro) convergence, we refer interested readers to Chapter III of Zygmund (2002).
338 CHAPTER 11. POWER SERIES

Proof We have, for each n 1,


n
X n
X n
X n
X
k 1 k 1 k 1 k 1
xk = (sk sk 1) = sk sk 1
k=1 k=1 k=1 k=1
Xn n
X n
X n
X1
k 1 k 2 k 1 k 1
= sk sk 1 = sk sk
k=1 k=2 k=1 k=1
n
X
k 1 n
= (1 ) sk + sn
k=1

as desired.

The second lemma deals with a variation of the geometric series.


Lemma 494 For all 2 ( 1; 1), we have:5
1
X
2 n 1 1
1+2 +3 + = n= (11.20)
n=1
(1 )2

Interestingly, the result holds also when 2 ( 1; 0). In this case we can write = ( 1)
with 2 (0; 1), from (11.20) it then follows that
1
X 1
1 2 +3 2
= ( 1)n 1 n 1
n= (11.21)
n=1
(1 + )2
Pm n 1
Proof Consider the sequence of partial sums sm = n=1 n. We next show that
m
X m
1 n 1 m
sm = 8m 1 (11.22)
1 1
n=1

We proceed by induction on m. For m = 1 the statement is true, indeed we have that


1
X 1 1
X 1
n 1 1 1 1 1 n 1 1
s1 = n=1= = =
1 1 1 1 1
n=1 n=1

Assume now the statement is true for m (induction hypothesis). We need to show it holds
for m + 1. By the induction hypothesis, we have that
m+1
X m
X m
X m
n 1 n 1 m 1 n 1 m m
sm+1 = n= n+ (m + 1) = + (m + 1)
1 1
n=1 n=1 n=1
m
X m m
1 n 1 m
= + (1 ) (m + 1)
1 1 1
n=1
Xm m m
1 n 1 m
= + (1 +m m)
1 1 1
n=1
m+1
X m m+1
X m+1
1 n 1 1 n 1 (m + 1)
= (m + 1) =
1 1 1 1
n=1 n=1
5 0
Recall the convention 0 = 1.
11.3. DISCOUNTED CONVERGENCE 339

By induction, we conclude that (11.22) holds. By passing to the limit, we have that

m
! m
1 X m m
1 X 1
n 1 n 1 m
lim sm = lim = lim lim m
m m 1 1 1 m 1 m
n=1 n=1
1 1 1
= 0=
1 1 (1 )2

proving the statement.

Proof of Theorem 492 We start by proving some ancillary facts. Let 2 (0; 1). Set
Pk
mk = sk =k = i=1 xi =k for all k 1. By hypothesis, mk ! L. Since mk ! L, it follows
that there exists M > 0 suchP that kjm1k LjP1M for all k 1. In view of (11.20), in turn,
this implies that the series 1k=1 sk = k=1
k 1
kmk converges absolutely. For,

1
X 1
X 1
X 1
X
k 1 k 1 k 1 k 1
k jmk j = k jmk L + Lj k jmk Lj + k jLj
k=1 k=1 k=1 k=1
X1 1
X 1
X
k 1 k 1 k 1 M + jLj
kM + jLj k = (M + jLj) k=
k=1 k=1 k=1
(1 )2
P1 k 1 P1 k 1
This proves that k=1 kmk = k=1 sk converges absolutely. In particular, it
converges. Next, we show that if 2 (0; 1), then
1
X 1
X
(1 ) k 1
xk = (1 )2 k 1
sk (11.23)
k=1 k=1

By (11.19), we have that


n
X n
X
(1 ) k 1
xk = (1 )2 k 1
sk + (1 ) n
sn 8n 1 (11.24)
k=1 k=1

At the same time, if 2 (0; 1), then we have that

n n n
j(1 ) sn j = (1 ) n jmn j = (1 ) n jmn L + Lj
n n
(1 ) n jmn Lj + (1 ) n jLj
n n
(1 ) nM + (1 ) n jLj
n n!1
= (1 ) n (M + jLj) ! 0
P
Paired with (11.24), this yields (11.23). De ne f : (0; 1) ! R by f ( ) = (1 ) 1
n=1
n 1
xn
for all 2 (0; 1). By (11.23), f is well-de ned. We are now ready to prove the main state-
C
ment. By hypothesis, xn ! L. So, for each " > 0 there exists n" 1 such that n n"
implies that
sn
L = jmn Lj < "
n
340 CHAPTER 11. POWER SERIES

Fix " > 0 and let n n"=2 . Next, by (11.23) and (11.20), observe that for each 2 (0; 1)
we have:
1
X 1
X
jf ( ) Lj = (1 )2 k 1
sk L = (1 )2 k 1
(sk Lk)
k=1 k=1
X1
sk
= (1 )2 k 1
k L
k
k=1
Xn 1
X
2 k 1 sk 2 k 1 sk
= (1 ) k L + (1 ) k L
k k
k=1 k=n+1
Xn X1
sk sk
(1 )2 k 1
k L + (1 2
) k 1
k L
k k
k=1 k=n+1
n
X 1
X
2 sk sk
(1 ) k 1
k L + (1 )2 k 1
k L
k k
k=1 k=n+1
Xn X1
sk "
(1 )2 k 1
k L + (1 )2 k 1
k
k 2
k=1 k=n+1
n
X
2 k 1 sk "
(1 ) k L +
k 2
k=1

Note that n does not depend on . Indeed, it was chosen before even discussing jf ( ) Lj.
Hence, if we choose close enough to 1, we have that
n
X sk "
(1 )2 k 1
k L <
k 2
k=1

So, for close enough to 1, we have jf ( ) Lj < ". This proves the statement.

Theorems 434 and 492 thus imply the following convergence hierarchy:
C A
xn ! L =) xn ! L =) xn ! L (11.25)
The converses are false. Example 437 showed that a sequence may converge in the sense of
Cesaro but not in the ordinary sense. The following modi cation of that example shows that
a sequence may converge in the sense of Abel but not in that of Cesaro.

Example 495 One can show that for the alternating sequence
xn = ( 1)n+1 n = f1; 2; 3; 4; 5; :::g
we have ( 1
x1 + x2 + + xn 2 if n is even
=
n 1
+ 1
if n is odd
2 2n
as well as
1
X
n 1 1
(1 ) xn =
n=1
(1 + )2
11.3. DISCOUNTED CONVERGENCE 341

Indeed, by (11.21) we have


1
X
n 1 2 2 1
xn = x1 + x2 + x3 + =1 2 +3 + =
n=1
(1 + )2

So, this sequence converges to 0 in the sense of Abel, but it does not converge in the sense
of Cesaro. N

Though weaker, also Abel convergence may fail.

Example 496 Consider the sequence fxn g given by

0; 0 ; 1; 1 ; 0; 0; 0; 0 ; 1; 1; 1; 1; 1; 1; 1; 1; :::
|{z} |{z} | {z } | {z }
2 elements 2 elements 4 elements 8 elements

where every block of 0s and 1s has length equal to the sum of the lengths of the previous
1
X
t 1
blocks. One can show that lim "1 (1 ) xt does not exist, so this sequence does not
t=1
converge in the sense of Abel. N

Tauberian theorems provide su cient conditions under which the converses in (11.25)
actually hold. Landau's Theorem gave a su cient condition under which ordinary and
Cesaro convergence are equivalent. In the same vein, the next classic Tauberian theorem
shows when Abel and Cesaro convergence are equivalent.

Theorem 497 (Hardy-Littlewood) A one-sided bounded sequence fxn g converges in the


sense of Cesaro if and only if it converges in the sense of Abel. In this case,
1
X n
n 1 1X
lim (1 ) xn = lim xk
"1 n!1 n
n=1 k=1

We give a remarkable proof, due to Karamata (1930), that relies on the following lemma.

Lemma 498 (Karamata) Let fxn g be a sequence, bounded below, that converges in the
sense of Abel to L. Then
1
X Z 1
n 1 n 1
lim (1 ) xn f =L f (t) dt (11.26)
"1 0
n=1

for every integrable function f : [0; 1] ! R.

Proof First, observe that proving (11.26) for positive sequences is equivalent to prove it
for sequences that are bounded from below. Clearly, on the one hand, if (11.26) holds for
sequences that are bounded from below, then it holds for positive sequences (the lower bound
is indeed zero). On the other hand, if fxn g is bounded from below but not positive, there
exists m < 0 such that xn m for all n 1. If we set yn = xn m and zn = m for all
n 1, it is easy to check that fyn g and fzn g are positive as well as
342 CHAPTER 11. POWER SERIES

1
X 1
X 1
X
n 1 n 1 n 1
(1 ) yn = (1 ) (xn + zn ) = (1 ) xn m
n=1 n=1 n=1

A A
hence yn ! L m and zn ! m. If we assume that (11.26) holds for positive sequences,
then
1
X Z 1
n 1 n 1
lim (1 ) yn f = (L m) f (t) dt
"1 0
n=1

and
1
X Z 1
n 1 n 1
lim (1 ) zn f = m f (t) dt
"1 0
n=1

that is
1
X Z 1
n 1 n 1
lim (1 ) ( zn ) f =m f (t) dt
"1 0
n=1

It follows that
1
X 1
X
n 1 n 1 n 1 n 1
lim (1 ) xn f = lim (1 ) (yn zn ) f
"1 "1
n=1 n=1
Z 1
= L f (t) dt
0

In view of all this, in the rest of the proof we assume without loss of generality that the
sequence fxn g is positive. In this case, L 0. We rst prove the result when f is a
X1
n 1
polynomial. Since (1 ) xn converges for all 2 (0; 1) and fxn g is positive, for
n=1
each k 1 and 2 (0; 1), we have that the following series converges and

1
X 1
X
n 1 n 1 k (n 1)(k+1)
(1 ) xn = (1 ) xn
n=1 n=1
1
X
1 k+1 k+1
n 1
= k+1
1 xn
1 n=1

By de l'Hospital's rule, note that

1 1
lim k+1
=
!1 1 k+1

This implies that


1
X Z 1
n 1 n 1 k 1
lim (1 ) xn = L=L tk dt (11.27)
"1 k+1 0
n=1
11.3. DISCOUNTED CONVERGENCE 343

If f (x) = a0 + a1 x + + ak xk , we then have


1
X 1
X
n 1 n 1 n 1 n 1 n 1 k
(1 ) xn f = (1 ) xn a0 + a1 + + ak
n=1 n=1
1
X 1
X
n 1 n 1 n 1
= a0 (1 ) xn + a1 (1 ) xn +
n=1 n=1
1
X
n 1 n 1 k
+ ak (1 ) xn
n=1

By (11.27) and by passing to the limit, this implies


1
X Z 1 Z 1 Z 1
n 1 n 1
lim (1 ) xn f = L a0 + a1 t dt + + ak tk dt =L f (t) dt
"1 0 0 0
n=1

as desired.
Now, let f : [0; 1] ! R be an integrable function. By Proposition 1870, for each " > 0
there exist two polynomials p" ; P" : [0; 1] ! R such that p" f P" and
Z 1 Z 1 Z 1 Z 1 Z 1
P" (x) dx " p" (x) dx f (x) dx P" (x) dx p" (x) dx + " (11.28)
0 0 0 0 0

Hence, since L 0, we have that


Z 1 Z 1 1
X
n 1 n 1
L P" (x) dx L" L p" (x) dx = lim (1 ) xn p"
0 0 "1
n=1
1
X
n 1 n 1
lim inf (1 ) xn f
"1
n=1
1
X
n 1 n 1
lim sup (1 ) xn f
"1 n=1
1
X Z 1
n 1 n 1
lim (1 ) xn P" =L P" (x) dx
"1 0
n=1

1
X
n 1 n 1
Since " > 0 was arbitrarily chosen, we conclude that lim "1 (1 ) xn f ex-
n=1
R1
ists. Denote such limit by l. We have that L 0 P" (x) dx l L" for all " > 0. By (11.28),
we have for each " > 0 that
Z 1 Z 1
L P" (x) dx L f (x) dx L"
0 0

R1
yielding that l L 0 f (x) dx 2L". Again, since " > 0 was arbitrarily chosen we conclude
R1
that l = L 0 f (x) dx, as desired.
344 CHAPTER 11. POWER SERIES

Proof of Theorem 497 The \only if" follows from Theorem 492. As to the converse,
A
suppose that fxn g is a sequence, bounded below, such that xn ! L (a dual argument holds
C
if the sequence is bounded above). We want to show xn ! L.
To this end, de ne f : [0; 1] ! R by
(
0 if t 2 0; e 1
f (t) = 1
t if t 2 e 1 ; 1

The function f is integrable, with


Z 1 Z 1
1
f (t) dt = dt = 1
0 1 t
e

1
If =e k , we have
1
X 1
X 1
X
1 n 1 1 n 1 n 1 n 1
n 1 n 1
xn f = e k xn f e k = e k xn f e k

n=1 n=1 n=1


k+1
X k+1
X
n 1 1
= e k
n 1 xn = xn
n=1 e k
n=1

because
8 n 1 8 n 1
< 0 if e 2 0; e 1 < 0 1
n 1
k
if e k <e
f e k = n 1 =
: 1
n 1 if e k 2 e 1; 1 : e nk 1 if e 1 e
n 1
k 1
e k
(
0 if n > k + 1
= n 1
e k if 1 n k+1

By (11.26), we then have


k+1
X
xn
1 n=1
lim 1 e k (k + 1) =L
k!1 k+1
1
By (8.53), we have limk!1 (1 e k ) (k + 1) = 1,6 yielding that
k+1
X
xn
n=1
lim =L
k!1 k+1
C
We conclude that xn ! L, as desired.
6
Note that
1 1
1 (e k 1) (k + 1) (e k 1) k+1
lim (1 e k ) (k + 1) = lim 1 = lim 1 lim =1
k!1 k!1
k
k k!1
k
k!1 k
11.3. DISCOUNTED CONVERGENCE 345

If we consider the sequence of partial sums, we can de ne the Abel sum of a series: if
A P A P1
sn ! S we write 1 n=1 xn = S and say that S is the Abel sum of the series n=1 xn . In
view of Frobenius' Theorem, Cesaro convergence implies Abel convergence. The converse is
false, as Example 500 will show. Thus, more divergent series become convergent according
to Abel than Cesaro.

Proposition 499 If 2 ( 1; 1), then


1
X 1
X
k 1 k 1
(1 ) sk = xk (11.29)
k=1 k=1

provided either series converges.

This interesting formula helps to best connect Abel and ordinary sums. Indeed, it shows
that
X1 X1
A k 1
xn = lim xk
n=1 !1
k=1

By Abel's Theorem, in particular


P by equality (11.9), the Abel sum thus reduces to the
ordinary sum when the series 1n=1 xn converges.

Proof 1 Before starting, by (11.19), recall that


n
X n
X
k 1 k 1 n
xk = (1 ) sk + sn 8n 1 (11.30)
k=1 k=1

1
X 1
X
k 1 k 1 k 1
If (1 ) sk converges, then so does sk and, in particular, sk ! 0 as
k=1 k=1
k k 1
k ! 1. It follows that sk = sk ! 0. By passing to the limit in n, (11.29) follows.
X1 X1
k 1 k 1
As for the case when xk converges, we only prove it in the case xk
k=1 k=1
converges absolutely. First, we assume that 2 [0; 1) and xk 0 for all k
1. This implies
X1
k 1 k 1
that sk 0 for all k 1 and xk 0 for all k 1. In this case, either sk
k=1
converges or diverges. In the rst case, from the previous part of the proof, it follows that
(11.29) holds. In the second case, the right hand side of (11.30) becomes arbitrarily large,
X1
k 1
a contradiction with xk converging. If fxk g is not positive and 62 [0; 1), then the
k=1
1
X
k 1
sequence jxk j 0 for all k 1 and j j 2 (0; 1). Since xk converges absolutely, we
k=1
P1 k 1
have that k=1 j j jxk j converges. From the previous part of the proof, we have that
1 k
!
X k 1
X
j j jxi j
k=1 i=1
346 CHAPTER 11. POWER SERIES

Pk k 1 Pk
converges. By Lemma 406 and since sk j jk 1 k 1
i=1 jxi j as well as j j i=1 jxi j +
k 1 P1 k 1
sk 0 for all k 1, it follows that k=1 sk converges. From the initial part of the
proof, we can conclude again that (11.29) holds.

Next we give a second proof of formula (11.29) based on the Cauchy product of power
series. For a change, we prove this formula starting from the 0 position, i.e.,
1
X 1
X
k k
(1 ) sk = xk
k=0 k=0

Note that an iteration of this formula gives, for each j j < 1,


1
X 1
X 1
X
k k 2 k
xk = (1 ) sk = (1 ) k
k=0 k=0 k=0

where k = s0 + + sk .

Proof 2PTo ease matters, let us assume that the terms xn are positive. Suppose rst that
(1 ) 1k=0 sk
k
converges. It holds
1
X 1
X
k k
(1 ) sk = (1 ) (x0 + + xk )
k=0 k=0
" 1 1
#
X X
k k
= (1 ) x0 + + xi +
k=0 k=i
" 1 i 1
! #
X 1 X
k k
= (1 ) x0 + + xi +
1
k=0 k=0
i
x0 1 1
= (1 ) + + xi +
1 1 1
i
x0 1 1
= (1 ) + + xi +
1 1 1
i
= x0 + + xi +
P1 k
as desired. Now, assume that the series k=0 xk converges. Set x = fxk g and 1 =
(1; :::; 1). Their convolution has term (1 x)k = x0 + + xk = sk . By Proposition 476, we
then have, for each j j < 1,
1
X 1
X 1
X 1
X 1
X
1 k k k k k
xk = xk = (1 x)k = sk
1
k=0 k=0 k=0 k=0 k=0

as desired.

Next we compute an Abel sum.


11.3. DISCOUNTED CONVERGENCE 347

Example 500 The exploding Grandi-type series


X1
1 2+3 4+ = ( 1)n 1
n
n=1

does not converge in the sense of Cesaro (why?). Yet, it converges in sense of Abel. Indeed,
by (11.21) we have
1
X
n 1 2 1
xn = 1 2 +3 = 8 2 (0; 1)
n=1
(1 + )2

In view of (11.29), we conclude that this series has Abel sum 1=4. N

11.3.2 In nite patience


In Example 392 we introduced, for every 2 (0; 1), the intertemporal utility function U :
R1 ! R given by
1
X
t 1
U (x) = ut (xt ) (11.31)
t=1

where all instantaneous utility functions ut : R ! R are uniformly bounded by a constant


M , as in condition (9.15). Such a utility function ranks all possible consumption streams
x = (x1 ; :::; xt ; :::) 2 R1 . In particular, the higher the subjective discount factor the more
the decision maker cares about future periods, that is, he is more patient.7
1
X
t 1
The power series ut (xt ) converges for all 2 (0; 1), as we learned in Example
t=1
475. One may wonder what happens in the limit case " 1, that is, when the subjective
discount factor tends to 1.8 Intuitively, we are in an \in nite patience" setting, where all
periods { present and future { matter the same for the decision maker. When the horizon T
is nite, the answer is simple:
T
X T
X
t 1
lim ut (xt ) = ut (xt ) (11.32)
"1
t=1 t=1

so that the limit case corresponds to the sum of the utilities of all periods, all with equal
unitary weight. When the horizon is in nite the problem becomes far more complex because,
1
X
for the series ut (xt ) to converge { so that, by Abel's Theorem, formula (11.32) continues
t=1
to hold for T = +1 { it must be that limt!1 ut (xt ) = 0 (Theorem 380), which is hardly
justi able from an economic standpoint.
Let us consider, instead, the Abel limit
1
X
t 1
lim (1 ) ut (xt )
"1
t=1
7
In an alternative interpretation, we can regard U has the objective function of a planner that has to
allocate consumption across di erent generations (each t then identi es a generation).
8
In view of Example 478-(v), we are studying the limit lim !1 f ( ) of the generating function of fut (xt )g.
348 CHAPTER 11. POWER SERIES

By Hardy-Littlewood's Theorem, the bounded sequence fut (xt )g converges in the sense of
Abel if and only if it converges in the sense of Cesaro,9 with
1
X T
t 1 1X
lim (1 ) ut (xt ) = lim ut (xt )
"1 T !1 T
t=1 t=1

All this suggests to de ne the function V : R1 ! R by

V (x) = (1 ) U (x) 8x 2 R1

For every 2 (0; 1) the function V is equivalent to U :

V (x) V (y) () U (x) U (y) 8x; y 2 R1

In light of (11.18), V is a normalization of U which assigns value 1 to the constant sequence


ut (xt ) = 1 for every t.
Thanks to Hardy-Littlewood's Theorem, we have
T
1X
lim V (x) = lim ut (xt )
"1 T !1 T
t=1

as long as the limits exist. The in nite patience case is thus captured by the limit of the
average utilities
T
1X
lim ut (xt ) (11.33)
T !1 T
t=1

that is, by the Cesaro limit of the sequence fut (xt )g. Such a criterion can be thus seen as a
limit case for " 1 of the intertemporal utility function V .
X T
The role that the sum ut (xt ) plays in case (11.32) with nite horizon is thus played in
t=1
the in nite horizon case by the limit of the average utilities (11.33). This important economic
application of Hardy-Littlewood's Theorem allows us to elegantly conclude this section.

11.4 Recursive patience


Let us continue to study of the discounted intertemporal utility function U : R1 ! R over
consumption streams (11.31), that is
1
X
t
U (x) = ut (xt ) (11.34)
t=0

where 2 (0; 1) and all instantaneous utility functions ut : R ! R are uniformly bounded.10
This is a criterion that a consumer may use at time 0 (say, today) to rank consumption
9
For this \if and only if" It is actually enough that the sequence fut (xt )g be positive (bounded or not).
10
Unlike the previous section, to ease notation here we start from t = 0 rather than t = 1 (this choice is
just a matter of convenience), and we omit the subscript and just write U (x).
11.4. RECURSIVE PATIENCE 349

streams to select the optimal ones. The consumer, however, may face a similar decision
problem at each point of time, thus at t = 1 (tomorrow), at t = 2 , and so on. At each point
of time t he then needs to rank consumption streams to select the ones that are optimal at
t, possibly revising his earlier choices { indeed, only current consumption xt is here assumed
to be irreversibly chosen.
Be that as it may, at each point of time t the consumer features an intertemporal utility
function Ut : R1 ! R over consumption streams. So, a collection fUt gt 0 of intertemporal
utility functions now governs his decisions at di erent points of time. In view of (11.34), a
natural form for these functions is the discounted one
1
X
t
Ut (x) = u (x ) (11.35)
=t

Note that this form presupposes the irrelevance of past consumption, only the current
and future consumption levels (xt ; xt+1 ; :::) matter. It is a non-trivial economic assumption:
habit formation, for instance, may be a channel through which past consumption may matter
(e.g., earlier high consumption levels may \spoil" the consumer, who then adjust less easily
to lower future levels).
The next important result, essentially due to Koopmans (1960, 1972), characterizes the
collections fUt gt 0 that have such a discounted representation. In the statement we consider
sets Zt , one per t 0, and their Cartesian product
1
Y
Z= Zt
t=0

An element z 2 Z is a sequence of the form

z = (z0 ; z1 ; :::; zt ; :::)

with z0 2 Z0 , z1 2 Z1 and so on. For instance, when each Zt is a subset of Rn , elements z


of Z are sequences of vectors. Note that the result abstracts from any application; indeed,
it holds for any 2 ( 1; 1) though only positive values have economic meaning.

Theorem 501 (Koopmans) Let 2 ( 1; 1) and f't gt 0 a collection of uniformly bounded


real-valued functions 't : Zt ! R. For a collection fft gt 0 of real-valued functions ft : Z !
R, the following conditions are equivalent:

(i) for each z = fzt g 2 Z,


1
X
t
ft (z) = 't (z ) 8t 0
=t

(ii) for each z = fzt g 2 Z,


sup jft (z)j < +1
t 0

and
ft (z) = 't (z ) + ft+1 (z) 8t 0
350 CHAPTER 11. POWER SERIES

Proof
P1 (i) implies (ii). Fix z 2 Z. By hypothesis, for each t 0 we have ft (z) =
t
=t ' (z ). By the uniform boundedness hypothesis, there is a constant M > 0
such that supzt 2Zt j't (zt )j M for all t 0. So, for each t 0 we have
1
X t+k
X t+k
X
t t t
jft (z)j = ' (z ) = lim ' (z ) lim j j j' (z )j
k!+1 k!+1
=t =t =t
t+k
X 1
X
t t M
M lim j j =M j j
k!+1
=t =t
1 j j

In turn, this implies that supt 0 jft (z)j < +1. At each t 0, we have
1
X 1
X 1
X
t t t 1
ft (z) = ' (z ) = 't (zt ) + ' (z ) = 't (zt ) + ' (z )
=t =t+1 =t+1
1
X
(t+1)
= 't (zt ) + ' (z ) = 't (zt ) + ft+1 (z)
=t+1

as desired.
(ii) implies (i). Fix z 2 Z. By hypothesis, we have ft (z) = 't (zt ) + ft+1 (z). At each
t 0, by iterating the recursion we have

ft (z) = 't (zt ) + ft+1 (z) = 't (zt ) + 't+1 (zt+1 ) + ft+2 (z)
t+k
X
2 t+k+1
= 't (zt ) + 't+1 (zt+1 ) + ft+2 (z) = = ' (z ) + ft+k+1 (z)
=t

that is
t+k
X
t+k+1
ft (z) = ' (z ) + ft+k+1 (z) (11.36)
=t
Set M = supt 0 jft (z)j < +1, so that jft (z)j M for all t 0. Since 2 ( 1; 1), we then
have
t+k+1
ft+k+1 (z) j jt+k+1 M ! 0
t+k+1
We conclude that limk!+1 ft+k+1 (z) = 0. By (11.36), we then have
t+k
!
X
t+k+1
ft (z) = lim zt + ft+k+1 (z)
k!+1
=t
t+k
X 1
X
t+k+1 t
= lim ' (z ) + lim ft+k+1 (z) = ' (z )
k!+1 k!+1
=t =t
P
because, thanks to the uniform boundedness hypothesis, the series 1=t t
' (z ) con-
verges absolutely (cf. the end of Example 475). This completes the proof of the theorem.

Let x = fxt g 2 R1 be a consumption stream, evaluated in utility terms as fut (xt )g


via uniformly bounded utility functions ut . In this case, by Koopmans' Theorem { setting
11.4. RECURSIVE PATIENCE 351

Z = R1 , 't = ut and ft = Ut { a collection of intertemporal utility functions fUt gt 0


over consumption streams, with Ut : R1 ! R at the point of time t, admits a discounted
representation (11.35) if and only if they are uniformly bounded in time at each stream {
i.e., supt 0 jUt (x)j < +1 { and satisfy the forward recursion

Ut (x) = ut (xt ) + Ut+1 (x)

A forward recursion thus underlies the classic discounted form of intertemporal utility func-
tions. Note that for its normalized version
1
X
t
Vt (x) = (1 ) u (x )
=t

the forward recursion becomes

Vt (x) = (1 ) ut (xt ) + Vt+1 (x) (11.37)

Indeed, by setting ft = Vt = (1 ), we have


1 1
Vt (x) = ft (z) = zt + ft+1 (z) = ut (xt ) + Vt+1 (x)
1 1

which implies (11.37).

From a mathematical standpoint, Koopmans' Theorem further illustrates the deep con-
nections between series and recursions, as the backward recursion (9.3) rst showed. In this
regard the following corollary can be useful. Its nal claim follows from Abel's Theorem.
Here l1 denotes the subset of the space of scalar sequences R1 consisting of all bounded
sequences.

Corollary 502 Let 2 ( 1; 1) and x = fxt g 2 l1 . For a sequence y = fyt g 2 R1 , the


following conditions are equivalent:

(i) yt = xt + yt+1 for all t 0;


P
(ii) y 2 l1 and yt = 1=t t
x for all t 0.
P1 P1 t P1
Moreover, if the series =t x exists at t 0, then lim !1 =t x = =t x .
P
For t = 0, in point (ii) we have a power series y0 = 1t=0
t
xt , with coe cients fxt g, that
converges at all 2 ( 1; 1). The forward recursion in (i) thus characterizes sums of power
series and, at the limit, general series. Instead, the backward recursion (9.3) characterizes
partial sums of a general series.
352 CHAPTER 11. POWER SERIES
Part III

Continuity

353
Chapter 12

Limits of functions

12.1 Introductory examples


The concept of limit has been introduced to formalize the concept of \how a function behaves
when the independent variable approaches (tends to) a point x0 ". To x ideas, we start with
some introductory examples in which we consider scalar functions, and then move to a
rigorous formalization in the scalar case as well as in the general multivariable case.

Consider the function f : R f0g ! R de ned by

sin x
f (x) =
x
and analyze its behavior for points closer and closer to x0 = 0, i.e., to the origin. In the next
table we nd the values that the function assumes at several such points:

x 0:1 0:01 0:001 0:001 0:01 0:1


f (x) 0:998 0:99998 0:9999999 0:9999999 0:99998 0:998

By inserting other points, closer and closer to the origin, we can verify that the corresponding
values of f (x) get closer and closer to L = 1. In this case we say that \the limit of f (x), as
x tends to x0 = 0, is L = 1". In symbols,

lim f (x) = 1
x!0

Note that in this example the point x0 = 0 where we take the limit does not belong to the
domain of the function f .

Let f : R ! R be the function de ned by


(
x for x 1
f (x) =
1 for x > 1
Its graph is:

355
356 CHAPTER 12. LIMITS OF FUNCTIONS

How does f behave when it approaches the point x0 = 1? By taking points closer and closer
to x0 = 1 we have:

x 0:98 0:99 0:999 0:9999 1:0001 1:001 1:01 1:02


f (x) 0:98 0:99 0:999 0:9999 1 1 1 1

Adding other points, closer and closer to x0 = 1, we can verify that, as x gets closer and
closer to x0 = 1, f (x) gets closer and closer to L = 1. In this case we say that \the limit of
f (x) as x tends to x0 = 1 is L = 1", and write

lim f (x) = 1
x!1

Observe that the value that the function assumes at the point x0 = 1 is f (1) = 1, so the
limit L = 1 is equal to the value f (1) of the function at x0 = 1.

Let f : R ! R be the function de ned by

8
>
> x if x < 1
<
f (x) = 2 if x = 1
>
>
:
1 if x > 1

Compared to the previous example we have introduced a \jump" at the point x = 1, so that
the function jumps to the value 2 { we have indeed f (1) = 2. The graph now is:
12.1. INTRODUCTORY EXAMPLES 357

If we study the behavior of f for values of x closer and closer to x0 = 1, we build the same
table as before (because the function, except at the point 1, is identical to the one in the
previous example). Therefore, also in this case we have

lim f (x) = 1
x!1

This time the value that the function assumes at the point 1 is f (1) = 2, di erent from the
value L = 1 of the limit.

Until now we have approached the point x0 from both the right and the left, that is,
bilaterally (in a two-sided manner). Sometimes this is not possible; rather, one can approach
x0 from either the right or the left, that is, unilaterally (in one-sided manner). Consider, for
example, the function f : R f2g ! R given by f (x) = 1= (x 2) and let x0 = 2. Its graph
is:
358 CHAPTER 12. LIMITS OF FUNCTIONS

\To approach the point x0 = 2 from the right" means to approach it by considering only
values x > 2:
x 2:0001 2:001 2:01 2:05 2:1 2:2 2:5
f (x) 10; 000 1; 000 100 20 10 5 2

For values closer and closer to 2 from the right, the function assumes values that are larger
and larger as well as unbounded above. In this case we say that \the function f tends to
+1 as x tends to 2 from the right" and write

lim f (x) = +1
x!2+

Let us see now what happens by approaching x0 = 2 from the left, that is, by considering
values x < 2:

x 1:5 1:8 1:9 1:95 1:99 1:999 1:9999


f (x) 2 5 10 20 100 1; 000 10; 000

For values closer and closer to 2 from the left, the function assumes larger and larger (in
absolute value) negative values. In this case we say that \the function f tends to 1 as x
tends to 2 from the left" and write

lim f (x) = 1
x!2

Summing up, as also the graph suggests, we have

+1 = lim f (x) 6= lim f (x) = 1 (12.1)


x!2+ x!2

The \right-hand" and the \left-hand" limits both exist but are (dramatically) di erent.
As we will see in Proposition 521, the fact that the one-sided limits are distinct re ects
the fact that the two-sided limit of f (x), as x tends to 2, does not exist. Indeed, the equality
of the one-sided limits is equivalent to the existence of the two-sided limit. For example, if
we modify the previous function by considering f (x) = 1= jx 2j, we have

lim f (x) = lim f (x) = lim f (x) = +1 (12.2)


x!2 x!2+ x!2

Now the two one-sided limits are equal and coincide with the two-sided one, which in this
case exists (even if in nite).

Considering again the function f (x) = 1= (x 2), what does it happen if, as x0 , we take
+1? In other terms, what does it happen if we consider increasingly larger values of x?
Look at the following table:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0102 0:001002 0:0001 0:00001 0:000001

For increasingly larger values of x, the function assumes values closer and closer to 0. In this
12.1. INTRODUCTORY EXAMPLES 359

case we say that \the function tends to 0 as x tends to +1" and write
lim f (x) = 0
x!+1

Observe that the function assumes values close to 0, but always strictly positive: f ap-
proaches 0 \from above". If we want to emphasize this aspect we write
lim f (x) = 0+
x!+1

where 0+ suggests that, while converging to 0, the values of f (x) remain positive.
What does it happen if, instead, as x0 we take 1? We have the following table of
values:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0098 0:000998 0:0001 0:00001 0:000001

For negative and increasingly larger (in absolute value) values of x, the function assumes
values closer and closer to 0. We say that \the function tends to 0 as x tends to 1" and
write
lim f (x) = 0
x! 1
If we want to emphasize that the function, in approaching 0, remains negative, we write
lim f (x) = 0
x! 1

Finally, after having seen various types of limits, let us consider a function that has no
limit, i.e., that it does not exhibit any \de nite trend". Let f : R f0g ! R be given by
1
f (x) = sin
x
At the origin, i.e., at x0 = 0, the function does not have a limit: for x closer and closer to
the origin, the function continues to oscillate with a tighter and tighter sinusoidal trend:

1 y
0.8

0.6

0.4

0.2

0
x
-0.2

-0.4

-0.6

-0.8

-1

-0.4 -0.2 0 0.2 0.4 0.6


360 CHAPTER 12. LIMITS OF FUNCTIONS

The origin is, however, the only point where this function does not have a limit: at all other
points of the domain the limit exists. A much more dramatic behavior is displayed by the
Dirichlet function f : R ! R de ned by
(
1 if x 2 Q
f (x) = (12.3)
0 else
This remarkable function, proposed by Dirichlet in 1829, oscillates \obsessively" between the
values 0 and 1 because, by the density of the rational numbers in the real numbers, for any
pair x < y of real numbers there exists a rational number q such that x < q < y. As we will
see, the Dirichlet function does not have a limit at any point x0 2 R.

12.2 Functions of a single variable


12.2.1 Two-sided limits
In the introductory examples emerge four possible cases in which the limit exists, depending
on the niteness or not of the point x0 and of the value L of the limit. Speci cally:

(i) limx!x0 f (x) = L 2 R, i.e., both the point x0 and the limit L are nite (scalars);
(ii) limx!x0 f (x) = 1, i.e., the point x0 is nite but the limit L is in nite;
(iii) limx!+1 f (x) = L 2 R or limx! 1f (x) = L 2 R, i.e., the point x0 is in nite but the
limit L is nite;
(iv) limx!+1 f (x) = 1 or limx! 1f (x) = 1, i.e., both the point x0 and the limit L
are in nite.

We formalize the notion of limit in these cases. We begin with case (i). First of all, let
us observe that we can meaningfully talk of the limit at x0 2 R of a function with domain
A only when x0 is a limit point of A. Indeed, in this case the sentence \as x 2 A tends to
x0 " is meaningful.

De nition 503 Given a function f : A R ! R and a limit point x0 of A, we write


lim f (x) = L 2 R
x!x0

if, for every " > 0, there exists a " > 0 such that, for every x 2 A,
0 < jx x0 j < " =) jf (x) Lj < " (12.4)
The value L is called the limit of the function at x0 .

Note that (12.4) can be written as


0 < d (x; x0 ) < " =) d (f (x) ; L) < " (12.5)
The de nition requires that, for any xed quantity " > 0, arbitrarily small, there exists
a value " such that all the points x 2 A that are " close to the point x0 have images f (x)
that are " close to the value L of the limit. Note that the condition d (x; x0 ) > 0 amounts
to require x 6= x0 .
12.2. FUNCTIONS OF A SINGLE VARIABLE 361

Example 504 Let us show that limx!2 (3x 5) = 1. We have to verify that, for every
" > 0, there exists " > 0 such that

jx 2j < " =) j(3x 5) 1j < " (12.6)

We have j(3x 5) 1j < " if and only if jx 2j < "=3. Therefore, setting " = "=3 yields
(12.6). N

Intuitively, the smaller (so the more demanding) the value of " is, the smaller " is. To
make more precise this intuition, note that the relationship between " and " is similar,
mutatis mutandis, to that between " and n" in the de nition of converge of sequences dis-
cussed at length after De nition 302. Now a function f has limit L at x0 when it passes
the following, still highly demanding, test: given any threshold " > 0 selected by a relentless
examiner, there has to be a small enough " so that all points that are close to x0 have
images that are " close to L.
Note that " depends on " and is not unique: when we nd a value of " , all smaller
values also work ne. For instance, in the last example we can choose as " any (positive)
value lower than "=3. But, one typically focuses on the largest such " (if exists), which is a
genuine threshold value. It is in terms of such \threshold" " that we can, indeed, say: the
smaller (so the more demanding) the value of " is, the smaller " is.

N.B. The value of " , besides depending on ", clearly depends also on x0 . This dependence
is, however, so obvious that it can safely omitted in the notation. O

O.R. It is hard to overestimate the importance of the previous \test" in making rigorous
limit notions in mathematics. Its origin traces back to Eudoxus' method of exhaustion that
underlies integration theory (Chapter 44). Perhaps, the best classic description of a form of
such test is Proposition 1 in Euclid's Book X: \Two unequal magnitudes being set out, if
from the greater there is subtracted a magnitude greater than its half, and from that which
is left a magnitude greater than its half, and if this process is repeated continually, then there
will be left some magnitude less than the lesser magnitude set out" (trans. Heath { we put
in italics the words where the test emerges). Yet, it was only in XIX century that, through
the works of Cauchy and Weierstrass, the test took the form that we presented in De nitions
302 and 503. H

We provide now an example in which the limit does not exist.

Example 505 For the Dirichlet function (12.3), limx!x0 f (x) does not exist for any x0 2 R.
Indeed, given x0 2 R, let us suppose, by contradiction, that limx!x0 f (x) exists and is equal
to L 2 R. Let 0 < " < 1=2. By de nition, there exists = " such that1

1
x0 6= x 2 (x0 ; x0 + ) =) jf (x) Lj < " <
2
1
The expression \x0 6= x 2 (x0 ; x0 + )" means \x 2 (x0 ; x0 + ) and x 6= x0 ". In words, x
belongs to the interval (x0 ; x0 + ) but is distinct from x0 . To ease notation, similar expressions are used
throughout the chapter.
362 CHAPTER 12. LIMITS OF FUNCTIONS

In each neighborhood (x0 ; x0 + ) there exist both rational points and irrational points
distinct from x0 (see Proposition 42), so points x0 ; x00 2 (x0 ; x0 + ) for which f (x0 ) = 1
00
and f (x ) = 0. We thus reach the contradiction
1 1
1 = j1 0j = f x0 f x00 f x0 L + L f x00 < + =1
2 2
Therefore, limx!x0 f (x) does not exist for any point x0 2 R. N

Though often we consider functions with interval domains, the next example shows that
the scope of the notion of limits goes well beyond this class, however important it might be.

Example 506 Consider the identity function f : Q ! R on the rationals given by f (x) = x
for all x 2 Q. The set of limit points of Q is R (Example 141). For each x0 2 R it holds

lim f (x) = x0
x!x0

Indeed, set " = " for each " > 0. We have, for each x 2 Q,

0 < jx x0 j < " =) jf (x) x0 j < "

as (12.4) requires. N

De nition 503, in which the distances are made explicit, is of the \"- " type. In view
of (12.5), it is immediate to rewrite it in the language of neighborhoods. To make notation
more expressive, we denote by U (x0 ) a neighborhood of x0 of radius and by V" (L) a
neighborhood of L of radius ". Graphically, the former is a neighborhood in the horizontal
axis, while the latter is a neighborhood in the vertical axis.

De nition 507 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (12.7)

As for convergence of sequences, the rewriting in the language of neighborhoods is very


evocative.2 In particular, via the topology of the extended real line (Section 8.8.4), we can
immediately generalize the de nition so to include also the cases (ii), (iii) and (iv), in analogy
with what we did with De nition 317 for limits of sequences.

De nition 508 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (12.8)


2
In a nutshell, we can say that \there exists a neighborhood" takes the place of the adverb \eventually"
used for sequences.
12.2. FUNCTIONS OF A SINGLE VARIABLE 363

The di erence between De nitions 507 and 508 is apparently minor: in the former de -
nition we have R, in the latter R. The simple modi cation allows, however, to consider also
the cases (ii), (iii) and (iv). In particular:

case (ii) is obtained by setting x0 2 R and L = 1;

case (iii) is obtained by setting x0 = 1 and L 2 R;

case (iv) is obtained by setting x0 = 1 and L = 1.

To exemplify we consider explicitly a few subcases, leaving to the reader the other ones.
We start with the subcase x0 2 R and L = +1 of (i). In this case De nition 508 reduces to
the following \"- " form (that is, with distances made explicit).

De nition 509 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = +1
x!x0

if, for every M > 0, there exists M > 0 such that, for every x 2 A, we have

0 < jx x0 j < M =) f (x) > M (12.9)

In other words, for each constant M , no matter how large, there exists M > 0 such that
all the points x0 6= x 2 A that are M close to x0 have images f (x) larger than M .

Example 510 Let f : R f2g ! R be given by f (x) = 1= jx 2j. Graphically:

The point x0 = 2 is a limit point for R f2g, so we can consider limx!2 f (x). Let M > 0.
Setting M = 1=M , we have
1 1
0 < jx x0 j < M () 0 < jx 2j < =) >M
M jx 2j
and therefore
0 < jx 2j < M =) f (x) > M
That is, limx!2 f (x) = +1. N
364 CHAPTER 12. LIMITS OF FUNCTIONS

Let us now consider case (iii) with x0 = +1 and L 2 R. Here De nition 508 reduces to
the following \"- " one.

De nition 511 Let f : A R ! R, with A unbounded above.3 We write

lim f (x) = L 2 R
x!+1

if, for every " > 0, there exists M" > 0 such that, for every x 2 A, we have

x > M" =) jf (x) Lj < " (12.10)

In this case, for each choice of " > 0 arbitrarily small, there exists a value M" such that
the images of points x greater that M" are " close to L.

Example 512 Let f : R ! R be given by f (x) = 1 + e x . By Lemma 315, +1 is a


limit point of R. We can, therefore, consider the limit limx!+1 f (x). Let us verify that
limx!+1 f (x) = 1. Let " > 0. We have
x x
jf (x) Lj = 1 + e 1 =e < " () x < log " () x > log "

Therefore, setting M" = log ", we have

x > M" =) jf (x) Lj < "

That is, limx!+1 f (x) = 1. N

Finally, we consider case (iv) with x0 = L = +1. Here De nition 508 reduces to the
following one:

De nition 513 Let f : A R ! R, with A unbounded above. We write

lim f (x) = +1
x!+1

if, for every M > 0, there exists N such that, for every x 2 A, we have

x > N =) f (x) > M (12.11)


p
Example 514 Let f : R+ ! R be given by f (x) = x. By Lemma 315, +1 is a limit
point of R+ , so we can consider limx!+1 f (x). Let us verify that limx!+1 f (x) = +1.
For every M > 0 we have
p
f (x) > M () x > M () x > M 2

Setting N = M 2 yields
x > N =) f (x) > M
That is, limx!+1 f (x) = +1. N
3
By Lemma 315, the fact that A is unbounded above guarantees that +1 is a limit point of A. For
example, this is the case when (a; +1) A.
12.2. FUNCTIONS OF A SINGLE VARIABLE 365

N.B. If A = N+ , that is, f : N+ ! R is a sequence, with the last two de nitions we recover
the notions of convergence and of (positive) divergence for sequences. The theory of limits of
functions extends, therefore, the theory of limits of sequences of Chapter 8. In this respect,
note that the set N+ has only one limit point: +1. This is why the only limit meaningful
for sequences is limn!1 . O

O.R. It may be useful to see the concept of limit \in three stages" (as a rocket):

(i) for every neighborhood V of L (in ordinate)

(ii) there exists a neighborhood U of x0 (in abscissa) such that

(iii) all the values of f at x 2 U , x 6= x0 , belong to V , i.e., all the images { excluding at
most f (x0 ) { of f in U \ A belong to V : f (U \ A fx0 g) V .

10 y

V(l)
6

O U(x ) x
0
0

-2
-2 -1 0 1 2 3 4

We are often tempted to simplify to two stages: \the values of x close to x0 have images
f (x) close to L", that is,

for every U there exists V such that f (U \ A fx0 g) V

Unfortunately, this an empty statement that is always (vacuously) true, as the gure shows:
366 CHAPTER 12. LIMITS OF FUNCTIONS

5
y

3 V(l)

0
O x
U(x )
-1 0

-2

-3

-4
-4 -2 0 2 4 6

In the gure, for every neighborhood U (x0 ), however small, of x0 there exists always a
neighborhood (possibly quite big) V (L) of L inside which fall all the values of f (x) with
x 2 U fx0 g. Such V can always be taken as an open interval that contains f (U fx0 g).H

12.2.2 One-sided limits


We cannot always talk of two-sided (or bilateral) limits. For example, consider the simple
function f : R ! R given by (
2 if x 1
f (x) =
x if x < 1
with graph

It is easy to see that limx!1 f (x) does not exist. In these cases one can resort to the weaker
notion of one-sided (or unilateral) limit, which we already met in an intuitive way in the
introductory examples of this chapter. These examples, indeed, suggest two possible cases
when the right limit exists:

(i) limx!x+ f (x) 2 R;


0
12.2. FUNCTIONS OF A SINGLE VARIABLE 367

(ii) limx!x+ f (x) = 1.


0

Similarly, we also have two \left" cases. Note that in both (i) and (ii) the point x0 is in
R, while the value of the limit is in R.
The next \right" de nition includes both cases.

De nition 515 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x+
0

if, for every neighborhood V" (L) of L, there exists a right neighborhood U +" (x0 ) = [x0 ; x0 + " )
of x0 such that
x0 6= x 2 U +" (x0 ) \ A =) f (x) 2 V" (L) (12.12)
The value L is called the right limit of the function at x0 .

In a similar way we can de ne the left limits, denoted by limx!x f (x), as readers can
0
check.

By excluding x0 , the neighborhood U +" (x0 ) reduces to (x0 ; x0 + " ), so (12.12) can be
more simply written as

x 2 (x0 ; x0 + ") \ A =) f (x) 2 V" (L)

But, it is important to keep track of neighborhoods.

This de nition includes both cases:

case (i) is obtained by setting L 2 R;

case (ii) is obtained by setting L = 1.

In case (i), De nition 515 reduces to the following \"- " one.

De nition 516 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x+
0

if, for every " > 0, there exists = " > 0 such that, for every x 2 A,

x0 < x < x0 + =) jf (x) Lj < " (12.13)


p p
Example 517 Consider f : R+ ! R given by f (x) = x. We claim that limx!0+ x = 0.
Let " > 0. Then, p
jf (x) Lj = x < " () x < "2
Setting " = "2 , we have
0<x< " =) jf (x) Lj < "
p
That is, limx!0+ x = 0. N
368 CHAPTER 12. LIMITS OF FUNCTIONS

Example 518 Consider the function f : Q ! R on rationals given by


(
x + 1 if x >
f (x) =
x if x <

It is easy to see that limx! + f (x) = + 1 and limx! f (x) = , so the one-sided limits
exist but are di erent at . In contrast, they are equal at all other points of the real line,
with (
x0 + 1 if x0 >
lim f (x) = lim f (x) =
x!x+0 x!x0 x0 if x0 <
N

Let us consider the subcase L = +1 of (ii), leaving to the reader the subcase L = 1.
For this case, De nition 515 reduces to the following \"- " one.

De nition 519 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = +1
x!x+
0

if, for every M > 0, there exists M > 0 such that, for every x 2 A,

x0 < x < x0 + M =) f (x) > M (12.14)

We close this section with an example, from the introduction, in which both one-sided
limits (right and left) exist, but are di erent.

Example 520 Let f : R f2g ! R be given by f (x) = 1= (x 2). The point x0 = 2


is a limit point of R f2g, so we can consider the two one-sided limits limx!2+ f (x) and
limx!2 f (x). Let M > 0. Setting M = 1=M , for every x > 2 we have

1 1
x x0 < M () x 2< =) >M
M x 2
Therefore
0<x 2< M =) f (x) > M
that is, limx!2+ f (x) = +1. On the other hand, for every x < 2 we have

1 1
x0 x< M () 2 x< =) < M
M x 2
Therefore
0<2 x< M =) f (x) < M
That is, limx!2 f (x) = 1. We conclude that the two one-sided limits exist but are
dramatically di erent. This formally proves (12.1), which was intuitively discussed in the
introduction. N
12.2. FUNCTIONS OF A SINGLE VARIABLE 369

12.2.3 Relations between one-sided and two-sided limits


Next we show that two-sided limits (for nite points) exist if and only if the corresponding
one-sided limits exist and are equal. In other words, a two-sided limit can be regarded as the
case in which the two one-sided limits coincide. When they di er (or at least one of them
does not exist), the two-sided limit no longer exists.
Proposition 521 Let f : A R ! R be a function and x0 2 R a point for which there
exists a neighborhood B" (x0 ) such that B" (x0 ) fx0 g A. Then, limx!x0 f (x) = L 2 R if
and only if
lim f (x) = lim f (x) = L 2 R
x!x+
0 x!x0

Note that B" (x0 ) fx0 g is a neighborhood of x0 deprived of x0 itself, so \with a hole"
in the middle. The result requires that there exists at least one such neighborhood in A.
Clearly, if x0 2 A this amounts to require that x0 be an interior point of A. But the
hole permits x0 to be outside A. For instance, this is the case if we consider (again) the
function f (x) = 1= jx 2j and the point x0 = 2, which is outside the domain of f . We have
limx!2 f (x) = +1 and hence, by Proposition 521,
lim f (x) = lim f (x) = lim f (x) = +1
x!2 x!2+ x!2

which con rms (12.2). For f (x) = 1= (x 2) we have, instead,


+1 = lim f (x) 6= lim f (x) = 1
x!2+ x!2

So, by Proposition 521 the two-sided limit limx!2 f (x) does not exist.

Proof We prove the proposition for L 2 R, leaving to the reader the case L = 1.
Moreover, for simplicity we suppose that x0 is an interior point of A.
\If". We show that limx!x f (x) = limx!x+ f (x) = L implies limx!x0 f (x) = L. Let
0 0
" > 0. Since limx!x+ f (x) = L, there exists 0" > 0 such that, for every x 2 (x0 ; x0 + 0" ) \ A,
0
we have jf (x) Lj < ". On the other hand, since limx!x f (x) = L, there exists 00" > 0
0
00 0 00
such that for every x 2 (x0 " ; x0 ) \ A we have jf (x) Lj < ". Let " = min "; " .
Then
x 2 (x0 ; x0 + " ) \ A =) jf (x) Lj < " (12.15)
and
x 2 (x0 " ; x0 ) \ A =) jf (x) Lj < " (12.16)
that is
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < "
Therefore, limx!x0 f (x) = L.
\Only if". We show that limx!x0 f (x) = L implies limx!x f (x) = limx!x+ f (x) = L.
0 0
Let " > 0. Since limx!x0 f (x) = L, there exists " > 0 such that
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < " (12.17)
Since x0 is not a boundary point, both intersections (x0 " ; x0 ) \ A and (x0 ; x0 + " ) \ A
are not empty. Therefore, (12.17) implies both (12.15) and (12.16), so limx!x+ f (x) =
0
limx!x f (x) = L.
0
370 CHAPTER 12. LIMITS OF FUNCTIONS

Example 522 For the function on rationals in Example (518), by Proposition 521 we have
that (
x0 + 1 if x0 >
lim f (x) =
x!x0 x0 if x0 <

and that the two-sided limit limx! f (x) does not exist. N

As the reader may have noted, when A is an interval the hypothesis B" (x0 ) fx0 g A
of Proposition 521 forbids x0 to be a boundary point. Indeed, to x ideas, assume that A
is an interval of the real line with endpoints a < b.4 When x0 = a = inf A, it does not
make sense to talk of the one-sided limit limx!a f (x), while when x0 = b = sup A it does
not make sense to talk of the one-sided limit limx!b+ f (x). So, at the endpoints one of the
one-sided limit becomes meaningless.
Interestingly, at the endpoints we instead have

lim f (x) = lim f (x) and lim f (x) = lim f (x) (12.18)
x!a x!a+ x!b x!b

Indeed, the de nition of two-sided limit is perfectly satis ed: for each neighborhood V of L
there exists a neighborhood { necessarily one-sided because x0 is an endpoint { such that
the images of f , except perhaps f (x0 ), fall in V .
A similar observation can be made, more generally, at each boundary point x0 of A. For
p
instance, if A is a half-line [x0 ; +1), the left limit at x0 is meaningless: for f (x) = x and
p
x0 = 0, the left limit limx!0 x is meaningless.

p
Example 523 Let f : [0; 1) ! R be given by f (x) = x. We just remarked that
limx!0 f (x) is meaningless , while in Example 517 we saw that limx!0+ f (x) = 0. By
what we just noted, we can also write limx!0 f (x) = 0. It is instructive to compute this
two-sided limit directly, through De nition 503. Let " > 0. As we saw in Example 517, we
have
p
jf (x) Lj = x < " () x < "2

Setting " = "2 , for every x 2 A, that is, for every x 0, we have

0 < jx x0 j < " () 0 < x < " =) jf (x) Lj < "


p
Therefore, limx!0 x = 0. N

12.2.4 Grand nale


We conclude by observing that in the general De nition 508 of the two-sided limit { which
includes all the cases of nite or in nite points and nite or in nite limits { the mention of
" and " is actually super uous. We can therefore rewrite such a de nition in the following
neater way.
4
In other words, one of the following four cases holds: (i) A = (a; b); (ii) A = [a; b); (iii) A = (a; b]; (iv)
A = [a; b].
12.3. FUNCTIONS OF SEVERAL VARIABLES 371

De nition 524 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V of L, there exists a neighborhood U of x0 such that

f ((U \ A) fx0 g) V

It is this version of two-sided limit that the reader will nd generalized to topological
spaces in more advanced courses. A similar general version holds for one-sided limits, as the
reader can check.

12.3 Functions of several variables


12.3.1 De nition
The extension to functions of several variable f : A Rn ! R of the de nition of limit,
limx!x0 f (x) = L, is altogether natural, almost e ortless. Indeed, once we consider neigh-
borhoods of Rn de ned through the general distance d (x; x0 ) = kx x0 k, the sentence
\to approach x0 " continues to mean \the distance between x and x0 becomes smaller and
smaller". Formally:5

De nition 525 Let f : A Rn ! R be a function and x0 2 Rn a limit point of A. We


write
lim f (x) = L 2 R (12.19)
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (12.20)

The value L is called the limit of the function at x0 .

De nition 508 is the special case with n = 1. In the \"- " version we have (12.19) if, for
every " > 0, there exists " > 0 such that, for every x 2 A,

0 < d (x; x0 ) = kx x0 k < " =) d (f (x) ; L) = jf (x) Lj < " (12.21)

Clearly, (12.21) reduces to (12.5) when n = 1, i.e., when kx x0 k reduces to jx x0 j.


P
Example 526 Let f : Rn ! R be given by f (x) = 1+ ni=1 xi . We verify that limx!0 f (x) =
1. Let " > 0. We have
n
X n
X
d (f (x) ; 1) = 1 + xi 1 < " () xi < "
i=1 i=1

5
For brevity, we consider only the two-sided case x0 2 Rn , leaving the other cases to readers.
372 CHAPTER 12. LIMITS OF FUNCTIONS

P Pn
Set " = "=n. Since j ni=1 xi j i=1 jxi j, we have
v
u n n
uX " X "2
d (x; x0 ) < " () t x2i < () x2i < 2 =)
n n
i=1 i=1
q r
2 "2 2 "2 "
xi < 2 8i = 1; 2; : : : ; n =) jxi j = xi < 2
= 8i = 1; 2; : : : ; n
n n n
Xn n
X Xn
=) jxi j < " =) d (f (x) ; 1) = xi jxi j < "
i=1 i=1 i=1

That is, limx!0 f (x) = 1. N

As the reader can check, we can easily extend to functions of several variables the limits
from above and from below (indeed, the limit L keeps being a scalar, not a vector). Moreover,
the notion of limit can be easily extended to operators. But we postpone it to Chapter 13
(De nition 587), when we will study the continuity of operators, a topic that will motivate
this further extension.

12.3.2 Directions
So far, so good. Too good, in a sense because the multivariable extension of the notion of
limit seems just a matter of upgrading the distance, from the absolute value jx x0 j between
scalars to the more general case of the norm kx x0 k between vectors. Formally, this is true
but one should not forget that, when n > 1, the condition kx x0 k < " controls many more
ways to approach a point. Indeed, in the real line there are only two ways to approach a
point x0 , the left direction and the right one. They are identi ed with and + in the next
gure, respectively.

Instead, in the plane { a fortiori, in a general space Rn { there are in nitely many directions
along which to approach a point x0 , as the gure illustrates:
12.3. FUNCTIONS OF SEVERAL VARIABLES 373

Intuitively, condition (12.21) requires that, as x approaches x0 along all such directions, the
function f tends to the same value L. In other words, the behavior of f is consistent across
all such directions. If, therefore, there are two such directions along which f does not tend
to the same limit value, the function does not have a limit as x ! x0 . The following example
should clarify the issue.

Example 527 Let f : R2 ! R be given by


log(1 + x1 x2 )
f (x1 ; x2 ) =
x21

Let us verify that lim(x1 ;x2 )!(0;0) f (x) does not exist. Consider two possible directions along
which we can approach the origin: along the parabola x2 = x21 , and along the straight line
x2 = x1 . Graphically:

Along the parabola we have

log(1 + x31 ) log(1 + x31 )


lim f (x1 ; x2 ) = lim f x1 ; x21 = lim = lim x1 =0
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21 x1 !0 x31

Along the straight line, we instead have

log(1 + x21 )
lim f (x1 ; x2 ) = lim f (x1 ; x1 ) = lim =1
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21

Since f tends to two di erent limit values along the two directions, we conclude that
lim(x1 ;x2 )!(0;0) f (x) does not exist.
We can prove this failure rigorously using De nition 525. Suppose, by contradiction, that
the limit exists, that is,
lim f (x1 ; x2 ) = L
(x1 ;x2 )!(0;0)

Set " = 1=4. By de nition of limit, there exists 1 > 0 such that, for (0; 0) 6= (x1 ; x2 ) 2
B 1 (0; 0), we have
1
d (f (x1 ; x2 ) ; L) < (12.22)
4
374 CHAPTER 12. LIMITS OF FUNCTIONS

From the limit along the parabola, by setting

log(1 + x3 )
g(x) =
x2

one gets limx1 !0 g(x1 ) = 0. Therefore, by setting again " = 1=4, there exists 2 > 0 such
that for, 0 6= x1 2 B 2 (0) R, we have

1 1
g(x1 ) 2 ( "; ") = ;
4 4

Now consider the neighborhood B 2 (0; 0) R2 of (0; 0). Take a point on the parabola
x2 = x21 that belongs to this neighborhood, that is, a point (0; 0) 6= x ^21 2 B 2 (0; 0). We
^1 ; x
have x 6
^1 2 B 2 (0), so
1 1
f x ^21 = g (^
^1 ; x x1 ) 2 ; (12.23)
4 4
Similarly, from the limit along the straight line, by setting

log(1 + x2 )
h(x) =
x2

one gets limx1 !0 h(x1 ) = 1. Therefore, setting again " = 1=4, there exists 3 > 0 such that
for 0 6= x1 2 B 3 (0) R we have

3 5
h(x1 ) 2 (1 "; 1 + ") = ;
4 4

Now consider the neighborhood B 3 (0; 0) R2 of (0; 0) and take a point of the straight line
x2 = x1 that belongs to it, that is, a point (0; 0) 6= (~x1 ; x
~1 ) 2 B 3 (0; 0). We have x
~1 2 B 3 (0),
so that
3 5
f (~x1 ; x
~1 ) = h (^
x1 ) 2 ; (12.24)
4 4

Let = minf 1 ; 2 ; 3 g and consider two points x ^21 and (~


^1 ; x x1 ; x
~1 ) on the parabola and
on the straight line that belong to B (0; 0) and that are di erent from the origin (0; 0). By
(12.23) and (12.24), we have

1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) >
2

On the other hand, from (12.22) we have

1 1 1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) d f x ^21 ; L + d (L; f (~
^1 ; x x1 ; x
~1 )) < + =
4 4 2

This contradiction shows that the limit lim(x1 ;x2 )!(0;0) f (x1 ; x2 ) does not exist. N
6
Indeed, d((^ ^21 ); (0; 0)) <
x1 ; x 2, ^21 + x
that is, x ^41 < 2
2, ^21 <
implies x 2
2, whence d(^
x1 ; 0) < 2.
12.3. FUNCTIONS OF SEVERAL VARIABLES 375

12.3.3 Sequential characterization


Limits of functions admit a key characterization through limits of approaching sequences.
Proposition 528 Given a function f : A Rn ! R and a limit point x0 2 Rn of A, we
have limx!x0 f (x) = L 2 R if and only if
xn ! x0 =) f (xn ) ! L
for every sequence fxn g in A with terms distinct from x0 .7
Proof We consider L 2 R, leaving to the reader the case L = 1. \If". Suppose f (xn ) ! L
for every sequence fxn g of points of A, with xn 6= x0 for every n, such that xn ! x0 . Suppose,
by contradiction, that limx!x0 f (x) = L is false. Then, there is " > 0 such that, for every
> 0, there exists x 2 A such that 0 < d (x ; x0 ) < and d (f (x ) ; L) ". For every n, set
= 1=n and let xn be the corresponding point of A just denoted by x . For the sequence fxn g
of points of A so constructed, we have d (x0 ; xn ) < 1=n for every n, so limn!1 d (x0 ; xn ) = 0.
By Proposition 306, xn ! x. But, by construction, d (f (xn ) ; L) " for every n, so the
sequence f (xn ) does not converge to L. Having contradicted the hypothesis, we conclude
that limx!x0 f (x) = L.
\Only if". Suppose limx!x0 f (x) = L 2 R. Let fxn g be a sequence of points of A, with
xn 6= x0 for every n, such that xn ! x0 . Let " > 0. There exists " > 0 such that, for
every x 2 A, 0 < d (x; x0 ) < " implies d (f (x) ; L) < ". Since xn ! x0 and xn 6= x0 , there
exists n" 1 such that 0 < d (xn ; x0 ) < " for every n n" . For every n n" we thus have
d (f (xn ) ; L) < ", which implies f (xn ) ! L.
Example 529 Let us go back to limx!2 (3x 5) of Example 504. Since A = R, let fxn g
be any sequence of scalars, with xn 6= 2 for every n, such that xn ! 2. For example,
xn = 2 + 1=n or xn = 2 1=n2 . By the algebra of limits of sequences, we have

lim (3xn 5) = 3 lim xn 5=3 2 5=1


n!1 n!1

For example, in the special case xn = 2 + 1=n we have


1 1
lim 3 2+ 5 = 3 lim 2+ 5=3 2 5=1
n!1 n n!1 n
By Proposition 528, this con rms that limx!2 (3x 5) = 1. N
Example 530 Consider the function f : (0; 1) ! R given by
p
x
f (x) =
x
and the limit limx!0 f (x). Since A = (0; 1) and x0 = 0, let fxn g be a sequence of strictly
positive scalars such that xn ! 0. For example, xn = 1=n or xn = 1=n2 . By the algebra of
limits of sequences, we have
p
xn 1
lim = lim p = +1
n!1 xn n!1 xn
and so Proposition 528 allows to conclude that limx!0 f (x) = +1. N
7
That is, xn 6= x0 for all n 1.
376 CHAPTER 12. LIMITS OF FUNCTIONS

The characterization of limits through sequences is important both operationally, be-


cause the calculation of the limits of functions reduces to the simpler calculation of limits
of sequences (as the last examples just showed), and theoretically because in this way many
of the properties established for limits of sequences easily extend to the more general case
of limits of functions. In this chapter we will mostly focus on the second, more theoretical
aspect, of the sequential characterization in order to obtain basic properties of limits.
In sum, though limits of sequences are a special case of limits of functions, they have a
special status because of the sequential characterization established in the last proposition.

12.4 Properties of limits


In this section we present some basic properties of limits. To ease matters and state the
properties directly in terms of functions of several variables, we will consider only limit
points x0 that belong to Rn . However, as the reader can check, for scalar functions these
properties hold also for the case x ! 1.8 In the book we will use also such versions when
needed.
We start with the uniqueness of the limit.

Theorem 531 (Uniqueness of the limit) Let f : A Rn ! R be a function and x0 2 R


a limit point of A. There exists at most a unique L 2 R such that limx!x0 f (x) = L.

Proof Let us suppose, by contradiction, that there exist two di erent limits L0 6= L00 . Let
fxn g be a sequence in A, with eventually xn 6= x0 , such that xn ! x0 . By Proposition 528,
f (xn ) ! L0 and f (xn ) ! L00 , which contradicts the uniqueness of the limit for sequences.
It follows that L0 = L00 .

Here is an alternative proof, which does not use limits of sequences.

Alternative proof By contradiction, let us suppose that there exist two di erent limits L1
and L2 , that is, L1 6= L2 . We assume therefore that

lim f (x) = L1
x!x0

and
lim f (x) = L2
x!x0

with L1 6= L2 . Without loss of generality, suppose that L1 > L2 . There exists a number K
such that L1 > K > L2 . Setting 0 < "1 < L1 K and 0 < "2 < K L2 , the neighborhoods

8
That is, for the case x0 2 R that, indeed, includes x ! 1 as the special cases x ! x0 = 1.
12.4. PROPERTIES OF LIMITS 377

B"1 (L1 ) = (L1 "1 ; L1 + "1 ) and B"2 (L2 ) = (L2 "2 ; L2 + "2 ) are disjoint.

10 y
L +ε
2 2
8
L2

L -ε
2 2
6
L +ε
1 1
L
1
4 L -ε
1 1

O x
0

-2
-2 -1 0 1 2 3 4

Since by hypothesis limx!x0 f (x) = L1 , given "1 > 0 one can nd 1 > 0 such that

x0 6= x 2 (x0 1 ; x0 + 1) \ A =) f (x) 2 (L1 " 1 ; L1 + " 1 ) (12.25)

Analogously, since by hypothesis limx!x0 f (x) = L2 , given "2 > 0 one can nd 2 > 0 such
that
x0 6= x 2 (x0 2 ; x0 + 2 ) \ A =) f (x) 2 (L2 " 2 ; L2 + " 2 ) (12.26)
Taking = min f 1 ; 2 g we have that the neighborhood (x0 ; x0 + ) of x0 with radius
is contained in the two previous neighborhoods, i.e., in (x0 ; x0 + ) both (12.25) and
(12.26) hold:

x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 " 1 ; L1 + " 1 ) and f (x) 2 (L2 " 2 ; L2 + " 2 )

Hence,

x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 "1 ; L1 + "1 ) \ (L2 "2 ; L2 + "2 )

which is a contradiction, since we assumed that

(L1 "1 ; L1 + "1 ) \ (L2 "2 ; L2 + "2 ) = ;

The limit is therefore unique.

We continue with a version for functions of the theorem on the permanence of sign
(Theorem 319).

Theorem 532 (Permanence of sign) Let f : A Rn ! R be a function and x0 2 Rn a


limit point of A. If limx!x0 f (x) = L 6= 0, then there exists a neighborhood B" (x0 ) of x0 on
which f (x) and L have the same sign, i.e.,

f (x) L > 0 8x0 6= x 2 B" (x0 ) \ A


378 CHAPTER 12. LIMITS OF FUNCTIONS

In words, if L 6= 0, it is always possible to choose a neighborhood of x0 small enough so


that the function takes on, at all its points (distinct from x0 ), a value that has the same sign
of L { i.e., such that f (x) L > 0.

We leave to the reader the easy \sequential" proof based on Theorem 319 and on Propo-
sition 528. We give, instead, a proof that directly uses the de nition of limit.

Alternative proof Let L 6= 0, say L > 0. Since limx!x0 f (x) = L, by taking " = L=2 > 0
there exists a neighborhood B" (x0 ) of x0 such that
L L L 3L
x0 6= x 2 B" (x0 ) \ A =) f (x) 2 L ;L + = ;
2 2 2 2
Since L=2 > 0, we are done. For L < 0, the proof is similar.

The comparison criterion takes the following form for functions.


Theorem 533 (Comparison criterion) Let f; g; h : A Rn ! R be three functions and
x0 2 Rn a limit point of A. If
g (x) f (x) h (x) 8x 2 A (12.27)
and
lim g (x) = lim h (x) = L 2 R (12.28)
x!x0 x!x0
then
lim f (x) = L
x!x0

Again we leave to the reader the easy \sequential" proof based on Theorem 338 and on
Proposition 528, and give a proof based on the de nition of limit.

Alternative proof Let " > 0. We have to show that there exists > 0 such that f (x) 2
(L "; L + ") for every x0 6= x 2 (x0 ; x0 + ) \ A. Since limx!x0 g(x) = L, there exists
1 > 0 such that
8x0 6= x 2 (x0 1 ; x0 + 1) \ A =) L " < g(x) < L + " (12.29)
Since limx!x0 h(x) = L, there exists 2 > 0 such that
8x0 6= x 2 (x0 2 ; x0 + 2) \ A =) L " < h(x) < L + " (12.30)
By taking = min f 1 ; 2 g, both (12.29) and (12.30) then hold in (x0 ; x0 + ) \ A. By
(12.27), we then have
L " < g(x) f (x) h(x) < L + " 8x0 6= x 2 (x0 ; x0 + ) \ A
that is
f (x) 2 (L "; L + ") 8x0 6= x 2 (x0 ; x0 + ) \ A
Since " was arbitrary, we conclude that limx!x0 f (x) = L.

The comparison criterion for functions has the same interpretation than the original
version for sequences (Theorem 338). The next simple application of this criterion is similar,
mutatis mutandis, to that seen in Example 339.
12.4. PROPERTIES OF LIMITS 379

2 1
Example 534 Let f : R ! R be given by f (x) = ex cos x and let x0 = 0. Since
1
0 cos2 1 8x 2 R
x
by the monotonicity of the exponential function we have
2 1
1 = e0 x ex cos x e1 x = ex 8x 0
Setting g (x) = 1 and h (x) = ex , conditions (12.27) and (12.28) are satis ed with L = 1.
Therefore, limx!0 f (x) = 1. The proof for x < 0 is analogous. N
As it was the case for sequences, more generally also for functions the last two results
establish properties of the limits with respect to the underlying order structure of Rn . The
next proposition, which extends Propositions 320 and 321 to functions, is yet another simple
result of this kind.
Proposition 535 Let f; g : A Rn ! R be two functions, x0 2 Rn a limit point of A, and
limx!x0 f (x) = L 2 R and limx!x0 g (x) = H 2 R.
(i) If f (x) g (x) in a neighborhood of x0 , then L H.
(ii) If L > H, then there exists a neighborhood of x0 in which f (x) > g (x).
Observe that in (i) we can only say L H even when we have the strict inequality
f (x) > g (x). For example, for the functions f; g : R ! R given by
1 if x = 0
f (x) =
x2 if x =
6 0
and g (x) = 0 we have, for x ! 0, L = H = 0 although f (x) > g (x) for every x 2 R.
Similarly, if f (x) = 1=x and g (x) = 0, for x ! +1 we have L = H = 0 although
f (x) > g (x) for every x > 0.

As we did so far in this section, we leave the sequential proof { based on Propositions
320 and 321 { to readers and give, instead, a proof based on the de nition of limit.

Alternative proof (i) By contradiction, assume that L < H. Set " = H L, so that
" > 0. The neighborhoods (L "=4; L + "=4) and (H "=4; H + "=4) are disjoint since
L + "=4 < H "=4. Since limx!x0 f (x) = L, there exists 1 > 0 such that
" "
x0 6= x 2 (x0 1 ; x0 + 1 ) =) f (x) 2 L ;L +
4 4
Analogously, since limx!x0 g (x) = H, there exists 2 > 0 such that
" "
x0 6= x 2 (x0 2 ; x0 + 2 ) =) g(x) 2 H ;H +
4 4
By setting = minf 1 ; 2 g, we have
" " " "
x0 6= x 2 (x0 ; x0 + ) =) L < f (x) < L + < H < g(x) < H +
4 4 4 4
That is, f (x) < g(x) for every x 2 B (x0 ). This contradicts the hypothesis that f (x) g (x)
in a neighborhood of x0 .
(ii) We prove the contrapositive. It is enough to note that, if f (x) g(x) in every
neighborhood of x0 , then (i) implies L H.
380 CHAPTER 12. LIMITS OF FUNCTIONS

12.5 Algebra of limits


The next result extends the algebra of limits established for sequences (Propositions 333 and
337) to the general case of functions.9

Proposition 536 Given two functions f; g : A Rn ! R and a limit point x0 2 Rn of A,


suppose that limx!x0 f (x) = L 2 R and limx!x0 g (x) = M 2 R. Then:

(i) limx!x0 (f + g) (x) = L+M , provided that L+M is not an indeterminate form (1.24),
of the type
+1 1 or 1+1

(ii) limx!x0 (f g) (x) = LM , provided that LM is not an indeterminate form (1.25), of the
type
1 0 or 0 ( 1)

(iii) limx!x0 (f =g) (x) = L=M provided that g (x) 6= 0 in a neighborhood of x0 , with x 6= x0 ,
and L=M is not an indeterminate form (1.26), of the type10

1 a
or
1 0

Proof We prove only (i), leaving to the reader the analogous proof of (ii) and (iii). Let
fxn g be a sequence in A, with xn 6= x0 for every n 1, such that xn ! x0 . By Proposition
528, f (xn ) ! L and g (xn ) ! M . Suppose that L + M is not an indeterminate form. By
Proposition 333, (f + g) (xn ) ! L + M , and therefore, by Proposition 528 it follows that
limx!x0 (f + g) (x) = L + M .

Example 537 Let f; g : R f0g ! R be given by f (x) = sin x=x and g (x) = 1= jxj. We
have limx!0 sin x=x = 1 and limx!0 1= jxj = +1. Therefore,

sin x 1
lim + = 1 + 1 = +1
x!0 x jxj

If, instead, g (x) = ex , we have limx!0 (sin x=x + ex ) = 1 + 1 = 2. N

As for sequences, when a 6= 0 the case a=0 of point (iii) is actually not an indeterminate
form for the algebra of limits, as the following version for functions of Proposition 335 shows.

Proposition 538 Let limx!x0 f (x) = L 2 R, with L 6= 0, and limx!x0 g(x) = 0. The limit
limx!x0 (f =g) (x) exists if and only if there is a neighborhood U (x0 ) of x0 2 Rn where the
function g has constant sign, except at most at x0 . In this case:11
9
For brevity, we focus on Proposition 333 and leave to the reader the analogous extension of Proposition
337.
10
As for sequences, to exclude the indeterminacy a=0 amounts to require M 6= 0.
11
Here g ! 0+ and g ! 0 indicate that limx!x0 g (x) = 0 with, respectively, g(x) 0 and g (x) 0 for
every x0 6= x 2 U (x0 ).
12.5. ALGEBRA OF LIMITS 381

(i) if L > 0 and g ! 0+ or if L < 0 and g ! 0 , then

f (x)
lim = +1
x!x0 g (x)

(ii) if L > 0 and g ! 0 or if L < 0 and g ! 0+ , then

f (x)
lim = 1
x!x0 g (x)

Example 539 Consider f (x) = x + 5 and g(x) = x. As x ! 0, we have f ! 5, but in every


neighborhood of 0 the sign of the function g(x) alternates, that is, there is no neighborhood
of 0 where g has constant sign. By Proposition 538, the limit of (f =g) (x) as x ! 0 does not
exist. N

As in the previous section, we considered only limits at points x0 2 Rn . The reader can
verify that for scalar functions the results of this section extend to the case x ! 1.

Example 540 Take f (x) = 1=x 1 and g(x) = 1=x. As x ! +1 we have f ! 1 and
g ! 0. Since g(x) > 0 for every x > 0, so also in any neighborhood of +1, we have g ! 0+ .
Thanks to the version for x ! 1 of Proposition 538, we have limx!+1 (f =g) (x) = 1.
N

12.5.1 Indeterminacies for limits


The algebra of limits presents indeterminacies similar to those of sequences (Section 8.10.3).
Here we will brie y review them.

Indeterminate form 1 1
For example, the limit limx!0 (f + g) (x) of the sum of the functions f; g : R f0g ! R
given by f (x) = 1=x2 and g (x) = 1=x4 falls under the indeterminate form 1 1. We
have
1 1 1 1
(f + g) (x) = 2 4
= 2 1
x x x x2
and, therefore,
1 1
lim (f + g) (x) = lim lim 1 = 1
x!0 x!0 x2 x!0 x2
since (+1) ( 1) is not an indeterminate form. Exchanging the signs between these two
functions, that is, by setting f (x) = 1=x2 and g (x) = 1=x4 , we have again the indetermi-
nate form 1 1 at x0 = 0, but this time limx!0 (f + g) (x) = +1. Thus, also for functions
the indeterminate forms can give completely di erent results, everything goes. So, they must
be solved case by case.
Finally, note that these functions f and g give rise to an indeterminacy at x0 = 0, but not
at x0 6= 0. Therefore, for functions it is crucial to specify the point x0 that we are considering.
This is, indeed, the only novelty that the study of indeterminate forms of functions features
relative that of sequences (for which we only have the case n ! +1).
382 CHAPTER 12. LIMITS OF FUNCTIONS

Indeterminate form 0 1
For example, consider the functions f; g : R ! R given by f (x) = (x 3)2 and g (x) =
1= (x 3)4 . The limit limx!3 (f g) (x) falls under the indeterminate form 0 1. But we have
1 1
lim (f g) (x) = lim (x 3)2 4 = lim = +1
x!3 x!3 (x 3) x!3 (x 3)2
On the other hand, by considering f (x) = 1= (x 3)2 and g (x) = (x 3)4 , we have
1
lim (f g) (x) = lim (x 3)4 2 = lim (x 3)2 = 0
x!3 x!3 (x 3) x!3

Again, only the direct calculation of the limit can determine its value.

Indeterminate forms 1=1 and 0=0


For example, let f; g : R ! R be given by f (x) = 5 x and g (x) = x2 25. The limit of
their ratio as x ! 5 has the form 0=0, but
f 5 x 5 x 1 1
lim (x) = lim = lim = lim =
x!5 g x!5 x2 25 x!5 (x 5)(x + 5) x!5 x+5 10
On the other hand, by taking f; g : R ! R given by f (x) = x2 and g (x) = x, as x ! +1
we have a indeterminate form of the type 1=1 and
f x2
lim (x) = lim = lim x = +1
x!+1 g x!+1 x x!+1

while, as x ! 1, we still have a form of the type 1=1 but


f x2
lim (x) = lim = lim x = 1
x! 1 g x! 1 x x! 1

In the two case the limits are in nities of opposite sign: again, one cannot avoid the direct
calculation of the limit.

For the functions f and g just seen, at the point x0 = 0 we have the indeterminate form
0=0, but
f x2
lim (x) = lim = lim x = 0
x!0 g x!0 x x!0

while, setting g (x) = x4 , we still have an indeterminate form of the type 0=0 and
f x2 1
lim (x) = lim
= lim 2 = +1
x!0 g x!0 x4 x!0 x
p
On the other hand, by taking f : R+ ! R given by f (x) = x + x 2 and g : R f1g ! R
given by g (x) = x 1, we have
p p p
f x+ x 2 x 1+ x 1 x 1
lim (x) = lim = lim = lim 1 +
x!1 g x!1 x 1 x!1 x 1 x!1 x 1
p
x 1 1 1 3
= 1 + lim p p = 1 + lim p =1+ =
x!1 ( x 1) ( x + 1) x!1 x+1 2 2
12.6. COMMON LIMITS 383

Summing up, everything goes.

We close with two observations: (i) as for sequences (Section 8.10.5), for functions the
various indeterminate forms can be reduced to one another; (ii) also in the case of functions
we can summarize what we have seen till now in tables similar to those in Section 8.10.4, as
readers can check.

12.6 Common limits


Using what studied so far, we now calculate some, more or less elementary, common limits.
We begin with a few examples of limits of elementary functions.

Example 541 (i) Let f : R ! R be given by f (x) = xn with n 1. For every x0 2 R, by


the basic properties of limits we have

lim xn = xn0
x!x0

Moreover, limx! n = +1 if n is even, while limx!+1 xn = +1 and limx! n


1x 1x = 1
if n is odd.

(ii) Let f : R f0g ! R be given by f (x) = 1=xn for n 1. For every 0 6= x0 2 R, we


have
1
lim f (x) = n
x!x0 x0
Moreover, limx! 1 1=xn = 0+ if n is even, while limx!+1 1=xn = 0+ and limx! 1 xn =
0 if n is odd. Finally, limx!0+ 1=xn = +1 and limx!0 1=xn = 1 if n is odd, while
limx!0+ 1=xn = limx!0 1=xn = +1 if n is even.

(iii) Let f : R ! R be given by f (x) = x, with > 0. For every x0 2 R, we have


limx!x0 x = x0 . Moreover,
8 8
>
> 0 if >1 >
> +1 if >1
< <
lim x = 1 if =1 and lim x
= 1 if =1
x! 1 >
> x!+1 >
>
: :
+1 if <1 0 if <1

(iv) Let f : (0; 1) ! R be given by f (x) = loga x, with a > 0; a 6= 1. For every x0 > 0, we
have limx!x0 loga x = loga x0 . Moreover,
( (
1 if a > 1 +1 if a > 1
lim loga x = and lim loga x =
x!0+ +1 if a < 1 x!+1 1 if a < 1

(v) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. For every x0 2 R, we
have limx!x0 sin x = sin x0 and limx!x0 cos x = cos x0 . The limits limx! 1 sin x and
limx! 1 cos x do not exist. N

Next we prove some classic limits for trigonometric functions (we already met the rst
one in the introduction of this chapter).
384 CHAPTER 12. LIMITS OF FUNCTIONS

Proposition 542 Let f; g : R+ ! R be de ned by f (x) = sin x=x and g (x) = (cos x 1) =x.
Then
sin x
lim =1 (12.31)
x!0 x
and
1 cos x 1 cos x 1
lim = 0; lim = (12.32)
x!0 x x!0 x2 2
Proof It is easy to see graphically that 0 < sin x < x < tan x for x 2 (0; =2) and that
tan x < x < sin x < 0 for x 2 ( =2; 0). Therefore, by dividing all the terms by sin x and
by observing that sin x > 0 when x 2 (0; =2) and sin x < 0 when x 2 ( =2; 0), we have in
all the cases
x 1
1< <
sin x cos x
The rst limit then follows from the comparison criterion. For the third one, it is su cient
to observe that
1 cos x 1 cos x 1 + cos x 1 cos2 x sin2 x 1
2
= 2
= 2
= 2
x x 1 + cos x x (1 + cos x) x 1 + cos x
and that, as x ! 0, the rst factor tends to 1 while the second one tends to 1=2. Finally,
the second limit follows immediately from the third one:
1 cos x 1 cos x 1
=x !0 =0
x x2 2

Finally, from the analogous propositions that we proved for sequences, we easily deduce
(the proofs are essentially identical) the following limits:

(i) If f (x) ! 1 as x ! x0 , then


f (x)
k
lim 1+ = ek
x!x0 f (x)
In particular
f (x) x
1 1
lim 1+ = e; lim 1+ =e
x!x0 f (x) x! 1 x

(ii) Let a > 0 and f (x) ! 0 as x ! x0 . Then

af (x) 1
lim = log a (12.33)
x!x0 f (x)
In particular,
ax 1
lim = log a (12.34)
x!0 x
which, when a = e, becomes
ex 1
lim =1 (12.35)
x!0 x
12.6. COMMON LIMITS 385

(iii) Let 0 < a 6= 1 and f (x) ! 0 as x ! x0 . Then

loga (1 + f (x)) 1
lim =
x!x0 f (x) log a

In particular,
loga (1 + x) 1
lim =
x!0 x log a
which, when a = e, becomes
log(1 + x)
lim =1
x!0 x
(iv) If f (x) ! 0 as x ! x0 , we have

(1 + f (x)) 1
lim =
x!x0 f (x)

In particular,
(1 + x) 1
lim =
x!0 x
N.B. The function u : (0; 1) ! R de ned by
( x1 1
1 if 6= 1
u (x) =
log x if =1

is the classic CRRA (constant relative risk aversion) utility function, where the scalar is
interpreted as a coe cient of relative risk aversion (see Pratt, 1964, p. 134). In view of the
limit (12.34), we have12

x1 1
lim u (x) = lim = log x = u1 (x)
!1 !1 1

In a similar vein, de ne u : R ! R by
8 x
< 1 e
if >0
u (x) =
: x if =0

the classic CARA (constant absolute risk aversion) utility function, where the scalar is
interpreted as a coe cient of relative risk aversion (see Pratt, 1964, p. 130). In view of the
limit (12.35), we have

1 e x 1 e x e x 1
lim u (x) = lim = lim ( x) = x lim = x = u0 (x)
!0 !0 !0 x !0 x
Note that the scalars and index functions, through them we are actually studying limit
properties of functions. O
12
Here 1 plays the role of x in (12.34).
386 CHAPTER 12. LIMITS OF FUNCTIONS

12.7 Orders of convergence and of divergence


As for sequences, also for functions it may happen that some of them approach their limit
\faster" than other ones.
For simplicity we limit ourselves to scalar functions. We rst extend to them the key
De nition 351. Note the importance of the clause \as x ! x0 ", which (as already remarked)
is the unique true novelty with respect to the case of sequences, in which this clause could
only take the form \n ! +1".

De nition 543 Given two functions f; g : A R ! R, let x0 2 R be a limit point of A for


which there exists a neighborhood B" (x0 ) such that g (x) 6= 0 for every x 2 A \ B" (x0 ).

(i) If
f (x)
lim =0
x!x0 g (x)
we say that f is negligible with respect to g as x ! x0 ; in symbols,

f = o (g) as x ! x0

(ii) If
f (x)
lim = k 6= 0 (12.36)
x!x0 g (x)
we say that f is comparable with g as x ! x0 ; in symbols,

f g as x ! x0

(iii) In particular, if
f (x)
lim =1
x!x0 g (x)
we say that f and g are asymptotic (or asymptotically equivalent) to one another as
x ! x0 and we write
f (x) g (x) as x ! x0

Terminology For functions, too, the expression f = o (g) as x ! x0 reads \f is little-o of


g, as x ! x0 ".

It is easy to see that also for functions the relations and for continue to satisfy the
properties seen in Section 8.14 for sequences, i.e.,

(i) the relations of comparability and of asymptotic are symmetric and transitive;

(ii) the relation of negligibility is transitive;

(iii) if limx!x0 f (x) and limx!x0 g (x) are both nite and non-zero, then f g as x ! x0 ;

(iv) if limx!x0 f (x) = 0 and 0 6= limx!x0 g (x) 2 R, then f = o (g) as x ! x0 .


12.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 387

We now consider the cases, which also for functions continue to be the most interesting
ones, in which both functions either converge to zero or diverge to 1. We start with
convergence to zero: limx!x0 f (x) = limx!x0 g (x) = 0. In this case, intuitively, f is neg-
ligible with respect to g as x ! x0 if it tends to zero faster. Let, for example, x0 = 1,
f (x) = (x 1)2 and g (x) = x 1. We have
(x 1)2
lim = lim (x 1) ! 0
x!1 x 1 x!1

that is, f = o (g) as x0 ! 1. On the other hand, as x ! +1, we have


p r
x 1
lim p = lim 1 !1
x!+1 x + 1 x!+1 x+1
p p
Therefore, the functions f (x) = x and g (x) = x + 1 are comparable (even better, they
are asymptotic to one another) as x ! +1.

Let us consider two functions tending both to 1 as x ! x0 . In this case, intuitively,


f is negligible with respect to g when it tends to in nity slower, that is, when assumes less
rapidly larger and larger values (in absolute value). For example, if f (x) = x and g (x) = x2 ,
for x0 = +1 we have
x 1
lim = lim =0
x!+1 x2 x!+1 x
and so f = o (g) as x ! +1. When x ! 1, too, we have
x 1
lim 2
= lim =0
x! 1 x x! 1 x
So, f = o (g) also as x ! 1: in both cases x tends to in nity slower than x2 . Note that,
as x ! 0, we have instead limx!0 x2 = limx!0 x = 0 and
x2
lim = lim x = 0
x!0 x x!0

so that g = o (f ) as x ! 0.

In sum, also for functions the meaning of negligibility must be speci ed according to
whether we consider convergence to zero or divergence to in nity. Moreover, the point x0
where we take the limit is key, as already remarked several times (repetita iuvant, hopefully).

12.7.1 Little-o algebra


Like for sequences, also for functions the application of the concept of \little-o" is not always
straightforward. Indeed, knowing that a function f is little-o of another function g as x ! x0
does not give much information on the form of f , apart from that of being negligible with
respect to g. Fortunately, there exists an \algebra" of little-o that extends the one seen
for sequences (Proposition 353) and allows us to manipulate safely the little-o of sums and
products of functions. To ease notation, in what follows we will always assume that the
negligibility of the various functions is as x approaches the same point x0 , so we will always
omit the clause \as x ! x0 ".13
13
In any case, it would be meaningless to consider sums or products of little-o at di erent x0 .
388 CHAPTER 12. LIMITS OF FUNCTIONS

Proposition 544 For every pair of functions f and g and for every scalar c 6= 0, we have:

(i) o(f ) + o (f ) = o (f );

(ii) o(f )o(g) = o(f g);

(iii) c o(f ) = o(f );

(iv) o(g) + o (f ) = o (f ) if g = o(f ).

We omit the proof because it is similar, mutatis mutandis, to that of Proposition 353.
Also the comments we made about that proposition still apply { in particular, about the
important special case o(f )o(f ) = o(f 2 ) of point (ii).

Example 545 Let f (x) = xn , with n > 2. Consider the two functions g(x) = xn 1 and
h(x) = e x 3xn 1 . It easy to check that g = o(f ) = o(xn ) and h = o(f ) = o(xn ) as
x ! +1.

(i) Summing the two functions we obtain g + h = e x 2xn 1, which is still o(xn ) as
x ! +1, in accordance with Proposition 544-(i).

(ii) Multiplying the two functions we obtain g h = xn 1 e x 3x2n 2 , which is o(xn xn ) =


o(x2n ) as x ! +1, in accordance with Proposition 544-(ii) in the special case o(f )o(f ).
Note that g h is not o(xn ).

(iii) Set c = 3 and consider c g = 3xn 1 . It is easy to check that 3xn 1 is still o(xn ) as
x ! +1, in accordance with Proposition 544-(iii).

(iv) Consider the function l(x) = x + 1. It is easy to check that l = o(g) = o(xn 1 ) as
x ! +1. Consider now the sum l + h, which is a sum of a o(g) and of a o(f ), with
g = o(f ). We have l + h = x + 1 + e x 3xn 1 , which is o(xn ) as x ! +1, i.e., o(f ),
in accordance with Proposition 544-(iv). Note that l + h is not o(g), even if l = o(g).
N

The next proposition presents some classic instances of functions with di erent rates of
divergence.

Proposition 546 Let k; h > 0, > 1 and a > 1. Then,

(i) xk = o ( x) as x ! +1, that is,

xk
lim x
=0
x!+1

(ii) xh = o xk as x ! +1 if h < k;

(iii) loga x = o xk as x ! +1, that is,

loga x
lim =0
x!+1 xk
12.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 389

By the transitivity property of the negligibility relation, from (i) and (ii) it follows that
x
loga x = o ( ) as x ! +1

Proof For all the three functions x , xk , loga x, one has that f (n 1) f (x) f (n) where
n = [x] is the integer part of x: such sequences are therefore increasing. It is then su cient
to use the sequential characterization of the limit of a function and to use the comparison
criterion.

N.B. A function is o (1) as x ! x0 if it tends to 0. Indeed, f (x) = o (1) means that


f (x) =1 = f (x) ! 0. O

12.7.2 Asymptotic equivalence


The asymptotic equivalence for functions is analogous to that for sequences. In particular, we
will see that in the calculation of limits it is possible to replace a function by an asymptotically
equivalent one, which often allows to simplify substantially such calculation.
The development of this argument is parallel to that seen for sequences in Section 8.14.3.
Such parallelism, and the unavoidable repetitiveness that it implies, should not make us lose
sight of the importance of what we will see now. To minimize repetitions, we will omit some
details and comments, as well as the proofs (referring the reader to Section 8.14.3).
Let us start by observing that f (x) g (x) as x ! x0 implies, for given L 2 R,

lim f (x) = L () lim g (x) = L


x!x0 x!x0

That is, two functions asymptotic to one another as x ! x0 have the same limit as x ! x0 .
In particular, we have the following version for functions of Lemma 355.14

Lemma 547 Let f (x) g (x) and h (x) l (x) as x ! x0 . Then:

(i) f (x) h (x) g (x) l (x) as x ! x0 ;

(ii) f (x) =h (x) g (x) =l (x) as x ! x0 , provided that h (x) 6= 0 and l(x) 6= 0 in every
point x 6= x0 of a neighborhood B" (x0 ).

We give now the analog of the important Lemma 356,

Lemma 548 We have

f (x) f (x) + o (f (x)) as x ! x0 (12.37)

Therefore,
lim f (x) = L () lim (f (x) + o (f (x))) = L
x!x0 x!x0

What is negligible with respect to f as x ! x0 , which is what o (f (x)) is as x ! x0 , is


asymptotically irrelevant and can be neglected. Thanks to Lemma 547, we therefore have:

(f (x) + o (f (x))) (g (x) + o (g (x))) f (x) g (x) as x ! x0 (12.38)


14
Relative to that lemma, for brevity here we limit ourselves to products and quotients (which are, in any
case, the more interesting cases).
390 CHAPTER 12. LIMITS OF FUNCTIONS

and
f (x) + o (f (x)) f (x)
as x ! x0 (12.39)
g (x) + o (g (x)) g (x)

Example 549 (i) Consider the limit


p p3 1 2
2 x + 5 x2 + x 2x 2 + 5x 3 + x
lim p = lim
x!+1 3 + x3 + 3x x!+1 3 + x 23 + 3x

3
and let us set f (x) = x and g (x) = x 2 . As x ! +1, we have
1 2
2x 2 + 5x 3 = o (f ) and 3 + 3x = o (g)

By (12.39), we then have


1 2
2x 2 + 5x 3 + x x 1
3 = p !0
3 as x ! +1
3 + x + 3x
2 x 2 x

(ii) Consider the limit


1
x2
+ x24 + e1x x 2 + 2x 4 + e x
lim = lim
x!+1 14 + 18 + 310 x!+1 x 4 + x 8 + 3x 10
x x x

As x ! +1, we have x 8 +3x 10 =o x 4 and, by Proposition 546-(i) e x +2x 4 =o x 2 .


By (12.39), we then have

x 2
+ 2x 4 + e x x 2
4 + x 8 + 3x 10 4
= x2 ! +1 as x ! +1
x x
(iii) Consider the limit
1 cos x
lim
x!0 sin2 x + x3

By applying rst (12.39) and then Lemma 547-(iii), we get


1 cos x 1 cos x 1 cos x 1
! as x ! 0
sin2 x + x3 sin2 x x2 2
N

12.7.3 Terminology
Here too, for the comparison of two functions that both either converge to 0 or diverge to
1, there is a speci c terminology. In particular,

(i) a function f such that limx!x0 f (x) = 0 is called in nitesimal as x ! x0 ;

(ii) a function f such that limx!x0 f (x) = 1 is called in nite as x ! x0 ;

(iii) if two functions f and g are in nitesimal at x0 and such that f = o (g) as x ! x0 , then
f is said to be in nitesimal of higher order at x0 with respect to g;
12.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 391

(iv) if two functions f and g are in nite at x0 and such that f = o (g) as x ! x0 , then f
is said to be in nite of lower order with respect to g.

A function is, therefore, in nitesimal of higher order than another one if it tends to zero
faster, while it is in nite of lower order if it tends to in nity slower.

Example 550 (i) The functions de ned by (x x0 )a are in nitesimal as x ! x+ 0 when


a > 0 and in nite when a < 0. (ii) The functions de ned by x are in nite as x ! +1 and
in nitesimal as x ! 1 when > 1, and vice versa when 0 < < 1. N

12.7.4 The usual bestiary


We recast the results, already provided for sequences, concerning the comparison among
exponential functions x , power functions xk , and logarithmic functions logh x. As x ! +1,
they are in nite when > 1, k > 0 and h > 0, and in nitesimal when 0 < < 1, k < 0 and
h < 0.

(i) If > > 0, then x


= o( x ); indeed, x
= x = ( = )x ! 0.

(ii) xk = o ( x ) for every > 1 and k > 0, as already proved with the ratio criterion. If
instead 0 < < 1 and k > 0, then x = o xk .

(iii) If k1 > k2 > 0, then xk2 = o xk1 ; indeed, xk2 =xk1 = xk2 k1 ! 0.

(iv) If k > 0, then logh x = o xk .

(v) If h1 > h2 , then logh2 x = o logh1 x ; indeed, logh2 x= logh1 x = logh2 h1


x ! 0.

We can still add:

(vi) x = o (xx ) for every > 0; indeed, x =xx = ( =x)x ! 0.

The previous results can be organized in scales of in nities and in nitesimals, in analogy
with what we saw for sequences. For brevity we omit the details.
392 CHAPTER 12. LIMITS OF FUNCTIONS
Chapter 13

Continuous functions

Ibis redibis, non morieris in bello (you will go, you will return, you will not die in war).
So the oracle muttered to the inquiring king, who had to decide whether to go to war. Or,
maybe, the oracle actually said: ibis redibis non, morieris in bello (you will go, you will not
return, you will die in war). A small change in a comma, a dramatic di erence in meaning.
When small changes have large e ects, instability may result: a small change may, sud-
denly, dramatically alter matters. In contrast, stability prevails when small changes can only
have small e ects, so nothing dramatic can happen because of small alterations. Continuity
is the mathematical translation of this general principle of stability for the relations between
dependent and independent variables that functions represent.

13.1 Generalities
Intuitively, a function is continuous when the relation between the independent variable
x and the dependent variable y is \regular", without breaks. The graph of a continuous
function can be drawn without ever lifting the pencil.
This means that a function is continuous at a point x0 of the domain if the behavior
towards x0 of the function is consistent with the value f (x0 ) that it actually assumes at x0 ,
that is, if the limit limx!x0 f (x) is equal to the image f (x0 ).

De nition 551 A function f : A Rn ! R is said to be continuous at a limit point x0 2 A


if
lim f (x) = f (x0 ) (13.1)
x!x0

By convention, f is continuous at each isolated point of A.

Note that we required x0 to belong to the domain A. Indeed, continuity is a consistency


property of the function at the points of its domain, so it loses meaning at points where the
function is not de ned.
The de nition distinguishes between the points of A that are limit points, for which it
makes sense to talk of limits, and the points of A that are isolated.1 For the latter points the
notion of continuity is, conceptually, vacuous: being isolated, they cannot be approached by
other points of A and, therefore, there is no limit behavior for which to require consistency.
1
Recall that a point of A is either a limit point or an isolated point, tertium non datur (Section 5.3.2).

393
394 CHAPTER 13. CONTINUOUS FUNCTIONS

Nevertheless, it is convenient to assume that a function be continuous at the isolated points


of its domain. As an example, consider the function f : R+ [ f 1g ! R de ned by

( p
x for x 0
f (x) =
1 for x = 1

Here x0 = 1 is an isolated point in the domain. Hence, we can (conveniently) say that f is
continuous at every point of its domain.

3
y

1 1

0
-1 O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

In sum, as a matter of convenience, we assume by convention that functions are automatically


continuous at isolated points. That said, this is essentially a punctilio as the important case
is, clearly, when x0 is a limit point of A. In such a case, condition (13.1) requires consistency
between the limit behavior of the function towards x0 and the value f (x0 ) that it assumes
at x0 . As we have seen in the previous chapter, such consistency might well not hold. For
example, we considered the function f : R ! R given by

8
>
> x for x < 1
<
f (x) = 2 for x = 1 (13.2)
>
>
:
1 for x > 1
13.1. GENERALITIES 395

For this function limx!1 f (x) = 1 6= f (1) because at x0 = 1 there is a jump:

3
y

2 2

1 1

0
O 1 x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The function f is, thus, not continuous at the point x0 = 1 because there is no consistency
between its behavior at the limit and its value at x0 . On the other hand, f is continuous at all
the other points of its domain: indeed, it is immediate to verify that limx!x0 f (x) = f (x0 )
for every x0 6= 1, so f does not exhibit other jumps besides the one at x0 = 1.

The distinction between limit points and isolated points becomes super uous for the
important case of functions f : I ! R de ned on an interval I of the real line. Indeed, the
points of any such interval { be it bounded or unbounded, closed, open or semi-closed { are
always limit points, so that f is continuous at any x0 2 I if limx!x0 f (x) = f (x0 ).

A function continuous at all the points of a subset E of the domain A is said to be


continuous on E. The set of all continuous functions on set E is denoted by C(E). For
example, function (13.2) is not continuous on R, but it is continuous on R f1g. A function
continuous at all the points of its domain is called continuous, without further speci cation.
For example, the function sin x is continuous.

We provide now an important characterization of continuity through sequences, based


on Proposition 528.2 Note that it does not distinguish between isolated and limit points x0 .
Proposition 552 A function f : A Rn ! R is continuous at a point x0 of A if and only
if f (xn ) ! f (x0 ) for every sequence fxn g of points of A such that xn ! x0 .
We can summarily express this sequential characterization of continuity of f at x0 as
follows:
xn ! x0 =) f (xn ) ! f (x0 )
Proof The result follows immediately from Proposition 528 once we observe that, when x0
is an isolated point of A, the unique sequence contained in A that tends to x0 is the constant
one fx0 ; x0 ; :::g.

Let us give some examples. We start by observing that elementary functions are contin-
uous.
2
The condition xn 6= x0 of Proposition 528 does not appear here because x0 belongs to A.
396 CHAPTER 13. CONTINUOUS FUNCTIONS

Example 553 (i) Let f : (0; 1) ! R be given by f (x) = log x. Since limx!x0 log x = log x0
for every x0 > 0, the function is continuous.

(ii) Let f : R ! R be given by f (x) = ax , with a > 0. Since limx!x0 ax = ax0 for every
x0 2 R, the function is continuous.

(iii) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. Since limx!x0 sin x =
sin x0 and limx!x0 cos x = cos x0 , both functions are continuous.

(iii) The function f : Q ! R on rationals given by


(
x + 1 if x >
f (x) =
x if x <

is continuous because it holds limx!x0 f (x) = f (x0 ) at each x0 2 Q (cf. Example


522). In a similar vein, the function f : Q ! R given by

1
f (x) =
x2 2

is continuous: N

Let us now see some examples of discontinuity.

Example 554 The function f : R ! R given by


8
< 1 if x 6= 0
f (x) = x (13.3)
:
0 if x = 0

is not continuous at x0 = 0, so on its domain R, but it is continuous on R f0g, as its graph


shows:

10
y
8

0
O x
-2

-4

-6

-8

-10
-2 -1 0 1 2 3 4
13.1. GENERALITIES 397

The same is true for the function f : R ! R given by


8
< 1 if x 6= 0
f (x) = x2 (13.4)
:
0 if x = 0
Its graph vividly shows the discontinuity at the origin:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 555 The function f : R ! R given by


(
2 if x > 1
f (x) = (13.5)
x if x 1
is not continuous at x0 = 1, so on its domain R, but it is continuous on the open set
( 1; 1) [ (1; 1), as it is clear from its graph:

N
398 CHAPTER 13. CONTINUOUS FUNCTIONS

Example 556 (i) The Dirichlet function is not continuous at any point of its domain because
limx!x0 f (x) does not exist for any x0 2 R (Example 505).
(ii) The function f : R ! R given by
(
sin x1 if x 6= 0
f (x) = (13.6)
0 if x = 1
is continuous everywhere except the origin, where it does not have a limit:

1 y
0.8

0.6

0.4

0.2

0
x
-0.2

-0.4

-0.6

-0.8

-1

-0.4 -0.2 0 0.2 0.4 0.6

N
Example 557 De ne f : R ! R by
(
2x + b if x 2
f (x) = (13.7)
4 x2 if x > 2
For which values of b is f continuous at x0 = 2 (so, on its domain)? To answer this question,
it is necessary to nd the value of b such that
lim f (x) = lim f (x) = f (2)
x!2 x!2+

We have limx!2 f (x) = 4 + b = f (2) and limx!2+ f (x) = 0, so that f is continuous at


x0 = 2 if and only if 4 + b = 0, i.e., when b = 4. Summing up, for b = 4 the function
(13.7) is continuous on R, while for b 6= 4 it is continuous on R f2g. N
Note that when f is continuous at x0 , we can write
lim f (x) = f (x0 ) = f ( lim x)
x!x0 x!x0

so that f and lim becomes exchangeable; in other words, limits can be taken inside the
arguments. This exchangeability is the essence of the concept of continuity.

Let us now consider some functions of several variables.


13.1. GENERALITIES 399

Example 558 (i) Let f : Rn ! R be given by


n
X
f (x) = 1 + xi
i=1

By proceeding as in Example 526, we can verify that limx!x0 f (x) = f (x0 ) for every x0 2 Rn .
The function is, therefore, continuous.
(ii) The function
1
f (x1 ; x2 ) = x21 +
x2
is continuous at each point of its natural domain

A = x = (x1 ; x2 ) 2 R2 : x2 6= 0

(iii) A separable function f : A Rn ! R given by


n
X
f (x) = gi (xi )
i=1

is continuous at x0 2 A if each function gi : A R ! R is continuous at x0 2 A, as the


reader can check by using the sequential characterization of continuity (Proposition 552)
along with the equivalence (8.68). N

In reading Proposition 552 one should not forget that, as emphasized in Section 12.3.2, in
the multivariable case there are in nitely many directions along which to approach a point
x0 . The next example is a stark illustration of this remark.

Example 559 Following Peano (1884),3 de ne f : R2 ! R by


8
> x x2
< x21+x24 if (x1 ; x2 ) 6= (0; 0)
1 2
f (x1 ; x2 ) =
>
: 0 if (x1 ; x2 ) = (0; 0)

This function is separately continuous, that is, it is continuous in each variable once the
other is xed { so the function becomes a scalar function in the \free" variable.4 Indeed, by
setting x2 = k we have

xk 2 x0 k 2
lim f (x; k) = lim = = f (x0 ; k) 8k 2 R
x!x0 x!x0 x2 + k 4 x20 + k 4

while by setting x1 = k we have

kx22 kx20
lim f (k; x) = lim = = f (k; x0 ) 8k 2 R
x!x0 x!x0 k 2 + x4 k 2 + x40
3
With his sharp mind, Giuseppe Peano was a master of counterexamples (a few of them entered calculus
folklore).
4
See Section 20.4.1 for more on this.
400 CHAPTER 13. CONTINUOUS FUNCTIONS

So, the function behaves according to continuity as we approach it at a point (x1 ; x2 ) of the
plane along the vertical and horizontal directions:

0.8

0.6

0.4

0.2

x0
2
-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

Yet, the function is not continuous at the origin 0 = (0; 0). Indeed, let us approach the
origin along a parabola (x1 ; x2 ) = (t2 ; t), with t 2 R. We have

t4 1
lim f (t; t) = lim = 6= 0 = f (0)
t!0 t!0 t4 + t4 2
So, if we take a sequence xn = (1=n2 ; 1=n), we have xn ! 0 but not f (xn ) ! f (0). By
Proposition 552, the function is not continuous at the origin. N

Surprisingly, a function can thus be separately continuous in its arguments, so along


the directions identi ed by the axes, but not jointly continuous in them, so not along all
directions.5 Single variable intuition about continuity may prove wrong when moving to the
multivariable case.

Our nal example of a continuous function of several variables is noteworthy.

Proposition 560 The norm k k : Rn ! R is a continuous function.

Proof Consider a sequence fxn g in Rn such that xn ! x 2 Rn . By the norm inequality


(4.11),
jkxn k kxkj kxn xk ! 0
and so kxn k ! kxk. We conclude that

xn ! x =) kxn k ! kxk (13.8)

This proves that the norm k k : Rn ! R is continuous (Proposition 552).

O.R. Naively, we could claim that a function such as f (x) = 1=x has a (huge) discontinuity
at x = 0. After all, it makes a \big jump" by passing from 1 to +1.
5
Later in the book, Example 1769 will present another example of this kind.
13.1. GENERALITIES 401

10
y
8

0
O x
-2

-4

-6

-8

-10
-2 -1 0 1 2 3 4

In contrast, the function g (x) = log x does not su er from any such problem, so it seems
\more continuous":

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

If we pay close attention to these two functions, however, we would realize that 1=x commits
the little sin of not being de ned for x = 0, an \original" sin, while log x commits the much
more serious sin of being de ned neither at x = 0 nor at any x < 0.
The truth is that, at the points at which a function is not de ned it is meaningless
to wonder about its continuity,6 a property of a function that can be considered only at
points where the function is de ned. At such points, the functions 1=x and log x are both
continuous. H

6
It would be as asking if green pigs are able to y: they do not exist, so the question is meaningless.
402 CHAPTER 13. CONTINUOUS FUNCTIONS

13.2 Discontinuity
As the previous examples indicate, for functions of a single variable f : I ! R there are
di erent types of discontinuity.7 Speci cally, f may be not continuous at x0 because:

(i) the two-sided limit at x0 exists nite but is di erent from f (x0 ), i.e.,

lim f (x) 6= f (x0 )


x!x0

(ii) the one-sided limits at x0 exist nite but are di erent, i.e.,

lim f (x) 6= lim f (x)


x!x0 x!x+
0

(so, limx!x0 f (x) does not exist);

(iii) at least one of the one-sided limits at x0 is either 1 or does not exist.

The discontinuity at x0 = 1 of the function (13.2) is of type (i) because

lim f (x) = 1 6= 2 = f (1)


x!1

The discontinuity at x0 = 1 of the function (13.5) is of type (ii) because

lim f (x) = 1 6= lim f (x) = 2


x!1 x!1+

The discontinuity at x0 = 0 of the function (13.3) is of type (iii) because

lim f (x) = 1=
6 lim f (x) = +1
x!0 x!0+

Similarly, the discontinuity at x0 = 0 of the function (13.4) is of type (iii) because

lim f (x) = lim f (x) = lim f (x) = +1


x!0 x!0+ x!0

(the two-sided limit here exists, but it is in nite). Finally, the discontinuity at each point
x0 2 R of the Dirichlet function is also of type (iii) because it is easy to see that its one-sided
limits do not exist; the same is true for the discontinuity at the origin of function (13.6). For
convenience, next we line up these discontinuity examples:

10 5
y 1 y
y
8 0.8
4
6 0.6
3 3
y 4 0.4

2 0.2
2 2
2
0 0
O x 1 x
-0.2
1 1 -2

-0.4
-4 0
0 O x -0.6
O 1 x -6
-1 -0.8
-8
-1
-1
-10 -2
-2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4 -0.4 -0.2 0 0.2 0.4 0.6
-2

-3
-3 -2 -1 0 1 2 3 4 5

7
Recall that f (x0 ) 2 R, we cannot have f (x) = 1. To ease matters, in this section we focus on functions
f : I ! R with interval domains.
13.2. DISCONTINUITY 403

When the discontinuity at a point x0 is of type (i) it is called removable, otherwise it is


called non-removable. In particular, (ii) is called (non-removable) jump discontinuity, while
(iii) is called (non-removable) essential discontinuity.

Removable discontinuity can be \ xed", as the terminology suggests, by modifying the


function f at x0 as follows:
(
f (x) if x 6= x0
f~ (x) = (13.9)
limx!x0 f (x) if x = x0
The function f~ is the \ xed" version of the function f that restores the continuity at x0 .

Example 561 (i) The xed version of the function (13.2) is


( (
f (x) if x 6= 1 x if x 1
~
f (x) = =
limx!1 f (x) if x = 1 1 if x > 1

(ii) De ne f : R ! R by
( x2 1
x 1 if x 6= 1
f (x) =
0 if x = 1

This function has a removable discontinuity at x0 = 1. Its xed version is


( x2 1
x 1 if x 6= 1
f~ (x) =
2 if x = 1

where the break at 1 is repaired. N

Such a xing is, in general, no longer possible for non-removable discontinuities, in partic-
ular for jump discontinuities. To see it, let f : I ! R be a function with a jump discontinuity
at an interior point x0 of I. The jump (or saltus) of f at x0 is given by the di erence

lim f (x) lim f (x)


x!x+
0 x!x0

For example, the function (13.5) has at x0 = 1 a jump equal to

lim f (x) lim f (x) = 2 1=1


x!x+
0 x!x0

There is no easy x for this non-removable discontinuity. To restore continuity one has to
play with the domain by considering suitable restrictions that get rid of the problematic
points, as we did in Example 13.5 by getting rid of the discontinuity point x0 = 1.
Thus, jumps at interior points have no easy x. Matters are, instead, simpler at the
endpoints of the interval I that belong to it. In this case, the jump would be

lim f (x) f (x0 )


x!x+
0
404 CHAPTER 13. CONTINUOUS FUNCTIONS

at the lower endpoint x0 = min I and

f (x0 ) lim f (x)


x!x0

at the upper endpoint x0 = max I. This kind of jump discontinuity is actually removable.
Formally, this is the case because at endpoints one-sided and two-sided limits coincide.8
Intuitively, it is enough to look at the graph of the function, as the next example illustrates.

Example 562 When I = [a; b], the possible jumps at the endpoints are

lim f (x) f (a) and f (b) lim f (x)


x!a+ x!b

For instance, de ne f : [0; 1] ! R by


8
< x if x 2 (0; 1)
f (x) =
: 1 if x 2 f0; 1g
2
with graph
3

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4

This function is continuous on (0; 1) but not at the endpoints, where it has the jumps

1 1
lim f (x) f (a) = and f (b) lim f (x) =
x!0+ 2 x!1 2

Clearly, the xed version f~ : [0; 1] ! R of function f is the identity function given by
f~ (x) = x. N

Summing up, the distinction between removable and jump discontinuity is sharp for
functions de ned on open intervals, with jump discontinuity being a signi cantly more severe
type of discontinuity. For functions de ned on general intervals, the distinction is relevant
only at the interior points because a jump discontinuity at an endpoint is actually removable.
8
Recall (12.18) of Section 12.2.3.
13.3. OPERATIONS AND COMPOSITION 405

A monotone function f : I ! R cannot have discontinuities of type (iii). Indeed, sup-


pose that f is increasing (similar considerations hold in the decreasing case). Increasing
monotonicity guarantees that the right and the left limits exist, with

lim f (x) lim f (x) lim f (x) lim f (x)


x!x0 x!x+
0 x!y0 x!y0+

for each pair of distinct points x0 < y0 of the domain of f . Therefore, these limits exist
nite, which excludes discontinuities of type (iii).
Moreover, f cannot have removable discontinuities at interior points of I because they
would violate monotonicity. Therefore, a monotone function can only have jump disconti-
nuities. Indeed, the next result shows that a monotone function can have at most countably
many jump discontinuities. The proof of this useful result is based on the following lemma,
which is of independent interest.

Lemma 563 A collection of disjoint intervals of R is at most countable.

Proof Let fIj gj2J be a collection of disjoint intervals of R. By the density of the rational
numbers, each interval Ij contains a rational number qj . Since these intervals are disjoint,
we have qj 6= qj 0 for j 6= j 0 . Then, the set of rational numbers fqj gj2J is a proper subset of
Q and is, therefore, at most countable. In turn, this implies that the index set J is, at most,
countable.

The disjointedness hypothesis cannot be removed: for instance, the set of overlapping
intervals f( r; r) : r 2 Rg is clearly uncountable.

Proposition 564 A monotone function f : I ! R can have at most countably many jump
discontinuities.

Proof Consider rst the open interval I = (a; b) with a; b 2 R (i.e., bounded or not). A
jump discontinuity of the function f determines a bounded interval in its codomain { i.e., in
the real line { with endpoints given by limx!b f (x) and limx!a+ f (x). By the monotonicity
of f , these intervals determined by jumps are disjoint. By Lemma 563, the intervals, and
therefore the jumps of f , are at most countable. This proves the result when I = (a; b).
Now, let I be any interval. Its interior has the form (a; b), with a; b 2 R. By what
just proved, f has at most countably many jump discontinuities on (a; b). This observation
readily implies the result (why?).

In the proof the monotonicity hypothesis is key for having countably many discontinuities:
it guarantees that the intervals de ned by the jumps of the function do not overlap.

13.3 Operations and composition


The next result illustrates the behavior of continuity with respect to the algebra of functions.

Proposition 565 Let f; g : A Rn ! R be continuous at x0 2 A. Then:

(i) the function f + g is continuous at x0 ;


406 CHAPTER 13. CONTINUOUS FUNCTIONS

(ii) the function f g is continuous at x0 ;

(iii) the function f =g is continuous at x0 , provided that g (x0 ) 6= 0.

Proof We prove (i), leaving to the reader the other points. Since limx!x0 f (x) = f (x0 ) 2 R
and limx!x0 g (x) = g (x0 ) 2 R, Proposition 536-(i) yields

lim (f + g) (x) = lim f (x) + lim g (x) = f (x0 ) + g (x0 ) = (f + g) (x0 )


x!x0 x!x0 x!x0

Therefore, f + g is continuous at x0 .

For example, each polynomial f (x) = 2 n


0+ 1 x+ 2x + + nx is continuous. Indeed,
for each x0 2 R we have
2 n
lim f (x) = lim 0 + 1x + 2x + + nx
x!x0 x!x0
2 n
= lim 0 + lim 1x + lim 2x + + lim nx
x!x0 x!x0 x!x0 x!x0
2 n
= 0 + 1 x0 + 2 x0 + + n x0 = f (x0 )

Continuity is preserved by the composition of functions:

Proposition 566 Let f : A Rn ! R and g : B R ! R be such that Im f B. If f is


continuous at x0 2 A and g is continuous at f (x0 ), then g f is continuous at x0 .

Proof Let fxn g A be such that xn ! x0 . By Proposition 552, f (xn ) ! f (x0 ). Since
g is continuous at f (x0 ), another application of Proposition 552 shows that g (f (xn )) !
g (f (x0 )). Therefore, g f is continuous at x0 .

As the next example shows, the result can be useful also in the computation of limits
since, when its hypotheses hold, we can write

lim (g f ) (x) = (g f ) (x0 ) = g (f (x0 )) = g( lim f (x)) (13.10)


x!x0 x!x0

If a limit involves a composition of continuous functions, (13.10) makes its computation


immediate.

Example 567 Let f : R f g ! R be given by f (x) = x2 = (x + ) and g : R ! R be


given by g (x) = sin x. Since g is continuous, by Proposition 566 g f is continuous at each
x 2 R f g. The observation is useful, for example, to compute the limit

x2
lim sin
x! x+

Indeed, once we observe that it can be written in terms of g f , then by (13.10) we have

x2 2
lim sin = lim (g f ) (x) = (g f ) ( ) = sin = sin =1
x! x+ x! 2 2
Therefore, continuity permits to calculate limits by substitution. N
13.4. ZEROS AND EQUILIBRIA 407

13.4 Zeros and equilibria


Continuous functions have remarkable properties that assign them a key role in applications.
In this section we study some of these properties, including a short preview of the all-
important Weierstrass Theorem (whose detailed study is postponed to Chapter 22).

13.4.1 Zeros
The rst result, Bolzano's Theorem,9 is very intuitive. Yet its proof, although simple, is
not trivial, showing how statements that are intuitive may happen to be di cult to prove.
Intuition is a fundamental guide in the search for new results, but it may be misleading.
Sometimes, properties that appeared to be intuitively true turned out to be false.10 For this
reason, the proof is the unique way of establishing the validity of a result; intuition, even the
most re ned one, must at a certain point leave the place to the rigor of the mathematical
argument.

Theorem 568 (Bolzano) Let f : [a; b] ! R be a continuous function. If f (a) f (b) 0,


then there exists c 2 [a; b] such that f (c) = 0. Moreover, if f is strictly monotone, such c is
unique.

Condition f (a) f (b) 0 is equivalent to asking that


f (a) 0 f (b) or f (b) 0 f (a)
That is, the values f (a) and f (b) cannot have the same sign. With this, the theorem should
have a clear intuitive meaning, revealed by the next gure:

Proof If f (a) f (b) = 0, we have f (a) = 0 or f (b) = 0. In the rst case, the result holds
by setting c = a; in the second case, by setting c = b. If instead f (a) f (b) < 0, we have
either f (a) < 0 < f (b) or f (a) > 0 > f (b). We rst study the case f (a) < 0 < f (b). Let
C = fx 2 [a; b] : f (x) < 0g
9
The result is named after Bernard Bolzano, who gave a rst proof in 1817.
10
Recall Guidi's crescendo in Section 10.3.2.
408 CHAPTER 13. CONTINUOUS FUNCTIONS

In words, C is the set of points where f is strictly negative. This set is not empty because
a 2 C. Let c = sup C. By Proposition 127,

c x 8x 2 C (13.11)

and
8" > 0; 9x0 2 C; x0 > c " (13.12)
We next prove that f (c) = 0. By contradiction, assume that f (c) 6= 0, i.e., that either
f (c) < 0 or f (c) > 0. Let us study separately the two possibilities and show that in both of
them we reach a contradiction.

Case 1: f (c) < 0. Hence, c < b because f (b) > 0. Since f is continuous, by Theorem 532
(permanence of sign) there exists > 0 small enough so that:11

f (x) < 0 8x 2 (c ; c + ) \ [a; b]

Since c < b, we can actually take small enough so that also b c. Thus, c + b
and so (c; c + ) [a; b]. With this, take any x 2 (c; c + ). By the de nition of C, we have
x 2 C. As x > c, this contradicts property (13.11), so the fact that c is a supremum. We
thus reached a contradiction.

Case 2: f (c) > 0. Again by Theorem 532 there exists > 0 small enough so that

f (x) > 0 8x 2 (c ; c + ) \ [a; b]

By the de nition of C, we have (c ; c + ) \ C = ;. Then, for each 0 < " there is no


x0 2 C such that x0 > c ". But this contradicts property (13.12), so the fact that c is a
supremum. Thus, also in this case we reached a contradiction.

Summing up, in both cases we reached a contradiction. We conclude that f (c) = 0. This
completes the proof when f (a) < 0 < f (b). Next, assume that f (a) > 0 > f (b). Consider
the continuous function f . Clearly, ( f ) (a) < 0 < ( f ) (b). So, by what has been just
proved, there exists c 2 [a; b] such that ( f ) (c) = 0. Hence, f (c) = 0 as desired.
Finally, if f is strictly monotone, it is injective (Proposition 218) and therefore there
exists a unique point c 2 [a; b] such that f (c) = 0.

Bolzano's Theorem is a central result in the study of equations, as it will be seen momen-
tarily in Chapter 14. To preview this role, here we show a simple application to the existence
of real solutions of a polynomial equation. To this end, let f : R ! R be a polynomial
2 n
f (x) = 0 + 1x + 2x + + nx (13.13)

with an 6= 0, so of degree n, that de nes a polynomial (or algebraic) equation

f (x) = 0 (13.14)
11
For a function f : [a; b] ! R continuous at a point x0 2 [a; b], Theorem 532 ensures the existence of a
neighborhood B (x0 ) = (x0 ; x0 + ) such that f (x) f (x0 ) > 0 for all x 2 B (x0 ) \ [a; b]. Indeed, by the
continuity of f at x0 we can take L = f (x0 ) in the statement of Theorem 532 (the use of " or is, instead,
just an inconsequential choice of a convenient notation).
13.4. ZEROS AND EQUILIBRIA 409

of degree n. This equation does not always have real solutions: for example, when f (x) =
x2 + 1 the polynomial equation (13.14) becomes a second-degree equation

x2 + 1 = 0

that has no real solutions. By Bolzano's Theorem, we have the following result that guar-
antees that each polynomial equation of odd degree { e.g., a cubic equation { has always at
least a real solution.

Corollary 569 If f : R ! R is a polynomial of odd degree, there exists at least a x


^2R
such that f (^
x) = 0.

Proof Let f : R ! R be given by f (x) = 0 + 1 x + 2 x2 + + n xn , with n 6= 0 for


some odd natural number n. Let us suppose n > 0 (otherwise, we consider f ) and let
g : R ! R be given by g (x) = 0 + 1 x+ 2 x2 + + n 1 xn 1 . We have g (x) = o (xn ) both
as x ! +1 and as x ! 1. We can therefore write f (x) = n xn + o (xn ) both as x ! +1
and as x ! 1, which implies limx!+1 f (x) = +1 and limx! 1 f (x) = 1. Since f is
continuous, there exist x1 < x2 such that f (x1 ) < 0 < f (x2 ). The function f is continuous
on the interval [x1 ; x2 ]. Therefore, by Bolzano's Theorem there exists x
^ 2 (x1 ; x2 ) such that
f (^
x) = 0.

O.R. In presenting Bolzano's Theorem, we remarked the limits of intuition. A nice example
in this regard is the following. Imagine you put a rope around the Earth at the equator
(about 40; 000 km) such that it perfectly adheres to the equator in each point. Now, imagine
that you add one meter to the rope and you lift it by keeping uniform its distance from the
ground. What is the measure of this uniform distance? We are all tempted to say \very,
very small: one meter out of forty thousands km is nothing!" Instead, no: the distance is 16
cm. Indeed, if c denotes the equatorial Earth circumference (in meters), the Earth radius is
r = c=2 ; if we add one meter, the new radius is r0 = (c + 1) =2 and the di erence between
the two is r0 r = 1=2 ' 0:1592. This proves another remarkable result: the distance of
about 16 centimeters is independent of c: no matter whether it is the Earth, or the Sun, or
a tennis ball, the addition of one meter to the length of the rope always causes a lift of 16
cm! As the manifesto of the Vienna circle remarked \Intuition ... is especially emphasized
by metaphysicians as a source of knowledge.... However, rational justi cation has to pursue
all intuitive knowledge step by step. The seeker is allowed any method; but what has been
found must stand up to testing." H

13.4.2 Equilibria
The next result is a further consequence of Bolzano's Theorem, with a remarkable economic
application: the existence and the uniqueness of the market equilibrium price.

Proposition 570 Let f; g : [a; b] ! R be continuous. If f (a) g (a) and f (b) g (b),
there exists c 2 [a; b] such that
f (c) = g (c)
If f is strictly decreasing and g is strictly increasing, such c is unique.
410 CHAPTER 13. CONTINUOUS FUNCTIONS

Proof Let h : [a; b] ! R be de ned by h (x) = f (x) g (x). Then

h (a) = f (a) g (a) 0 and h (b) = f (b) g (b) 0

Since h is continuous, by Bolzano's Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = g (c).
If f is strictly decreasing and g is strictly increasing, then h is strictly decreasing. There-
fore, again by Bolzano's Theorem, c is unique.

We now apply the result to establish the existence and uniqueness of the market equi-
librium price. Let D : [a; b] ! R and S : [a; b] ! R be the demand and supply functions of
some good, where [a; b] R+ is the set of the prices at which the good can be traded (see
Section 8.4). A pair (p; q) 2 [a; b] R+ of prices and quantities is called market equilibrium
if

q = D (p) = S (p)

A fundamental problem is the existence, and the possible uniqueness, of such an equilib-
rium. By Proposition 570, so ultimately by Bolzano's Theorem, we can solve the problem
in a very general way. Let us assume that the functions D and S are both continuous, so
neither side of the market abruptly responds to price changes, with

S (a) D (a) and S (b) D (b)

At the smallest possible price a , the demand of the good is greater than its supply, while
the opposite is true at the highest possible price b. These natural hypotheses ensure, by
Proposition 570, the existence of an equilibrium price p 2 [a; b], i.e., such that D (p) = S (p).
The equilibrium quantity is q = D (p) = S (p). Therefore, the pair of prices and quantities
(p; q) is a market equilibrium.
Moreover, again by Proposition 570, the market has a unique market equilibrium (p; q)
when the demand function D is strictly decreasing { i.e., at greater prices, smaller quantities
are demanded { and the supply function S is strictly increasing { i.e., at greater prices,
greater quantities are o ered.
Because of its importance, we state formally this market equilibrium result.

Proposition 571 Let D : [a; b] ! R and S : [a; b] ! R be continuous. If D (a) S (a) and
D (b) S (b), there exists a market equilibrium (p; q) 2 [a; b] R+ . If, in addition, D is
strictly decreasing and S is strictly increasing, such equilibrium is unique.

The next gure illustrates graphically the result, which corresponds to the classic \inter-
section" of demand and supply:
13.5. WEIERSTRASS' THEOREM: A PREVIEW 411

6
y
D
5

S
3

0
O b x
-1
-0.5 0 0.5 1 1.5 2

In equilibrium analysis, Bolzano's Theorem is often directly applied using the demand
excess function E : [a; b] ! R de ned by
E (p) = D (p) S (p)
We have E (p) 0 when at the price p the demand exceeds the supply; otherwise, we have
E (p) 0. Therefore, p 2 [a; b] is an equilibrium price if and only if it solves the market
equation
E (p) = 0 (13.15)
i.e., if and only if p equalizes demand and supply. The equilibrium price p is a zero of the
excess demand function; the conditions on the functions D and S assumed in Proposition
571 guarantee the existence and uniqueness of such a zero.

A nal observation: the reader can easily verify that Proposition 570 holds as long as
(i) the monotonicity of f and g are opposite: one is increasing and the other decreasing,
(ii) at least one of them is strict. In the statement we assumed f to be strictly decreasing
and g to be strictly increasing both for simplicity and in view of the application to market
equilibrium.

13.5 Weierstrass' Theorem: a preview


A continuous function de ned on a compact (i.e., closed and bounded) domain enjoys a
fundamental property: it attains both its maximum and minimum values, that is, it has a
maximizer and a minimizer. This result is contained in the celebrated Weierstrass' Theorem
(sometimes called Extreme Value Theorem). Here we state the theorem for functions of a
single variable de ned on a compact interval [a; b]. In Chapter 22 we will state and prove its
general multivariable form.
Theorem 572 A continuous function f : [a; b] ! R has (at least) a minimizer and (at least)
a maximizer in [a; b], that is, there exist x1 ; x2 2 [a; b] such that
f (x1 ) = max f (x) and f (x2 ) = min f (x)
x2[a;b] x2[a;b]
412 CHAPTER 13. CONTINUOUS FUNCTIONS

The hypotheses of continuity of f and of compactness (closure and boundedness) of its


domain are both indispensable. In absence of either hypothesis, the existence of a maximizer
or of a minimizer is no longer guaranteed, as the next simple examples show.
Example 573 (i) Let f : [0; 1] ! R be given by
8
< x if x 2 (0; 1)
f (x) =
: 1 if x 2 f0; 1g
2
Then f is de ned on the compact interval [0; 1] but is not continuous (cf. Example 562).

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4

It is easy to see that f has no maximizers. Indeed, rst observe that the endpoints are
not maximizers as f (x) > 1=2 for each x 2 (1=2; 1). Now, suppose that x 2 (1=2; 1) is a
candidate maximizer. As there exists x < y < 1, we have f (y) = y > x = f (x) and so the
candidate fails. We conclude that there are no maximizers. A similar argument shows that
there are no minimizers as well.
(ii) Let f : (0; 1) ! R be the identity function f (x) = x de ned over the open unit
interval. Here f is continuous but the interval (0; 1) is not compact (it is open). In this case,
too, the function has no maximizers and minimizers.

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4
13.5. WEIERSTRASS' THEOREM: A PREVIEW 413

(iii) Let f : [0; 1) ! R be again the identity function f (x) = x now de ned on the
positive half-line. This function is continuous but the interval [0; 1) is not compact (it is
closed but not bounded). It has no maximizers and the only minimizer is the origin.

y
2

0
O x

-1

-2

-3
-2 -1 0 1 2 3 4 5

(iv) Let f : R ! R be given by (see Proposition 276)


(
1 12 ex if x < 0
f (x) = 1 x
2e if x 0

with graph
2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

The function f is continuous (and bounded), but R is not compact (it is closed but not
bounded). This function has no maximizers and minimizers. N

Summing up, Weierstrass' Theorem shows that continuity and compactness are general
conditions jointly ensuring the existence of maximizers and minimizers, a most important
fact. Of course, they can exist even without these conditions: it is immediate to construct
414 CHAPTER 13. CONTINUOUS FUNCTIONS

examples of discontinuous functions de ned on non-compact sets that have maximizers and
minimizers (for instance, take the Dirichlet function over the real line: the rationals are
maximizers, the irrationals are minimizers). Yet, without continuity and compactness one
needs to painfully proceed example by example to nd explicitly maximizers and minimizers,
as we no longer have a general result like the Weierstrass Theorem ensuring their existence.

13.6 Intermediate Value Theorem


13.6.1 The theorem
An important extension of Bolzano's Theorem is the Intermediate Value Theorem, to which
we devote this section. The next lemma establishes a rst remarkable property.

Lemma 574 Let f : [a; b] ! R be continuous, with f (a) f (b). If

f (a) z f (b)

then there exists c 2 [a; b] such that f (c) = z. If f is strictly increasing, such c is unique.

Proof If f (a) = f (b), it is su cient to set c = a or c = b. Let f (a) < f (b) and let
h : [a; b] ! R be de ned by h (x) = f (x) z. We have

h (a) = f (a) z 0 and h (b) = f (b) z 0

Since f is continuous, by Bolzano's Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = z.
The function h is strictly monotone if and only if f is so. Therefore, by Bolzano's
Theorem such c is unique whenever f is strictly monotone.

The function assumes, therefore, all the values between f (a) and f (b), without any
\break". The lemma formalizes the intuition given at the beginning of the chapter that the
graph of a continuous function can be drawn without ever lifting the pencil.

Together with Weierstrass' Theorem, the last lemma implies the following classic result.

Theorem 575 (Intermediate Value Theorem) Let f : [a; b] ! R be continuous. Set

m = min f (x) and M = max f (x)


x2[a;b] x2[a;b]

Then, for any z with


m z M
there exists c 2 [a; b] such that f (c) = z. If f is strictly monotone, such c is unique.

Proof Let z 2 [m; M ]. By Weierstrass' Theorem, m and M are well de ned and so there
exist x1 ; x2 2 [a; b] such that m = f (x1 ) and M = f (x2 ). Assume rst that x1 x2 and
consider the compact interval [x1 ; x2 ] [a; b]. The function f is continuous on [x1 ; x2 ]. Since
f (x1 ) z f (x2 ), by Lemma 574 there exists c 2 [x1 ; x2 ] such that f (c) = z.
13.6. INTERMEDIATE VALUE THEOREM 415

If x1 > x2 , consider the continuous function f on the compact interval [x2 ; x1 ] [a; b].
Clearly, M = minx2[a;b] ( f ) (x) and m = maxx2[a;b] ( f ) (x). As M z m,
by what has been just proved, there exists c 2 [x2 ; x1 ] such that ( f ) (c) = z. Hence,
f (c) = z.
Finally, a strictly monotone f is injective (Proposition 218) and, therefore, the point
c 2 [a; b] such that f (c) = z is unique.

The image of f is thus the compact interval:

Im f = [m; M ] (13.16)

as illustrated by the following gure:

5
y
4
M

3
z = f(c)

1
m
0 O a x c x b x
M m

-1

-1 0 1 2 3 4 5 6

By the Intermediate Value Theorem, a continuous f maps the compact interval [a; b] into
the compact interval [m; M ]. We can express this property by saying that the continuous
image of a compact interval is a compact interval. Momentarily, we will extend this property
to any interval, compact or not (Proposition 580).
Bolzano's Theorem is, via Lemma 574, behind the Intermediate Value Theorem. Its
statement is a special case. Indeed, observe that its hypothesis f (a) f (b) 0 can be
equivalently stated as

min ff (a) ; f (b)g 0 max ff (a) ; f (b)g

Clearly,
m min ff (a) ; f (b)g 0 max ff (a) ; f (b)g M
and so the Intermediate Value Theorem ensures the existence of c 2 [a; b] such that f (c) = 0.

The continuity of f on [a; b] is crucial for the Intermediate Value Theorem. To see this,
consider, for example, the so-called signum function sgn : R ! R de ned by
8
( x >
> 1 if x > 0
if x =
6 0 <
jxj
sgn x = = 0 if x = 0 (13.17)
0 if x = 0 >
>
:
1 if x < 0
416 CHAPTER 13. CONTINUOUS FUNCTIONS

Its restriction sgn : [ 1; 1] ! R on the interval [ 1; 1] is continuous at all the points of


this interval except at the origin, where it has a non-removable jump discontinuity. So, the
continuity hypothesis of the Intermediate Value Theorem does not hold. The image of sgn x
consists of only three points f 1; 0; 1g, with m = 1 and M = 1. For every z 2 [ 1; 1] with
z 6= 1; 0; 1 there is no x 2 [ 1; 1] such that sgn x = z.

13.6.2 Some consequences


A nice consequence of the Intermediate Value Theorem is a characterization of scalar con-
tinuous injective functions that completes what we established in Proposition 218. In the
rest of the section I denotes any interval, bounded or not, of the real line.

Proposition 576 A continuous function f : I ! R is injective if and only if it is strictly


monotone.

The proof of this result relies on a lemma that presents a useful characterization of strict
monotonicity on open intervals.

Lemma 577 A function f : (a; b) ! R, with a; b 2 R, is strictly monotone if and only if

x < z < y =) min ff (x) ; f (y)g < f (z) < max ff (x) ; f (y)g (13.18)

for all x; y; z 2 (a; b).

Strict monotonicity thus holds if and only if x < z < y implies either f (x) < f (z) < f (y)
or f (y) < f (z) < f (x).

Proof \If". Let x; y; z 2 (a; b) be such that x < z < y. If f is strictly monotone, then
f (x) < f (z) < f (y), and so min ff (x) ; f (y)g = f (x) < f (z) < f (y) = max ff (x) ; f (y)g.
\Only if". Suppose, per contra, that f is not strictly monotone. Then, there exist
x; y; w; z 2 (a; b) such that

x<y and f (x) f (y) ; w<z and f (w) f (z)

As (a; b) is an open interval, there exist ; 2 (a; b) such that <x<y< and <w<
z < . As < w < z, by (13.18) we have

min ff ( ) ; f (y)g < f (x) < max ff ( ) ; f (y)g

As f (x) f (y), this implies that max ff ( ) ; f (y)g = f ( ) > f (x). As x < y < , by
(13.18) we have
min ff (x) ; f ( )g < f (y) < max ff (x) ; f ( )g
As f (x) f (y), this implies that min ff (x) ; f ( )g = f ( ) < f (y). We conclude that

f ( ) < f (y) f (x) < f ( ) (13.19)

As < w < z, by (13.18) we have

min ff ( ) ; f (z)g < f (w) < max ff ( ) ; f (z)g


13.6. INTERMEDIATE VALUE THEOREM 417

As f (w) f (z), this implies min ff ( ) ; f (z)g = f ( ) < f (w). As w < z < , by (13.18)
we have
min ff (w) ; f ( )g < f (z) < max ff (w) ; f ( )g
As f (w) f (z), this implies max ff (w) ; f ( )g = f ( ) > f (z). We conclude that

f ( ) < f (w) f (z) < f ( )

which contradicts (13.19).

Proof of Proposition 576 The \if" follows from Proposition 218. As to the converse,
assume that f is injective. We want to show that f is strictly monotone. It is enough to
show that f is strictly monotone on int I since the strict monotonicity on I then follows
from the continuity of f (why?). So, let us assume that I is an open interval. Suppose, by
contradiction, that f is not strictly monotone. By the last lemma, there exist x < z < y
such that either f (z) max ff (x) ; f (y)g or f (z) min ff (x) ; f (y)g. As f is injective,
these inequalities are actually strict. Suppose that f (z) > max ff (x) ; f (y)g, the other
case being similarly handled. Let f (z) > k > max ff (x) ; f (y)g. That is, we have both
f (z) > k > f (x) and f (z) > k > f (y). By the Intermediate Value Theorem, there exist
t0k 2 [x; z] and t00k 2 [z; y] such that f (t0k ) = f (t00k ) = k, thus contradicting the injectivity of
f . We conclude that f is strictly monotone.

Without continuity the \only if" fails: consider the discontinuous function f : R ! R
given by
(
x if x 2 Q
f (x) =
x else
It is not strictly monotone: if x = 3, z = and y = 4, we have x < z < y and f (z) <
min ff (x) ; f (y)g. Yet, f is injective. Indeed, let x 6= y. We have f (x) = x 6= y = f (y) if
x; y 2 Q and f (x) = x 6= y = f (y) if x; y 2 = Q. If x 2 Q and y 2= Q, then f (x) = x 2 Q
and f (y) = y 2 = Q, and so again f (x) 6= f (y). We conclude that f is injective.

Thanks to last result, we can prove that inverses inherit continuity.

Proposition 578 Let f : I ! R be an injective function. If f is continuous, its inverse


f 1 is continuous.

Proof Let f : I ! R be an injective and continuous function. By Proposition 576, f is


strictly monotone, say strictly increasing (the argument is similar if f is strictly decreasing).
Let fyn g Im f be such that yn ! y0 2 Im f . We want to show that f 1 (yn ) ! f 1 (y0 ).
Set xn = f 1 (yn ) for n 0. Assume that x0 is an interior point of I. Fix " > 0 small
enough so that x0 "; x0 + " 2 I. Set y"+ = f (x0 + ") and y" = f (x0 "). Since f is strictly
increasing, we have y" < y0 < y"+ . Since yn ! y0 , there is n" 1 such that

y" yn y"+ 8n n"

Being f strictly increasing, we then have


1 1 1
x0 "=f y" f (yn ) f y"+ = x0 + " 8n n"
418 CHAPTER 13. CONTINUOUS FUNCTIONS

So,
1 1
x0 " lim inf f (yn ) lim sup f (yn ) x0 + "
Since these inequalities hold for each " > 0 small enough, we conclude that
1 1
lim f (yn ) = x0 = f (y0 )

A similar argument holds if x0 is an endpoint of I.

The next simple example shows the importance of an interval domain.

Example 579 In the real line, take A = [0; 1) [ f3g. De ne the function f : A ! R by
(
x if x 2 [0; 1)
f (x) =
1 if x = 3
This function is injective. As 3 is an isolated point of A, it is also continuous. Its image is
easily seen to be the closed unit interval [0; 1]. Its inverse f 1 : [0; 1] ! R is given by
(
1
x if y 2 [0; 1)
f (y) =
3 if y = 1
It is discontinuous at 1. N

Next we show that the continuous image of an interval is an interval, thus extending to
any interval what the Intermediate Value Theorem ensured for compact intervals.

Proposition 580 If a function f : I ! R is continuous, its image Im f is an interval.

Proof Let t1 ; t2 2 Im f , say t1 < t2 , and 2 [0; 1]. We want to show that t1 + (1 ) t2 2
Im f . There exist x1 ; x2 2 I such that f (x1 ) = t1 and f (x2 ) = t2 . Let m = minx2[x1 ;x2 ] f (x)
and M = maxx2[x1 ;x2 ] f (x). In view of (13.16), by the Intermediate Value Theorem we have
f ([x1 ; x2 ]) = [m; M ]. Since t1 ; t2 2 f ([x1 ; x2 ]), we thus have t1 + (1 ) t2 2 f ([x1 ; x2 ]),
and so t1 + (1 ) t2 2 Im f . This proves that Im f is an interval.

This \interval image" property actually characterizes continuity for monotone functions
de ned on intervals. Intuitively, monotone functions may have only jump discontinuities
(Proposition 564), a possibility that an interval image precludes.

Proposition 581 Let f : I ! R be a monotone function. Then, f is continuous if and only


if its image Im f is an interval.

Without monotonicity this elegant result fails: the image of the discontinuous and non-
monotone function f : R ! R given by
8
>
> x if x<0
>
>
>
< 1 if x=0
f (x) =
>
> 0 if x 2 (0; 1)
>
>
>
:
x + 1 if x 1
13.6. INTERMEDIATE VALUE THEOREM 419

is the unbounded interval R.

Proof The \only if" is a special case of the last result. As to \if", suppose per contra that f is
not continuous at x0 2 I. If x0 2 int I there exists " > 0 small enough so that x0 " 2 I. By
Proposition 564, there is a jump at x0 , so an open gap (a; b) [limx!x f (x) ; limx!x+ f (x)]
0 0
in the image of f , i.e., (a; b) \ Im f = ;. On the other hand, it holds f (x0 ") < a < b <
f (x0 + ") and so, being Im f an interval,

(a; b) [f (x0 ") ; f (x0 + ")] Im f

This contradiction proves that f is continuous at x0 2 int I. We leave to the reader the
similar argument for the lower endpoint x0 = min I of the interval I (when relevant, i.e.,
when min I 2 I) and for its upper endpoint x0 = max I (when relevant, i.e., when max I 2 I).

So, if f : I ! R is strictly increasing and continuous, the image Im f is an interval and


its inverse f 1 : Im f ! I is strictly increasing and continuous. The next result builds upon
this important property of the image.

Proposition 582 If a continuous function f : I ! R is strictly increasing (decreasing) and


concave, its inverse f 1 : Im f ! I is strictly increasing (decreasing) and convex (concave).

Depending on the kind of monotonicity that f features, its inverse is thus either convex
or concave. The continuity of f becomes super uous when I is open because concave and
convex functions are automatically continuous on open convex sets, as it will be seen later
in the book (Theorem 833).

Proof Let f be continuous, strictly increasing and concave. By Proposition 581, the domain
Im f of f 1 is an interval. By Proposition 222, f 1 is strictly increasing. Suppose, by
contradiction, that f 1 is not convex. Then there exist two points x1 6= x2 in Im f and a
scalar 2 (0; 1) such that
1 1 1
f ( x1 + (1 ) x2 ) > f (x1 ) + (1 )f (x2 )

By setting f 1 (x ) = t1 and f 1 (x ) = t2 we have


1 2

x1 + (1 ) x2 > f ( t1 + (1 ) t2 )

As x1 = f (t1 ) and x2 = f (t2 ), it follows that

f (t1 ) + (1 ) f (t2 ) > f ( t1 + (1 ) t2 )

But this contradicts the concavity of f .


Now, let f be strictly decreasing and concave. It is enough to consider the function
g (x) = f ( x) which is strictly increasing and concave when f is strictly decreasing and
concave.
420 CHAPTER 13. CONTINUOUS FUNCTIONS

13.6.3 Multivariable version


Using convex sets, which generalize intervals to Rn , we have the following extension of
Proposition 580 to operators.12

Proposition 583 Let C be a convex set in Rn . If an operator f : C ! Rm is continuous,


its image Im f is a convex set in Rm .

In words, the continuous image of a convex set is convex.

Proof Let t1 ; t2 2 Im f , say t1 < t2 , and 2 [0; 1]. We want to show that t1 + (1 ) t2 2
Im f . There exist x1 ; x2 2 C such that f (x1 ) = t1 and f (x2 ) = t2 . De ne the auxiliary
function ' : [0; 1] ! R by ' ( ) = f ( x1 + (1 ) x2 ). This auxiliary function is easily
seen to be continuous, with ' (0) = t2 and ' (1) = t1 . So, by Proposition 580 its image
Im ' is an interval. Thus, t1 + (1 ) t2 2 Im ', so there exists 2 [0; 1] such that
f ( x1 + (1 ) x2 ) = ' ( ) = t1 + (1 ) t2 . This implies t1 + (1 ) t2 2 Im f , as
desired.

When m = 1, a simple consequence of this result is the following multivariable extension


of the Intermediate Value Theorem.

Corollary 584 Let f : C ! R a continuous function de ned on a compact and convex set
C in Rn . Set
m = min f (x) and M = max f (x)
x2C x2C

Then, for any z with


m z M

there exists c 2 C such that f (c) = z.

Proof By the Weierstrass Theorem, m = minx2C f (x) and M = maxx2C f (x) are well
de ned. Clearly, Im f [m; M ]. By Proposition 583, Im f is a convex set. Since it contains
both m and M , we conclude that Im f = [m; M ]. So, if z 2 [m; M ] then there exists c 2 C
such that f (c) = z.

We close with an example that vividly shows the importance of convexity for this result.

Example 585 The monotone function f : Q ! R given by


( p
1 if x > 2
f (x) = p
1 if x < 2

is continuous but its domain is not convex, so the last corollary does not apply. It actually
fails completely because Im f = f 1; 1g. N
12
Convex sets will be studied in Chapter 16.
13.7. LIMITS AND CONTINUITY OF OPERATORS 421

13.7 Limits and continuity of operators


The notion of continuity extends in a natural way to operators f : A Rn ! Rm . First of
all, note that they can be seen as a m-tuple (f1 ; :::; fm ) of functions of several variables

fi : A Rn ! R 8i = 1; 2; :::; m

de ned by

y1 = f1 (x1 ; :::; xn )
y2 = f2 (x1 ; :::; xn )

ym = fm (x1 ; :::; xn )

The functions fi are the component functions of the operator f . To illustrate, let us go back
to the operators of Example 188.

Example 586 (i) If f : R2 ! R2 is de ned by f (x1 ; x2 ) = (x1 ; x1 x2 ), then

f1 (x1 ; x2 ) = x1 and f2 (x1 ; x2 ) = x1 x2

(ii) If f : R3 ! R2 is de ned by

f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42

then
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 and f2 (x1 ; x2 ; x3 ) = x1 x42
N

The notion of limit extends in a natural way to operators.

De nition 587 Let f : A Rn ! Rm be an operator and x0 2 Rn a limit point of A. We


write
lim f (x) = L 2 Rm
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x0 6= x 2 U " (x0 ) \ A =) f (x) 2 V" (L)

The value L is called the limit of the operator f at x0 .

For m = 1 it reduces, verbatim, to the De nition 525 of limit of functions of several


variables. Note that here L is a vector of Rm .13

De nition 588 An operator f : A Rn ! Rm is said to be continuous at a limit point


x0 2 A if
lim f (x) = f (x0 )
x!x0

By convention, f is continuous at each isolated point of A.


13
For simplicity, we do not consider possible \extended values", that is, a vector L with one or more
coordinates that are 1.
422 CHAPTER 13. CONTINUOUS FUNCTIONS

Here, too, an operator that is continuous at all the points of a subset E of the domain
A is called continuous on E, while an operator that is continuous at all the points of its
domain is called continuous. It is easy to see that the two operators of the last example are
continuous.

By writing f = (f1 ; :::; fm ) one obtains the following componentwise characterization of


continuity, whose proof is left to the reader.

Proposition 589 An operator f = (f1 ; :::; fm ) : A Rn ! Rm is continuous at a point


x0 2 A if and only if all its component functions fi Rn ! R are continuous at x0 .

The continuity of an operator is thus brought back to the continuity of its component
functions, a componentwise notion of continuity.

In Section 8.16 we saw that the convergence of vectors is equivalent to that of their
components. This will allow (the reader) to prove the next sequential characterization of
continuity that extends Proposition 552 to operators.

Proposition 590 An operator f : A Rn ! Rm is continuous at a point x0 of A if and


only if f (xn ) ! f (x0 ) for every sequence fxn g of points of A such that xn ! x0 .

The statement is formally identical to that of Proposition 552, but here f (xn ) ! f (x0 )
indicates convergence of vectors in Rm .

Proposition 590 permits to extend to operators the continuity results established for
functions of several variables, except the ones that use in an essential way the order structure
of their codomain R, like the Bolzano and Weierstrass Theorems. We leave to the reader
such extensions.

13.8 Infracoda topologica


In view of the sequential characterization of limits (Proposition 528), the notion of continuity
can be easily reformulated using the concept of limit (De nition 524) as follows.

Proposition 591 A function f : A Rn ! R is continuous at x0 2 A if and only if, for


every " > 0, there exists " > 0 such that

kx x0 k < " =) jf (x) f (x0 )j < " 8x 2 A (13.20)

This characterization is identical to the de nition of limx!x0 f (x) = f (x0 ) for a point
x0 that belongs to the domain of the function, except for the elimination of the condition
0 < kx x0 k { i.e., of the requirement that x 6= x0 { so to include x0 that are isolated points
of A.

Proof If x0 is an accumulation point of A, condition (13.20) amounts to limx!x0 f (x) =


f (x0 ). If x0 is an isolated point of A, condition (13.20) always holds. Indeed, by the
de nition of isolated point, there exists a neighborhood B (x0 ) of small enough radius > 0
13.8. INFRACODA TOPOLOGICA 423

so that B (x0 ) \ A = fx0 g. Thus, for each x 2 A we have kx x0 k < if and only if x = x0 .
It follows that, for each " > 0, there exists > 0 such that kx x0 k < implies x = x0 for
all x 2 A, so that jf (x) f (x0 )j = 0 < ".

In the language of neighborhoods, this characterization reads as follows: f is continuous


at x0 2 A if and only if, for every neighborhood V" (f (x0 )) there exists a neighborhood
U " (x0 ) such that
x 2 U " (x0 ) \ A =) f (x) 2 V" (L) (13.21)
that is,
f (U " (x0 ) \ A) V" (f (x0 ))
Equivalently (why?), for each open set V containing f (x0 ), there exists an open set U
containing x0 such that f (U \ A) V . This topological characterization leads, in turn, to
a topological characterization of continuity in terms of preimages of open and closed sets.14
We consider operators de ned, for simplicity, on the whole space.

Proposition 592 An operator f : Rn ! Rm is continuous if and only if the preimages of


closed sets are, in turn, closed.

In symbols, the preimage f 1 (C) of each closed set C of Rm is itself a closed set of Rn .
For instance, level sets f 1 (y) = fx 2 Rn : f (x) = yg of continuous operators are closed sets
since singletons fyg are closed sets in Rm .
The proof of this proposition relies on some basic set-theoretic properties of images and
preimages, whose proof is left to the reader.

Lemma 593 Let f : X ! Y be a function between any two sets X and Y . We have:

(i) f f 1 (E) E for each E Y;


(ii) f 1 (E c ) = f 1 (E) c for each E Y;
(iii) f 1 (f (A)) A for each A X.

With this, we can turn to the proof of the last result.

Proof of Proposition 592 \If". Suppose that f is continuous. Let C be a closed set of
Rm . Let fxn g f 1 (C) be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (C).
Set yn = f (xn ). Since f is continuous, we have f (xn ) ! f (x0 ). Then f (x0 ) 2 C because
C is closed. In turn, this implies x0 2 f 1 (C), as desired.
\Only if". Suppose that, for each closed set C of Rm , the set f 1 (C) is closed in Rn .
So, if V is an open set in Rm containing f (x0 ), the set f 1 (V ) is open in Rn because
c
f 1 (V ) = f 1 (V c ). Thus, being x0 2 f 1 (V ), there exists a neighborhood B (x0 ) such
that B (x0 ) f 1 (V ). So, f (B (x0 )) f f 1 (V ) V . In view of (13.21), we conclude
that f is continuous at x0 .

In view of Lemma 593-(ii), there is a dual version of the last proposition for open sets.
Because of its importance, we report it formally.15
14
We use the term \topological" because it is a property purely in terms of open and closed sets.
15
This is, indeed, the characterization of continuity used to generalize this fundamental notion well beyond
Euclidean spaces, as readers will learn in more advanced courses.
424 CHAPTER 13. CONTINUOUS FUNCTIONS

Corollary 594 An operator f : Rn ! Rm is continuous if and only if the preimages of open


sets are, in turn, open.

The next example illustrates the last two results.

Example 595 By Proposition 592, upper and lower contour sets16


1 1
(f t) = f ([t; 1)) and (f t) = f ((1; t]) 8t 2 R

of a continuous function f : Rn ! R are closed because the intervals [t; 1) and (1; t] are
closed. By Corollary 594, their strict versions
1 1
(f t) = f ((t; 1)) and (f < t) = f ((1; t)) 8t 2 R

are open because the intervals (t; 1) and (1; t) are open. N

There is no counterpart of the last two results for images: given a continuous function,
in general the image of an open set is not open and the image of a closed set is not closed. In
other words, the continuous image of an open (closed) set is not necessarily open (closed).

Example 596 (i) Let f : R ! R be the quadratic function f (x) = x2 . For the open
interval I = ( 1; 1) we have f (I) = [0; 1), which is not open. (ii) Let f : R ! R be the
exponential function f (x) = ex . The real line R is a closed set (also open, but here this is
not of interest), but f (R) = (0; 1) is not closed. N

The next result clari es why in (ii) we have a closed but unbounded (so not compact)
set like R.

Proposition 597 Let K be a compact set in Rn . If f : K ! Rm is continuous, its image


Im f is a compact set in Rm .

In words, the continuous image of a compact set is compact.

Proof With the notions of topology at our disposal we are able to prove the result only in
the case n = m = 1 (the general case, however, does not present substantial di erences). So,
let n = m = 1. By De nition 32, to show that the set Im f is bounded in R it is necessary to
show that it is bounded both above and below in R. Suppose, by contradiction, that Im f is
unbounded above. Then there exists a sequence fyn g Im f such that limn!1 yn = +1.
Let fxn g K be the corresponding sequence such that f (xn ) = yn for every n. The
sequence fxn g is bounded since it is contained in the bounded set K. By Bolzano-Weierstrass'
Theorem, there exist a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x ~.
Since K is closed, we have x ~ 2 K. Moreover, the continuity of f implies limk!1 ynk =
limk!1 f (xnk ) = f (~x) 2 R. This contradicts limk!1 ynk = limn!1 yn = +1. It follows
that the set Im f is bounded above in the real line. In a similar way, one shows that it is
also bounded below. Thus, the set Im f is bounded in the real line.
To complete the proof that Im f is compact, it remains to show that it is closed. Consider
a sequence fyn g Im f that converges to y 2 R. By Theorem 174, we must show that
16
They will be introduced in Section 17.2.1.
13.8. INFRACODA TOPOLOGICA 425

y 2 Im f . Since fyn g Im f , by de nition there exists a sequence fxn g K such that


f (xn ) = yn . As seen above, the sequence fxn g is bounded. The Bolzano-Weierstrass'
Theorem yields a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x
~. Since K
is closed, x
~ 2 K. Moreover, the continuity of f implies that

y = lim ynk = lim f (xnk ) = f (~


x)
k!1 k!1

Therefore, y 2 Im f , as desired.

The fact that continuity preserves compactness is quite remarkable. It is another charac-
teristic that distinguishes compact sets among closed sets, for which in general this property
does not hold, as Example 596-(ii) shows.
A nice dividend of this proposition is an operator version of Proposition 578 about the
continuity of inverses.

Corollary 598 A continuous bijective function f : K ! Rm de ned on a compact subset


K of Rn has continuous inverse f 1 : Im f ! Rn .

Proof Also here we consider the scalar case m = n = 1. By Proposition 597, the set Im f
is compact. Let C be a closed subset of Im f . We want to show that f 1 (C) is closed.
Since Im f is compact, C is actually compact. Let fxn g f 1 (C) be a sequence that
converges to x 2 Rn . We want to show that x 2 f 1 (C). By de nition of f 1 (C), there
exists a sequence fyn g C such that yn = f (xn ) for each n 1. Since C is compact, by
the Bolzano-Weierstrass Theorem there exists a subsequence fynk g that converges to some
y 2 C. By the continuity of f , we conclude that x = f 1 (y) 2 f 1 (C), thus proving that
f 1 (C) is closed. So, f 1 maps closed sets into closed sets. In view of Proposition 600
(momentarily established), this proves that f 1 is continuous.

We let readers to contrast this corollary with Proposition 578. Meditations will be helped
by the next example that, along with the earlier Example 579, shows the importance of a
compact domain for the last result and of an interval domain for that proposition.

Example 599 The function f : (1; 0] [ (1; 1) ! R de ned by


(
x if x 0
f (x) =
x 1 if x > 1

is easily seen to be strictly increasing and continuous. Its inverse f 1 : R ! R is given by


(
1
x if x 0
f (x) =
x + 1 if x > 0

and is discontinuous at the origin. N

We close by generalizing the topological characterization of continuity to general opera-


tors, not necessarily de ned on the whole space.
426 CHAPTER 13. CONTINUOUS FUNCTIONS

Proposition 600 An operator f : C ! Rm de ned on a closed subset of Rn is continuous


if and only if the preimages of closed sets are, in turn, closed.

The function f : (0; 1] ! R given by f (x) = x readily shows that this proposition is false
without the topological hypothesis on the domain.

Proof \If". Suppose that f is continuous. Let F be a closed set of Rm . Let fxn g
f 1 (F ) \ C be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (F ) \ C. Since
C is closed, we have x0 2 C. Set yn = f (xn ). Since f is continuous on C, we have
f (xn ) ! f (x0 ). Then f (x0 ) 2 F because F is closed. In turn, this implies x0 2 f 1 (F ).
We conclude that x0 2 f 1 (F ) \ C, as desired.
\Only if". Suppose that, for each closed set F of Rm , the set f 1 (F ) \ C is closed
in Rn . So, if V is an open set in Rm containing f (x0 ), the set f 1 (V ) [ C c is open in
c
Rn because f 1 (V ) \ C = f 1 (V c ) \ C. Thus, being x0 2 f 1 (V ), there exists a
neighborhood B (x0 ) such that B (x0 ) f 1 (V ) [ C c , i.e., B (x0 ) \ C f 1 (V ) \ C.17 So,
f (B (x0 ) \ C) f f 1 (V ) \ C f f 1 (V ) V . In view of (13.21), we conclude that
f is continuous at x0 .

In a dual way, an operator de ned on an open set is continuous if and only if its preimages
are open. For later reference, we state this dual version formally.

Corollary 601 An operator f : U ! Rm de ned on an open subset of Rn is continuous if


and only if the preimages of open sets are, in turn, open.

13.9 Coda continua


Besides on ", the value of " in the epsilon-delta condition (13.21) depends also on the point
x0 at hand. If it happens that, given " > 0, we can choose the same " for every x0 2 A
(i.e., once " is xed, the same " would work at all the points of the domain of f ), we have
a stronger notion of continuity, called uniform continuity. It is a remarkable property of
uniformity that allows to \control" the distance jf (x) f (y)j between images just through
the distance jx yj between each pair of points x and y of the domain of f .

De nition 602 A function f : A Rn ! R is said to be uniformly continuous on A if, for


every " > 0, there exists " > 0 such that

kx yk < " =) jf (x) f (y)j < " 8x; y 2 A

Here the value of " thus depends only on ", no longer on a point x0 . Indeed, no speci c
points x0 are mentioned in this de nition, which considers only the domain per se.
Uniform continuity implies continuity, but the converse does not hold. For example, we
will see soon that the quadratic function is continuous on R, but not uniformly. Yet, the two
notions of continuity become equivalent on compact sets (Section 5.6).

Theorem 603 A function f : A Rn ! R is continuous on a compact set K A if and


only if it is uniformly continuous on K.
17
Recall property (1.4).
13.9. CODA CONTINUA 427

Proof The \if" is obvious because uniform continuity implies continuity. We prove the \only
if". For simplicity, consider the scalar case n = 1 with K = [a; b]. So, let f : [a; b] ! R be
continuous. We need to show that it is also uniformly continuous. Suppose per contra that
there exist " > 0 and two sequences fxn g and fyn g in [a; b] with jxn yn j ! 0 and

jf (xn ) f (yn )j " 8n 1 (13.22)

Since the sequences fxn g and fyn g are bounded, the Bolzano-Weierstrass' Theorem yields
two convergent subsequences fxnk g and fynk g, i.e., there exist x; y 2 [a; b] such that xnk ! x
and ynk ! y. Since xn yn ! 0, we have xnk ynk ! 0 and, therefore, x y = 0 because of the
uniqueness of the limit. Since f is continuous, we have f (xnk ) ! f (x) and f (ynk ) ! f (y).
Hence, f (xnk ) f (ynk ) ! f (x) f (y) = 0, which contradicts (13.22). We conclude that f
is uniformly continuous.

Theorem 603 does not hold without assuming the compactness of K, as the next coun-
terexamples show. The rst one considers a closed but unbounded set { the real line { while
the other ones consider bounded sets which are not closed.

Example 604 The quadratic function f : R ! R is continuous but not uniformly contin-
uous. Suppose, by contradiction, that f (x) = x2 is uniformly continuous on R. By setting
" = 1, there exists " > 0 such that

jx yj < " =) x2 y2 < 1 8x; y 2 R (13.23)

If we take xn = n and yn = n + " =2, we have jxn yn j < " for every n 1, but
2 2 2
limn xn yn = +1, which contradicts (13.23). Therefore, the quadratic function x is not
uniformly continuous on R. Yet, its restriction to any compact interval [a; b] is uniformly
continuous thanks to the last theorem. We can check this directly, for instance, for the
restriction f : [0; 1] ! R on the closed unit interval. We have

x2 y 2 = jx yj jx + yj 2 jx yj 8x; y 2 [0; 1]

and so, by setting " = "=2, we have

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [0; 1]

as desired. N

Example 605 The hyperbola f : (0; 1) ! R is continuous but not uniformly continuous.
Indeed, suppose per contra that f (x) = 1=x is uniformly continuous on (0; 1). By setting
" = 1, there exists " > 0 such that

1 1
jx yj < " =) <1 8x; y 2 (0; 1) (13.24)
x y

Let y = min f " =2; 1=2g and x = y=2. It is immediate that 0 < x < y < 1 and jx yj < ".
By (13.24), we thus have
1 1 1 1
= <1 (13.25)
x y x y
428 CHAPTER 13. CONTINUOUS FUNCTIONS

On the other hand,


1 1 1
= 2
x y y
which contradicts (13.25). We conclude that the function 1=x is not uniformly continuous
on (0; 1). Nevertheless, by Theorem 603 its restriction to any compact interval [a; b] (0; 1)
is uniformly continuous. N

Example 606 (i) The function f : Q \ [0; 2] ! R given by


( p
1 if 2 < x 2
f (x) = p
1 if 0 x < 2

is continuous but not uniformly continuous (why?). The bounded set Q \ [0; 2] is not closed,
so not compact (its closure is the interval [0; 2]).
(ii) The function f : [0; 1) [ (1; 2] ! R given by
(
1 if 0 x < 1
f (x) =
1 if 1 < x 2

is continuous but not uniformly continuous (why?). The bounded set [0; 1) [ (1; 2] is not
closed, so not compact (its closure is the interval [0; 2]). N

13.10 Ultracoda continua


13.10.1 Stone-Weierstrass' Theorem
Polynomials are the simplest continuous functions on the real line. Remarkably, any con-
tinuous function de ned on a compact interval [a; b] of the real line may be approximated,
arbitrarily well, by polynomials.

Theorem 607 (Stone-Weierstrass) Let f : [a; b] ! R be a continuous function. For each


" > 0 there exists a polynomial p : [a; b] ! R such that

jf (x) p (x)j " 8x 2 [a; b]

This important result was proven by Karl Weierstrass in 1885, with a signi cant extension
due to Marshall Stone in 1937.18 A third protagonist of this result is Sergei Bernstein, who
in 1913 gave a beautiful proof of this theorem in which the approximating polynomials {
aptly called Bernstein polynomials { are explicitly constructed when [a; b] is the unit interval
[0; 1]. Here we will follow his lead by proving the Stone-Weierstrass Theorem via Bernstein's
result.
Before doing that, however, we give a sandwich version of Stone-Weierstrass' Theorem
that is sometimes useful.
18
Weierstrass proved this result when he was about 70 years old. We consider only his original result since
Stone's version is beyond the scope of this book. Yet, we name this theorem after both Stone and Weierstrass
also to distinguish it from Weierstrass' Theorem on extremals.
13.10. ULTRACODA CONTINUA 429

Corollary 608 Let f : [a; b] ! R be a continuous function. For each " > 0 there exist two
polynomials p; P : [a; b] ! R such that p f P and
P (x) p (x) " 8x 2 [a; b]

In turn, this is easily seen to imply that


P (x) " p (x) f (x) P (x) p (x) + " 8x 2 [a; b]
Proof Let " > 0. De ne f " ; f" : [a; b] ! R by f " = f + "=4 and f" = f "=4. Since
the functions f" and f " are continuous, by the Stone-Weierstrass' Theorem there are two
polynomials p; P : [a; b] ! R such that
" "
jf" (x) p (x)j and jf " (x) P (x)j 8x 2 [a; b]
4 4
So, for all x 2 [a; b] we have
" " "
f (x) = f" (x) p (x) f" (x) + = f (x)
2 4 4
" " "
= f " (x) P (x) f " (x) + = f (x) +
4 4 2
We conclude that p f P and
" "
P (x) p (x) f (x) + f (x) ="
2 2
as desired.

13.10.2 Bernstein polynomials


Given a function f : [0; 1] ! R and a positive integer n 1, de ne its Bernstein polynomial
Bn f : [0; 1] ! R of degree n by
n
X k n k
Bn f (x) = f x (1 x)n k
8x 2 [0; 1]
n k
k=0

The rst degree Bernstein polynomial B1 f : [0; 1] ! R of f is given by


1
X 1 k 1 1
B1 f (x) = f (k) x (1 x)1 k
= f (0) (1 x) + f (1) x
k 0 1
k=0
= f (0) (1 x) + f (1) x
So, to de ne B1 we only need to know the values of f at the endpoints f0; 1g. The second-
degree Bernstein polynomial B2 f : [0; 1] ! R of f is given by
2
X k 2 k
B2 f (x) = f x (1 x)2 k
2 k
k=0
2 1 2 2 2
= f (0) (1 x)2 + f x (1 x) + f (1) x
0 2 1 2
1
= f (0) (1 x)2 + 2f x (1 x) + f (1) x2
2
430 CHAPTER 13. CONTINUOUS FUNCTIONS

So, to de ne B2 we only need to know the values of f at the three points f0; 1=2; 1g. In
general, to de ne the Bernstein polynomial of degree n we only need to know the values of
f at the n + 1 points
1 2 n 1
0; ; ; :::; ;1
n n n
of the unit interval.

Bernstein polynomials have simple, yet important, properties.

Proposition 609 Let f; g : [0; 1] ! R.

(i) Monotonicity: f g implies Bn f Bn g;

(ii) Normalization: Bn c = c for all c 2 R;

(iii) Linearity: Bn ( f + g) = Bn f + Bn g for all ; 2 R.

In particular, by the monotonicity property (i) we have that f 0 implies Bn f 0. In


words, Bernstein polynomials of positive functions are positive.

Proof (i) Suppose that f (x) g (x) for all x 2 [0; 1]. Then, f (k=n) g (k=n) for all
0 k n, so Bn f (x) Bn g (x) for all x 2 [0; 1]. (ii) Let c 2 R. We have, for all x 2 [0; 1],
n
X n
X
n k n k n k
Bn c (x) = c x (1 x) =c x (1 x)n k
= c [x + (1 x)]n = c
k k
k=0 k=0

where the penultimate equality follows from Newton's binomial formula (B.7).
(iii) We have, for all x 2 [0; 1],
n
X k n k
Bn ( f + g) (x) = ( f + g) x (1 x)n k
n k
k=0
Xn n
X
k n k k n k
= f x (1 x)n k
+ g x (1 x)n k
n k n k
k=0 k=0
= Bn f (x) + Bn g (x)

as desired.

It is important to observe that Bernstein polynomials have a probabilistic nature. Indeed,


if we toss n times a coin that with probability 0 x 1 lands on Heads, the probability pk
that, k out of n times, Heads comes up is

n k
pk = x (1 x)n k
k
P
In particular, by Newton's binomial formula we have nk=0 pk = 1 (see point (ii) of the last
proof). So, the function pn;x : f1; :::; ng ! [0; 1] de ned by pn;x (k) = pk is a probability
distribution, the so-called binomial distribution. The function f induces a function :
13.10. ULTRACODA CONTINUA 431

f1; :::; ng ! R de ned by (k) = f (k=n). The Bernstein polynomial Bn f (x) is thus the
expectation of the induced function under this binomial distribution, that is,
n
X
Bn f (x) = (k) pn;x (k)
k=0

It is because of this probabilistic nature that Bernstein polynomials are naturally de ned on
the unit interval.

13.10.3 Bernstein's version


We now state and prove Bernstein's constructive version of Stone-Weierstrass' Theorem on
the unit interval.

Theorem 610 (Bernstein) Let f : [0; 1] ! R be a continuous function. For each " > 0
there exists n" 1 such that, for all n n" , we have

jf (x) Bn f (x)j " 8x 2 [a; b]

For a continuous function f : [0; 1] ! R we thus have the remarkable formula:


n
X k n k
f (x) = lim Bn f (x) = lim f x (1 x)n k
8x 2 [0; 1]
n!1 n!1 n k
k=0

The theorem relies on the next lemma in which the Bernstein polynomials of the rst
two powers are computed. This lemma is of independent interest in that it shows, inter alia,
that the Bernstein polynomial Bn f of a polynomial f may be di erent from the polynomial
itself, i.e., Bn f 6= f .

Lemma 611 (i) For the identity function f (x) = x,

Bn f (x) = x 8n 1

(ii) For the quadratic function f (x) = x2 ,


n 1 x
Bn f (x) = x2 + 8n 1
n n
Proof We prove only (i). By (B.5) and (B.7), we have, for each x; y 2 [0; 1],
n
X n
X n
X n
X
k n k n k k n k n k n 1 k n k n 1 k 1 n k
x y = x y = x y =x x y
n k n k k 1 k 1
k=0 k=1 k=1 k=1
nX1 n
X1
n 1 n 1
= x k n (k+1)
x y =x xk y n 1 k
= x (x + y)n 1
k k
k=0 k=0

So, by setting y = 1 x,
n
X k n k
Bn x = x (1 x)n k
= x (x + 1 x)n 1
=x
n k
k=0
432 CHAPTER 13. CONTINUOUS FUNCTIONS

as desired.

Proof of Bernstein's Theorem Let " > 0. Since [0; 1] is compact, the function f is
uniformly continuous (Theorem 603). So, there exists > 0 such that
"
jx yj < =) jf (x) f (y)j < 8x; y 2 [0; 1]
2
Fix any x0 2 [0; 1]. We have
"
jx x0 j < =) jf (x) f (x0 )j < 8x 2 [0; 1] (13.26)
2
By Weierstrass' Theorem, the function f has a maximizer x
^ 2 [0; 1]. Set M = f (^
x), so M is
the maximum value of f on [0; 1]. We thus have f (x) M for all x 2 [0; 1]. In particular,
for each x 2 [0; 1] we have

(x x0 )2
jx x0 j =) jf (x) f (x0 )j (jf (x)j + jf (x0 )j) 2M 2M 2 (13.27)

Together (13.26) and (13.27) imply

(x x0 )2 "
jf (x) f (x0 )j 2M 2 + 8x 2 [0; 1]
2

De ne f1 ; f2 ; g : [0; 1] ! R by f1 (x) = x, f2 (x) = x2 , and g (x) = (x x0 )2 for all x 2 [0; 1].


By Proposition 609 and Lemma 611, for all x 2 [0; 1] we have
2M "
jBn f (x) f (x0 )j = jBn (f (x) f (x0 )) (x)j 2 Bn g (x) +
2
2M " 2M
= 2 Bn f2 + x20 2x0 f1 (x) + = 2 Bn f2 (x) + x20 2x0 Bn f1 (x) +
2
2M n 1 x "
= 2 x2 +
+ x20 2x0 x +
n n 2
2M 1 2 x "
= 2 x2 x + + x20 2x0 x +
n n 2
2M x x2 " 2M 1 "
= 2 (x x0 )2 + + 2 (x x0 )2 + +
n 2 4n 2

where the last inequality holds because maxx2[0;1] x x2 = 1=4. In particular, by taking
x = x0 we have
2M 1 "
jBn f (x0 ) f (x0 )j 2 4n + 2

Since x0 was arbitrarily chosen, in turn this implies that


2M 1 "
jBn f (x) f (x)j 2 4n + 2 8x 2 [0; 1]

We have
2M 1 " M
2 4n () n 2
2 "
13.10. ULTRACODA CONTINUA 433

So,
jBn f (x0 ) f (x0 )j " 8x 2 [0; 1]
for all n M=2 2 ", as desired.

We are now ready to prove the Stone-Weierstrass' Theorem via Bernstein's Theorem.

Proof of Stone-Weierstrass' Theorem De ne the function g : [0; 1] ! [a; b] by g (t) =


t (b a) + a for all t 2 [0; 1]. Since b > a, it is immediate to see that g is continuous and
strictly increasing, with g (0) = a and g (1) = b. So, g is bijective with Im g = [a; b]. It is
easy to check that its inverse g 1 : [a; b] ! [0; 1], given by g 1 (x) = (x a) = (b a) for all
x 2 [a; b], is continuous too.
That said, let f : [a; b] ! R be a continuous function and " > 0. Since g is continuous,
the composition f g is a continuous function from [0; 1] to R. By Bernstein's Theorem,
there exists a polynomial p : [0; 1] ! R such that

j(f g) (t) p (t)j " 8t 2 [0; 1] (13.28)

Since p is a polynomial, observe that p^ = p g 1 is a polynomial from [a; b] to R.19 Fix


x 2 [a; b] and de ne t = g 1 (x). We have:
1 1
jf (x) p^ (x)j = f g g (x) p g (x) = j(f g) (t) p (t)j " (13.29)

Since x was arbitrarily chosen in [a; b], this inequality actually holds for all x 2 [a; b], thus
proving the statement.

Concavity and increasing monotonicity are two important properties that, on the one
hand, Bernstein polynomials Bn f inherit from the function f and, on the other hand, via
Bernstein's Theorem they transmit to it, as next we show.

Proposition 612 Let f : [0; 1] ! R be a continuous function. The following conditions are
equivalent:

(i) f is increasing (resp., concave);

(ii) Bn f is increasing (resp., concave) for all n 1.

Proof We rst prove monotonicity part and then move to concavity. We begin with a useful
fact: for each n 1 and each x 2 (0; 1),
n
X1 k+1 k n 1
Bn0 f (x) = n f f xk (1 x)n k 1
(13.30)
n n k
k=0

Since B1 f (x) = f (0) (1 x) + f (1) x for all x 2 [0; 1], it easy to check that (13.30) holds
for n = 1. Since Bn f is a polynomial for all n 2, it follows that Bn f is derivable at each
19
The composition of two polynomials is still a polynomial (as readers can check). Since both p and g are
polynomials, so does their composition p^.
434 CHAPTER 13. CONTINUOUS FUNCTIONS

x 2 (0; 1). In particular, by (B.4) and (B.5), we have that for each n 2 and each x 2 (0; 1),

0 n
Bn0 f (x) = f n (1 x)n 1
n 0
n
X1 k n
+ f kxk 1
(1 x)n k
(n k) xk (1 x)n k 1
n k
k=1
n n
+f nxn 1
n n
n
X1
0 n n 1 k n n 1
= f n (1 x) + f kxk 1
(1 x)n k
n 0 n k k 1
k=1
n
X1 k n n 1 n n
f (n k) xk (1 x)n k 1
+f nxn 1
n n k n k 1 n n
k=1
n
X2
0 n k+1 n 1
= f n (1 x)n 1
+n f xk (1 x)n k 1
n 0 n k
k=0
n
X1 k n 1 n n
n f xk (1 x)n k 1
+f nxn 1
n k n n
k=1
n
X1 k+1 k n 1
=n f f xk (1 x)n k 1
n n k
k=0

proving (13.30). We can now prove the equivalence of (i) and (ii) in the monotone case.

(i) implies (ii). Fix n 1. Since f is increasing, it follows that

k+1 k
f f 0 8k 2 f0; :::; n 1g
n n

Since n n k 1 xk (1 x)n k 1 0 for all x 2 [0; 1] and for all k 2 f0; :::; n 1g, this implies
that Bn0 f (x) 0 for all x 2 (0; 1). By Proposition 1322, we have that Bn f is increasing on
(0; 1). Since Bn f is continuous, we conclude that Bn f is increasing on [0; 1].

(ii) implies (i). By Bernstein's Theorem, we have that Bn f (x) ! f (x) for all x 2 [0; 1].
If x y, then this implies that

f (x) = lim Bn f (x) lim Bn f (y) = f (y)


n n

proving the implication.

We next move to concavity. By di erentiating (13.30), tedious computations yield that


for each n 2 and each x 2 (0; 1)
n
X2
00 k n 2
Bn f (x) = n (n 1) 2
f xk (1 x)n k 2
n k
k=0
13.10. ULTRACODA CONTINUA 435

where

2 k k+2 k+1 k
f =f 2f +f 8k 2 f0; :::; n 2g
n n n n

We can now prove the equivalence of (i) and (ii) in the concave case.
(i) implies (ii). Fix n 1. If n = 1, then Bn f (x) = f (0) (1 x) + f (1) x = f (0) +
(f (1) f (0)) x for all x 2 [0; 1], so Bn f is a ne and, in particular, concave. If n 2, since f
is concave, 2 f (k=n) 0 for all k 2 f0; :::; n 2g. Since n (n 1) n k 2 xk (1 x)n k 2 0
for all x 2 [0; 1] and for all k 2 f0; :::; n 2g, this implies that Bn00 f (x) 0 for all x 2 (0; 1).
By Corollary 1438, we have that Bn f is concave on [0; 1].
(ii) implies (i). By Bernstein's Theorem, we have that Bn f (x) ! f (x) for all x 2 [0; 1].
If x; y 2 [0; 1] and 2 [0; 1], then this implies that

f ( x + (1 ) y) = lim Bn f ( x + (1 ) y) lim [ Bn f (x) + (1 ) Bn f (y)]


n n
= lim Bn f (x) + (1 ) lim Bn f (y) = f (x) + (1 ) f (y)
n n

proving the implication.

This result implies, inter alia, that increasing (resp., concave) continuous functions on
compact intervals are approximated by increasing (resp., concave) polynomials.

Corollary 613 Let f : [a; b] ! R be an increasing (resp., concave) and continuous function.
For each " > 0 there exists an increasing (resp., concave) polynomial p : [a; b] ! R such that

jf (x) p (x)j " 8x 2 [a; b]

We leave to the reader the proof of this result (by now, it should be easy). Finally,
the reader may also want to establish a dual version of this corollary for decreasing (resp.,
convex) functions.
436 CHAPTER 13. CONTINUOUS FUNCTIONS
Chapter 14

Equations and xed points


(sdoganato)

14.1 Equations
14.1.1 Poincare-Miranda's Theorem
An operator f = (f1 ; :::; fn ) : A Rn ! Rn de nes an (operator ) equation

f (x) = 0 (14.1)

that is, 8
>
> f1 (x1 ; :::; xn ) = 0
>
>
>
< f2 (x1 ; :::; xn ) = 0
(14.2)
>
>
>
>
>
:
fn (x1 ; :::; xn ) = 0
The vector x is the unknown of the equation. The solutions of equation (14.1) are all x 2 A
such that f (x) = 0.1

Example 614 (i) The polynomial equation


2 n
0 + 1x + 2x + + nx =0

can be written as
f (x) = 0 (14.3)
where f : R ! R is the polynomial f (x) = 0 + 1 x + 2 x2 + + n xn . Its solutions are
all x 2 R that satisfy (14.3).
(ii) The market equation (13.15), i.e., E (p) = 0, is de ned via the excess demand function
E : [a; b] ! R of a single good. Its solutions are the equilibrium prices of the one-good
market.
1
Often (14.2) is referred to as a \system of equations", each fi (x) = 0 being an equation. We will also use
this terminology when dealing with systems of linear equations (Section 15.7). In view of (14.1), however,
one should use this terminology cum grano salis.

437
438 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

(iii) Later in the book (Section 15.7) we will study systems of linear equations that can
be written as f (x) = 0 through the a ne operator f : Rn ! Rn de ned by f (x) = Ax b.
N

A fundamental issue in dealing with equations is the existence of solutions, that is,
whether there exist vectors x 2 A such that f (x) = 0. As well-known from (at least) high
school, this might well not be the case: consider f : R ! R given by f (x) = x2 + 1; there
are no x 2 R such that x2 + 1 = 0.
Bolzano's Theorem is a powerful result to establish the existence of solutions in the scalar
case. Indeed, if f : A R ! R is a continuous function and A is an interval, then equation

f (x) = 0 (14.4)

has a solution provided there exist x0 ; x00 2 A such that f (x0 ) < 0 < f (x00 ). For instance, in
this way Corollary 569 was able to establish the existence of solutions of some polynomial
equations.
Bolzano's Theorem admits a generalization to Rn that, surprisingly, turns out to be a
quite di cult result, known as Poincare-Miranda's Theorem.2 A piece of notation: given a
vector x 2 Rn , we write
(xi ; x i )
to emphasize the component i of vector x. For instance, if x = (4; 7; 11) then x1 = 4 and
x 1 = (7; 11), while x3 = 11 and x 3 = (4; 7).

Theorem 615 (Poincare-Miranda) Let f = (f1 ; :::; fn ) : [a; b] ! Rn be a continuous


operator de ned on an interval of Rn . If, for each i = 1; :::; n, we have

fi (ai ; x i ) fi (bi ; x i ) 0 8x i 2 [a i ; b i ] (14.5)

then there exists c 2 [a; b] such that f (c) = 0.3

If n = 1, we are back to Bolzano's Theorem. If n = 2, condition (14.5) becomes:

f1 (a1 ; x2 ) f1 (b1 ; x2 ) 0 8x2 2 [a2 ; b2 ] (14.6)


f2 (x1 ; a2 ) f2 (x1 ; b2 ) 0 8x1 2 [a1 ; b1 ]

Under this condition, the Poincare-Miranda's Theorem ensures that for a continuous operator
f = (f1 ; f2 ) : [a; b] ! R2 there exists a point x 2 [a; b] such that

f1 (x) = f2 (x) = 0

In general, if there exist vectors x0 ; x00 2 A such that condition (14.5) holds on the interval
[x0 ; x00 ] A, then the equation (14.1) induced by a continuous function f : A Rn ! Rn
has a solution. The next example illustrates.
2
It was stated in 1883 by Henri Poincare and proved by Carlo Miranda in 1940 (unaware of Poincare's
earlier work). For a proof, we refer interested readers to Kulpa (1997).
3
For instance, if a; b 2 R3 , then [a 1 ; b 1 ] = [a2 ; b2 ] [a3 ; b3 ], [a 2 ; b 2 ] = [a1 ; b1 ] [a3 ; b3 ] and [a 3 ; b 3 ] =
[a1 ; b1 ] [a2 ; b2 ].
14.1. EQUATIONS 439

Example 616 De ne f : R2 ! R2 by f (x1 ; x2 ) = (x51 + x22 ; e x21 + x32 ). Consider the


equation ( 5
x1 + x22 = 0
(14.7)
e x21 + x32 = 0
Set x0 = ( 100; 1) and x00 = (100; 1). The interval [a; b] satis es condition (14.6) in the
form

f1 ( 100; x2 ) 0 f1 (100; x2 ) 8x2 2 [ 1; 1]


f2 (x1 ; 1) 0 f2 (x1 ; 1) 8x1 2 [ 100; 100]

By the Poincare-Miranda's Theorem, the equation has a solution x 2 [x0 ; x00 ] R2 , with
f1 (x) = f2 (x) = 0. N

Thanks to the Poincare-Miranda's Theorem, we can establish an operator version of


Proposition 570.

Proposition 617 Let f = (f1 ; :::; fn ) ; g = (g1 ; :::; gn ) : [a; b] ! Rn be continuous operators
de ned on an interval of Rn . If, for each i = 1; :::; n, we have

fi (ai ; x i ) gi (ai ; x i ) and fi (bi ; x i ) gi (bi ; x i ) 8x i 2 [a i ; b i ]

then there exists c 2 [a; b] such that f (c) = g (c).

Proof Let h : [a; b] ! Rn be de ned by h (x) = f (x) g (x). Then, for each i = 1; :::; n, we
have

hi (ai ; x i ) = fi (ai ; x i ) gi (ai ; x i ) 0 and hi (bi ; x i ) = fi (bi ; x i ) gi (bi ; x i ) 0

for each x 2 [a; b]. Since h is continuous, by the Poincare-Miranda's Theorem there exists
c 2 [a; b] such that h (c) = 0, that is, f (c) = g (c).

Through this result we can generalize the equilibrium analysis that we carried out earlier
in the book for the market of a single good (Proposition 571). Consider now a market where
n goods are traded. Let

D = (D1 ; :::; Dn ) : [a; b] ! Rn+ and S = (S1 ; :::; Sn ) : [a; b] ! Rn+

be, respectively, the aggregate demand and supply functions, that is, at price p 2 [a; b] Rn+
the market demands a quantity Di (p) 0 and o ers a quantity Si (p) 0 of each good
i = 1; :::; n.
A pair (p; q) 2 [a; b] Rn+ of prices and quantities is a market equilibrium if

q = D (p) = S (p) (14.8)

The last result permits to establish the existence of such equilibrium, thus generalizing
Proposition 571 to the general case of n goods. Besides continuity, existence requires that,
for each good i, we have

Di (ai ; p i ) Si (ai ; p i ) and Di (bi ; p i ) Si (bi ; p i ) 8p i 2 [a i ; b i ]


440 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

At its smallest possible price, a i , the demand of the good i is greater than its supply
regardless of the prices of the other goods, while the opposite is true at its highest possible
price bi . To x ideas, assume that a = 0. Then, condition Di (0; p i ) Si (0; p i ) just means
that demand of a free good will always exceed its supply, regardless of which are the prices of
the other goods (a reasonable assumption). In contrast, the opposite happens at the highest
price bi , at which the supply of good i exceeds its demand regardless of the prices of the
other goods (a reasonable assumption as long as bi is \high enough").
Via the excess demand function E : [a; b] ! Rn de ned by
E (p) = D (p) S (p)
we can formulate the equilibrium condition (14.8) as a market equation
E (p) = 0 (14.9)
For n = 1, it reduces to the earlier one-good market equation (13.15). A pair (p; q) of
prices and quantities is a market equilibrium if and only if price p solves this equation and
q = D (p). There is excess demand at price p of good i if Ei (p) 0 and excess supply if
Ei (p) 0. In equilibrium, there is neither excess demand nor excess supply. Next we state
the general equilibrium existence result in excess demand terms.
Proposition 618 Let the excess demand function E : [a; b] ! R be continuous and such
that, for each good i = 1; :::; n,
Ei (bi ; p i ) 0 Ei (ai ; p i ) 8p i 2 [a i ; b i ]
Then, there exists a market equilibrium (p; q) 2 [a; b] Rn+ .
This result thus establishes the existence of equilibria under the reasonable assumptions
previously discussed. It can be easily extended to the standard case when demand and
supply functions are de ned on Rn+ , so have the form D : Rn+ ! Rn+ and S : Rn+ ! Rn+ , by
requiring the existence of prices p0 < p00 that play the roles of the vectors a and b, respectively.
These prices are here, mutatis mutandis, the analogs of the vectors x0 and x00 considered for
equation (14.7).

14.1.2 Fixed points


We can look at the scalar equation f (x) = 0 from a di erent angle. De ne the auxiliary
function g : A R ! R by g (x) = f (x) + x, with 6= 0. A scalar x 2 A solves the scalar
equation if and only if g (x) = x. The scalar x is said to be a xed point of function g. So, a
scalar is a solution of the equation de ned by function f if and only if it is a xed point of
function g. Solving an equation thus amounts to nd a xed point.
In the scalar case, this remark is just a bit more than a curiosum. In contrast, it becomes
important in the general vector case because sometimes the best way to solve the general
equation (14.1) is to consider an associated xed point problem, so to reduce the solution
of an equation to the search of the xed points of suitable operators. For this reason in this
section we study xed points.

An operator f : A Rn ! Rn is said to be a self-map if f (A) A, that is, if f (x) 2 A


for all x 2 A. In words, self-maps associates to elements of A elements of A. They never
escape A. To emphasize this key feature, we often write f : A ! A.
14.1. EQUATIONS 441

Example 619 (i) All operators f : Rn ! Rn are, trivially, self-maps. (ii) The function
f : [0; 1] ! R given by f (x) = x2 is a self-map because x2 2 [0; 1] for all x 2 [0; 1]. In
contrast, the function f : [0; 1] ! R given by f (x) = x + 1 is not a self-map because, for
instance, f (1) = 2 2
= [0; 1]. N

Self-maps are important here because they may admit xed points.

De nition 620 Given a self-map f : A ! A, a vector x 2 A is said to be a xed point of


f if f (x) = x.

For instance, for the quadratic self-map f : [0; 1] ! [0; 1] given by f (x) = x2 , the
endpoints 0 and 1 of the unit interval are xed points. For the self-map f : R2 ! R2 given
by f (x1 ; x2 ) = (x1 ; x1 x2 ), the origin is a xed point in that f (0) = 0.

Turn now to the key question of the existence of xed points. In the scalar case, it is an
immediate consequence of Bolzano's Theorem.

Lemma 621 A continuous self-map f : [0; 1] ! [0; 1] has a xed point.

Proof The result is obviously true if either f (0) = 0 or f (1) = 1. Suppose f (0) > 0 and
f (1) < 1. De ne the auxiliary function g : [0; 1] ! R by g (x) = x f (x). Then, g (0) < 0
and g (1) > 0. Since g is continuous, by Bolzano's Theorem there exists x 2 (0; 1) such that
g (x) = 0. Hence, f (x) = x, and so x is a xed point.

In the general case, the existence of xed points is ensured by the famous Brouwer's
Fixed Point Theorem.4 In analogy with the scalar case, it can be viewed as a consequence
of the Poincare-Miranda's Theorem.

Theorem 622 (Brouwer) A continuous self-map f : K ! K de ned on a convex compact


subset K of Rn has a xed point.

Proof We rst consider the interval case K = [0; 1]n . Let I : [0; 1]n ! [0; 1]n be the identity
function I (x) = x. We have Ii (0i ; x i ) fi (0i ; x i ) and Ii (1i ; x i ) fi (1i ; x i ) for all
x 2 [0; 1]n , where 1 = (1; :::; 1). So, we can apply the Poincare-Miranda's Theorem to the
function I f , which ensures the existence of a vector x 2 [0; 1]n such that (I f ) (x) = 0.
Hence, f (x) = x. The interval case is thus an immediate consequence of Poincare-Miranda's
Theorem. To consider a general K we need a claim involving its dimension m = dim K n
(see Section 16.6 below).

Claim There exists a continuous bijection h from K to the closed unit ball B1 (0) of Rm .

Proof of the Claim We refer to Stoer and Witzgall (1970) pp. 124 for a proof. Here we
just remark that when m = n, i.e., when K has a nonempty interior (Proposition 812 below)
and when 0 2 int K, the function h : K ! B1 (0) de ned by
8
< inff 0: x2Kg x if x 6= 0
kxk
h (x) =
: 0 if x = 0
4
It is named after Luitzen Brouwer, who proved it in 1912.
442 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

is the sought-after continuous bijection from K to B1 (0).

By Corollary 598, also the inverse function h 1 : B1 (0) ! K is continuous. Clearly,


a similar bijection exists between B1 (0) and [0; 1]m . Using compositions, it is then easy
to see that there exists a continuous bijection g : [0; 1]m ! K with continuous inverse
g 1 : K ! [0; 1]m . The map

g 1
f g : [0; 1]m ! [0; 1]m

is continuous and so, by what previously proved, it has a xed point x. Since
1
g f g (x) = x () f (g (x)) = g (x)

we conclude that g (x) is a xed point of f , as desired.

Brouwer's Theorem is a powerful result that only requires the self-map to be continuous.
However, it is demanding on the domain, which has to be a compact and convex set, and it
is a non-constructive existence result: it ensures the existence of a xed point, but gives no
information on how to nd it.5
We close by observing that, as proved by Carlo Miranda in his seminal 1940 piece,
Brouwer's Theorem in turn can be used to prove the Poincare-Miranda's Theorem. The two
results thus imply each other and, in this sense, are equivalent.6

14.1.3 Aggregate market analysis via xed points


Let us go back to equation (14.1), i.e.,

f (x) = 0

In view of Brouwer's Theorem, we may solve this equation by nding a self-map g : K ! K


de ned on a convex compact subset K of Rn such that f (x) = 0 if and only if g (x) = x.
In this way, we reduce the solution of the equation to the search of the xed points of a
self-map.
Nice on paper, but in practice it might well not be an easy task to carry out. Remarkably,
however, this approach works very well to establish the existence of market equilibria. So,
let D : Rn+ ! Rn+ and S : Rn+ ! Rn+ be, respectively, the aggregate demand and supply
functions of such bundles of n goods. Through the excess demand operator E = D S we
can de ne the market equation (14.9), i.e., E (p) = 0. A pair (p; q) 2 Rn+ Rn+ of prices and
quantities is a market equilibrium if (14.8) holds, i.e., q = D (p) = S (p). Thus, a market
equilibrium exists if and only if there exists a price vector p, called equilibrium price, that
solves the market equation.
A weaker notion is often considered, however, that only requires goods' demand not to
exceed their supply: a pair (p; q) 2 Rn+ Rn+ is a weak market equilibrium if

q = D (p) S (p) (14.10)


5
Recall the discussion in Section 1.3.2 on existence results.
6
In Section 40.4 we will see a third equivalent result, a further multivariable version of Bolzano's Theorem.
14.1. EQUATIONS 443

To de ne the corresponding operator equation, de ne the positive part E + : Rn+ ! Rn+ of E


by
Ei+ (p) = max fEi (p) ; 0g 8i = 1; :::; n
That is, Ei+ (p) = Ei (p) if Ei (p) > 0 and Ei+ (p) = 0 otherwise. As a result, given any
price vector p it holds E (p) 0 if and only if E + (p) = 0. So, a weak market equilibrium
exists if and only if there exists a price vector p, called weak equilibrium price, that solves
the following equation
E + (p) = 0 (14.11)
and q = D (p). Note that the domain and range of E + is Rn+ , not Rn .
A remarkable application of Brouwer's Fixed Point Theorem is the resolution of this more
general market equation. We assume that:

A.1 D and S are continuous at each p > 0;


A.2 D ( p) = D (p) and S ( p) = S (p) for each > 0: nominal changes in prices do not
matter;
A.3 Di (p) > Si (p) for some i with pi > 0 implies Sj (p) > Dj (p) for some j: if some goods
are in excess demand at a positive price, other ones must be in excess supply.
A.4 Di (p) Si (p) if pi = 0: free goods are in excess demand.

These conditions ensure the existence of weak market equilibria.

Theorem 623 Under conditions A.1-A.4, a weak market equilibrium exists.

Proof A useful consequence of the positive homogeneity condition A.2 is that, without loss
of generality, we can consider the operator E restricted on the simplex
( n
)
X
n
n 1 = p 2 R+ : pi = 1
i=1

of Rn .7 So, let E : n 1! Rn be the restriction of E on the simplex. We want to show


that there is some p 2 +
n 1 such that E (p) = 0, i.e., E (p) 0. De ne g : n 1 ! n 1
by
1
g (p) = Pn + p + E + (p) 8p 2 n 1
1+ i=1 Ei (p)
By A.1, the function is continuous (why?). Since n 1 is convex and compact, By Brouwer's
Theorem there is some p 2 n 1 such that g (p) = p, that is,
1
Pn + p + E + (p) = p
1+ i=1 Ei (p)
Pn +
Hence, E + (p) = i=1 Ei (p) p. That is,
n
X
Ek+ (p) = pk Ei+ (p) 8k = 1; ::; n (14.12)
i=1
7
Simplexes will be studied in Chapter 17 (see Example 774).
444 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

We want to prove that E + (p) = 0. Suppose, by contradiction, that there exists a good k for
which Ek+ (p) = Ek (p) > 0. By (14.12), it follows that pk > 0. Hence, by A.3 there exists a
good j for which Sj (p) > Dj (p). Hence, Ej+ (p) = 0. Moreover, A.4 implies that its price is
strictly positive, i.e., pj > 0. In view of (14.12) we can write
n
X
0= Ej+ (p) = pj Ei+ (p)
i=1
Pn
This yields i=1 Ei+ (p) = 0, which contradicts Ek+ (p) > 0. We conclude that E + (p) = 0,
so p is a weak equilibrium price.

Consider the following additional condition, which complements assumption A.3:

A.5 Di (p) < Si (p) for some i with pi > 0 implies Sj (p) < Dj (p) for some j: if some goods
are in excess supply at a positive price, other ones must be in excess demand.

Proposition 624 Under conditions A.1-A.5, a market equilibrium exists.

This result shares with our earlier equilibrium existence result, Proposition 618, condi-
tions A.1 and A.4 { the latter being, essentially, the condition Ei (ai ; x i ) 0. Conditions
A.2, A.3 and A.5 are, instead, new and replace the highest price condition Ei (bi ; x i ) 0.
In particular, condition A.2 will be given a compelling foundation in Section 22.9.

Proof By the previous result there exists p 2 n 1 such that E (p ) 0. We want to


show that E (p ) = 0. Suppose, by contradiction, that Ei (p ) < 0 for some good i. By
A.4, pi > 0. By A.5, there exists some good j such that Ej (p ) > 0, which contradicts
E (p ) 0. We conclude that E (p ) = 0, so p is an equilibrium price.

In Section 22.9, we will present a simple exchange economy that provides a foundation
in terms of individual behavior of the aggregate market analysis of this section. In that
section we will see that it is natural to expect that the excess demand satis es the following
property:

W.1 p E (p) 0 for all p 2 Rn+ .

This condition is a weak version of the following (aggregate) Walras' law :

W.2 p E (p) = 0 for all p 2 Rn+ .

As it will be seen in Section 22.9, condition W.1 only requires agents to buy a ordable
bundles, while Walras' law requires them to exhaust their budgets, a reasonable but non-
trivial assumption.
Condition W.1 implies condition A.3. So, in the existence Theorem 623 we can replace
A.3 with a weak Walras' law, which has a compelling economic foundation. The stronger
condition W.2 implies both A.3 and A.5, so in the last result Walras' law can replace these
two conditions. A bit more is actually true, as the next simpli ed version of classic results,
due to Kenneth Arrow and Gerard Debreu, shows.8
8
The classic work on this topic is Debreu (1959).
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 445

Theorem 625 (Arrow-Debreu) Under conditions A.1, A.2 and W.1, a weak market equi-
librium exists. If, in addition, A.4 and W.2 hold, then a market equilibrium exists.

Proof As in the previous proof, using P


just A.1 and A.2 we can prove the existence of a price
+ n +
vector p 2 n 1 such that E (p) = i=1 Ei (p) p. Multiply this equation by the vector
E (p) and use W.1 to get
" n #
X
E + (p) E (p) = Ei+ (p) p E (p) 0
i=1
P P
So, since ni=1 Ei+ (p) 0, we have ni=1 Ei+ (p) Ei (p) 0. But, every addendum is positive
because Ei+ (p) Ei (p) is either 0 or Ei2 (p). So,
n
X
Ei+ (p) Ei (p) = 0
i=1

and, in particular, Ei+ (p) Ei (p) = 0 for each i. By the de nition of Ei+ (p), we obtain that
Ei (p) 0 for each i. Therefore, p is a weak equilibrium price.
It remains to show that, if also A.4 and W.1 hold, then p is an equilibrium price. Since
W.1 implies A.5, we can proceed as in the proof of Proposition 624.

14.2 Asymptotic behavior of recurrences


14.2.1 A general de nition for recurrences
The notions introduced so far, in this and in the last chapter, permit to study the convergence
of sequences de ned by recurrences, a most important class of sequences.
We rst give a general de nition of a recurrence that properly formalizes the informal
analysis of recurrences of Section 8.1. Throughout this section A denotes a subset of the real
line.9

De nition 626 A function ' : An = A A ! A de nes a recurrence of order k if


(
x0 = 0 ; x1 = 1 ; ::: ; xk 1 = k 1

xn = ' (xn 1 ; xn 2 ; :::; xn k ) for n k

with k given initial conditions i 2 A.

A closed form sequence f : N ! R solves the recurrence if


(
f (0) = 0 ; f (1) = 1 ; ::: ; f (k 1) = k 1
(14.13)
f (n) = ' (f (n 1) ; f (n 2) ; :::; f (n k)) for n k

If ' is linear and A is the real line, by Riesz's Theorem there exists a vector a = (a1 ; :::; an ) 2
Rn such that ' (x) = a x, so we get back to the linear recurrence (8.11). Solutions of this
important class of recurrences have been studied in Section 11.2.2.
9
Most of the analysis of this section continues to hold if A is a subset of Rn , as readers can check.
446 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

If k = 1, the scalar function ' : A ! A is a self-map that de nes a recurrence of order 1


given by (
x0 = 0
(14.14)
xn = ' (xn 1 ) for n 1
with initial condition 0 2 R. If the self-map ' : A ! A is linear, it reduces to the geometric
recurrence (
x0 = 0
(14.15)
xn = axn 1 for n 1
We close this introductory part with a simple, yet important, uniqueness property of
solutions.

Proposition 627 The recurrence (14.13) has at most a unique solution.

Proof Let f; f~ : N ! R be two solutions of this recurrence. We want to show that f = f~.
We proceed by induction. A preliminary observation: by construction and since the terms
~
i are given, we have f (n) = f (n) for all n < k.
Initial step: just note that f (0) = 0 = f~ (0). Induction step: assume that f (n 1) =
f~ (n 1). By the preliminary observation, if n 1 < k 1, then n < k and f (n) = f~ (n) for
all n < k. By (14.13), if n 1 k 1, then n k and

f (n) = ' (f (n 1) ; f (n 2) ; :::; f (n k)) = '(f~ (n 1) ; f~ (n 2) ; :::; f~ (n k)) = f~ (n)

as desired. We conclude that f = f~.

14.2.2 Asymptotics
From now on, we focus on the recurrence (14.14). We need some notation. Given any selfmap
' : A ! A, its second iterate ' ' : A ! A is denoted by '2 . More generally, 'n : A ! A
denotes the n-th iterate 'n = 'n 1 ', i.e.,

'n (x) = ' 'n 1


(x) = ( ' ' ' ) (x) 8x 2 A
| {z }
n times

We adopt the convention that '0 is the identity map '0 (x) = x for all x 2 A.

Example 628 (i) Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = x= (1 + x).
Then,
x
1+x x
'2 (x) = ' (' (x)) = x =
1 + 1+x 1 + 2x
x
x
'3 (x) = ' '2 (x) = 1+2xx =
1 + 1+2x 1 + 3x

This suggests that


x
'n (x) = 8n 1 (14.16)
1 + nx
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 447

Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
x
n+1 n 1+nx x
' (x) = ' (' (x)) = x =
1 + 1+nx 1 + (n + 1) x

as desired.
(ii) Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = ax2 . Then,
2
'2 (x) = ' (' (x)) = a ax2 = a3 x4
2
'3 (x) = ' '2 (x) = a a3 x4 = a7 x8

With the help of a little bird, this suggests that


n 1 2n
'n (x) = a2 x 8n 1 (14.17)

Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
n 1 2n 2 n 1 2n 1 n+1 n+1 1 2n+1
'n+1 (x) = ' ('n (x)) = a a2 x = aa2 a x2 = a2 x

as desired. N

We can represent the sequence fxn g de ned via the recurrence (14.14) using the iterates
'n of the selfmap ' : A ! A. Indeed, we have

xn = 'n (x0 ) 8n 0 (14.18)

A sequence of iterates f'n (x0 )g of points in A that starts from an initial point x0 of A is
called orbit of x0 under '. The collection

ff'n (x0 )g : x0 2 Ag

of all the orbits determined by possible initial conditions is called phase portrait of '. In view
of (14.18), the orbits that form the phase portrait of ' describe how the sequence de ned
by the recurrence (14.14) may evolve according to how it is initialized.

Example 629 (i) For the geometric recurrence, the relation (14.18) takes the familiar form

xn = 'n (x0 ) = an x0 8n 0

So, the phase portrait of ' (x) = ax is ffan x0 g : x0 2 Rg.


(ii) For the nonlinear recurrence de ned by the self-map ' : [0; 1) ! [0; 1) given by
' (x) = x= (1 + x), we have
x0
xn = 'n (x0 ) = 8n 1
1 + nx0

Here the phase portrait is ffx0 = (1 + nx0 )g : x0 0g. N


448 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

Orbits solve the recurrence (14.14) if they can be described in closed form, as it is the
case for the recurrences of the last two examples. Unfortunately, often this is not possible
and so the main interest of (14.18) is theoretical. Yet, operationally it may still suggest a
qualitative analysis of the recurrence. A main issue in this regard is the asymptotic behavior
of orbits: where do they end up eventually? for instance, do they converge?
The next simple, yet important, result shows that xed points play a key role in studying
the convergence of orbits.

Theorem 630 Let ' : A ! A be a continuous self-map and x0 a point of A. If the orbit
f'n (x0 )g converges to x 2 A, then x is a xed point of '.

Proof Assume that xn = 'n (x0 ) ! x 2 A. Since ' is continuous, we have ' (x) =
lim ' ('n (x0 )). So,

' (x) = lim ' ('n (x0 )) = lim 'n+1 (x0 ) = lim xn+1 = lim xn = lim 'n (x0 ) = x

where the equality lim xn+1 = lim xn holds because, as easily checked, if xn ! x then
xn+k ! x for every given k 1. We conclude that x is a xed point, as desired.

So, a necessary condition for a point to be the limit of a sequence de ned by a recurrence
of order 1 is to be a xed point of the underlying self-map. If there are no xed points,
convergence is hopeless. If they exist (e.g., by Brouwer's Theorem), we have some hope.
Yet, it is only a necessary condition: as it will become clear later in the section, there are
xed points of ' that are not limit points of the recurrence (14.14).10

Fixed points thus provide the candidate limit points. We have the following procedure
to study limits of sequences de ned by a recurrence (14.14):

1. Find the collection fx 2 A : ' (x) = xg of the xed points of the self-map '.

2. Check whether they are limits of the orbits f'n (x0 )g, that is, whether 'n (x0 ) ! x.

This procedure is especially e ective when xed points are unique. Indeed, in this case
there is a unique candidate limit point for all possible initial conditions x0 2 A, so if orbits
converge { e.g., they form a monotone sequence, so Theorem 323 applies { then they have to
converge to the xed point. Remarkably, in this case iterations swamp the initial condition,
which asymptotically plays no role in the behavior of the recursion. Regardless of how it
starts, the recursion eventually behaves the same.
In view of this discussion, the next result is especially interesting.11

Proposition 631 If the self-map ' : A ! A is a contraction, then it has at most a unique
xed point.
10
See the oscillating case in Section 14.2.3.
11
Contractions are introduced in Section 19.1. For this section, it is enough to recall its simple de nition for
the case at hand: ' is a contraction if there exists a constant k 2 (0; 1) such that j' (x1 ) ' (x2 )j k jx1 x2 j
for all x1 ; x2 2 A.
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 449

Proof Suppose that x1 ; x2 2 A are xed points. By the de nition of contraction, for some
k 2 (0; 1) we have
0 jx1 x2 j = j' (x1 ) ' (x2 )j k jx1 x2 j
and so jx1 x2 j = 0. This implies x1 = x2 , as desired.

So, recursions de ned by self-maps that are a contraction have at most a single candidate
limit point. It is then enough to check whether it is actually a limit point.

Example 632 A continuously di erentiable function ' : [a; b] ! R is a contraction if 0 <


k = maxx2[a;b] j'0 (x)j < 1 (cf. Example 896). Take the contraction ' : [0; 1] ! [0; 1] given
by ' (x) = x2 =4. The unique xed point is the origin x = 0. By (14.17), we have
1 n
'n (x0 ) = x2
1 0
!0 8x0 2 [0; 1]
42 n
So, the orbits converge to the xed point for all initial conditions x0 2 [0; 1]. N

The next example shows, inter alia, that being a contraction is a su cient but not
necessary condition for the uniqueness of xed points.

Example 633 Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = x= (1 + x). We
have, for all x; y 0,
jx yj
j' (x) ' (y)j =
(1 + x) (1 + y)
So, ' is not a contraction. Nevertheless, it is easy to check that it has a unique xed point
given by the origin x = 0. By (14.16), we have
x0
'n (x0 ) = !0 8x0 0
1 + nx0
So, the orbits converge to the xed point for all initial conditions x0 0. N

In the rest of the section we illustrate our asymptotic analysis through some important
applications.

14.2.3 Price dynamics


We consider the equilibrium price recurrences of Section 8.4, an important class of linear
recurrences. Expectations play here a key role, so we articulate the analysis according to
them.

Classic expectations Let us go back to the recurrence, with initial expectation E0 (p1 ),
(
p1 = E0 (p1 )
(14.19)
pt = pt 1 for t 2

of the equilibrium prices of markets with production delays and classic expectations, that is,
extrapolative expectations of the simplest form Et 1 (pt ) = pt 1 (cf. Section 8.4.3).
450 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

We now study lim pt to understand the asymptotic behavior of such equilibrium prices.
To this end, consider the map ' : [a; b] ! R de ned by

' (x) = x

where [a; b] = [0; = ] with > 0 and 0 < .

Lemma 634 The function ' : [a; b] ! R is a self-map.

Proof Since > 0, the function ' is strictly decreasing and = 2 (0; 1]. Moreover, we
have
' (0) = 0= and ' = 1 0

We can conclude that 0 '( = ) ' (x) ' (0) = = for all x 2 [a; b], that is, ' is a
self-map.

We can thus write ' : [a; b] ! [a; b]. This self-map de nes the price recurrence (14.19).
Its unique xed point of ' is easily seen to be

p=
+

Thus, the unique candidate limit price is the equilibrium price (8.17) of the market without
delays in production.

Let us check whether or not p is indeed the limit point. The following formula is key.

Lemma 635 We have


t 1
pt p = ( 1)t 1
(p1 p) 8t 1 (14.20)

Proof Consider t 2. We have

1 1
pt p = pt 1 = pt 1
+ +
+
= pt 1 = pt 1 = (pt 1 p)
( + ) +

that is,
pt p= (pt 1 p) 8t 2 (14.21)

For each t 1 set t = pt p. We have that 1 = p1 p and t = ( = ) t 1 for all


t 2. This is the geometric recursion of Example 293 which gives as a result the geometric
sequence with rst term p1 p and common ratio = . This implies that for each t 1
t 1 t 1
t 1
pt p= t = 1 = ( 1) (p1 p)
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 451

as desired.

Since (
t 1 1 if t odd
( 1) =
1 if t even
from formula (14.20) it follows that
t 1
jpt pj = jp1 pj 8t 1 (14.22)

The value of lim pt thus depends on the ratio = of the slopes of the supply and demand
functions. We need to distinguish two cases according to whether such ratio is lower than
or equal to 1, that is, according to whether < or = .

Case 1: < The supply function has a lower slope than the demand function. We
have
t 1
lim jpt pj = jp1 pj lim =0

So,
lim pt = p (14.23)
as well as
lim Et 1 (pt ) =p (14.24)
When < , the xed point p is indeed a limit point. Equilibrium prices of markets with
delays and classic expectations thus converge to the equilibrium price of the market without
delays in production. This holds for any possible initial expectation E0 (p1 ), which in the
long run turns out to be immaterial.
Note that the (one-step-ahead) forecast error vanishes asymptotically:
et = pt Et 1 (pt ) !0
Classic expectations, though lazy, are nevertheless asymptotically correct provided < .

Case 2: = The demand and supply functions have the same slope. Formula (14.20)
implies
pt p = ( 1)t 1
(p1 p) 8t 1
The initial price p1 is equal to p if and only if the initial expectation is correct:
E0 (p1 ) = p1 () E0 (p1 ) = E0 (p1 ) () E0 (p1 ) = p

So, if the initial expectation is correct, then pt = p for all t 1. Otherwise, the initial error
E0 (p1 ) 6= p1 determines an oscillating sequence of equilibrium prices
2p p1 if t even
pt = p + ( 1)t 1
(p1 p) =
p1 if t odd
for all t 1. Also the error forecast
et = pt Et 1 (pt ) = pt pt 1 = (( 1)t 1
( 1)t 2
) (p1 p) = 2 ( 1)t 1
(p1 p)
keeps oscillating.
452 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

Rational expectations As we already remarked, given a sequence of equilibrium prices


fpt g and of price expectations fEt 1 (pt )g, the error forecast et at each t is given by
et = pt Et 1 (pt )

The expectation underestimates price pt if et > 0 and overestimates it if et < 0. Instead, if


et = 0 the expectation is correct.
It is plausible that rational producers do not err systematically: errare humanum est,
perseverare diabolicum. An extreme form of this principle requires expectations to be always
correct:
Et 1 (pt ) = pt 8t 1
It is the so-called hypothesis of rational expectations (or perfect foresight). Though extreme,
it is a clear-cut hypothesis that is important to x ideas.

In view of the equilibrium relation (8.19), in the market of potatoes with production
delays the producers' error forecast et at time t is

et = pt Et 1 (pt ) = + 1 Et 1 (pt )

In particular, at each t 1 one has

et = 0 () + 1 Et 1 (pt ) = 0 () Et 1 (pt ) =
+
So, expectations are rational if and only if

Et 1 (pt ) = pt = p = 8t 1
+
We have thus proved the following result.
Proposition 636 A uniperiodal market equilibrium of markets MRt features rational expec-
tations if and only if the sequence of equilibrium prices is constant with
pt = Et 1 (pt ) =p
for all t 1.
The constancy of equilibrium prices is thus equivalent to the correctness of expectations.
A non-trivial price dynamics is, thus, the outcome of forecast errors. This result holds for
any kind of expectations, extrapolative or not. Indeed, the rationality of expectations is
a property of expectations, not an hypothesis on how they are formed: once a possible
expectation formation mechanism is speci ed, a theoretical issue is to understand when they
are correct. For instance, in the previous case = , we saw that classic expectations are
rational if and only if the initial expectation is correct, that is, E0 (p1 ) = p1 .

The uniperiodal price equilibrium under rational expectations of markets MRt with pro-
duction delays is equal to the equilibrium price (8.17) of market M . Remarkably, rational
expectations have neutralized, in equilibrium, any e ect of di erences in production tech-
nologies. In terms of potatoes' equilibrium prices, it is immaterial to have a traditional
technology, with sowing in t 1 and harvest in t, rather than a Star Trek one with instan-
taneous production.
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 453

Adaptive expectations We close by considering the equilibrium price dynamics when


producers' expectations are adaptive. By Proposition 636, adaptive expectations are rational
if and only if
pt = p = 8t 1 (14.25)
+
This can be also checked directly: adaptive expectations are rational if and only if Et 1 (pt )
Et 2 (pt 1 ) = [pt 1 Et 2 (pt 1 )] = 0 for all t 2, which in turn easily implies (14.25). A
natural question is whether under adaptive expectations equilibrium prices converge in the
long run to the rational expectation equilibrium price, as for instance happens under classic
expectations when < . To answer this key question, we need the next lemma.

Lemma 637 We have


t 1
(1 )
pt p = ( 1)t 1
(p1 p) 8t 1

Proof Recall the linear recurrence (8.27), that is,


8
< p1 = E0 (p1 )
h i
: pt = (1 ) pt 1 + for t 2

Consider t 2. We have
(1 ) 1
pt p = (1 ) pt 1 + = pt 1 +
+ +
(1 ) + (1 ) (1 )
= pt 1 + = pt 1 +
( + ) ( + )
(1 ) (1 ) (1 )
= pt 1 + = (pt 1 p)
+
that is,
(1 )
pt p= (pt 1 p) 8t 2

For each t 1 set t = pt p. We have that 1 = p1 p and t = (((1 ) )= ) t 1


for all t 2. This is the geometric recursion of Example 293 which gives as a result the
geometric sequence with rst term p1 p and common ratio [(1 ) ] = . This implies
that for each t 1
t 1 t 1
(1 ) (1 )
pt p = t = 1 = (p1 p)
t 1
(1 )
= ( 1)t 1
(p1 p)

as desired.

By taking the absolute value, we then get


t 1
(1 )
jpt pj = jp1 pj 8t 1
454 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)

which reduces to (14.22) in the classic case = 1. The limit behavior of the prices pt thus
depends on the ratio
(1 )

To understand its value, observe that

(1 )
= + (1 ) ( 1) (14.26)

Since ; = 2 (0; 1], there are two possible cases to consider:

<1 and =1

In the rst case, observe that

1< + (1 ) ( 1) < 1

By (14.26), we conclude j( (1 ) ) = j < 1. By proceeding as we did earlier in this


section for classic expectations, it is then easy to see that in this rst case prices converge
to the rational expectation price. In the second case, since necessarily = = = 1 (why?),
we go back to the classic case with oscillating prices. In sum, our earlier \classic" ndings
nicely generalize to adaptive expectations.

14.2.4 Heron's method


While computing the square a2 of a number a is quite simple, the procedure required to
p
compute the square root a of a positive number a is signi cantly harder. Fortunately, we
can count on Heron's method, a powerful algorithm also known as \Babylonian method".
Given 0 < a 6= 1, Heron's sequence fxn g is de ned by recurrence by setting x1 = a and

1 a
xn+1 = xn + 8n 2 (14.27)
2 xn
p
Theorem 638 (Heron) Let 0 < a 6= 1. It holds xn ! a.

Thus, Heron's sequence converges to the square root of a. On top of that, the rate of
convergence is quite fast, as we will see in a few examples.

Proof By induction, it is immediate to show that xn > 0 for all n 1. Heron's sequence
is convergent because it is (strictly) decreasing, at least after n = 2. To prove it, we rst
observe that
p p
xn > a =) xn > xn+1 > a (14.28)
p
Indeed, let xn > a. It follows that x2n > a, i.e., xn > a=xn . So,

1 a 1
xn+1 = xn + < (xn + xn ) = xn
2 xn 2
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 455

Next, we show that p p


xn 6= a =) xn+1 > a
2 p
Since x2n a > 0 when xn 6= a, we have

x4n + a2 a2
x4n 2x2n a + a2 > 0 =) > 2a =) x2
n + > 2a =)
x2n x2n
2
a2 a
x2n + + 2a > 4a =) xn + > 4a
x2n xn

that is,
2
1 a
x2n+1 = xn + >a
4 xn
p
proving that xn+1 > a. This completes the proof of (14.28).
p p
If a > 1, we have x1 = a > a. By (14.28), x2 > a. If, instead, 0 < a < 1, then
p
x2 = (a + 1) =2 > a. Indeed, by squaring we obtain

(a + 1)2 > 4a () a2 + 2a + 1 > 4a () a2 2a + 1 > 0 () (a 1)2 > 0


p p
In sum, for all 0 < a 6= 1 we have x2 > a. From (14.28) it then follows a < x3 < x2 ,
p
which in turn implies a < x4 < x3 , and so on. The elements of the sequence, starting from
p
the second one, are thus decreasing and greater than a.
p
We conclude that Heron' sequence is decreasing, at least for n 2, with lower bound a.
p
So, it is bounded and, by Theorem 323-(i), it has a nite limit L a > 0. The recurrence
(14.27) is de ned by the self-map ' : (0; 1) ! (0; 1) given by
1 a
' (x) = x+
2 x
Since
1 a a a
x+ x= () 2x = x + () x = () x2 = a
2 x x x
p p
the unique xed point of ' is a. By Theorem 630, we conclude that L = a, as desired.
p
Example 639 (i) Let us compute 2, which we know to be approximately 1:4142135.
Heron's sequence is:

1 2 3
x1 = 2 ; x2 = 2+ = = 1:5
2 2 2
1 3 2 17
x3 = + = ' 1:4166667
2 2 3=2 12
1 17 2 577
x4 = + = ' 1:4142156
2 12 17=12 408
1 577 2 665857
x5 = + = ' 1:4142135
2 408 577=408 470832

The quality of the approximation after only ve steps is remarkable.


456 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
p
(ii) Let us compute 428356 ' 654:48911. Heron's sequence is:

x1 = 428356 ; x2 ' 214178:5 ; x3 ' 107090:24


x4 ' 53547:115 ; x5 ' 26777:619 ; x6 ' 13396:807
x7 ' 6714:3905 ; x8 ' 3389:0936 ; x9 ' 1757:743
x10 ' 1000:7198 ; x11 = 714:3838 ; x12 ' 656:9999
x13 ' 654:4939 ; x14 ' 654:4891

Here fourteen steps delivered a sharp approximation.


p
(iii) For 0:13 ' 0:3605551, Heron's sequence is:

x1 = 0:13 ; x2 ' 0:565 ; x3 ' 0:3975442


x4 ' 0:3622759 ; x5 ' 0:3605592 ; x6 ' 0:360555

The sequence is decreasing starting from the second element. N

The geometric intuition behind Heron's method is elegant. It is based on a sequence of


p
rectangles of area equal to a that converge to a square with sides of length a (thus with
area a). The n-th rectangle's longer side is equal to xn and its shorter side is equal to a=xn
(given that the area must equal a): for n + 1 the longer side shrinks to

1 a
xn+1 = xn + < xn
2 xn

By iterating the algorithm, xn and a=xn become closer and closer, till they reach their
p
common value a. The following gure illustrates:

y
4

2a/x n+1
a/x
n
1

0
O x x x
n+1 n

-1
-1 0 1 2 3 4 5
Part IV

Linear and nonlinear analysis

457
Chapter 15

Linear functions and operators

In Chapter 3 we studied at length the linear structure of Rn . In this chapter we consider


linear functions, which form an important family of functions de ned on Rn that preserve
its linear structure.

15.1 Linear functions


15.1.1 De nition and rst properties
We begin by introducing the all-important class of linear functions.

De nition 640 A function f : Rn ! R is said to be linear if


f ( x + y) = f (x) + f (y) (15.1)
for every x; y 2 Rn and every ; 2 R.1

Example 641 The scalar functions f : R ! R de ned by f (x) = mx for some m 2 R are
linear. Geometrically, they are straight lines passing through the origin with slope m. N

Example 642 Through inner products (Section 4.1.1), it is easy to de ne linear functions.
Indeed, given a vector 2 Rn , de ne f : Rn ! R by
f (x) = x 8x 2 Rn (15.2)
This function f is linear:
n
X
f ( x + y) = ( x + y) = i( xi + yi )
i=1
n
X n
X
= i xi + i yi = ( x) + ( y)
i=1 i=1
= f (x) + f (y)
for every x; y 2 Rn and every ; 2 R. When n = 1, we go back to the last example: f is
then a straight line passing through the origin with slope 2 R. N
1
These functions are sometimes called linear functionals.

459
460 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Example 643 Production functions may take the linear form (15.2), with = ( 1 ; 2 ; :::; n) 2
Rn interpreted as the vector of constant production coe cients. Indeed, we have

f e1 = e1 = 1
2 2
f e = e = 2

f (en ) = en = n

which means that 1 is the quantity of output determined by one unit of the rst input,
2 is the quantity of output determined by one unit of the second input, and so on. These
coe cients are constant because they do not depend on the quantity of input. This implies
that the returns to scale of these production functions are constant. N

Next, we give a simple but important characterization: a function is linear if and only if
it preserves the operations of addition and scalar multiplication. Linear functions are, thus,
the functions that preserve the linear structure of Rn . This clari es their nature.

Proposition 644 A function f : Rn ! R is linear if and only if

(i) f (x + y) = f (x) + f (y) for all x; y 2 Rn ;

(ii) f ( x) = f (x) for all x 2 Rn and 2 R.

Proof \If". Suppose that (i) and (ii) hold. Then

f ( x + y) = f ( x) + f ( y) = f (x) + f (y)

so f is a linear function. \Only if". Let f be a linear function. If in (15.1) we set = =1


we get (i); if in (15.1) we instead set = 0, we get (ii).

Next we show that, more generally, linear combinations are preserved by linear functions.
When k = 2 we are back to the de nition, but the result goes well beyond that, as it holds
for every k 2.

Proposition 645 Let f : Rn ! R be a linear function. We have

f (0) = 0

and !
k
X k
X
i
f ix = if xi (15.3)
i=1 i=1
k
for every set of vectors xi i=1
in Rn and every set of scalars f i gki=1 .

Proof Let us show that f (0) = 0. Since f is linear, we have f ( 0) = f (0) for all 2 R.
So, f (0) = f (0) for all 2 R, which can happen if and only if f (0) = 0. The proof of
(15.3) is left to the reader.
15.1. LINEAR FUNCTIONS 461

A more general version of (15.3), called Jensen's inequality, will be proved in Chapter
17. Property (15.3) has an important consequence: once we know the values taken by a
linear function on the elements of a basis, we can determine its value for any vector of Rn
whatsoever. Indeed, let S be a basis of Rn . Each vector x 2 Rn can be written as a linear
n
so there exists a nite set of vectors xi i=1 in S and a set of
combination of elements of S, P
n n
scalars f i gi=1 such that x = i=1 i xi . By (15.3), we then have
n
X
f (x) = if xi
i=1

Thus, by exploiting the linearity of f , if we know the values of f xi : xi 2 S , we can


determine the value f (x) for each vector x 2 Rn .

Linearity is a purely algebraic property that requires functions to have a consistent behav-
ior with respect to the operations of addition and scalar multiplication. Thus, prima facie,
linearity has no topological consequences. It is, therefore, remarkable that linear functions
turn out to be continuous.

Theorem 646 Linear functions are continuous.

This elegant result is important because continuity is, as we learned in the last chapter,
a highly desirable property. We omit, however, the proof because it is a special case of a
result, Theorem 833, that will be proved later in the book.

15.1.2 Representation
The operations of addition and scalar multiplication for functions f; g : Rn ! R, additive or
not, and a scalar 2 R are de ned in the usual pointwise way (cf. Section 6.3.2), that is,

(f + g) (x) = f (x) + g (x) 8x 2 Rn (15.4)

and
( f ) (x) = f (x) 8x 2 Rn (15.5)
In particular, linearity is preserved by these operations.

Proposition 647 Let f; g : Rn ! R be two linear functions and 2 R. The functions f + g


and f are linear.

Proof We prove only that f + g is linear. Let x; y 2 X and ; 2 R. It holds

(f + g) ( x + y) = f ( x + y) + g ( x + y) = f (x) + f (y) + g (x) + g (y)


= (f (x) + g (x)) + (f (y) + g (y)) = (f + g) (x) + (f + g) (y)

We conclude that the sum function is linear.

Next we introduce the natural ambient where to carry out these operations over linear
functions.
462 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

De nition 648 The set of all linear functions f : Rn ! R is called the dual space of Rn
and is denoted by (Rn )0 .

By Proposition 647, the dual space (Rn )0 is closed under addition and scalar multiplica-
tion:

(i) if f; g 2 (Rn )0 , then f + g 2 (Rn )0 ;

(ii) if f 2 (Rn )0 and 2 R, then f 2 (Rn )0 .

The two operations satisfy the properties (v1)-(v8) that, in Chapter 3, we discussed for
Rn . Hence, intuitively, (Rn )0 is an example of a vector space. In particular, the neutral
element for the addition is the zero function f such that f (x) = 0 for every x 2 Rn , while
the opposite element of f 2 (Rn )0 is the function g = ( 1) f = f such that g(x) = f (x)
for every x 2 Rn .

The next important result, an elementary version of the celebrated Riesz's Theorem,
describes the dual space (Rn )0 . We saw that every vector 2 Rn induces a linear function
f : Rn ! R de ned by f (x) = x (Example 642). The following result shows that the
converse holds: all linear functions de ned on Rn have this form, i.e., the dual space (Rn )0
consists of the linear functions of the type f (x) = x for some 2 Rn . In particular, the
straight lines passing through the origin are the unique linear functions de ned on the real
line (Example 641).

Theorem 649 (Riesz) A function f : Rn ! R is linear if and only if there exists a unique
vector 2 Rn such that
f (x) = x 8x 2 Rn

Proof We have already seen the \if" part in Example 642. It remains to prove the \only if"
part. So, let f : Rn ! R be a linear function and consider the standard basis e1 ; :::; en of
Rn . Set
= f e1 ; :::; f (en ) 2 Rn
P
We can write each vector x 2 Rn as x = ni=1 xi ei . Thus, by the linearity of f we have:
n
! n n
X X X
i
f (x) = f xi e = xi f ei = i xi = x 8x 2 Rn
i=1 i=1 i=1

As to the uniqueness of , let 0 2 Rn be a vector such that f (x) = 0 x for every


x 2 Rn . Then, for every i = 1; :::; n we have
0 0
i = ei = f ei = ei = i

and so 0 = .

A linear function f : Rn ! R is, therefore, identi ed by a unique vector 2 Rn .


n 0 n
Accordingly, we can identify the dual space (R ) with the Euclidean space R . In a slightly
improper way, we can say that the dual space of Rn is Rn itself.
15.1. LINEAR FUNCTIONS 463

15.1.3 Monotonicity
Turn now to the order structure of Rn . A function f : Rn ! R is said to be:

(i) positive if f (x) 0 for each x 0;2


(ii) strictly positive if f (x) > 0 for each x > 0;
(iii) strongly positive if positive and f (x) > 0 for each x 0.

In words, a (strictly, strongly) positive function f assigns (strictly, strongly) positive


values f (x) to (strictly, strongly) positive vectors x.
In general, positivity and increasing monotonicity are altogether independent properties.
For instance, f : R ! R given by f (x) = 1 + sin x is positive but not increasing, while
g : R ! R given by g (x) = e x is increasing but not positive. Remarkably, for linear
functions these two properties become equivalent.

Proposition 650 A linear function f : Rn ! R is (strictly, strongly) increasing if and only


if it is (strictly, strongly) positive.

To prove that a linear function is (strictly, strongly) increasing, it thus is enough to show
that it is (strictly, strongly) positive.

Proof \Only if". Let f : Rn ! R be linear and increasing. Let x 0. We want to show
that f (x) 0. As f is increasing, we have

f (x) f (0) = 0

where the equality follows from Proposition 645. We conclude that f (x) 0, as desired.
\If". Let f : Rn ! R be linear and positive. Let x; y 2 Rn be such that x y. We want
to show that f (x) f (y). Set z = x y 2 Rn . Since x y, we have z 0. Positivity and
linearity then imply
f (x) f (y) = f (x y) = f (z) 0
yielding that f (x) f (y), as desired.
Finally, the proof of the strict and strong versions is similar.

Positivity emerges also in the monotone version of Riesz's Theorem. This result is of
great importance in applications as, for example, we will see in Section 24.6.3

Theorem 651 (Riesz-Markov) A function f : Rn ! R is linear and increasing if and


only if there exists a unique positive vector 2 Rn+ such that

f (x) = x 8x 2 Rn

In particular,
2
Positivity with respect to the order structure is weaker than positivity of the image of a function f : A
n
R ! R, a stronger notion requiring f (x) 0 for all x 2 A. In what follows, it should be clear from the
context which notion of positivity we are referring to.
3
Co-named after Andrej Markov because his 1938 piece is an early systematic study of monotone linear
functions in general spaces. A more general version of the Riesz-Markov Theorem will be given in Theorem
765.
464 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

(i) > 0 if and only if f is strongly increasing;

(ii) 0 if and only if f is strictly increasing.

Proof \Only if". Let f : Rn ! R be linear and increasing. As f is linear, by the Riesz
Theorem there exists 2 Rn such that f (x) = x for all x 2 Rn . As f is increasing, we
have
i = ei = f ei 0
for each i = 1; :::; n. This proves that 0, that is, 2 Rn+ .
n
\If". Let f : R ! R be such that f (x) = x for all x 2 Rn , with 2 Rn+ . Clearly,
f is linear. To show that f is increasing, take x; y 2 Rn with x y. We want to show that
f (x) f (y). Set z = x y. Since x y, we have z 0. As 0, we then have

z 0

that is, (x y) 0. In turn, this implies f (x) = x y = f (y).

(i) \Only if". Let f : Rn ! R be linear and strongly increasing. As f is increasing, by


what proved before there exists 2 Rn+ such that f (x) = x for all x 2 Rn . We want
to show that > 0. Let 1 = (1; :::; 1) be the vector with components equal to 1. Clearly,
1 0. Thus,
Xn

i = 1 = f (1) > 0
i=1
Pn
As 0, from i=1 i > 0 it follows that > 0.
\If". Let f : Rn ! R be such that f (x) = x for all x 2 Rn , with > 0. Clearly, f is
linear. To show that f is strongly increasing, take x; y 2 Rn with x y. We want to show
that f (x) > f (y). Set z = x y. Since x y, we have z 0. As > 0, we then have

z>0

that is, (x y) > 0. In turn, this implies f (x) = x> y = f (y).

(ii) \Only if". Let f : Rn ! R be linear and strictly increasing. As f is increasing, by


what proved before there exists 2 Rn+ such that f (x) = x for all x 2 Rn . We want to
show that 0. As f is strictly increasing, we have

i = ei = f ei > 0

for each i = 1; :::; n. This proves that 0.


\If". Let f : Rn ! R be such that f (x) = x for all x 2 Rn , with 0. Clearly, f
n
is linear. To show that f is strictly increasing, take x; y 2 R with x > y. We want to show
that f (x) > f (y). Set z = x y. Since x > y, we have z > 0. As 0, we then have

z>0

that is, (x y) > 0. In turn, this implies f (x) = x y = f (y).

(Strictly, strongly) increasing linear functions are thus characterized by (strongly, strictly)
positive representing vectors . Let us see an instance of this result.
15.2. MATRICES 465

Example 652 Consider the linear functions f; g : R3 ! R de ned by

f (x) = x1 + 2x2 + 5x3 and g (x) = x1 + 3x2

Denote by f and g their representing vectors. By the Riesz-Markov Theorem, f is strictly


increasing because f = (1; 2; 5) 0 and g is strongly increasing because g = (1; 2; 0) > 0.
N

As the reader can easily verify, dual results hold for decreasing and negative linear func-
tions. Finally, let us denote by (Rn )0+ the set of the dual space (Rn )0 consisting of all positive
linear functions. If we identify each f 2 (Rn )0+ with its unique positive representing vector
, we can identify (Rn )0+ with Rn+ .

15.2 Matrices
15.2.1 De nition
Matrices play a key role in the study of linear operators. Speci cally, a m n matrix is
simply a table, with m rows and n columns, of scalars
2 3
a11 a12 a1j a1n
6 a21 a22 a2j a2n 7
6 7
6 7
6 7
4 5
am1 am2 amj amn

For example, 2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9
is a 3 4 matrix, where

a11 = 1 a12 = 5 a13 = 7 a14 = 9


a21 = 3 a22 = 2 a23 = 1 a24 = 4
a31 = 12 a32 = 15 a33 = 11 a34 = 9

Notation The elements (or components or entries) of a matrix are denoted by aij and the
matrix itself is also denoted by (aij ). A matrix with m rows and n columns will be often
denoted by A .
m n

In a matrix (aij ) we have n column vectors:


2 3 2 3 2 3
a11 a12 a1n
6 7 6 7 6 7
6 7;6 7 ; :::; 6 7
4 5 4 5 4 5
am1 am2 amn
466 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

and m row vectors:

(a11 ; :::; a1n )


(a21 ; :::; a2n )

(am1 ; :::; amn )

A matrix is called square (of order n) when m = n and is called rectangular when m 6= n.

Example 653 The 3 4 matrix


2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9
is rectangular, with three row vectors

1 5 7 9 ; 3 2 1 4 ; 12 15 11 9

and four column vectors


2 3 2 3 2 3 2 3
1 5 7 9
4 3 5; 4 2 5; 4 1 5; 4 4 5
12 15 11 9

The 3 3 matrix 2 3
1 5 1
4 3 4 2 5
1 7 9
is square, with three row vectors

1 5 1 ; 3 4 2 ; 1 7 9

and three column vectors 2 3 2 3 2 3


1 5 1
4 3 5; 4 4 5; 4 2 5
1 7 9
N

Example 654 (i) The square matrix of order n obtained by writing, one next to the other,
the versors ei of Rn is called the identity (or unit) matrix and is denoted by In or, when
there is no danger of confusion, simply by I:
2 3
1 0 0
6 0 1 0 7
6 7
I=6 . . .. .. 7
4 .. .. . . 5
0 0 1
15.2. MATRICES 467

(ii) The m n matrix with all zero elements is called null and is denoted by Omn or, when
there is no danger of confusion, simply by O:
2 3
0 0 0
6 0 0 0 7
6 7
O=6 . . . .. 7
.
4 . . . .
. . 5
0 0 0
N

15.2.2 Operations on matrices


Let M (m; n) be the set of all the m n matrices. On M (m; n) we can de ne in a natural
way the operations of addition and scalar multiplication:

(i) given two matrices (aij ) and (bij ) in M (m; n), the addition (aij ) + (bij ) is de ned by
2 3 2 3 2 3
a11 a1n b11 b1n a11 + b11 a1n + b1n
6 7 6 7 6 7
6 7+6 7=6 7
4 5 4 5 4 5
am1 amn bm1 bmn am1 + bm1 amn + bmn
that is (aij ) + (bij ) = (aij + bij );
(ii) given 2 R and (aij ) 2 M (m; n), the scalar multiplication (aij ) is de ned by
2 3 2 3
a11 a1n a11 a1n
6 7 6 7
6 7=6 7
4 5 4 5
am1 amn am1 amn
that is (aij ) = ( aij ).

Example 655 We have


2 3 2 3 2 3
1 5 7 9 0 2 1 4 1 7 8 13
4 3 2 1 4 5+4 1 3 1 4 5=4 2 1 0 0 5
12 15 11 9 5 8 1 2 17 23 12 11
and 2 3 2 3
1 5 7 9 4 20 28 36
44 3 2 1 4 5 = 4 12 8 4 16 5
12 15 11 9 48 60 44 36
N

Example 656 Given a square matrix A = (aij ) of order n and two scalars and , we have
2 3
a11 + a12 a1n
6 a21 a22 + a2n 7
A+ I =6 4
7:
5
an1 an2 ann +
N
468 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

It is easy to verify that the operations of addition and scalar multiplication just introduced
on M (m; n) satisfy the properties (v1)-(v8) that in Chapter 3 we established for Rn , that
is:

(v1) A + B = B + A

(v2) (A + B) + C = A + (B + C)

(v3) A + O = A

(v4) A + ( A) = O

(v5) (A + B) = A + B

(v6) ( + ) A = A + A

(v7) 1A = A

(v8) ( A) = ( )A

Intuitively, we can say that M (m; n) is another example of a vector space. Note that
the neutral element for the addition is the null matrix.

15.2.3 A rst taxonomy


Square matrices are particularly important. We call main (or principal ) diagonal of a square
matrix the set of the elements aii on the diagonal. A square matrix is said to be:

(i) symmetric if aij = aji for every i; j = 1; 2; :::; n, i.e., when the two triangles separated
by the main diagonal are mirror images of each other;

(ii) lower triangular if all the elements above the main diagonal are zero, that is, aij = 0
for i < j;

(iii) upper triangular if all the elements below the main diagonal are zero, that is, aij = 0
for i > j;

(iv) diagonal if it is both lower and upper triangular, that is, if all the elements outside the
main diagonal are zero: aij = 0 for i 6= j.

Example 657 The matrix 2 3


1 2 1
4 2 4 0 5
1 0 9
is symmetric. The matrices
2 3 2 3 2 3
1 0 0 1 5 1 1 0 0
4 3 4 0 5, 4 0 4 2 5, 4 0 4 0 5
1 7 9 0 0 0 0 0 9

are lower triangular, upper triangular and diagonal, respectively. N


15.2. MATRICES 469

We call transpose of a matrix A 2 M (m; n), the matrix B 2 M (n; m) obtained by


interchanging the rows and the columns of A, that is,

bij = aji

for every i = 1; 2; ; n and every j = 1; 2; :::; m. The transpose of A is denoted by AT .

Example 658 We have:


2 3 2 3
1 5 7 1 3 12
A=4 3 2 1 5 and AT = 4 5 2 15 5
12 15 11 7 1 11

as well as
2 3
1 3
1 0 7 4
A= and T
A = 0 5 5
3 5 1
7 1

Note that
T
AT =A

so the \transpose of the transpose" of a matrix is the matrix itself. In particular, it is easy
to see that a square matrix A is symmetric if and only if AT = A. In this case, transposition
has no e ect. Finally, in terms of operations we have

( A)T = AT and (A + B)T = AT + B T (15.6)

for every 2 R and every A; B 2 M (m; n).

A row vector x = (x1 ; :::; xn ) 2 Rn can be regarded as a 1 n matrix, so we can identify


Rn with M (1; n). According to this identi cation, the transpose xT of x is the column vector
2 3
x1
6 7
6 7
4 5
xn

that is, xT 2 M (n; 1). This allows us to identify Rn also with M (n; 1).
In what follows we will often identify the vectors of Rn with matrices. Sometimes it
will convenient to regard them as row vectors, that is, as elements of M (1; n), sometimes
as column vectors, that is, as elements of M (n; 1). In any case, one should not forget that
vectors are elements of Rn , identi cations are holograms.
470 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

15.2.4 Product of matrices


It is possible to de ne the product of two matrices A and B under suitable conditions on
their dimensions. We rst present the special case of the product of a matrix with a vector.
Let A = (aij ) 2 M (m; n) and x 2 Rn . The choice of the dimensions of A and x is not
arbitrary: the product of the type AxT between matrix A and the column vector xT requires
that the number of rows of x be equal to the number of columns of A. If this is the case,
the product AxT is de ned by
2 n 3
X
6 a1i xi 7
6 7
2 32 3 6 i=1 7 2 1 3
a11 a12 a1n x1 6 n 7 a x
6 X 7
6 a21 a22 a2n 7 6 7 6 a2i xi 7 6 2 7
AxT = 6 7 6 x2 7 = 6 7=6 a x 7
4 54 5 6 i=1 7 4 5
6 7
am1 am2 amn xn 6 7 m
a x
6 n 7
6 X 7
4 a x 5 mi i
i=1

where a1 , a2 , ..., am are the rows of A and

a1 x; a2 x; :::; am x

are the inner products between the rows of A and the vector x. In particular, AxT 2 M (m; 1).

It is thus evident why the dimension of the vector x must be equal to the number of
columns of A: in multiplying A with x, the components of AxT are the inner products
between the rows of A and the vector x. But, inner products are possible only between
vectors of the same dimension.

Notation To ease notation, in what follows we will just write Ax instead of AxT .

Example 659 Let A 2 M (3; 4) and x 2 R4 be given by


2 3
3 2 0 1
A = 4 0 10 2 2 5 and x = (1; 2; 3; 4)
4 0 2 3

It is possible to compute the product Ax:


2 3
2 3 1
3 2 0 1 6
2 7
4
Ax = 0 10 2 2 56
4 3 5
7
4 0 2 3
4
2 3 2 3
3 1 + ( 2) 2 + 0 3 + ( 1) ( 4) 3
= 4 0 1 + 10 2 + 2 3 + ( 2) ( 4) 5 = 4 34 5
4 1 + 0 2 + ( 2) 3 + 3 ( 4) 14

However, it is not possible to take the product xA: the number of rows of A (i.e., 3) is not
equal to the number of columns of x (i.e., 1). N
15.2. MATRICES 471

In a similar way, we de ne the product of two matrices A and B by suitably multiplying


the rows of A and the columns of B. The prerequisite on the dimensions of the matrices is
that the number of columns of A is equal to the number of rows of B. In other words, the
product AB is possible when A 2 M (m; n) and B 2 M (n; q). If we denote by a1 , a2 ,..., am
the rows of A and by b1 , b2 ,..., bq the columns of B, we then have
2 3 2 1 1 3
a1 a b a1 b2 a1 bq
6 a2 7 1 2 6 2 1 a2 b2 a2 bq 7
AB = 6 7 b ; b ; :::; bq = 6 a b 7
4 5 4 5
am am b1 am b2 m
a b q

The elements abij of the product matrix AB are, therefore,


n
X
abij = ai bj = aik bkj
k=1

for i = 1; :::; m and j = 1; :::; q.

The product matrix AB is of type m q: so, it has the same number of rows as A and
the same number of columns as B. Note that it is possible to take the product AB of the
matrices A and B if and only if the product B T AT of the transpose matrices B T and
m n n q q n
T
A is well-de ned. Momentarily it will be seen that, indeed, (AB)T = B T AT .
n m

This de nition of product between matrices nds its justi cation in Proposition 677,
which we discuss later in the chapter. For the moment, it is important to understand the
\mechanics" of the de nition. To this end, we proceed with some examples.

Example 660 Let A 2 M (2; 4) and B 2 M (4; 3) be given by


2 3
0 2 3
3 2 8 6 6 5 6 1 7
A= and B=6
4 12
7
13 0 4 9 7 0 5
1 9 11

It is possible to compute the product AB:

3 0 + ( 2) 5 + 8 12 + ( 6) ( 1) 3 2 + ( 2) ( 6) + 8 7 + ( 6) 9
AB =
13 0 + 0 5 + ( 4) 12 + 9 ( 1) 13 2 + 0 ( 6) + ( 4) 7 + 9 9
3 3 + ( 2) 1 + 8 0 + ( 6) (11) 92 20 59
=
13 3 + 0 1 + ( 4) 0 + 9 (11) 57 79 138

However, it is not possible to take the product BA: the number of rows of A (i.e., 2) is not
equal to the number of columns of B (i.e., 3). As we just remarked, it is possible, though,
to take the product B T AT ; indeed, the number of columns of B T (i.e., 4) is equal to the
number of rows of AT . N
472 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Example 661 Consider the matrices


2 3
1 2 1 0
1 3 1
A = and B =4 2 5 2 2 5
2 3 0 1 4 3 4
0 1 3 2

The product matrix AB is 2 4. In this regard, note the useful mnemonic rule (2 4) =
(2 3)(3 4). We have:

2 3
1 2 1 0
1 3 1 4 2 5 2 2 5
AB =
0 1 4
0 1 3 2
1 1+3 2+1 0 1 2+3 5+1 1 1 1+3 2+1 3 1 0+3 2+1 2
=
0 1+1 2+4 0 0 2+1 5+4 1 0 1+1 2+4 3 0 0+1 2+4 2
7 18 10 8
=
2 9 14 10

N
The product of matrices has the following properties, as the reader can verify.

Proposition 662 Let A; B and C be any three matrices for which it is possible to take the
products indicated below. Then

(i) (AB)C = A(BC);

(ii) A(B + C) = AB + AC;

(iii) (A + B)C = AC + BC;

(iv) AB = ( A)B = A( B) for every 2 R;

(v) (AB)T = B T AT .

Among the properties of the product, commutativity is missing. Indeed, the product of
matrices does not satisfy this property: if both products AB and BA are well-de ned, in
general we have AB 6= BA. The next example will illustrate this notable failure.
When AB = BA, we say that the two matrices commute. Since (AB)T = B T AT , the
matrices A and B commute if and only if their transposes commute.

Example 663 Let A and B be given by


2 3 2 3
1 0 3 2 1 4
A=4 2 1 0 5 and B=4 0 3 1 5
1 4 6 4 2 4
15.3. LINEAR OPERATORS 473

Since A and B are square matrices, both BA and AB are well-de ned 3 3 matrices. We
have:
2 32 3
2 1 4 1 0 3
4
BA = 0 3 1 5 4 2 1 0 5
4 2 4 1 4 6
2 3
2 1+1 2+4 1 2 0+1 1+4 4 2 3+1 0+4 6
=4 0 1+3 2+1 1 0 0+3 1+1 4 0 3+3 0+1 6 5
4 1+2 2+4 1 4 0+2 1+4 4 4 3+2 0+4 6
2 3
8 17 30
=4 7 7 6 5
12 18 36

while
2 32 3
1 0 3 2 1 4
AB = 4 2 1 0 54 0 3 1 5
1 4 6 4 2 4
2 3
1 2+0 0+3 4 1 1+0 3+3 2 1 4+0 1+3 4
=4 2 2+1 0+0 4 2 1+1 3+0 2 2 4+1 1+0 4 5
1 2+4 0+6 4 1 1+4 3+6 2 1 4+4 1+6 4
2 3
14 7 16
= 4 4 5 9 5
26 25 32

So AB 6= BA, the product is not commutative. N

15.3 Linear operators


15.3.1 De nition and rst properties
Next we introduce linear operators.

De nition 664 An operator T : Rn ! Rm is linear if

T ( x + y) = T (x) + T (y) (15.7)

for every x; y 2 Rn and every ; 2 R.

The notion of linear operator generalizes that of linear function (De nition 640), which
is the special case m = 1, that is, Rm = R.
Linear operators are the operators which preserve the operations of addition and scalar
multiplication, thus generalizing the analogous result that we established for linear functions
(Proposition 644). Though natural, it is a signi cant generalization: here T (x) is a vector
of Rm , not a scalar (unless m = 1).

Proposition 665 An operator T : Rn ! Rm is linear if and only if


474 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

(i) T (x + y) = T (x) + T (y) for all x; y 2 Rn ;


(ii) T ( x) = T (x) for all x 2 Rn and 2 R.

We omit the proof because it is similar to that of Proposition 644.

Example 666 Given a matrix A 2 M (m; n), de ne the operator T : Rn ! Rm by


T (x) = Ax 8x 2 Rn (15.8)
It is easy to see that T is linear. Soon we will see that all linear operators T : Rn ! Rm
actually have such a form (Theorem 672).
Note that this operator can be written in the form
T = (T1 ; :::; Tm ) : Rn ! Rm
introduced in Section 13.7 by setting, for every i = 1; :::; m,
Ti (x) = ai x
where ai is the i-th row vector of the matrix A. N

Example 667 (i) The null or zero operator 0 : Rn ! Rm de ned by


0 (x) = 0 8x 2 Rn
is linear.

(ii) The identity operator I : Rn ! Rn de ned by


I (x) = x 8x 2 Rn
is linear. N

When n = m, we have the important special case of operators T : Rn ! Rn that have


the same domain and codomain.

Example 668 Let A = (aij ) be an n n square matrix. De ne the operator T : Rn ! Rn


by (15.8), i.e.,
T (x) = Ax 8x 2 Rn
Now, this operator has the same domain and codomain. N

We conclude this rst section with some basic properties of linear operators that gen-
eralize those stated in Proposition 645 for linear functions (the easy proof is left to the
reader).

Proposition 669 Let T : Rn ! Rm be a linear operator. We have


T (0) = 0
and !
k
X k
X
i
T ix = iT xi (15.9)
i=1 i=1
k
for every set of vectors xi i=1
in Rn and every set of scalars f i gki=1 .
15.3. LINEAR OPERATORS 475

As we have already seen for linear functions, property (15.9) has the important conse-
quence that, once we know the values taken by a linear operator T on the elements of a basis
of Rn , we can determine the values of T for each vector of Rn .

The operations of addition and scalar multiplication for operators are de ned, as usual
(cf. Section 6.3.2), pointwise: given two operators S; T : Rn ! Rm , linear or not, and a
scalar 2 R, de ne S + T : Rn ! Rm and T : Rn ! Rm by

(S + T ) (x) = S (x) + T (x) 8x 2 Rn

and
( T ) (x) = T (x) 8x 2 Rn
Denote by
L (Rn ; Rm )
the space of all linear operators T : Rn ! Rm . In the case of linear functions, i.e., m = 1,
the space L (Rn ; R) reduces to the dual space (Rn )0 that we studied before. It is immediate
to check that addition and scalar multiplication preserve linearity:

(i) if S; T 2 L (Rn ; Rm ), then S + T 2 L (Rn ; Rm );

(ii) if T 2 L (Rn ; Rm ) and 2 R, then T 2 L (Rn ; Rm ).

The space L (Rn ; Rm ) is thus closed under these two operations, which are also easily seen
to satisfy the \usual" properties (v1)-(v8). Again, this means that L (Rn ; Rm ) is, intuitively,
another example of a vector space. To ease notation, in the special case n = m, i.e., for
linear operators T : Rn ! Rn having the same domain and codomain, we just write L (Rn )
in place of L (Rn ; Rn ).

Addition and scalar multiplication are, by now, routine operations. The next notion is,
instead, peculiar to operators.

De nition 670 Given two linear operators T : Rn ! Rm and S : Rm ! Rq , their product


is the function ST : Rn ! Rq de ned by

(ST ) (x) = S (T (x))

for every x 2 Rn .

In other words, the product operator ST is the composite function S T . If the operators
S and T are linear, also the product ST is so. Indeed:

(ST ) ( x + y) = S (T ( x + y)) = S ( T (x) + T (y))


= S (T (x)) + S (T (y)) = (ST ) (x) + (ST ) (y)

for every x; y 2 Rn and every ; 2 R. The product of two linear operators is, therefore,
still a linear operator.
476 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

As Proposition 677 will make clear, in general the product is not commutative: when
both products ST and T S are de ned, in general we have ST 6= T S. Hence, when one writes
ST and T S, the order with which the two operators appear is important.

Last, but not least, we state the version for operators of the remarkable Theorem 646 on
continuity.

Proposition 671 Linear operators are continuous.

We omit the proof because, like Theorem 646, this result is a special case of Theorem
833.

15.3.2 Representation

In this section we study more in detail linear operators T : Rn ! Rm . We start by es-


tablishing a representation theorem for them. In Riesz's Theorem we saw that a function
f : Rn ! R is linear if and only if there exists a vector 2 Rn such that f (x) = x for
n
every x 2 R . The next result generalizes Riesz's Theorem to linear operators.

Theorem 672 An operator T : Rn ! Rm is linear if and only if there exists a (unique)


matrix A such that
m n

T (x) = Ax (15.10)

for every x 2 Rn .

The matrix A is called matrix associated to the operator T (or also representative matrix
of the operator T ). In particular, if we identify each linear operator with its associated
matrix, we can identify the space L (Rn ; Rm ) with the space of matrices M (m; n).
Matrices allow us, therefore, to represent operators in the form (15.10), which is of great
importance both theoretically and operationally. This is why matrices are so important:
though the fundamental notion is that of operator, thanks to the representation (15.10)
matrices become a most useful auxiliary notion that will accompany us in the rest of the
book.

Proof \If". This part is contained, essentially, in Example 666. \Only if". Let T be a linear
operator. Set
" #
A = T e1 ; T e2 ; :::; T (en ) (15.11)
m n m 1
m 1 m 1
15.3. LINEAR OPERATORS 477

that is, A is the m n matrix whosePn columns are the column vectors T ei for i = 1; :::; n.
We can write every x 2 Rn as x = ni=1 xi ei . Therefore, for every x 2 Rn ,
n
! n
X X
i
T (x) = T xi e = xi T ei
i=1 i=1
2 3 2 3 2 3
a11 a12 a1n
6 a21 7 6 a22 7 6 a2n 7
6 7 6 7 6 7
= x1 6 .. 7 + x2 6 .. 7+ + xn 6 .. 7
4 . 5 4 . 5 4 . 5
am1 am2 amn
2 3 2 3
a11 x1 + a12 x2 + + a1n xn a1 x
6 a21 x1 + a22 x2 + 7 6
+ a2n xn 7 6 a2 x 7
6 7
=6 .. 7=6 .. 7 = Ax
4 . 5 4 . 5
am1 x1 + am2 x2 + + amn xn am x

where a1 , a2 , ..., am are the rows of A.


As to uniqueness, let B be an m n matrix for which (15.10) holds. By considering the
vectors ei we have

(a11 ; a21 ; :::; am1 ) = T e1 = Be1 = (b11 ; b21 ; :::; bm1 )


(a12 ; a22 ; :::; am2 ) = T e2 = Be2 = (b12 ; b22 ; :::; bm2 )

(a1n ; a2n ; :::; amn ) = T (en ) = Ben = (b1n ; b2n ; :::; bmn )

Therefore, A = B.

Example 673 De ne T : R3 ! R3 by

T (x) = (0; x2 ; x3 ) 8x 2 R3

In other words, T projects every vector in R3 on the plane x 2 R3 : x1 = 0 . For example,


T (2; 3; 5) = (0; 3; 5). We have

T e1 = (0; 0; 0) ; T e2 = (0; 1; 0) ; T e3 = (0; 0; 1)

and therefore 2 3
" # 0 0 0
A = T e1 ; T e2 ; T e3 =4 0 1 0 5
3 3 3 1 3 1 3 1 0 0 1
Hence, T (x) = Ax for every x 2 R3 . N

Example 674 De ne T : R3 ! R2 by

T (x) = (x1 x3 ; x1 + x2 + x3 ) 8x 2 R3
478 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

For example, T (2; 3; 5) = ( 3; 10). We have

T e1 = (1; 1) ; T e2 = (0; 1) ; T e3 = ( 1; 1)

and therefore " #


1 0 1
A = T e1 ; T e2 ; T e3 =
2 3 2 1 2 1 2 1
1 1 1

So, we can write T (x) = Ax for every x 2 R3 . N

15.3.3 Matrices and operations


At this point it is natural to ask what are the matrix representations of the operations on
operators. For addition and scalar multiplication we have the following simple result (the
easy proof is left to the reader).

Proposition 675 Let S; T : Rn ! Rm be two linear operators and let 2 R. Let A and B
be the two m n matrices associated to S and T , respectively. Then,

(i) A + B is the matrix associated to the operator S + T ;

(ii) A is the matrix associated to the operator S.

Example 676 Let S; T : R3 ! R3 be linear operators de ned, for all x 2 R3 , by

S (x) = (0; x2 ; x3 ) and T (x) = (2x1 x3 ; x1 + x2 + 3x3 ; 2x1 x2 )

In Example 673 we saw that 2 3


0 0 0
A=4 0 1 0 5
0 0 1
is the matrix associated to the operator S. By proceeding in the same way, it is easy to
check that 2 3
2 0 1
B=4 1 1 3 5
2 1 0
is the matrix associated to the operator T . By Proposition 675,
2 3
2 0 1
A+B =4 1 2 3 5
2 1 1

is then the matrix associated to the operator S +T . Moreover, if we take for example = 10,
by Proposition 675, 2 3
0 0 0
4
A = 0 10 0 5
0 0 10
is the then matrix associated to the operator S. N
15.3. LINEAR OPERATORS 479

We move to the more interesting case of the product of operators.

Proposition 677 Let S : Rm ! Rq and T : Rn ! Rm be two linear operators with associ-


ated matrices, respectively,

A = (aij ) and B = (bij )


q m m n

Then, the matrix associated to the product operator ST : Rn ! Rq is the product matrix

AB = (abij )
q n

The product matrix AB is, therefore, the matrix representation of the product operator
ST . This motivates the notion of product of matrices that, when it was introduced earlier
in the chapter, might have seemed quite arti cial.

n q m
Proof Let ei i=1
, e~i i=1
, and ei i=1
be respectively the standard bases of Rn , Rq , and
Rm . We have

T ej = Bej = (b1j ; b2j ; :::; bmj )


m
X
= b1j (1; 0; :::; 0) + b2j (0; 1; 0; :::; 0) + + bmj (0; 0; :::; 1) = bkj ek
k=1

In the same way,


q
X
S(ek ) = Aek = (a1k ; :::; aqk ) = aik e~i
i=1

We can therefore write


m
! m
X X
j j k
(ST ) e =S T e =S bkj e = bkj S ek
k=1 k=1
m q
! q m
!
X X X X
i
= bkj aik e~ = aik bkj e~i
k=1 i=1 i=1 k=1

On the other hand, if C is the q n matrix associated to the operator ST , then


q
X
(ST ) ej = Cej = (c1j ; :::; cqj ) = cij e~i
i=1

Pm
Therefore, cij = k=1 aik bkj and we conclude that C = AB.

As we saw in Section 15.2.4, the product of matrices is in general not commutative: this,
indeed, re ects the lack of commutativity of the product of linear operators.
480 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

15.4 Rank
15.4.1 Linear operators
The kernel, denoted ker T , of an operator T : Rn ! Rm is the set

ker T = fx 2 Rn : T (x) = 0g (15.12)

That is, the kernel of T is the preimage of 0 under T , i.e., ker T = T 1 (0). The kernel is
thus the set of the points at which the operator takes on a null value (i.e., the zero vector 0
of Rm ). When T is linear, we always have 0 2 ker T because T (0) = 0 by Proposition 669.

Example 678 De ne T : R2 ! R2 by

T (x) = (x1 x2 ; x1 + x2 ) (15.13)

This operator is easily seen to be linear. We have ker T = f0g, i.e., the zero vector is the
only vector where T takes on value 0. Indeed, a vector x = (x1 ; x2 ) of the plane belongs to
ker T when both the di erence and the sum of its components is null, a property that only
the vector 0 satis es. N

Another important set is the image (or range) of T , which is de ned in the usual way as

Im T = fT (x) : x 2 Rn g Rm (15.14)

The image is, therefore, the set of the vectors of Rm that are \reached" from Rn through
the operator T .

For linear operators the above sets turn out to be vector subspaces, the kernel of the
domain Rn and the image of the codomain Rm .

Lemma 679 If T : Rn ! Rm is linear, then ker T is a vector subspace of Rn and Im T is a


vector subspace of Rm .

Proof We show the result for ker T , leaving Im T to the reader. Let x; x0 2 ker T , i.e.,
T (x) = 0 and T (x0 ) = 0. For every ; 2 R, we have

T x + x0 = T (x) + T x0 = 0 + 0 = 0

Thus, x + x0 2 ker T and this proves that ker T is a vector subspace of Rn .

These vector subspaces are important when dealing with the properties of injectivity
and surjectivity of linear operators. In particular, by de nition the operator T is surjective
when Im T = Rm , that is, when the subspace Im T coincides with the entire space Rm . As
to injectivity, by exploiting the linearity of T we have the following simple characterization
through a null kernel.

Lemma 680 A linear operator T : Rn ! Rm is injective if and only if ker T = f0g.


15.4. RANK 481

Thus, the linear operator (15.13) is injective.

Proof \If". Suppose that ker T = f0g. We want to show that T is injective. Let x; y 2 Rn
with x 6= y. Since x y 6= 0, the hypothesis ker T = f0g implies x y 2 = ker T , i.e.,
T (x y) 6= 0. By the linearity of T , we then have T (x) 6= T (y).
\Only if". Let T : Rn ! Rm be an injective linear operator. We want to show that
ker T = f0g. By Proposition 669, T (0) = 0 and so 0 2 ker T . Let 0 6= x 2 Rn . By
injectivity, T (x) 6= T (0) = 0 and so x 2
= ker T . We conclude that ker T = f0g.

We can now state the important Rank-Nullity Theorem, which says that the dimension
n of the Euclidean space Rn is the sum of the dimensions of the two subspaces ker T and
Im T determined by a linear operator T . To this end, we give a name to such dimensions.

De nition 681 The rank (T ) of a linear operator T : Rn ! Rm is the dimension of Im T ,


while the nullity (T ) is the dimension of ker T .

Using this terminology, we can now state and prove the result.

Theorem 682 (Rank-Nullity) Given a linear operator T : Rn ! Rm , we have

(T ) + (T ) = n (15.15)
k
Proof Setting (T ) = k and (T ) = h, let y i i=1
be a basis of the vector subspace Im T
h k
of Rm and xi i=1
a basis of the vector subspace ker T of Rn . Since y i i=1 Im T , by
k n i
de nition there exist k vectors fxi gi=1 in R such that T (xi ) = y for every i = 1; :::; k. Set
n o
E = x1 ; :::; xk ; x1 ; :::; xh

To prove the theorem it is su cient to show that E is a basis of Rn . Indeed, in this case E
consists of n vectors and therefore k + h = n.
First of all, we show that the set E is linearly independent. Let f 1 ; :::; k ; 1 ; :::; h g be
scalars such that
k
X h
X
i
i xi + ix =0 (15.16)
i=1 i=1

Since T (0) = 0,4 we have


k h
! k
! h
!
X X X X
i i i i
T ix + ix =T ix +T ix =0
i=1 i=1 i=1 i=1

h Ph
On the other hand, since xi i=1
is a basis of ker T , we have T i=1 ix
i = 0. Therefore,

k
! k k
X X X
i i i
T ix = iT x = iy =0 (15.17)
i=1 i=1 i=1
4
In this proof we use two di erent zero vectors 0: the zero vector 0Rm in Rm and the zero vector 0Rn in
n
R . For simplicity, we omit subscripts as no confusion should arise.
482 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

k
Being a basis, y i i=1 is a linearly independent set, so (15.17) implies i = 0 for every
P
i = 1; :::; k. Therefore, (15.16) reduces to hi=1 i xi = 0, which implies i = 0 for every
h
i = 1; :::; h because xi i=1 , as a basis, is a linearly independent set. Thus, we conclude that
the set E is linearly independent.
It remains to show that span E = Rn . Let x 2 Rn and consider its image T (x). By
k
de nition, T (x) 2 Im T and therefore, since y i i=1 is a basis of Im T , there exists a set
P
f i gki=1 R such that T (x) = ki=1 i y i . Setting y i = T xi for every i = 1; :::; k, one
obtains !
X k Xk
i i
T (x) = iT x = T ix
i=1 i=1
Pk i
Pk i
Therefore, T x i=1 ix = 0, and so x i=1 ix 2 ker T . On the other hand,
h
xi is a basis of ker T , and therefore there exists a set f i ghi=1 of scalars such that
i=1
P k i
Ph i
Pk i
Ph i
x i=1 i x = i=1 i x . In conclusion, x = i=1 i x + i=1 i x , which shows that
x 2 span E, as desired.

Example 683 Consider the linear function T : R2 ! R given by T (x) = x1 x2 . Clearly,


Im T = R and so (T ) = 1. By the Rank-Nullity Theorem, (T ) = 2 1 = 1. Indeed,
ker T = x 2 R2 : x1 = x2 is the 45 degree line in the plane and so has dimension 1. N

To appreciate the importance of this result, next we present some interesting conse-
quences that it has.

Corollary 684 Let T : Rn ! Rm be linear.

(i) If T is injective, then n m.


(ii) If T is surjective, then n m.

Proof (i) Let T be injective. By Lemma 680, ker T = f0g. Since Im T is a vector subspace
of Rm , we have (T ) = dim (Im T ) dim Rm = m. Therefore, (15.15) reduces to

n = dim Rn = (T ) + dim (0) = (T ) dim Rm = m

(ii) Let T be surjective, i.e., Im T = Rm . Since (T ) 0, (15.15) yields

n = dim Rn = (T ) + (T ) = dim Rm + (T ) dim Rm = m

as claimed.

For a generic function, injectivity and surjectivity are altogether distinct and independent
properties: for instance, the function f : R ! R given by f (x) = arctan x is injective but not
surjective since Im f = ( =2; =2), as seen in Section 6.5.3, while the function f : R ! R
given by f (x) = x sin x is surjective but not injective (as its graph vividly shows). The next
important result, a remarkable consequence of the Rank-Nullity Theorem, shows that for
linear operators T : Rn ! Rn with the same domain and codomain the two properties turn
out to be, instead, equivalent.5
5
In Section 14.1.2 we called self-maps the operators T : Rn ! Rn .
15.4. RANK 483

Corollary 685 A linear operator T : Rn ! Rn is injective if and only if it is surjective.

The injective linear operator (15.13) is thus also surjective, so it is bijective.

Proof \If". Let T be surjective, i.e., Im T = Rn . By (15.15),

n = (T ) + (T ) = n + (T )

Hence, (T ) = 0. In turn, this implies that ker T = f0g. By Lemma 680, T is injective.
\Only if". Let T be injective. By Lemma 680, ker T = f0g. By (15.15),

n = (T ) + 0 = (T )

Hence, (T ) = n. As Im T Rn , in turn this implies that Im T = Rn . We conclude that T


is surjective.

Remarkably, for a linear operator T : Rn ! Rn the following properties are thus equiva-
lent:

(i) T is bijective;

(ii) T is injective, i.e., ker T = f0g;

(iii) T is surjective, i.e., Im T = Rn .

An equivalent way to state this equivalence is to say that the following conditions are
equivalent:

(i) T is bijective;

(ii) (T ) = 0;

(iii) (T ) = n.

15.4.2 Rank of matrices


The rank of a matrix is one of the central notions of linear algebra.

De nition 686 The rank of a matrix A, denoted by (A), is the maximum number of its
linearly independent columns.

Example 687 Let 2 3


3 6 18 2
6 1 2 6 47
A=6
40
7
1 3 65
2 1 3 8
Since the third column can be obtained by multiplying the second column by 3, the set of all
four columns is linearly dependent. Therefore, (A) < 4. Instead, it is easy to verify that the
rst, second, and fourth columns are linearly independent, like the rst, third, and fourth
columns. Thus, (A) = 3. Note that there are two di erent sets of linearly independent
columns, which have the same cardinality. N
484 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

N.B. To establish whether k vectors x1 ; x2 ; :::; xk 2 Rn are linearly independent (with k n,


otherwise the answer is certainly negative) one can construct the n k matrix that has these
vectors as columns. They are linearly independent if and only if the rank of this matrix is
k. O

Let A be the matrix associated to a linear operator T . Since the vector subspace Im T
is generated by the column vectors of A,6 we have (T ) (A) (why?). The next result
shows that, actually, equality holds: the notions of rank for operators and for matrices are
consistent. In other words, the dimension of the image of a linear operator is equal to the
maximum number of linearly independent columns of the matrix associated to it.

Proposition 688 Let A 2 M (m; n) be the matrix associated to a linear operator T : Rn !


Rm . Then (A) = (T ).

Proof Denote (A) = k n. By (15.11), we have A = T e1 ; T e2 ; :::; T (en ) . Without


loss of generality, suppose that the k linearly independent columns are T e1 , ..., T ek .
The (possible) remaining columns T ek+1 , ..., T (en ) can therefore be expressed as their
linear combination, so that
n o
span T e1 ; T e2 ; :::; T (en ) = span T e1 ; T e2 ; :::; T ek

LetP de nition, there exists x 2 Rn such that T (x) = y. Therefore, y = T (x) =


y 2 Im T . ByP
n i n i
T i=1 xi e = i=1 xi T e . It follows that
n o
Im T = span T e1 ; T e2 ; :::; T (en ) = span T e1 ; T e2 ; :::; T ek

which proves that the set T e1 ; T e2 ; :::; T ek is basis of Im T . Therefore, (T ) =


dim (Im T ) = k.

Thanks to the Rank-Nullity Theorem, this proposition has the following corollary that
shows that the linear independence of the columns is the matrix counterpart for injectivity.

Corollary 689 A linear operator T : Rn ! Rm , with associated matrix A 2 M (m; n), is


injective if and only if the columns of A are linearly independent.

Proof By Lemma 680, T is injective if and only if (T ) = 0. By the Rank-Nullity Theorem,


this happens if and only if (T ) = n, i.e., if and only if (A) = n (by Proposition 688).

So far we have considered the linear independence of the columns of A. The connection
with the linear independence of the rows of A is, however, very tight as the next important
result shows. In reading it, note that the rank of the transpose matrix AT is the maximum
number of linearly independent rows of A.

Theorem 690 For every matrix A, the maximum numbers of its linearly independent rows
and columns coincide, i.e., (A) = AT .
6 Pn Pn
Indeed, recall that the i-th column of A is T ei and therefore T (x) = T i=1 xi ei = i=1 xi T ei .
This shows that the image T (x) is a linear combination of the columns of A.
15.4. RANK 485

Proof Let A = (aij ) 2 M (m; n). In the proof we denote the i-th row by Ri and the j-th
column by Cj . We have to prove that the subspace of Rn generated by the rows of A, called
row space of A, has the same dimension of the subspace of Rm generated by the columns
of A, called column space of A. Let r be the dimension of the row space of A, that is,
r = AT , and let fx1 ; x2 ; :::; xr g Rn be a basis of this space, where

xi = xi1 ; xi2 ; :::; xin 8i = 1; 2; :::; r

Each row Ri of A can be written in a unique way as a linear combination of fx1 ; x2 ; :::; xr g,
that is, there exists a vector of r coe cients (w1i ; w2i ; :::; wri ) such that

Ri = w1i x1 + w2i x2 + + wri xr 8i = 1; 2; :::; m (15.18)

Let us concentrate now on the rst column of A, i.e., C1 = (a11 ; a21 ; :::am1 ). The rst
component a11 of C1 is equal to the rst component of R1 , the second component a21 of C1
is equal to the rst component of R2 , and so on until the m-th component am1 of C1 which
is equal to the rst component of Rm . Thanks to (15.18), we have

a11 = w11 x11 + w21 x21 + + wr1 xr1


a21 = w12 x11 + w22 x21 + + wr2 xr1

am1 = w1m x11 + w2m x21 + + wrm xr1

that is,
2 3 2 1 3 2 3 2 3
a11 w1 w21 wr1
6 a21 7 6 2 7 6 2 7 6 wr2 7
C1 = 6 7 = x11 6 w1 7 + x21 6 w2 7 + + xr1 6 7
4 5 4 5 4 5 4 5
am1 w1m m
w2 m
wr

The column C1 of A can, therefore, be written as linear combination of the vectors w1 ; w2 ; :::; wr ,
where 2 1 3 2 1 3 2 1 3
w1 w2 wr
6 2
w1 77 6 2 7 6 wr2 7
w1 = 6 2 6 w2 7 ; ; wr = 6 7
4 5; w = 4 5 4 5
w1m w2m wrm
In a similar way it is possible to verify that all the n columns of A can be written as linear
combinations of w1 ; w2 ; :::; wr . Therefore, the column space of A is generated by the r vectors
w1 ; w2 ; :::; wr of Rm , which implies that its dimension (A) is lower than or equal to r. That
is,
(A) r = (AT )

By interchanging the rows and the columns and by repeating the same reasoning, we get

r = (AT ) (A)

This concludes the proof.


486 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Example 691 Consider the rows of the matrix


2 3
3 6 18
A=4 1 2 6 5
0 1 3

Since the rst row is obtained by multiplying the second one by 3, the set of all the three
rows is linearly dependent. Therefore, AT < 3. Instead, the two rows (3; 6; 18) and
(0; 1; 3) are linearly independent, like the rows (1; 2; 6) and (0; 1; 3). Therefore, AT = 2.
N

The maximum sets of linearly independent rows or columns can be di erent: in the
matrix of the last example we have two di erent sets, both for the rows and for the columns.
Yet, they have the same cardinality because (A) = AT . It is a remarkable result that, in
view of Corollary 685, shows that for a linear operator T : Rn ! Rn the following conditions
are equivalent:

(i) T is injective;

(ii) T is surjective;

(iii) the columns of A are linearly independent, that is, (A) = n;

(iv) the rows of A are linearly independent, that is, AT = n.

The equivalence of these conditions is one of the deepest results of linear algebra.

O.R. Sometimes one calls rank by rows the maximum number of linearly independent rows,
and rank by columns what we have de ned as the rank, that is, the maximum number of
linearly independent columns. According to these de nitions, Theorem 690 says that the
rank by columns always coincides with the rank by rows. The rank is their common value.H

15.4.3 Properties
From Theorem 690 it follows that, if A 2 M (m; n), we have

(A) min fm; ng (15.19)

If it happens that (A) = min fm; ng, the matrix A is said to be of full (or maximum) rank.
Indeed, the rank cannot assume a higher value.

Note that the rank of a matrix does not change if one permutes the places of two columns.
So, without loss of generality, we can assume that, for a matrix A of rank r, the rst r
columns are linearly independent. This useful convention will be used several times in the
proofs below.

The next result gathers some properties of the rank.

Proposition 692 Let A; B 2 M (m; n). Then


15.4. RANK 487

(i) (A + B) (A) + (B) and ( A) = (A) for every 6= 0;


(ii) (A) = (CA) = (AD) = (CAD) if C and D are square matrices of full rank;7
(iii) (A) = AT A .
Point (i) shows the behavior of the rank with respect to the matrix operations of addition
and scalar multiplication. Points (ii) and (iii) are interesting properties of invariance of the
rank with respect to the product of matrices. The square matrix AT A is important in
applications and is called the Gram matrix (for instance, we will meet it in connection with
the least squares method).

Proof (i) Let r and r0 be the ranks of A and of B: there are r and r0 linearly independent
columns in A and in B, respectively. If r + r0 n the result is trivial because the number of
columns of A + B is n and there cannot be more than n linearly independent columns.
Let therefore r + r0 < n. We denote by as and bs , with s = 1; : : : ; n, the generic columns
of the two matrices, so that the sth column of A + B is as + bs . We can always assume that
the r linearly independent columns of A are the rst ones { i.e., a1 ; : : : ; ar { and that the r0
0
linearly independent columns of B are the last ones { i.e., bn r +1 ; : : : ; bn . In this way the
n (r + r0 ) central columns
n of A + B (i.e., the as +obs with s = r + 1; : : : ; n r0 ) are certainly
0
linear combinations of a1 ; ; ar ; bn r +1 ; : : : ; bn because the as can be written as linear
n 0
o
combinations of a1 ; ; ar and the bs of bn r +1 ; : : : ; bn . It follows that the number of
linearly independent columns of A + B cannot exceed r + r0 . We leave to the reader the
proof of the rest of the statement.

(ii) Let us prove (A) = (AD), leaving to the reader the proof of (A) = (CA) (the
equality (A) = (CAD) can be obtained immediately from the other two ones). If A = O,
the result is trivially true. Let therefore A 6= O and let r be the rank of A; there are therefore
r linearly independent columns: let us call them a1 ; a2 ; : : : ; ar since we can always suppose
that they are the rst r ones; the others, ar+1 ; ar+2 ; ; an are linear combinations of the
rst ones. Let us prove, now, that the columns of AD are linear combinations of the columns
of A. To this end, let A = (aij ) and D = (dij ). Moreover, let i for i = 1; 2; :::; m and aj for
j = 1; 2; :::; n be the rows and the columns of A, and dj for j = 1; 2; :::; n be the columns of
D. Then

2 1
3 2 1
3
d1 1 d2 1 dn
6 2 7 1 2 6 2 d1 2 d2 2 dn 7
AD = 6
4
7 d jd j
5 jdn = 6
4
7
5
m m d1 m d2 m d n

The rst column of AD, denoted by (ad)1 , is


2 1 1 3 2 3
d a11 d11 + a12 d21 + ::: + a1n dn1
6 2 d1 7 6 7
(ad)1 = 6 7 = 6 21 d11 + a22 d21 + ::: + a2n dn1
a 7 = d11 a1 + d21 a2 + ::: + dn1 an
4 5 4 5
m d1 am1 d11 + am2 d21 + ::: + amn dn1
7
Of order m and n, respectively, so the products CA and AD are well de ned.
488 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

The rst column of AD is, therefore, a linear combination of the columns of A. Analogously,
it is possible to prove that the second column of AD is

(ad)2 = d12 a1 + d22 a2 + + dn2 an

and, in general, the j-th column of AD is

(ad)j = d1j a1 + d2j a2 + + dnj an 8j = 1; 2; :::; n (15.20)

Therefore, since each column of AD is a linear combination of the columns of A, the space
generated by the columns of AD is a subspace of Rm of dimension lower than or equal to
that of the space generated by the columns of A. In other words,

(AD) (A) = r (15.21)

Let us suppose, by contradiction, that (AD) < (A) = r. Then, in the linear combinations
(15.20) one of the rst r columns of A always has coe cient zero { otherwise, the column
space of AD would have dimension at least r, being a1 ; a2 ; :::; ar linearly independent vectors
of Rm . Without loss of generality, let us suppose that column a1 is the one having coe cient
zero in all linear combinations (15.20). Then, we have

d11 = d12 = = d1n = 0

which is a contradiction since D has full rank and it cannot have a row of only zeros.
Therefore, the space generated by the columns of AD has dimension at least r, that is,
(AD) r. Together with (15.21), this proves the result.

(iii) If A, and therefore AT , has full rank, the result follows from (ii). Suppose that A
has not full rank and let (A) = r, with r < minfm; ng. As seen in (ii), the columns of AT A
are linear combinations of the columns of AT , and so

(AT A) (AT ) = (A) = r (15.22)

By assuming that the rst r columns of A are linearly independent, we can write A as

A = B C
m n m r m (n r)

with B of full rank equal to r. Therefore,

BT BTB BTC
AT A = [B C] = :
CT C TB C TC

By property (ii), the submatrix B T B, which is square of order r, has full rank r. Therefore,
the r columns of B T B are linearly independent vectors of Rr . Consequently, the rst r
columns of AT A are linearly independent vectors of Rn (otherwise, the r columns of B T B
would not be linearly independent). The column space of AT A has dimension at least r,
that is, (AT A) r. Together with (15.22), this proves the result.
15.4. RANK 489

15.4.4 Gaussian elimination procedure


The Gaussian elimination procedure is an important algorithm for the calculation of the
rank of matrices. Another algorithm, due to Kronecker, will be presented in Section 15.6.7
after having introduced the notion of determinant.
We start with a trivial observation. There are matrices that reveal immediately their
rank. For example, both matrices
2 3
2 3 1 0 0
1 0 0 0 0 6 0
4 0 1 0 0 0 5 6 1 0 7
7
and 4 0 (15.23)
0 1 5
0 0 1 0 0
0 0 0

have rank 3: in the rst one the rst three columns are linearly independent (they are
the three versors of R3 ); in the second one the rst three rows are linearly independent.
The matrices (15.23) are a special case of echelon matrices, which are characterized by the
following properties:

(i) the rows with not all elements zero have 1 as rst non-zero component, called pivot
element, or simply pivot;

(ii) the other elements of the column of the pivot are zero;

(iii) pivots form a \little scale" from the left to the right: a pivot of a lower row is to the
right of the pivot of an upper row;

(iv) the rows with all elements zero (if they exist) lie under the other rows, so in the lower
part of the matrix.

Matrices (15.23) are echelon matrices, and so it is the matrix:


2 3
1 0 0 0 0
6 0 1 0 0 7 7
6 7
4 0 0 1 3 0 5
0 0 0 0 0

in which the pivots are in boldface. Note that a square matrix is an echelon matrix when it
is diagonal, possibly followed by rows of only zeros; for example:
2 3
1 0 0
4 0 1 0 5
0 0 0

Clearly, the non-zero rows (that is, the rows with at least one non-zero element) are linearly
independent. The rank of an echelon matrix is, therefore, obvious.

Lemma 693 The rank of an echelon matrix is equal to the number of non-zero rows.
490 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

There exist some simple operations that permit to transform any matrix A into an echelon
matrix. Such operations, called elementary operations (by row ),8 are:

(i) multiplying any row by a non-zero scalar (denoted by E1 );

(ii) adding to any row a multiple of any other row (denoted by E2 );

(iii) interchanging any two rows (denoted by E3 ).

The three operations amounts to multiplying, on the left, the matrix A 2 M (m; n) by
suitable m m square matrices, called elementary. Speci cally,

(i) multiplying the s-th row of A by a scalar amounts to multiplying, on the left, A by
the elementary matrix Ps ( ) that coincides with the identity matrix Im except that,
in the place (s; s), we have instead of 1;

(ii) adding to the r-th row of A a multiple of the s-th row amounts to multiplying, on
the left, A by the elementary matrix Srs ( ) that coincides with the identity matrix
Im except that, in the place (r; s), we have instead of 0;

(iii) interchanging the r-th row and the s-th row of A amounts to multiplying, on the left,
A by the elementary matrix Trs that coincides with the identity matrix Im except that
the r-th row and the s-th row have been interchanged.

Example 694 Let 2 3


3 2 4 1
A=4 1 0 6 9 5
5 3 7 4

(i) Multiplying A by 2 3
1 0 0
P2 ( ) = 4 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
1 0 0 3 2 4 1 3 2 4 1
4
P2 ( ) A = 0 0 5 4 5
1 0 6 9 = 4 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4

in which the second row has been multiplied by .

(ii) Multiplying A by 2 3
1 0
S12 ( ) = 4 0 1 0 5
0 0 1
8
Though we could de ne also analogous elementary operations by column, we prefer not to do it and to
refer always to the rows in order to avoid any confusion and errors in computations. Choosing the rows over
the columns does not change the results.
15.4. RANK 491

on the left, we get


2 3 2 3 2 3
1 0 3 2 4 1 3 2 4+6 1+9
S12 ( ) A = 4 0 1 0 5 4 1 0 6 9 5 = 4 1 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4

in which to the rst row one added the second one multiplied by .

(iii) Multiplying A by 2 3
0 1 0
T12 =4 1 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
0 1 0 3 2 4 1 1 0 6 9
T12 A=4 1 0 0 5 4 1 0 6 9 5=4 3 2 4 1 5
0 0 1 5 3 7 4 5 3 7 4

in which the rst two rows have been exchanged. N

The next result, the proof of which we omit, shows the uniqueness of the echelon matrix
to which we arrive via elementary operations:

Lemma 695 Each matrix A 2 M (m; n) is transformed, via elementary operations, into a
unique echelon matrix A 2 M (m; n).

Naturally, di erent matrices can be transformed into the same echelon matrix. The
sequence of elementary operations that transforms a matrix A into the echelon matrix A is
called the Gaussian elimination procedure

Example 696 Let 2 3


3 2 4 1
A=4 1 0 6 9 5
5 3 7 4
We proceed as follows (the sign means that we pass from a matrix to the next one via an
elementary operation):
2 3
2 3 1 2 4 1
2 3 1 2 4 1 3 3 3
3 2 4 1 3 3 3 6 7
6 7 6 7
A=4 1 0 6 9 5 6 7 6 0 2 6+ 4
9+ 1 7
(1) 4 1 0 6 9 (2) 6
5
4
3 3 3 7
5
5 3 7 4
5 3 7 4
5 3 7 4
2 2 4 1
3 2 18 27
3
1 3 3 3 1 0 3 3
6 7 6 7
6 7 6 7
6 0 2 22 28 7 6 2 22 28 7
(3) 6
3 3 3 7 (4) 6 0 3 3 3 7
4 5 4 5
10 20 5 1 1 7
0 3 3 7 3 4 3 0 3 3 3
492 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
2 3 2 21
3 2 3
3
1 0 6 9 1 0 0 9+ 2 1 0 0 2
6 7 6 7 6 7
6 7 6 7 6 7
6 0 2 22 287 6 2 22 28 7 6 2 28 154 7
(5) 6
3 3 37 (6) 6 0 3 3 3 7 (7) 6 0 3 0 3 12 7
4 5 4 5 4 5
0 0 4 7 0 0 4 7 0 0 4 7
2 3
3
1 0 0 2
6 7
6 7
6 0 1 0 21 7
(8) 6 7
4
4 5
7
0 0 1 4

where: (1) multiplication of the rst row by 1=3; (2) addition of the rst row to the second
one; (3) addition of 5 times the rst row to the third one; (4) subtraction of the second
row from the rst one; (5) addition of the second row multiplied by 1=2 to the third one; (6)
addition of the third row multiplied by 3=2 to the rst one; (7) subtraction of the third row
multiplied by 22=12 from the second one; (8) multiplication of the second row by 3=2 and of
the third one by 1=4. Finally, we get
2 3
3
1 0 0 2
6 7
6 7
6
A=6 0 1 0 21 7
4 7
4 5
7
0 0 1 4

Example 697 If A is square of order n, the echelon matrix A that the Gaussian elimination
procedure yields is square of order n and upper triangular, with diagonal composed of only
1s and 0s. N

Going back to the calculation of the rank, which was the motivation of this section,
Proposition 692 shows that the elementary operations by row do not alter the rank of A
because the elementary matrices are square matrices of full rank. We have therefore:

Proposition 698 For each matrix A we have (A) = (A).

To calculate the rank of a matrix one can, therefore, apply Gaussian elimination to obtain
an echelon matrix of equal rank, whose rank is evident. For instance, in the last example
(A) = 3 because all the three rows are non-zero. By Proposition 698, we have (A) = 3,
so matrix A has full rank.

15.5 Invertible operators


15.5.1 Invertibility
An injective operator is usually called invertible. An invertible operator T 2 L (Rn ) has,
indeed, an inverse operator T 1 : Rn ! Rn , whose domain is the entire space Rn because T ,
being injective, is also surjective by the important Corollary 685.
15.5. INVERTIBLE OPERATORS 493

Lemma 699 If T : Rn ! Rn is linear, then its inverse T 1 : Rn ! Rn is linear.

This lemma, whose proof is left to the reader, shows that the inverse operator T 1 is a
linear operator too, that is, T 1 2 L (Rn ).9 Moreover, it is easy to verify that
1 1
T T = TT =I (15.24)

where I is the identity operator.

Example 700 (i) The identity operator I : Rn ! Rn is clearly invertible, with I 1 = I.


(ii) De ne T : R2 ! R2 by T (x) = Ax for every x 2 R2 , where

1 0
A=
1 2

The operator T is invertible, as the reader can verify, where T 1 (x) = Bx for every x 2 R2
and
1 0
B= 1 1
2 2
Finding the inverse operator is not an easy task, yet it is not just con ned to guessing. Later
in the chapter we will discuss a procedure allowing for the computation of B. N

In the last section we saw a rst characterization of the invertibility through the notions
of rank and nullity (Corollary 685). We give now another characterization of invertibility.
What is remarkable is that the existence of either a right or a left inverse grants full- edged
invertibility.

Proposition 701 Given an operator T 2 L (Rn ), the following conditions are equivalent:

(i) T is invertible;

(ii) there exists S 2 L (Rn ) such that T S = I;

(iii) there exists R 2 L (Rn ) such that RT = I;

(iv) there exist S; R 2 L (Rn ) such that

T S = RT = I (15.25)

Moreover, S and R are unique and we have S = R = T 1.

Proof (i) implies (ii). Since T is invertible, it is enough to set S = T 1.

(ii) implies (iii). Assume T S = I. We next show that ker S = ker T = f0g. By
contradiction, assume that there exists x 6= 0 such that S (x) = 0. Since T is linear, we
reach the contradiction 0 = T (0) = T (S (x)) = I (x) = x. By Corollary 685, we conclude
that ker S = f0g and S is bijective. Since ker S = f0g, we are left to show that ker T = f0g.
By contradiction, assume that there exists x 6= 0 such that T (x) = 0. Since S is bijective,
9
Recall that L(Rn ) is the space of linear operators T : Rn ! Rn .
494 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

there exists y 2 Rn such that S (y) = x. Since x 6= 0 and S is bijective, y must be di erent
from 0. Since T S = I, we have y = T (S (y)) = T (x) = 0, a contradiction. By Corollary
685, we conclude that ker T = f0g and that T is injective. So, T is invertible and it is enough
to set R = T 1 .

(iii) implies (iv). Assume that there exists R 2 L (Rn ) such that RT = I. We can use
the same technique of above with the role of T played by R and of S played by T . This
allows us to conclude that R is invertible. Since RT = I, this implies that R 1 (RT ) = R 1 ,
yielding that T = R 1 . Since R 1 is injective, T is invertible and S and R can be chosen to
be T 1 .
(iv) implies (i). Since (iv) is stronger than (ii), with the same technique used to prove
that (ii) implies (iii) we can show that T is invertible.

Finally, we have shown that if T S = I or RT = I holds, then T is invertible. In the rst


case, it follows that T 1 (T S) = T 1 , yielding that S = T 1 . Similarly, in the second case,
(RT ) T 1 = T 1 , yielding that R = T 1 .

15.5.2 Inverse matrix


The square matrix A associated to an invertible linear operator T : Rn ! Rn is said to be
invertible. The matrix associated to the inverse operator T 1 is called the inverse matrix of
A and is denoted by A 1 .

Example 702 (i) Back to Example 700, we have

1 0 1 1 0
A= and A = 1 1
1 2 2 2

From (15.24) we have


1 1
A A = AA =I
(ii) We already remarked that the linear operator T : R2 ! R2 given by (15.13), i.e.,

T (x) = (x1 x2 ; x1 + x2 )

is injective. Its matrix is


1 1
A=
1 1
It can be checked that tts inverse matrix is
1 1
1 2 2
A = 1 1
2 2

So, the inverse operator T 1 : R2 ! R2 is given by


1 1 x1 x2 x1 x2
T (x) = A x= + ; +
2 2 2 2
More generally, in view of Corollary 689, of Theorem 690 and of Proposition 701, we
have the following characterization.
15.6. DETERMINANTS 495

Corollary 703 For a square matrix A of order n the following properties are equivalent:

(i) A is invertible;

(ii) the columns of A are linearly independent;

(iii) the rows of A are linearly independent;

(iv) (A) = n;

(v) there exist two square matrices B and C of order n such that AB = CA = I; such
matrices are unique, with B = C = A 1 .

From this corollary one deduces a noteworthy property of inverse matrices.

Proposition 704 If the square matrices A and B of order n are invertible, then their prod-
uct is invertible and
(AB) 1 = B 1 A 1

Proof Let A and B be of order n and invertible. We have (A) = (B) = n, so that
(AB) = n by Proposition 692. By Corollary 703, the matrix AB is invertible. Recall from
(6.12) of Section 6.4 that, for the composition of invertible functions f and g, one has that
(g f ) 1 = f 1 g 1 . In particular this holds for linear operators, that is, (ST ) 1 = T 1 S 1 ,
so Proposition 677 implies (AB) 1 = B 1 A 1 .

So far so good. But, operationally, how do we compute the inverse of an (invertible)


matrix A, i.e., how do we nd the elements of the inverse matrix A 1 ? To answer this
important question, we must rst introduce determinants, the wizards of linear algebra.

15.6 Determinants
15.6.1 De nition
A matrix contained in a matrix A 2 M (m; n) is called a submatrix of A. It can be thought
of as obtained from A by deleting some rows and/or columns. In particular, we denote by
Aij the (m 1) (n 1) submatrix obtained from A by deleting row i and column j.

Example 705 Let 2 3 2 3


a11 a12 a13 2 1 4
A = 4 a21 a22 a23 5 = 4 3 1 0 5
a31 a32 a33 1 6 3
We have, for example,

a21 a23 3 0 a11 a13 2 4


A12 = = ; A32 = =
a31 a33 1 3 a21 a23 3 0
a11 a13 2 4 a12 a13 1 4
A22 = = ; A31 = =
a31 a33 1 3 a22 a23 1 0
N
496 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Through submatrices, we can de ne in a recursive way the determinants of square ma-


trices (only for them this notion is de ned). To ease notation we denote by M (n), in place
of M (n; n), the space of the square matrices of order n.
De nition 706 The determinant is the function det : M (n) ! R de ned, for every A 2
M (n), by
(i) det A = a11 if n = 1, A = [a11 ];
P
(ii) det A = nj=1 ( 1)1+j a1j det A1j if n > 1, A = (aij ).
Example 707 If n = 2, the determinant of the matrix
a11 a12
A=
a21 a22
is
det A = ( 1)1+1 a11 det ([a22 ]) + ( 1)1+2 a12 det ([a21 ]) = a11 a22 a12 a21
For example, if
2 4
A=
1 3
we have det A = 2 3 4 1 = 2. N
Example 708 If n = 3, the determinant of the matrix
2 3
a11 a12 a13
A = 4 a21 a22 a23 5
a31 a32 a33
is given by
det A = ( 1)1+1 a11 det A11 + ( 1)1+2 a12 det A12 + ( 1)1+3 a13 det A13
= a11 det A11 a12 det A12 + a13 det A13
= a11 (a22 a33 a23 a32 ) a12 (a21 a33 a23 a31 ) + a13 (a21 a32 a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31
For example, suppose we want to calculate the determinant of the matrix
2 3
2 1 4
A= 3 4 1 0 5
1 6 3
Let us calculate rst the determinants of the three submatrices A11 , A12 and A13 . We have
det A11 = 1 3 0 6=3
det A12 = 3 3 0 1=9
det A13 = 3 6 1 1 = 17
and, therefore,
det A = 2 det A11 1 det A12 + 4 det A13 = 2 3 1 9 + 4 17 = 65
15.6. DETERMINANTS 497

Example 709 For a lower triangular matrix A we have

det A = a11 a22 ann

that is, its determinant is simply the product of the elements of the main diagonal. Indeed,
all the other products are zero because they necessarily contain a zero element of the rst
row.
Since det A = det AT (Proposition 717), a similar result holds for upper triangular ma-
trices, so also for the diagonal ones. N

Example 710 If the matrix A has all the elements of its rst row zero except for the rst
one, which is equal to 1, then
2 3
1 0 0 2 3
6 a21 a22 7 a22 a2n
6 a2n 7 6 .. .. .. 7
det 6 . .. .. .. 7 = det 4 . . . 5
4 .. . . . 5
an2 ann
an1 an2 ann

That is, the determinant coincides with the determinant of the submatrix A11 . Indeed, in
n
X
det A = ( 1)1+j a1j det A1j
j=1

all the summands except for the rst one are zero. More generally, for any scalar k we have
2 3
k 0 0 2 3
6 a21 7 a22 a2n
6 a22 a2n 7 6 . .. .. 7
det 6 . .. .. .. 7 = k det 4 .. . . 5
4 .. . . . 5
an2 ann
an1 an2 ann

Similar properties hold also for the columns. N

The determinant of a square matrix can, therefore, be calculated through a well speci ed
procedure { an algorithm { based on its submatrices. There exist various techniques to
simplify the calculation of determinants (we will see some of them shortly) but, for our
purposes, it is important to know that they can be calculated through algorithms.

15.6.2 Geometry
Geometrically, the determinant of a square matrix measures (with a sign!) the \space taken
up" by its column vectors. Let us try to explain this, at least in the simplest case. So, let A
be the matrix 2 2
a11 a12
A=
a21 a22
498 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

in which we assume that a11 > a12 > 0 and a22 > a21 > 0 (the other possibilities can be
similarly studied, as readers can check).

3 G

2 a F C E
22

1 a B
21

0
D
O a a
12 11

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The determinant of A is the area of the parallelogram OBGC (see the gure), i.e., twice
the area of the triangle OBC that is obtained from the two column vectors of A. The area
of the triangle OBC can be easily calculated by subtracting from the area of the rectangle
ODEF the areas of the three triangles ODB, OCF , and BEC. Since
a11 a21 a22 a12
area ODEF = a11 a22 ; area ODB = ; area OCF =
2 2
(a11 a12 ) (a22 a21 ) a11 a22 a11 a21 a12 a22 + a12 a21
area BCE = =
2 2
one gets
a11 a21 + a22 a12 + a11 a22 a11 a21 a12 a22 + a12 a21
area OBC = a11 a22
2
a11 a22 a12 a21
=
2
Therefore,
det A = area OBGC = a11 a22 a12 a21
The reader will immediately realize that:

(i) if we exchange the two columns, the determinant changes only its sign (because the
parallelogram is covered in the opposite direction);

(ii) if the two vectors are proportional, that is, linearly dependent, the determinant is zero
(because the parallelogram collapses in a segment).

For example, let


6 4
A=
2 8
15.6. DETERMINANTS 499

One has
6 2
area ODEF = 6 8 = 48; area ODB =
=6
2
8 4 (6 4) (8 2)
area OCF = = 16; area BCE = =6
2 2

and so
area OBC = 48 6 16 6 = 20

We conclude that
det A = area OBGC = 40

For 3 3 matrices, the determinant is the volume (with sign) of the hexahedron determined
by the three column vectors.

15.6.3 Combinatorics
A permutation of the set of numbers N = f1; 2; :::; ng is any bijection : N ! N (Appendix
B.2). There are n! possible permutations. For example, the permutation

f2; 1; 3; 4; :::; ng (15.26)

interchanges the rst two elements of N and leave the others unchanged. So, it is represented
by the function : N ! N de ned by
8
>
> 2 if k = 1
<
(k) = 1 if k = 2
>
>
:
k else

Let be the set of all the permutations of N . We have an inversion in a permutation 2


if, for some k; k 0 2 N we have k < k 0 and (k) > (k 0 ). We say that the permutation
is odd (resp., even) if it features an odd (resp., even) number of inversions. The function
sgn : ! f 1; 1g de ned by
(
+1 if is even
sgn =
1 if is odd

is called parity.10 In particular, an even permutation has parity +1, while an odd permutation
has parity 1.

Example 711 (i) The permutation (15.26) is odd because there is only one inversion, with
k = 1 and k 0 = 2. So, its parity is 1. (ii) The identity permutation (k) = k has, clearly,
no inversions. So, it is an even permutation, with parity +1. N
10
Though the symbol sgn has been already used for the signum function, no confusion should arise.
500 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Let us go back to determinants. Consider a 2 2 matrix A and set N = f1; 2g. In this
case consists of only two permutations and 0 , de ned by
( (
1 if k = 1 0
2 if k = 1
(k) = and (k) =
2 if k = 2 1 if k = 2

In particular, we have sgn = +1 and sgn 0 = 1. Remarkably, we then have


0
det A = (sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2)

Indeed:
0
(sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2) = a11 a22 a12 a21

The next result shows that this remarkable fact is true in general, thus providing an important
combinatorial characterization of determinants (we omit the proof).

Theorem 712 We have


X n
Y
det A = sgn ai (i) (15.27)
2 i=1

for every square matrix A = (aij ) of order n.

Note that each term in the sum (15.27) contains only one element of each row and only
one element of each column. This will be crucial in the proofs of the next section.

15.6.4 Properties
The next proposition collects some of the main properties of determinants, which are also
useful for their computation. In the statement \line" stands for either row or column: the
properties hold, indeed, symmetrically for both the rows and the columns of the matrix.
\Parallel lines" means two rows or two columns.

Proposition 713 Let A and B be two square matrices of the same order. Then:

(i) If a line of A is zero, then det A = 0.

(ii) If B is obtained from A by multiplying a line by a scalar k, then det B = k det A.

(iii) If B is obtained from A by interchanging two parallel lines, then det B = det A.

(iv) If two parallel lines of A are equal, then det A = 0.

(v) If a line of A is the sum of two vectors b and c, then det A is the sum of the determinants
of the two matrices that are obtained by taking that line equal rst to b and then to c.

(vi) If B is obtained from A by adding to a line a multiple of a parallel line, then det B =
det A.
15.6. DETERMINANTS 501

Proof The proof relies on the combinatorial characterization of the determinant established
in Proposition 712, in particular on the observation that each term that appears in the
determinant contains exactly one element of each row and one element of each column. In
the proof we only consider rows (similar arguments hold for the columns).
(i) In all the products that constitute the determinant, there appears one element of each
row: if a row is zero, all the products are then zero.
(ii) For the same reason, all the products turn out to be multiplied by k.
(iii) By exchanging two rows, all the even permutations become odd and vice versa.
Therefore, the determinant changes sign.
(iv) Let A be the matrix that has rows i and j equal and let Aij be the matrix A with
such rows interchanged. By (iii), we have det Aij = det A. Nevertheless, since the two
interchanged rows are equal, we have A = Aij . So, det Aij = det A. This is possible if and
only if det Aij = det A = 0.
(v) Suppose
2 1 3 2 3
a a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
A=6 7 6
6 ar 7 = 6 b + c 7
7
6 7 6 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am

and let 2 3 2 3
a1 a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
Ab = 6
6
7
7 and Ac = 6
6
7
7
6 b 7 6 c 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am

be the two matrices obtained by taking as r-th row b and c, respectively. Then
0 1
X n
Y X Y
det A = sgn ai (i) = sgn @ ai (i)
A (b + c)
r (r)
2 i=1 2 i6=r
0 1 0 1
X Y X Y
= sgn @ ai (i)
A br (r) + sgn @ ai (i)
A cr (r) = det Ab + det Ac
2 i6=r 2 i6=r

which completes the proof of this point.


(vi) Let
2 3
a1
6 a2 7
6 7
A=6 .. 7
4 . 5
am
502 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

The matrix obtained from A by adding, for example, k times the rst row to the second one,
is 2 3
a1
6 a2 + ka1 7
6 7
B=6 .. 7
4 . 5
am
Moreover, let 2 3 2 3
a1 a1
6 ka1 7 6 a1 7
6 7 6 7
C=6 .. 7 and D = 6 .. 7
4 . 5 4 . 5
am am
By (v), det B = det A + det C. On the other hand, by (ii) we have det C = k det D. But,
since D has two equal rows, by (i) we have det D = 0. We conclude that det B = det A.

An important operational consequence of this proposition is that now we can say how
the elementary operations E1 -E3 , which characterize the Gaussian elimination procedure,
modify the determinant of A. Speci cally:

E1 : if B is obtained from A by multiplying a row of the matrix A by a constant 6= 0,


then det B = det A by Proposition 713-(ii);
E2 : if B is obtained from A by adding to a row of A the multiple of another row, then
det B = det A by Proposition 713-(vi);
E3 : if B is obtained from A by exchanging two rows of A, then det B = det A by Propo-
sition 713-(iii).

In particular, if matrix B is obtained from A via elementary operations, we have

det A 6= 0 () det B 6= 0 (15.28)

or, equivalently, det A = 0 if and only if det B = 0. This observation leads to the following
important characterization of square matrices of full rank.

Proposition 714 A square matrix A has full rank if and only if det A 6= 0.

Proof \Only if". If A has full rank, its rows are linearly independent (Corollary 703). By
Lemma 695 and Proposition 698, A can be then transformed via elementary operations into
a unique echelon square matrix of full rank, that is, the identity matrix In . By (15.28), we
conclude that det A 6= 0.
\If". Let det A 6= 0. Suppose, by contradiction, that A does not have full rank. Then,
its rows are not linearly independent (Corollary 703), so at least one of them is a linear
combination of the others. Such row can be reduced to become zero by repeatedly adding
to it carefully chosen multiples of the other rows. Denote by B such transformed matrix.
By Proposition 713-(i), det B = 0, so by (15.28) we have det A = 0, a contradiction. We
conclude that A has full rank.

Corollary 703 and the previous result jointly imply the following important result.
15.6. DETERMINANTS 503

Corollary 715 For a square matrix A the following conditions are equivalent:

(i) the rows are linearly independent;

(ii) the columns are linearly independent;

(iii) det A 6= 0.

The determinants behave well with respect to the product, as the next result shows. It
is a key property of determinants.

Theorem 716 (Binet) If A and B are two square matrices of the same order n, then
det AB = det A det B.

So, determinants commute: det AB = det BA. This is a rst interesting consequence of
Binet's Theorem. Since I = A 1 A, another interesting consequence of this result is that

1 1
det A =
det A
when A is invertible. Indeed, 1 = det I = det A 1 A = det A 1 det A. In particular, this
formula implies that det A 6= 0 if and only if det A 1 6= 0.

Proof If (at least) one of the two matrices has linearly dependent rows or columns, then the
statement is trivially true since the columns of AB are linear combinations of the columns
of A and the rows of AB are linear combinations of the rows of B, hence in both cases AB
has also linearly dependent rows or columns, so det AB = 0 = det A det B.
Suppose, therefore, that both A and B have full rank. Suppose the matrix A is diagonal.
If so, det A = a11 a22 ann . Moreover, we have
0 10 1
a11 0 0 b11 b12 b1n
B 0 a22 0 C B b2n C
AB = B C B b21 b22 C
@ A@ A
0 0 ann bn1 bn2 bnn
0 1
a11 b11 a11 b12 a11 b1n
B a22 b21 a22 b22 a22 b2n C
=B@
C
A
ann bn1 ann bn2 ann bnn

By Proposition 713-(ii),

det AB = a11 a22 ann det B = det A det B

proving the result in this case.


If A is not diagonal, we can transform it into a diagonal matrix by suitably applying
the elementary operations E2 and E3 . As we have seen, such operations are equivalent to
multiply A on the left by a square matrices Srs ( ) and Trs , respectively. Let us agree to make
rst the transformations T and then the transformations S ( ). Let us suppose, moreover,
504 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

that the diagonalization requires h transformation S ( ) and k transformations T . If D is


the diagonal matrix obtained in this way, we then have

D = S ( )S ( ) S ( )T T T A
| {z }| {z }
h times k times

Since D is diagonal, we know that

det DB = det D det B

Since D is obtained from A through h elementary operations that do not modify the deter-
minant and k elementary operations that only change its sign, we have det D = ( 1)k det A.
Therefore,
det DB = ( 1)k det A det B (15.29)

Analogously, since the product of matrices is associative, we have

DB = (S ( ) S ( )T T A) B = (S ( ) S ( )T T ) (AB)

Therefore, DB is obtained from AB via h elementary operations that do not modify the
determinant and k elementary operations that only change its sign. So, as before, we have

det DB = ( 1)k det AB (15.30)

Putting together (15.29) and (15.30), we get det AB = det A det B, as desired.

We close with couple of other important properties of determinants.

Proposition 717 Let A be a square matrix of order n. Then:

(i) det A = det AT .

(ii) det ( A) = n det A for all 2 R.

Point (ii) yields the important formula

det ( A) = ( 1)n det A

when = 1.

Proof (i) Transposition does not alter any of the n! products in the sum (15.27), as well
as their parity. (ii) Let 2 R. First, observe that A = ( I) A. By Binet's Theorem,
det ( A) = det (( I) A) = det ( I) det A = n det A.
15.6. DETERMINANTS 505

15.6.5 Laplace's Theorem


Let A be a square matrix of order n. The algebraic complement (or cofactor ) of aij , denoted
by aij , is the number
aij = ( 1)i+j det Aij

The matrix of algebraic complements (or cofactor matrix ) of A, denoted by A , is the matrix
whose elements are the algebraic complements of the elements of A, that is,

A = aij

with i; j = 1; 2; :::; n. The transpose (A )T is sometimes called the (classical ) adjoint matrix.

Example 718 Let


2 3
1 3 0
A=4 5 1 2 5
3 6 4
For a11 = 1, we have
1 2
A11 = and det A11 = 16
6 4

Therefore, a11 = ( 1)1+1 ( 16) = 16.

For a12 = 3, we have


5 2
A12 = and det A12 = 26
3 4

Therefore, a12 = ( 1)1+2 26 = 26.

For a13 = 0, we have


5 1
A13 = and det A13 = 27
3 6

Therefore, a13 = ( 1)1+3 27 = 27.

Similarly,

a21 = ( 1)2+1 12 = 12; a22 = ( 1)2+2 4 = 4; a23 = ( 1)2+3 15 = 15


3+1 3+2 3+3
a31 = ( 1) 6 = 6; a32 = ( 1) 2= 2; a33 = ( 1) ( 16) = 16

We conclude that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
N
506 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Using the notion of algebraic complement, the de nition of the determinant of a square
matrix (De nition 706) can be viewed as the sum of the products of the elements of the rst
row by their algebraic complements, that is,
n
X
det A = a1j a1j
j=1

The next result shows that, actually, there is nothing special about the rst row: the
determinant can be computed using any row or column of the matrix. The choice of which
one to use is then just a matter of analytical convenience.

Proposition 719 The determinant of a square matrix A is equal to the sum of the products
of the elements of any line (row or column) by their algebraic complements.

In symbols, choosing the row i,


n
X
det A = aij aij
j=1

or, choosing the column j,


n
X
det A = aij aij
i=1

Proof For the rst row, the result is just a rephrasing of the de nition of determinant. Let
us verify it for the i-th row. By points (ii) and (v) of Proposition 713 we can rewrite det A
in the following way:
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det A = det 6
6 ai1 aij ain 7 7 (15.31)
6 .. .. 7
4 . . 5
an1 anj ann

2 3 2 3
a11 a1j a1n a11 a1j a1n
6 .. .. 7 6 .. .. 7
6 . . 7 6 . . 7
6 7 6 7
= ai1 det 6
6 1 0 0 7 7+ + + aij det 6
6 0 1 0 7 7
6 .. .. 7 6 .. .. 7
4 . . 5 4 . . 5
an1 anj ann an1 anj ann
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
+ +ain det 6
6 0 0 1 7 7
6 .. .. 7
4 . . 5
an1 anj ann
15.6. DETERMINANTS 507

Let us calculate the determinant of the submatrix relative to the term (i; j):
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
6
det 6 0 1 0 7 (15.32)
7
6 .. .. 7
4 . . 5
an1 anj ann
Note that to be able to apply the de nition of the determinant and to use the notion of
algebraic complement, it is necessary to bring the i-th row to the top and the j-th column to
the left, i.e., to transform the matrix (15.32) into a matrix that has (1; 0; :::; 0) as rst row,
(1; a1j ; a2j ; :::; ai 1;j ; ai+1;j ; :::anj ) as rst column and Aij as the (n 1) (n 1) South-East
submatrix:
2 3
1 0 0 0 0
6 a1j a11 a1;j 1 a1;j+1 a1n 7
6 7
6 7
6 7
~ 6
A = 6 ai 1;j ai 1;1 ai 1;j 1 ai 1;j+1 ai 1;n 7
7
6 ai+1;j ai+1;1 ai+1;j 1 ai+1;j+1 ai 1;n 7
6 7
4 5
anj an1 an;j 1 an;j+1 an;n
The transformation requires i 1 exchanges of adjacent rows to bring the i-th row to the
top, and j 1 exchanges of adjacent columns to bring the j-th column to the left (leaving
the order of the other rows and columns unchanged). Clearly, we have
det A~ = 1 det Aij
and so
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 = ( 1)
i+j 2
det A~ = ( 1)i+j det Aij = aij (15.33)
6 .. .. 7
4 . . 5
an1 anj ann
By applying formula (15.31) and using (15.33) we complete the proof.
Example 720 Let 2 3
1 3 4
A= 4 2 0 2 5
1 3 1
By Proposition 719, we can compute the determinant using any line. It is, however, simpler
to compute it using the second row because it contains a zero, a feature that facilitates the
algebra. Indeed,
det A = a21 a21 + a22 a22 + a23 a23
3 4 1 3
= ( 2)( 1)2+1 det + 0 + (2)( 1)2+3 det
3 1 1 3
= ( 2)( 1)( 15) + 0 + (2)( 1)(0) = 30
508 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

The next result completes Proposition 719 by showing what happens if we use the alge-
braic complements of a di erent row (or column).

Proposition 721 The sum of the products of the elements of any row (column) by the al-
gebraic complements of a di erent row (column) is zero.

In symbols, choosing the row i,


n
X
aij aqj = 0 8q 6= i
j=1

or, choosing the column j,


n
X
aij aiq = 0 8q 6= j
i=1
P
Proof Let us replace the i-th row by the q-th row. Then we get det A = nj=1 aij aqj . But,
on the other hand, the determinant is zero because the matrix has two equal rows.

Example 722 Let 2 3


1 0 2
A= 4 2 1 3 5
2 4 1
Then

a11 = ( 1)1+1 ( 13) = 13; a12 = ( 1)1+2 4 = 4; a13 = ( 1)1+3 10 = 10


a21 = ( 1)2+1 8 = 8; a22 = ( 1)2+2 ( 3) = 3; a23 = ( 1)2+3 ( 4) = 4
a31 = ( 1)3+1 2 = 2; a32 = ( 1)3+2 1 = 1; a33 = ( 1)3+3 ( 1) = 1

Let us add the products of the elements of the second row by the algebraic complements of
the rst row:
2a11 + a12 + 3a13 = 26 4 + 30 = 0
Now, let us add the products of the elements of the second row by the algebraic complements
of the third row:
2a31 + a32 + 3a33 = 4 1 3 = 0
The reader can verify that, in accordance with the last result, we get 0 in all the cases
in which we add the products of the elements of a row by the algebraic complements of a
di erent row. N

The last two results are summarized in the famous, all-inclusive, Laplace's Theorem:

Theorem 723 (Laplace) Let A be a square matrix of order n. Then:


15.6. DETERMINANTS 509

(i) choosing the row i, (


n
X det A if q = i
aij aqj =
j=1 0 if q 6= i

(ii) choosing the column j,


n
(
X det A if q = j
aij aiq =
i=1 0 if q 6= j

Laplace's Theorem is the occasion to introduce the Kronecker delta function :N N!


f0; 1g de ned by
1 if i = j
ij =
0 if i 6= j
Here i and j are, thus, any two natural numbers { for instance, 11 = 33 = 1 and 13 =
31 = 0. Using this function, points (i) and (ii) of Laplace's Theorem assume the following
elegant forms:
X n
aij aqj = iq det A
j=1

and
n
X
aij aiq = jq det A
i=1

as the reader may verify.

15.6.6 Inverses and determinants


Let us go back to inverse matrices. The next result shows the importance of the determinants
in their calculation.

Theorem 724 A square matrix A is invertible if and only if det A 6= 0. In this case,
1
A 1
= (A )T
det A

Thus, the elements aij1 of the inverse matrix A 1 are

aji det Aji


aij1 = = ( 1)i+j (15.34)
det A det A
A (square) matrix A for which det A = 0 is called singular. With this terminology, the
theorem says that a matrix is invertible if and only if it is not singular. By Corollary 703,
the following properties are therefore equivalent:

(i) A is invertible;

(ii) det A 6= 0, that is, A is not singular;


510 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

(iii) the columns of A are linearly independent;

(iv) the rows of A are linearly independent;

(v) (A) = n.

Proof of Theorem 724 If


2 3 2 3
1 ( )1
6 2 7 6 ( )2 7
A = (aij ) = 6
4
7
5 and A = (aij ) = 6
4
7
5
n
( )n

we have 2 3
1
6 2 7
A (A )T = 6
4
7 (
5 )1 j ( )2 j j( )n
n

By Laplace's Theorem, the place (i; q) in the product A (A )T is


n
X
i q det A if i = q
( ) = aij aqj =
0 if i 6= q
j=1

Analogously, the place (i; q) in the product (A )T A is


n
X
C i det A if i = q
(a ) (aC )q = aji ajq =
0 if i 6= q
j=1

where (a C )i is the i-th column of A and (aC )q is the q-th column of A. Therefore,
2 3
det A 0 0
6 0 det A 0 7
6 7
A (A )T = (A )T A = 6 . . .. .. 7 = det A In
4 .. .. . . 5
0 0 det A

That is,
1 1
A (A )T = (A )T A = In
det A det A
which allows us to conclude that
1
A 1
= (A )T
det A
as desired.

This last theorem is important because, through determinants, it provides an algorithm


that allows both to verify the invertibility of A and to compute the elements of the inverse
A 1 . Note that in formula (15.34) the subscript of Aji is exactly ji and not ij.
15.6. DETERMINANTS 511

Example 725 We use formula (15.34) to calculate the inverse of the matrix
1 2
A=
3 5
We have
det A11 a22 5
a111 = ( 1)1+1 = = = 5
det A a11 a22 a12 a21 1
det A21 a12 2
a121 = ( 1)1+2 = = =2
det A a11 a22 a12 a21 1
det A12 a21 3
a211 = ( 1)2+1 = = =3
det A a11 a22 a12 a21 1
det A22 a11 1
a221 = ( 1)2+2 = = = 1
det A a11 a22 a12 a21 1
So,
a22 a12
1 det A det A 5 2
A = a21 a11 =
det A det A 3 1
N

Example 726 A diagonal matrix A is invertible if no element of the diagonal is zero. In


this case the inverse A 1 is diagonal and formula (15.34) implies that
( 1
aij if i = j
aij1 =
0 if i 6= j
N

Example 727 For the matrix


2 3
1 3 0
A= 4 5 1 2 5
3 6 4
we saw that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
Therefore, 2 3
16 12 6
T
(A ) = 4 26 4 2 5
27 15 16
Also det A = 94 and so
2 3
2 3 8 6 3
16 12 6 47 47 47
1 1 4 6 7
A 1
= (A )T = 26 4 2 5=6
4
13
47
2
47
1
47
7
5
det A 94
27 15 16 27 15 8
94 94 47
N
512 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

15.6.7 Ranks and determinants: Kronecker's Algorithm


There is an interesting determinant angle on the rank of a matrix. To see it, recall some
results proved previously:

1. if the rank of a matrix is r, it contains at most r linearly independent columns (so,


also r linearly independent rows);

2. r vectors x1 ; x2 ; :::; xr of Rr are linearly independent if and only if the determinant of


the square matrix of order r that has them as row (or column) vectors is non-zero;

3. if r vectors x1 ; x2 ; :::; xr of Rr are linearly independent in Rr , then the r vectors


y 1 ; y 2 ; :::; y r of Rn , with n > r, that have exactly x1 ; x2 ; :::; xr as their rst r com-
ponents are linearly independent in Rn .11

These basic results underlie (why?) the following result.

Proposition 728 A matrix A has rank r if and only if it has a square submatrix of order r
with a non-zero determinant and all its square submatrix of order r + 1 (if there exist any)
have a zero determinant.

We call minor of order r the determinant of a square submatrix of A of order r. Under


this standard terminology, we can say that the rank of a matrix is the maximum order of its
non-zero minors.

Example 729 Let12 2 3


1 0 2 1
A=4 0 1 1 1 5
2 0 4 2
All its minors of order 3 are zero:
2 3 2 3 2 3
1 0 2 1 0 1 0 2 1
det 4 0 1 1 5 = det 4 0 1 5 4
1 = det 1 1 1 5
2 0 4 2 0 2 0 4 2
2 3
1 2 1
= det 4 0 1 1 5=0
2 4 2

On the other hand, we have the following non-zero minor of order 2:

1 0
det = 1
0 1

By the last proposition, the rank of A is 2. N

This result sheds some further light on the nature of the rank of a matrix, but the next
version is more useful to compute it as it involves fewer minors.
11
The property is easy to verify and has already been used in the proof of Proposition 692.
12
This example is from Mirsky (1955) p. 136.
15.6. DETERMINANTS 513

Proposition 730 A matrix A has rank r if and only if it has a square submatrix B of order
r with non-zero determinant and such that each square submatrix C of order r + 1 (if there
exist any) that contains B has a zero determinant.

The Kronecker Algorithm uses this result to compute the rank of a matrix through
determinants. It proceeds as follows.

(i) We choose as \leader" a square submatrix B of order r of A that is readily seen to be


not singular; pragmatically, we often take a submatrix of order 2.
(ii) We \border" in all the possible ways the submatrix B with one of the surviving rows
and one of the surviving columns. If all these submatrices C of order r + 1 have a zero
determinant, the rank of A is r and the procedure ends. Otherwise, if we run into a
submatrix C with non-zero determinant, we start again by taking it as a new \leader".

The Kronecker Algorithm is best described using minors. To this end, given a matrix A,
we call bordered minor of a square submatrix B of order r the determinant of a submatrix
C of A of order r + 1 that contains B.

Example 731 Let 2 3


6 3 9 0
4
A= 4 1 7 2 5
8 10 6 12
Let us choose as leader the minor
6 3
det = 6 6= 0
4 1
With the last two columns and the last non-used row, we obtain the following bordered
minors: 2 3 2 3
6 3 9 6 3 0
det 4 4 1 7 5 = 0 ; det 4 4 1 2 5=0
8 10 6 8 10 12
So, the rank of A is 2. N

15.6.8 Summing up
We conclude this section by noting how the rank of a matrix is simultaneously many things
{ each one of them being a possible de nition of it. Indeed, it is:

(i) the maximum number of its linearly independent columns;


(ii) the maximum number of its linearly independent rows;
(iii) the maximum order of its non-zero minors;
(iv) the dimension of the image of the linear operator that the matrix determines.

The rank is a multi-faceted notion that plays a key role in linear algebra and its many
applications. Operationally, the Gaussian elimination procedure and the Kronecker's Algo-
rithm permit to compute it.
514 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

15.7 Square linear systems


Using inverse matrices we can give a procedure for solving \square" linear systems of equa-
tions, i.e., systems of n equations in n unknowns:
8
>
> a11 x1 + a12 x2 + + a1n xn = b1
<
a21 x1 + a22 x2 + + a2n xn = b2
>
>
:
an1 x1 + an2 x2 + + ann xn = bn

In matrix form:
A x = b (15.35)
n nn 1 n 1

where A is a square n n matrix, while x and b are vectors in Rn . We ask two questions
concerning the system (15.35):

Existence: which conditions ensure that the system has a solution for every vector
b 2 Rn , that is, when, for any given b 2 Rn there exists a vector x 2 Rn such that
Ax = b?

Uniqueness: which conditions ensure that such a solution is unique, that is, when, for
any given b 2 Rn there exists a unique x 2 Rn such that Ax = b?

To frame the problem in what we studied until now, consider the linear operator T :
Rn ! Rn associated to A, de ned by T (x) = Ax for every x 2 Rn . The system (15.35) can
be written in functional form as
T (x) = b
So, it is immediate that:

the system admits a solution for a given b 2 Rn if and only if b 2 Im T ; in particular,


the system admits a solution for every b 2 Rn if and only if T is surjective, that is,
Im T = Rn ;

the system admits a unique solution for a given b 2 Rn if and only if the preimage
T 1 (b) is a singleton; in particular, the system admits a unique solution for every
b 2 Rn if and only if T is injective.13

Since injectivity and surjectivity are, by Corollary 685, equivalent properties for linear
operators from Rn into Rn , the two problems of existence and uniqueness are equivalent:
there exists a solution for the system (15.35) for every b 2 Rn if and only if such a solution
is unique.
In particular, a necessary and su cient condition for such a unique solution to exist for
every b 2 Rn is that the operator T is invertible, i.e., that one of the following equivalent
conditions holds:

(i) the matrix A is invertible;


13
Recall that a function is injective if and only if all its preimages are singletons.
15.7. SQUARE LINEAR SYSTEMS 515

(ii) the matrix A is non-singular, i.e., det A 6= 0;

(iii) the matrix A is of full rank, i.e., (A) = n.

The condition required is, therefore, the invertibility of the matrix A, or one of the
equivalent properties (ii) and (iii). This is the content of Cramer's Theorem, which thus
follows easily from what we learned so far.

Theorem 732 (Cramer) Let A be a square matrix of order n. The system (15.35) has
one, and only one, solution for every b 2 Rn if and only if the matrix A is invertible. In this
case, the solution is given by
x = A 1b

Proof \If". Let A be invertible. The associated linear operator T : Rn ! Rn is invertible,


so both surjective and injective. Since T is surjective, the system has a solution. Since T
is injective, this solution is unique. In particular, the solution that corresponds to a given
b 2 Rn is T 1 (b). Since T 1 (y) = A 1 y for every y 2 Rn , it follows that the solution is
T 1 (b) = A 1 b.14
\Only if". Assume that the system (15.35) admits one and only one solution for every
b 2 Rn . This means that, for every vector b 2 Rn , there exists only one vector x 2 Rn such
that T (x) = b. Hence, the operator T is bijective, so invertible. It follows that also A is
invertible.

Thus, the system (15.35) admits a solution for every b if and only if the matrix A is
invertible and, even more important, the unique solution is expressed in terms of the inverse
matrix A 1 . Since we are able to calculate A 1 using determinants (Theorem 724), we
have obtained a procedure for solving linear systems of n equations in n unknowns: formula
x = A 1 b can indeed be written as
1
x= (A )T b (15.36)
det A
Using Laplace's Theorem, it is easy to show that formula (15.36), called Cramer's rule, can
be written in detail as: 2 det A1 3
det A
6 7
6 det A2 7
x=6 det A 7 (15.37)
4 5
det An
det A
where Ak denotes the matrix obtained by replacing the k-th column of the matrix A with
the column vector 2 3
b1
6 b2 7
b=6 4
7
5
bn
14
Alternatively, it is possible to prove the \if" in the following, rather mechanical, way. Set x = A 1 b; we
have Ax = A A 1 b = AA 1 b = Ib = b, so x = A 1 b solves the system. It is also the unique solution.
~ 2 Rn is another solution, we have x
Indeed, if x ~ = Ix~ = A 1A x ~ = A 1 (A~x) = A 1 b = x as claimed.
516 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Example 733 A special case of the system (15.35) is when b = 0. Then the system is called
homogeneous and, if A is invertible, by Proposition 732 the unique solution is x = 0. N

Example 734 For the system


x1 + 2x2 = b1
3x1 + 5x2 = b2
of two equations in two unknowns we have

1 2
A=
3 5

From Example 725 we know that A is invertible. By Proposition 732, the unique solution of
the system is therefore

1 5 2 b1 5b1 + 2b2
x=A b= =
3 1 b2 3b1 b2

Using Cramer's rule (15.37), we see that

b1 2 1 b1
det A = 1 det A1 = det = 5b1 2b2 det A2 = det = b2 3b1
b2 5 3 b2

Therefore,
5b1
2b2 b2 3b1
x1 = = 5b1 + 2b2 ; x2 = = 3b1 b2
1 1
which coincides with the solution found above. N

Example 735 For the system


8
< x1 2x2 + 2x3 = b1
2x2 x3 = b2
:
x2 x3 = b3

of three equations in three unknowns we have


2 3
1 2 2
A= 4 0 2 1 5
0 1 1

Using submatrices, it is easy to verify that det A = 1 6= 0. Therefore, A is invertible and,


using formula (15.34), we obtain
2 3
1 0 2
A 1=4 0 1 1 5
0 1 2

By Proposition 732, the unique solution of the system is


2 32 3 2 3
1 0 2 b1 b1 + 2b3
x = A 1b = 4 0 1 1 5 4 b2 5 = 4 b2 b3 5
0 1 2 b3 b2 2b3
15.7. SQUARE LINEAR SYSTEMS 517

For example, if b = (1; 1; 2), we have

x = (1 + 2 2; 1 2; 1 2 2) = (5; 3; 5)

Using Cramer's rule (15.37), we see that

det A = 1 det A1 = b1 2b3 det A2 = b2 + b3 det A3 = b2 + 2b3

Hence
b1 2b3 b2 + b3 b2 + 2b3
x1 = = b1 + 2b3 x2 = = b2 b3 x3 = = b2 2b3
1 1 1
which coincides with the solution found above. N

A classic linear system of n equations in n unknowns is:


8
>
> x1 + a1 x2 + a21 x3 + + a1n 1 xn = b1
>
<
x1 + a2 x2 + a22 x3 + + a2n 1 xn = b2
(15.38)
>
>
>
:
x1 + an x2 + a2n x3 + + ann 1 xn = bn

The coe cients of this linear system are P


powers based on n parameters a1 , a2 , ..., an . De ne
a polynomial px : R ! R by px (a) = nj=1 aj 1 xj for all a 2 R. We can then write the
n

linear system (15.38) as


8
>
> px (a1 ) = b1
>
<
px (a2 ) = b2
(15.39)
>
>
>
:
px (an ) = bn
To solve this linear system, let us represent it in matrix form via the n n matrix
2 3
1 a1 a21 an1 1
6 7
6
A = 6 1 a2 a2
2 an2 1 7
7
4 5
1 an an 2 ann 1

This is the so-called Vandermonde matrix, which permits to write the linear system (15.38)
in the standard matrix form
Ax = b (15.40)

Next we establish a key property of Vandermonde matrices.

Proposition 736 For a Vandermonde matrix A of order n we have


Y
det A = (ai aj ) (15.41)
1 j<i n
518 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

For instance,
2 3
1 a1 a21
6 7
det 4 1 a2 a22 5 = (a2 a1 ) (a3 a2 ) (a3 a1 )
1 a3 a23

Proof We proceed by induction on the order of the matrix. Initial step: for n = 1 we trivially
have det A = 1. Induction step: assume that formula (15.41) holds for Vandermonde matrices
of order n 1. Consider the following operation: subtract from each column the previous
one multiplied by a1 . By Proposition 713,
2 3
1 a1 a21 a1n 1
6 7
6
det A = det 6 1 a2 a22 a2n 1 7
7
4 5
1 an a2n ann 1
2 3
1 a1 a1 a21 a21 an1 1
a1 a1n 2
6 7
6
= det 6 1 a2 a1 a22 a2 a1 an2 1
a1 a2n 2 7
7
4 5
1 an a1 a2n an a1 ann 1 a1 ann 2
2 3
1 0 0 0
6 7
6
= det 6 1 a2 a1 a2 (a2 a1 ) an2 2
(a2 a1 ) 7
7
4 5
1 an a1 an (an a1 ) ann 2 (a
n a1 )

Consider now the following operation: divide each row i, expect the rst one, by ai a1 .
Again by Proposition 713,
2 3
1 0 0 0
6 7
6
det 6 1 a2 a1 a2 (a2 a1 ) an2 2
(a2 a1 ) 7
7
4 5
1 an a1 an (an a1 ) ann 2 (an a1 )
2 3
1 0 0 0
6 1 n 2 7
6 1 a2 a2 7
= (a2 a1 ) det 6 a2 a1 7
4 5
1 an a1 an (an a1 ) ann 2 (an a1 )
2 3
1 0 0 0
6 7
6 1
1 a2 an2 2 7
= (a2 a1 ) (an a1 ) det 6 a2 a1 7
4 5
1 n 2
a a 1 an an
2 n 1 3
n 2
1 a2 a2
6 7
= (a2 a1 ) (an a1 ) det 4 5
1 an n
an 2
15.8. GENERAL LINEAR SYSTEMS 519

where the last equality follows from Laplace's Theorem. By the induction hypothesis,
2 3
1 a2 an2 2 Y
det 4 5= (ai aj )
1 an ann 2 2 j<i n

So,
n
Y Y Y
det A = (ai a1 ) (ai aj ) = (ai aj )
i=2 2 j<i n 1 j<i n
as desired.

As a straightforward corollary of this result we have the following simple characterization


of invertible Vandermonde matrices.

Corollary 737 A Vandermonde matrix A of order n is invertible if and only if its n pa-
rameters a1 , a2 , ..., an are distinct.

Armed with this result, let us go back to the linear system (15.40). If the parameters a1 ,
a2 , ..., an are distinct, the Vandermonde matrix A is invertible. So, by Cramer's Theorem
this linear system admits a unique solution x, which in turn Pnprovides the coe cients of the
n j 1
interpolating polynomial px : R ! R de ned by px (a) = j=1 ai xj and such that
px (ai ) = bi 8i = 1; :::; n
This polynomial is thus able to match parameters and known terms, in accordance with
(15.39).15

15.8 General linear systems


15.8.1 Kronecker-Capelli's Theorem
We now turn to a general linear system of m equations in n unknowns
8
>
> a11 x1 + a12 x2 + + a1n xn = b1
>
<
a21 x1 + a22 x2 + + a2n xn = b2
>
>
>
:
am1 x1 + am2 x2 + + amn xn = bm
where it is no longer required that n = m, i.e., the number of equations and unknowns may
di er. The system can be written in matrix form as
A x = b
m nn 1 m 1

where A 2 M (m; n), x 2 Rn , and b 2 Rm . The square system is the special case where
n = m.

Let T (x) = Ax be the operator T : Rn ! Rm associated to the system, which can be


then written as T (x) = b. We say that the system is:
15
There exist formulas for the inverse of a Vandermonde matrix (see, e.g., Horn and Johnson, 2013, p. 37)
that permit to compute the unique solutions of the linear system (15.40) and so the interpolating polynomials.
520 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

(i) unsolvable when it does not admit any solution, i.e., b 2


= Im T ;

(ii) solvable when it admits at least one solution, i.e., b 2 Im T .

Moreover, a solvable linear system is said to be:

(ii.a) determined (or uniquely solvable) when it admits only one solution, i.e., T 1 (b) is a
singleton;

(ii.b) undetermined when it admits in nitely many solutions, i.e., T 1 (b) has in nite cardi-
nality.16

These two cases exhaust all the possibilities: if a system admits two solutions, it certainly
has in nitely many ones. Indeed, if x and x0 are two di erent solutions { that is, Ax = Ax0 =
b { then all the linear combinations x+(1 ) x0 with 2 R are also solutions of the system
because
A x + (1 ) x0 = Ax + (1 ) Ax0 = b + (1 )b = b
Using this terminology, in the case n = m Cramer's Theorem says that a square linear
system is solvable for every vector b if and only if it is determined for every such vector. In
this section we modify the analysis of the last section in two di erent directions:

(i) we consider general systems, without requiring that m = n;

(ii) we study the existence and uniqueness of solutions for a given vector b (so, for a speci c
system at hand), rather than for every such vector.

To this end, let us consider the so-called augmented (or complete) matrix of the system

Ajb
m (n+1)

obtained by writing near A the vector b of the known terms. The next famous result gives
a necessary and su cient condition for a linear system to have a solution.

Theorem 738 (Kronecker-Capelli) Let A 2 M (m; n) and b 2 Rm . The linear system


Ax = b is solvable if and only if the matrix A has the same rank as the augmented matrix
Ajb, that is,
(A) = (Ajb) (15.42)

Proof Let T : Rn ! Rm be the linear operator associated to the system, which can therefore
be written as T (x) = b. The system is solvable if and only if b 2 Im T . Since Im T is the
vector subspace of Rm generated by the columns of A, the system is solvable if and only if b
is a linear combination of such columns. That is, if and only if the matrices A and Ajb have
the same number of linearly independent columns (so, the same rank).
16
Since the set T 1 (b) is convex, it is a singleton or it has in nite cardinality (in particular, it has the
power of the continuum), tertium non datur. We will introduce convexity in the next chapter.
15.8. GENERAL LINEAR SYSTEMS 521

Example 739 Consider 8


< x1 + 2x2 + 3x3 = 3
>
6x1 + 4x2 + 2x3 = 7
>
:
5x1 + 2x2 x3 = 4
For both matrices
2 3 2 3
1 2 3 1 2 3 3
4
A= 6 4 2 5 and 4
Ajb = 6 4 2 7 5
5 2 1 5 2 1 4

the third row is the di erence between the second and rst rows. These three rows are thus
not linearly independent: (A) = (Ajb) = 2. So, the system is solvable. N

Example 740 A homogeneous system is always solvable because the zero vector is always
a solution of the system. This is con rmed by the Kronecker-Capelli's Theorem because the
ranks of A and of Aj0 are always equal. N

Note the Kronecker-Capelli's Theorem considers a given pair (A; b), while Cramer's Theo-
rem considers, as given, only a square matrix A. This re ects the new direction (ii) mentioned
above and, for this reason, the two theorems are only partly comparable in the case of square
matrices A. Indeed, Cramer's Theorem considers only the case (A) = n, in which condition
(15.42) is automatically satis ed for every b 2 Rn (why?). For this case, it is more powerful
than Kronecker-Capelli's Theorem: the existence holds for every vector b and, moreover,
we have also the uniqueness. But, di erently from Cramer's Theorem, Kronecker-Capelli's
Theorem is able to handle also the case (A) < n by giving, for a given vector b, a necessary
and su cient condition for the system to be solvable.

15.8.2 Uniqueness
We now turn our attention to the uniqueness of the solutions of a system Ax = b, whose
existence is guaranteed by Kronecker-Capelli's Theorem. The next result shows that for
uniqueness, too, it is necessary to consider the rank of the matrix A (recall that, thanks to
condition (15.19), we have (A) n).

Proposition 741 Let Ax = b be a solvable linear system, with A 2 M (m; n) and b 2 Rm .


Then:

(i) if (A) = n, then the system is determined;

(ii) if (A) < n, then the system is undetermined.

The proof is based on the following result, of independent interest.

Proposition 742 Let T : Rn ! Rm be a linear operator and suppose T (x) = b. The vectors
x 2 Rn for which T (x) = b are those of the form x + z with z 2 ker T , and only them. That
is,
T 1 (b) = fx + z : z 2 ker T g (15.43)
522 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Proof Being T (z) = 0, one has T (x + z) = T (x) + T (z) = b + 0 = b. Now, let x be


another vector for which T (x ) = b. Subtracting member to member the two equalities
T (x ) = b and T (x) = b, we get T (x ) T (x) = 0, that is, T (x x) = 0 and therefore
x x 2 ker T . We conclude that x = x + z with z 2 ker T .

The \only if" part of Lemma 680 { i.e., that linear and injective operators have trivial
kernels { is a special case of this result. Indeed, suppose that the linear operator T is
injective, so that T 1 (0) = f0g. If b = 0, we can set x = 0 and (15.43) then implies
f0g = T 1 (0) = f0 + z : z 2 ker T g = ker T . So, ker T = f0g.

For systems the last result takes the following form:

Corollary 743 If x is a solution of the system Ax = b, then all solutions are of the form

x+z

with z such that Az = 0 (i.e., z solves the homogeneous system Ax = 0).

Therefore, once we nd a solution of the system Ax = b, all the other solutions can be
found by adding to it the solutions of the homogeneous system Ax = 0. Besides its theoretical
interest, this is relevant also operationally (especially when it is signi cantly simpler to solve
the homogeneous system than the original one).17
That said, Corollary 743 allows to prove Proposition 741.

Proof of Proposition 741 By hypothesis, the system has at least one solution x. Moreover,
since (A) = (T ), by the Rank-Nullity Theorem (A) + (T ) = n. If (A) = n, we have
(T ) = 0, that is, ker T = f0g. From Corollary 743 it follows that x is the unique solution.
If, instead, (A) < n we have (T ) > 0 and therefore ker T is a non-trivial vector subspace
of Rm , with in nitely many elements. By Corollary 743, adding such elements to the solution
x we nd the in nitely many solutions of the system.

15.8.3 Summing up
Summing up, now we are able to state a general result on the resolution of linear systems
that combines the Kronecker-Capelli's Theorem and Proposition 741.

Theorem 744 Let A 2 M (m; n) and b 2 Rm . The linear system Ax = b is

(i) unsolvable if and only if (A) < (Ajb);

(ii) solvable if and only if (A) = (Ajb). In this case, it is

(ii.a) determined if and only if (A) = (Ajb) = n;


(ii.b) undetermined if and only if (A) = (Ajb) < n.
17
As readers will see in more advanced courses, the representation of all solutions as the sum of a particular
solution and the solution of the associated homogeneous system holds also for the solutions of systems of
linear di erential equations, as well as of linear di erential equations of order n.
15.9. SOLVING SYSTEMS: CRAMER'S METHOD 523

The comparison of the ranks (A) and (Ajb) with the number n of the unknowns
allows, therefore, to establish the existence and the possible uniqueness of the solutions of
the system. If the system is square, we have (A) = n if and only if (A) = (Ajb) = n
for every b 2 Rm .18 Cramer's Theorem, which was only partly comparable with Kronecker-
Capelli's Theorem, becomes a special case of the more general Theorem 744.
Example 745 Consider a homogeneous linear system Ax = 0. Since, as already observed,
the condition (A) = (Aj0) is always satis ed, the system has a unique solution (that is,
the zero vector) if and only if (A) = n, and it is undetermined if and only if (A) < n. N
O.R. It is often said that a linear system Ax = b with A 2 M (m; n)
(i) has a unique solution if m = n, i.e., there are as many equations as unknowns;
(ii) is undetermined if m < n, i.e., there are less equations than unknowns;19
(iii) is unsolvable if m > n, i.e., there are more equations than unknowns.
The idea is wrong because it might well happen that some equations are redundant:
some of them are a multiple of another or a linear combination of others (in such cases, they
would be automatically satis ed once the others are satis ed). In view of Theorem 744,
however, the claims (i) and (ii) become true provided that by m we mean the number of
non-redundant equations, that is, the rank of A: indeed, the rank counts the equations that
cannot be expressed as linear combinations of others. H

15.9 Solving systems: Cramer's method


We close with a \quadrature" procedure that, by permitting the use of Cramer's Rule, is
useful in calculations. Consider a generic solvable linear system
A x=b
m n

i.e., such that (A) = (Ajb). Set (A) = k.


1. If k < m, there are m k rows that can be written as linear combinations of the other
k. Given that each row of A identi es an equation of the system, there are m k
equations that, being linear combinations of the other ones, are \ ctitious": they are
satis ed when the other k are satis ed. We can simply delete them, by reducing in
this way the system to one with k linearly independent equations.
2. If k < n, there are n k columns that can be written as linear combination of the other
k (so, are \ ctitious"). The corresponding n k \unknowns" are not really unknowns
(they are \ ctitious unknowns") but can assume completely arbitrary values: for each
choice of such values, the system reduces to one with k unknowns (and k equations)
and, therefore, there is only one solution for the k \true unknowns". We can simply
assign arbitrary values to the n k \ ctitious unknowns", by reducing in this way the
system to one with k unknowns.
18
Why? (we have already made a similar observation).
19
Sometimes we say that there are more degrees of freedom (unknowns) than constraints (equations). The
opposite holds in (iii).
524 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

As usual, we can assume that the k rows and the k columns that determine the rank of
A are the rst ones. Let A0 be a non-singular submatrix k k of A,20 and write
2 3
A0 B
6 k k k (n k) 7
A =4 5
m n C D
(m k) k (m k) (n k)

We can then eliminate the last m k rows and give arbitrary values, say z 2 Rn k to the
last n k unknowns, obtaining in this way the system

A0 x0 = b0 Bz (15.44)

in which x0 2 Rk is the vector that contains the only k \true" unknowns and b0 2 Rk is the
vector of the rst k known terms.
The square system (15.44) satis es the hypothesis of Cramer's Theorem for every z 2
R k , so it can be solved with the Cramer's rule. If we call x
n ^0 (z) the unique solution for
each given z 2 R n k , the solutions of the original system Ax = b are

^0 (z) ; z
x 8z 2 Rn k

Example 746 Consider again the system


8
< x1 + 2x2 + 3x3 = 3
>
6x1 + 4x2 + 2x3 = 7
>
:
5x1 + 2x2 x3 = 4

of Example 739, which we showed to be solvable because (A) = (Ajb) = 2.


Since the last equation is redundant (recall that it is di erence between the second and
rst equations), one has

1 2 3 3
A0 = ; B = ; C = 5 2 ; D = [ 1] ; b0 =
2 2 6 4 2 1 2 1 2 1 1 2 1 7

so that, setting b0z = b0 Bz, the square system (15.44) becomes A0 x = b0z , that is,
(
x1 + 2x2 = 3 3z
6x1 + 4x2 = 7 2z

In other words, the procedure consisted in deleting the redundant equation and in assigning
arbitrary value z to the unknown x3 .
Since det A0 6= 0, by Cramer's Rule the in nitely many solutions are described as

2 8z 1 11 + 16z 11
x1 = = + z; x2 = = 2z; x3 = z
8 4 8 8
20
Often there is more than one possible A0 , so there is some freedom in choosing which equations to delete
and which unknowns are \ ctitious".
15.9. SOLVING SYSTEMS: CRAMER'S METHOD 525

for every z 2 R. We can verify it:

1 11 1 + 11
First equation : 1 +z +2 2z +3 z = +0 z =3
4 8 4
1 11 6 + 22
Second equation : 6 +z +4 2z +2 z = +0 z =7
4 8 4

Alternatively, we could have noted that the second equation is the sum of the rst and third
ones and then delete the second equation rather than the third one. In this way the system
would reduce to
x1 + 2x2 + 3x3 = 3
5x1 + 2x2 x3 = 4
We can now assign arbitrary value to the rst unknown, say x1 = z~, rather than to the third
one.21 This yields the system
2x2 + 3x3 = 3 z~
2x2 x3 = 4 5~ z
that is, A00 x = b00z~ , with matrix
2 3
A00 =
2 1

and vectors x = (x2 ; x3 )T and b00z~ = (3 z~; 4 z )T . Since det A00 6= 0, Cramer's Rule
5~
expresses the in nitely many solutions as

15 16~
z 1
x1 = z~; x2 = ; x3 = + z~ z 2 Rn
8~
8 4
In the rst way we get x1 = 1=4 + z, while in the second one x1 = z~. Therefore z~ = 1=4 + z.
With such value the solutions just found,

1
x1 = z~ = +z
4
1
15 16~z 15 16 4 +z 15 4 16z 11
x2 = = = = 2z
8 8 8 8
and
1 1 1
x3 =
+ z~ = + +z =z
4 4 4
become the old ones. The two sets of solutions are the same, just written using two di erent
parameters. We invite the reader to delete the rst equation and redo the calculations. N

Example 747 Consider the homogeneous system


8
>
> 2x1 x2 + 2x3 + 2x4 = 0
<
x1 x2 2x3 4x4 = 0
>
>
:
x1 2x2 2x3 10x4 = 0
21
The tilde on z helps to distinguish this case from the previous one.
526 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

If we consider x4 as a known term, so that x0 = (x1 ; x2 ; x3 ) and z = x4 , we can write the


system in the \square" form (15.44) as A0 x0 = Bz with
2 3 2 3
2 1 2 2
A0 = 4 1 1 2 5 and B = 4 4 5
3 3 3 1
1 2 2 10
The square matrix A0 is invertible, with
2 1 2
3
3 1 3
1
A0 =4 0 1 1 5
1 1 1
6 2 6
Since 2 32 3 2 3
1 2 10
3 1 3 2x4 3 x4
1
A0 ( Bz) = 4 0 1 1 5 4 4x4 5 = 4 6x4 5
3 3 3 1 1 1 1 2
6 2 6 10x4 3 x4
in view of Cramer's Theorem we conclude that vectors x of R4 of the form
10 2
x= t; 6t; t; t
3 3
solve the system for every t 2 R. This con rms what found in Section 3.7. N
The solution procedure for systems explained above, based on Cramer's rule, is theo-
retically elegant. However, from the computational viewpoint there is a better procedure
that we do not discuss, known as Gauss method and based on the Gaussian elimination
procedure.

15.10 Coda media


In this coda we show how Riesz's Theorem and its variations permit a principled approach
to weighted averages, a most important notion in applications.

15.10.1 The notion


Assume that a rm has n branches { for example, in n di erent cities. We can collect in a
vector x = (x1 ; x2 ; :::; xn ) 2 Rn the pro ts of the branches: x1 denotes the pro t of the rst
branch, x2 denotes the pro t of the second branch, and so on. A negative xi is interpreted
as a loss (which, indeed, can be regarded as a negative pro t).
The board of directors is interested in the rm's pro t, an amount of money f (x1 ; :::; xn )
that depends on the branches' Pn pro ts via a function f . The simplest example of such a
function is the sum f (x) = i=1 xi that, however, does not consider costs and gains that
might arise from the centralized management of the rm. Be that as it may, the amount
f (x1 ; :::; xn ) is all that matters in the vector of pro ts x for the board of directors. In general,
applications often feature a function of interest f : An ! R that associates to each vector
x 2 An a scalar f (x), interpreted as a quantity of interest determined by the vector x.22 In
turn, this function permits to de ne a notion of average.23
22
Throughout An = A A Rn , where A is an interval of the real line.
23
This notion of average was proposed in 1929 by Oscar Chisini.
15.10. CODA MEDIA 527

De nition 748 The average or mean ( in the sense of Chisini) of a vector x = (x1 ; x2 ; :::; xn ) 2
An with respect to a function f : An ! R is a scalar x 2 A such that

f (x; x; :::; x) = f (x1 ; x2 ; :::; xn ) (15.45)

In words, the average x of a vector x is a scalar that can replace each component of x in the
function of interest f without altering its value. In terms of interpretation, the components of
x represent di erent amounts of some homogeneous entity { for instance, di erent amounts
of money (like the pro ts discussed above) { and the average x summarizes the di erent
amounts in a single one that, if replaced to the di erent components of x, gives the same
value of interest; i.e., f (x) = f (x). If all the n di erent branches of the rm earned pro t
x, the rm's pro t would be the same. Pro t x can be thus viewed as a typical pro t of the
branches with respect to the function of interest f for the board.
Di erent functions f , motivated by di erent aims, may result in di erent averages x of
the same vector x. Unless they are a strictly monotone transformation one of the other, as
next we show.

Proposition 749 Given a function f : An ! R, let g : Im f ! R be a strictly monotone


transformation. A scalar x is the average of a vector x 2 An with respect to f if and only if
it is the average of x with respect to g f .

Proof It is enough to note that

f (c; c; :::; c) = f (x1 ; x2 ; :::; xn ) () (g f ) (c; c; :::; c) = (g f ) (x1 ; x2 ; :::; xn )

for all c 2 A and all x = (x1 ; x2 ; :::; xn ) 2 An .

15.10.2 Examples
Next we present a few classic examples of averages.

Arithmetic average As previously remarked, a simple function f : An ! R is the sum


n
X
f (x) = xi
i=1
Pn
Formula (15.45) becomes nx = i=1 xi , so the average of a vector x = (x1 ; x2 ; :::; xn ) 2 An
with respect to this sum function is the arithmetic average
n
1X
x= xi
n
i=1

In our motivating example, the sum is the function of interest for the board of directors of
a rm whose pro ts are just the sum of the pro ts of its branches (so, the rm's centralized
activities are not pro table per se).
Di erent branches, however, may end up making the same proPt. If i is the number of
branches making pro t xi , the sum of branches' pro ts becomes ni=1 i xi . So, let us take
528 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

P
f : An ! RPgiven by f (x) = ni=1 i xi , with each i > 0. Now formula (15.45)
a function P
becomes x ni=1 i = ni=1 i xi and we get the weighted arithmetic average
n
X i
x= Pn xi (15.46)
i=1 i=1 i

P
The weights i = ni=1 i are easily seen to add up to 1. In the example, each such weight
gives the proportion of the branches making pro ts xi , so the weighted arithmetic average x is
able to account for the multiplicity or frequency of the pro ts' levels made by the branches.24

Geometric average Another simple function f : An ! R, with A [0; 1), is the product
n
Y
f (x) = xi
i=1

n
Y
Formula (15.45) becomes xn = xi , so the average of a vector x = (x1 ; x2 ; :::; xn ) 2 An
i=1
with respect to this product function is the geometric average

n
!1
Y n

x= xi
i=1

For instance, consider an investor who, in each period of time, can invest an initial monetary
capital c 0 and receive at the end of next period an amount (1 + r) c, with r 0 (see
Example 295). Assume that the investor keeps investing for n periods and that in period i
the interest rate is ri , possibly di erent across periods. At the end, per euro invested the
Yn n
Y
investor earns (1 + ri ) euros or, equivalently, Ri where Ri = 1 + ri is the gross return
i=1 i=1
in period i.
n
Y
The function of interest is thus the product Ri that says how much the investor earned
i=1
per euro invested. The relevant average gross return R is then the geometric average

n
!1
Y n

R= Ri
i=1

In particular, the average interest rate is r = R 1.


If the same rate occurs in di erent period, say rate ri occurs i times, then we have
Y n n
Y
the product Ri i . So, let us consider a function f : An ! R given by f (x) = xi i ,
i=1 i=1

24
We say that = ( 1 ; :::; n ) 6= 0 is a vector of frequencies if its elements are positive, i.e., if > 0. If
they add up to 1, they are called weights. So, weights are normalized frequencies.
15.10. CODA MEDIA 529

Pn n
Y
with > 0. Formula (15.45) becomes x i=1 i = xi i and we get the weighted geometric
i=1
average
n
! Pn 1
Y i=1 i
x= xi i
i=1

Note that we also get this average if we take as function of interest the logarithmic transfor-
Yn
P
mation ni=1 i log xi of xi i (cf. Proposition 749), provided that A (0; 1).
i=1

Harmonic average P A further function f : An ! R that we may consider is the sum


n
of
Pnreciprocals f (x) = i=1 1=xi , with A (0; 1). Now formula (15.45) becomes n=x =
i=1 1=xi , thus the average of a vector x = (x1 ; x2 ; :::; xn ) 2 An with strictly positive
components is here the harmonic average
n
x = Pn 1
i=1 xi
Pn Pn
More generally, for the function f (x) =
P i=1 i =xi , with > 0, we have i=1 i =x =
n
i=1 i =xi and we get the weighted harmonic average
Pn
i
x = ni=1
P i
(15.47)
i=1 xi

For instance, consider a racing car driver who in a lap reaches di erent speeds x1 , x2 ,..., xn
{ in kilometers per hour { in di erent parts of the circuit, which he keeps for 1 , 2 , ...,
n kilometers, respectively. The lap time, which is the quantity of interest for the driver, is
then
n
X i
xi
i=1

The average speed that keeps unchanged such time is the harmonic average (15.47).

Quasi-arithmetic average In all these examples a vector of frequencies characterizes


the function of interest f : An ! R. Di erent vectors correspond to di erent functions of
interest within each class (di erent sum functions, di erent product functions, and so on).25
In particular, a function f : An ! R de nes, up to a strictly increasing transformation
(Proposition 749), a:

(i) weighted arithmetic average if there exists > 0 such that


n
X
f (x) = i xi
i=1

25
It is possible to let vary also the vector of frequencies but this would require notions that readers will
learn in more advanced courses.
530 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

(ii) weighted geometric average if there exists > 0 such that


n
Y
f (x) = xi i
i=1

(iii) weighted harmonic average if there exists > 0 such that


n
X i
f (x) =
xi
i=1

In the uniform case = (1=n; :::; 1=n) we drop the adjective \weighted" and we have the
classic arithmetic, geometric, and harmonic averages.
Is there a common structure across these important examples? To address this question,
given a vector of frequencies 2 Rn++ and a strictly monotone and continuous function
: A ! R, de ne a function f : An ! R by
n
X
f (x) = i (xi )
i=1
P P
Formula (15.45) becomes (x) ni=1 i = ni=1 i (xi ), so the average of a vector x 2 An
with respect to the function f is the quasi-arithmetic average
n
!
X i
1 Pn
x= (xi ) (15.48)
i=1 i=1 i

For instance, while the board of directors cares about the pro ts that the branches
generate, shareholders are more interested in the dividends that each of them is able to
generate. Assume that these dividends depend on a branch's pro ts according Pto a strictly
monotone and continuous function : A ! R, so the sum of dividends is ni=1 i (xi ).
This sum is the function of interest for shareholders and (15.48) is the average pro t relevant
for them, which might well di er from the average pro t (15.46) relevant for the board of
directors. So, before asking for an \average" it should be clari ed what we are interested in:
di erent notions of average serve di erent purposes.
Remarkably, the earlier examples of averages are all special cases of quasi-arithmetic
averages:

(i) if (x) = x, we get back to the weighted arithmetic average;


(ii) if (x) = log x and A (0; 1), we get back to the weighted geometric average;
(iii) if (x) = 1=x and A (0; 1), we get back to the weighted harmonic average.

Quasi-arithmetic averages thus answer our previous question. They form a general class
of averages that covers most cases of interest. For instance, a further example of a quasi-
arithmetic average is the power case (x) = xk , with k 6= 0 and A [0; 1), which leads to
the power average
n
!1
X i
k

P k
x= n xi
i=1 i=1 i
15.10. CODA MEDIA 531

It is an important class of averages that generalizes weighted arithmetic averages.

O.R. The invariance property established in Proposition 749 suggests that there might be
an optimization angle on averages. Indeed, let be di erentiable on an open interval A,
with 0 > 0. It is easily checked that the quasi-arithmetic averageP
(15.48) of a vector x 2 An
is a critical point of the function : A ! R de ned by (y) = ni=1 P i ( (xi ) (y))2 .26
n
In the special case (x) = x, the arithmetic average is the minimizer of i=1 (xi y)2 . This
optimization twist on averages can be relevant computationally. H

15.10.3 Average functions


Given f : An ! R, de ne an auxiliary function ' : A ! R by

' (c) = f (c; c; :::; c) 8c 2 A

Proposition 750 If f is strongly monotone and continuous, then each vector x = (x1 ; x2 ; :::; xn ) 2
An has a unique average x 2 A with respect to f , given by
1
x=' (f (x1 ; x2 ; :::; xn )) (15.49)

Moreover, x is internal:
min xi x max xi
i=1;:::;n i=1;:::;n

The proof relies on two noteworthy lemmas.

Lemma 751 If f is strongly monotone and continuous, then ' 1 f is well de ned.

Proof Since f is strongly monotone and continuous, ' is continuous and either strictly
increasing or strictly decreasing. Being ' either strictly increasing or strictly decreasing, it
is enough to show that Im ' = Im f . Clearly, by the de nition of ' we have Im ' Im f .
As for the opposite inclusion, let x 2 An . De ne = mini=1;:::;n xi 2 A as well as =
maxi=1;:::;n xi 2 A. Note that 1 x 1. Since f is strongly monotone and 1; 1 2 An ,
it follows that either ' ( ) = f ( 1) f (x) f ( 1) = ' ( ) or ' ( ) = f ( 1) f (x)
f ( 1) = ' ( ). In both cases, since ' is continuous on [ ; ] A, by the Intermediate Value
Theorem there exists 2 A such that ' ( ) = f (x). It follows that f (x) 2 Im '. Since x
was arbitrarily chosen, we conclude that Im ' Im f .

Lemma 752 If f is strongly monotone, then ' 1 f is strongly increasing.

Proof If f is strongly increasing, then ' and its inverse are strictly increasing (Proposition
222). In this case, the result is easily established. If f is strongly decreasing, then '
and its inverse are strictly decreasing (Proposition 222). For all x; y 2 An , if x y then
f (x) 1
f (y) and so ' (f (x)) 1
' (f (y)), while if x y then f (x) < f (y) and so
' 1 (f (x)) > ' 1 (f (y)). We conclude that ' 1 f is strongly increasing.

Proof of Proposition 750 By Lemma 751, ' 1 f is well de ned. Consider x 2 An .


De ne x = ' 1 (f (x)). It follows that f (x; x; :::; x) = ' (x) = f (x), proving that x is the
26
Critical point will be introduced in Section 28.1.
532 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

average of x. Vice versa, if c is also an average of x, we have that ' (x) = f (x) = ' (c).
Since ' is either strictly increasing or strictly decreasing, this implies that c = x and x is
therefore unique. As to internality, assume by contradiction that x < mini=1;:::;n xi , so that
x (x; :::; x). By Lemma 752, ' 1 f is strongly increasing. So, we reach the contradiction
x = ' 1 (f (x; :::; x)) < ' 1 (f (x)) = x. We conclude that mini=1;:::;n xi x. A similar
argument shows that x maxi=1;:::;n xi .

Formula (15.49) makes explicit the average, which in (15.45) was de ned only implicitly.
So, the strong monotonicity and continuity of the function of interest makes its average
explicit, unique, and internal (the reader can revisit the previous examples and identify the
relevant functions '). This motivates the following de nition.

De nition 753 Given a strongly monotone and continuous function f : An ! R, its aver-
age function m : An ! R is de ned by
1
m (x) = ' (f (x1 ; x2 ; :::; xn )) 8x 2 An

By the last lemma, the average function is well de ned and internal:

min xi m (x) max xi 8x 2 An


i=1;:::;n i=1;:::;n

In particular, the average function m is a:

(i) weighted arithmetic average if and only if there exists > 0 such that
n
X i
m (x) = Pn xi
i=1 i=1 i

(ii) weighted geometric average if and only if there exists > 0 such that

n
! Pn 1
Y i=1 i
m (x) = xi i
i=1

(iii) weighted harmonic average if and only if there exists > 0 such that
n
X i
m (x) = Pn i
i=1 i=1 xi

More generally, the average function m is quasi-arithmetic if and only if there exists a
vector 2 Rn++ and a strictly monotone and continuous function : A ! R such that
n
!
X i
1 Pn
m (x) = i (xi )
i=1 i=1 i

for all x 2 An .
15.10. CODA MEDIA 533

15.10.4 Arithmetic average functions


Let us proceed from rst principles by postulating some properties that we deem natural
that a function of interest f should satisfy and then try to see where these properties lead
us.
For example, in our initial example suppose that two rms { overseen by the same board
of directors { have pro ts' vectors x and y that are equivalent for the board's function of
interest f , so f (x) = f (y). Assume that if we merge each of the two rms with a third rm
that has a pro ts' vector z, the resulting pro ts' vectors are x + z and y + z (so, the third
rm has no \synergies" with either rm). In this case, we can assume that the board, so its
function of interest f , regards as equivalent the two mergings, that is,

f (x + z) = f (y + z)

We could argue and justify in a similar fashion a property of homogeneity of f . The function
of interest is then quasi-linear, a class of function that next we de ne.

De nition 754 A function f : Rn ! R is quasi-linear if, for all x; y 2 Rn , we have

f (x) = f (y) =) f ( x + z) = f ( y + z) 8z 2 Rn ; 8 ; 2R

This property turns out to characterize functions


Pn of interest that correspond to weighted
n
arithmetic averages. Here n 1 = x 2 R+ : i=1 xi = 1 denotes the standard simplex of
Rn , which is the collection of all vectors of weights.27

Proposition 755 A function f : Rn ! R is strongly increasing (decreasing), continuous,


and quasi-linear if and only if there exist a strictly increasing (decreasing) continuous func-
tion ' : R ! R and a unique vector of weights 2 n 1 such that m (x) = x and
n
!
X
f (x) = (' m) (x) = ' i xi
i=1

for all x 2 Rn .
Pn
Couple of observations on this nice result. First,
Pn if we set
Pn i = i = i=1 i for some
vector > 0, we can equivalently write m (x) = i=1 ( i = i=1 i ) xi , as we did before.
Second, quasi-linearity is easily seen to be an ordinal property,28 a feature that in the
representation corresponds
P to the strict monotonicity of '. In particular, ' is the identity,
so that f (x) = ni=1 i xi , when f is such that f (c; :::; c) = c for all c 2 R, a normalization
property.29

Proof \If". Suppose that there exists a strictly increasing (decreasing) function ' : R ! R
and a vector 2 n 1 such that f (x) = ' ( x) for all x 2 Rn . The function is easily seen
27
Simplexes will be studied in Chapter 17 (see Example 774).
28
Ordinal and cardinal properties are discussed in Section 17.3.3.
29
Later in the book we aptly call \normalized" the functions having this property (Section 19.3).
534 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

to be strongly increasing (decreasing). Moreover, since ' is strictly increasing (decreasing),


for any x; y 2 Rn we have
f (x) = f (y) =) ' ( x) = ' ( y) =) x= y
=) ( x + z) = ( y + z) =) f ( x + z) = f ( y + z)
for all z 2 Rn and all ; 2 R. It is also immediate to see that f is continuous, being the
composition of two continuous functions.
\Only if". De ne ' : R ! R by ' (c) = f (c; :::; c) for all c 2 R. Since f is strongly
increasing (decreasing) and continuous, the function ' and its inverse are strictly increasing
(decreasing). So, de ne m : Rn ! R by m = ' 1 f . Clearly, m (c; :::; c) = c and so m is
normalized. Let x; y 2 Rn and ; 2 R. Since f (x) = f (m (x) 1) and f (y) = f (m (y) 1),
we have f ( x + y) = f ( m (x) 1 + y) and f ( m (x) 1 + y) = f ( m (x) 1 + m (y) 1).
So,
1 1
m ( x + y) = ' (f ( x + y)) = ' (f ( m (x) 1 + y))
1 1
= ' (f ( m (x) 1 + m (y) 1)) = ' (f (( m (x) + m (y)) 1))
= m (x) + m (y)
We conclude that the function m is linear. It is also positive { i.e., x 0 implies m (x) 0
{ because by Lemma 752 the function ' 1 f is strongly increasing. Then, by the Monotone
Riesz's Theorem there exists a unique
P positive vector 2 Rn+ such that m (x) = x for all
n n
x 2 R . It remains to prove that i=1 i = 1. Since m (1) = 1, we have
n
! n n
X X X
1 = m (1; :::; 1) = m ei = m ei = i
i=1 i=1 i=1

as desired.

A further interesting property that f may satisfy is symmetry. In our pro t example,
symmetry says that the board does not care about which branch realized which pro t, but
only about the size of the overall pro t. For instance, if n = 2 and x = (1000; 4000) and
y = (4000; 1000), under symmetry f (x) = f (y) because the only di erence in the two vectors
is which branch earned a given pro t.
To state formally symmetry we need permutations, that is, bijections : N ! N where
N = f1; 2; :::; ng.30 Given x; y 2 Rn , write x y if there exists a permutation such that
xi = y (i) for all i = 1; 2; :::; n. In other words, y can be obtained from x by permuting
indexes.
Example 756 We have x = (1000; 4000) y = (4000; 1000). Indeed, let : f1; 2g ! f1; 2g
be the permutation given by (1) = 2 and (2) = 1, in which indexes are interchanged.
Then (y (1) ; y (2) ) = (y2 ; y1 ) = (1000; 4000) = x. N
We say that f is symmetric if
x y =) f (x) = f (y)
In other words, a symmetric f assigns the same value to all vectors that can be obtained
from another one via a permutation.
30
See Appendix B.2.
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 535

Proposition 757 A function f : Rn ! R is strongly increasing (decreasing), continuous,


quasi-linear, and symmetric if and only if there exists a strictly increasing (decreasing) con-
tinuous function ' : R ! R such that
n
!
1X
f (x) = ' xi 8x 2 Rn
n
i=1

Remarkably, this result provides a foundation for the classic arithmetic average: it is the
only average function on Rn that corresponds to a function f which is strongly increasing,
quasi-linear, continuous, and symmetric. As long as these properties are compelling in our
application, we can summarize vectors via their arithmetic averages.

Proof In view of the last proposition, f is strongly increasing (decreasing), continuous, and
quasi-linear if and only if there exist a strictly increasing (decreasing) continuous function
' : R ! R and a unique vector 2 n 1 such that m (x) = x for all x 2 Rn and f = ' m.
Clearly, f is symmetric if and only if m is symmetric. Thus, it remains to prove that m is
symmetric if and only if i = 1=n for each i = 1; 2; :::; n. \If". Suppose that i = 1=n for
each i = 1; 2; :::; n. Let x; y 2 Rn be such that x y. By de nition, there is a permutation
such that xi = y (i) for all i = 1; 2;
P:::; n. Clearly,
Pn nite sums are commutative, so they are
n
invariant under permutations, i.e., i=1 y (i) = i=1 yi . Then,
n n n
1X 1X 1X
m (x) = xi = y (i) = yi = m (y)
n n n
i=1 i=1 i=1

proving that m is symmetric. \Only if". Suppose m is symmetric. Note that ei ej for all
indexes i 6= j. Indeed, it is enough to consider the permutation : N ! N de ned by
8
< j if k = i
(k) = i if k = j
:
k else

By symmetry, we then have i = m ei = m ej = j for all indexes i 6= j. So, the weights


are equal. Since they add up to 1, this implies i = 1=n for each i, as desired.

Summing up, Riesz's Theorem and its variations permit a principled approach to weighted
averages that are justi ed via the properties of the functions of interest, which are the
fundamental objects of interest { averages being de ned through them.

15.11 Ultracoda: Hahn-Banach et similia


So far we considered linear functions de ned on the entire space Rn . However, they can be
de ned on any vector subspace V of Rn .

De nition 758 A function f : V ! R is said to be linear if

f ( x + y) = f (x) + f (y)

for every x; y 2 V and every ; 2 R.


536 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Since V is closed with respect to sums and multiplications by a scalar, we have that
x + y 2 V , and therefore this de nition is well posed and generalizes De nition 640.

Example 759 Consider in R3 the vector subspace

V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg

generated by the versors e1 and e2 . It is a \zero level" plane in R3 . The function f : V ! R


de ned by f (x) = x1 + x2 for every x 2 V is linear. N

Given a linear function f : V ! R de ned on a vector subspace of Rn , one may wonder


whether it can be extended to the entire space Rn while still preserving linearity or if, instead,
it remains \trapped" in the subspace V without having any possible extension to Rn . More
formally, we wonder whether there is a linear function f : Rn ! R such that fjV = f , that
is,
f (x) = f (x) 8x 2 V

This is quite an important problem, as we will see shortly, also for applications. Fortunately,
the following positive result holds.

Theorem 760 (Hahn-Banach) Let V be a vector subspace of Rn . Every linear function


f : V ! R can be linearly extended to Rn .

Proof Let dim V = k n and let x1 ; :::; xk be a basis for V . If k = n, there is nothing to
prove since V = Rn . Otherwise, by Theorem 92, there are n k vectors xk+1 ; :::; xn such
that the overall set x1 ; :::; xn is a basis for Rn . Let frk+1 ; :::; rn g be an arbitrary set of
n k real numbers. By Theorem 89, note that forPeach vector x in Rn there exists a unique
collection of scalars f i gni=1 R such that x = ni=1 i xi . De ne f : Rn ! R to be such
Pk Pn
that f (x) = i=1 i f (xi ) + i=k+1 i ri . Since for each vector x the collection f i gni=1 is
unique, we have that f is well-de ned and linear (why?). Note also that
(
i
f xi for i = 1; :::; k
f x =
ri for i = k + 1; :::; n

Since x1 ; :::; xk is a basis for V , for every x 2 V there are k scalars f i gki=1 such that
P
x = ki=1 i xi . Hence,

k
! k k k
!
X X X X
i
f (x) = f ix = if xi = if xi = f ix
i
= f (x)
i=1 i=1 i=1 i=1

We conclude that the linear function f : Rn ! R extends the linear function f : V ! R to


Rn .

As one can clearly infer from the proof, such an extension is far from unique: to every
set of scalars fri gni=k+1 , a di erent extension is associated.
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 537

Example 761 Consider the previous example, with the plane V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
of R3 and the linear function f : V ! R de ned by f (x) = x1 + x2 . By the Hahn-Banach's
Theorem, there is a linear function f : R3 ! R such that f (x) = f (x) for each x 2 V .
For example, f (x) = x1 + x2 + x3 for each x 2 R3 , but also f (x) = x1 + x2 + x3 is an
extension, for each 2 R. This con rms the multiplicity of the extensions. N

Although it may appear as a fairly innocuous result, the Hahn-Banach's Theorem is very
powerful. Let us see one of its remarkable consequences by extending Riesz's Theorem to
linear functions de ned on subspaces.31

Theorem 762 (Riesz) Let V be a vector subspace of Rn . A function f : V ! R is linear


if and only if there exists a vector 2 Rn such that

f (x) = x 8x 2 V (15.50)

Such a vector is unique if V = Rn .

Proof We prove the \only if" since the converse is obvious. Let f : V ! R be a linear
function. By the Hahn-Banach's Theorem, there is a linear function f : Rn ! R such
that f (x) = f (x) for each x 2 V . By the Riesz's Theorem, there is a 2 Rn such that
f (x) = x for each x 2 Rn . Therefore f (x) = f (x) = x for every x 2 V , as desired.

Conceptually, the main novelty relative to this version of Riesz's Theorem is the loss of
the uniqueness of vector . Indeed, the proof shows that such a vector is determined by the
extension f whose existence is guaranteed by Hahn-Banach's Theorem. Yet, such extensions
are far from being unique, thus implying the non-uniqueness of vector .

Example 763 Going back to the previous examples, we already noted that all linear func-
tions f : R3 ! R de ned by f (x) = x1 + x2 + x3 , with 2 R, extend f to R3 . By setting
= (1; 1; ), we have that f (x) = x for every 2 R, so that

f (x) = x 8x 2 V

for every 2 R. Hence, in this example there are in nitely many vectors for which the
representation (15.50) holds. N

The monotone version of Hahn-Banach's Theorem is of great importance.

Theorem 764 Let V be a vector subspace of Rn . Every (strictly) increasing linear function
f : V ! R can be extended on Rn so to be (strictly) increasing and linear.

Proof We prove the statement in the particular, yet important case, in which V \ Rn++ is
not empty and f is increasing.32 We start by introducing a piece of notation which is going
to be useful.
31
In Section 24.6 we will see an important nancial application of this result.
32
In nancial applications this assumption is often satis ed (see Section 24.6). The proof of the more
general case, as well as the strictly increasing version of the result, relies on mathematical facts that the
reader will encounter in more advanced courses.
538 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

Let W be a vector subspace of Rn such that V W . Consider a linear function f^ : W !


R such that f (x) = f (x) for all x 2 V . In other words, f^ extends f to the subspace W .
^
De ne dim f^ = dim W . Consider the set
n o
N = k 2 f1; :::; ng : k = dim f~ and f~ is a monotone increasing linear extension of f

This set is not empty since it contains dim V . For, f is an extension of itself which is linear
and monotone increasing by assumption. Consider now max N . Being N not empty, max N
is well-de ned. If max N = n, then the statement is proved. Indeed, in such a case we can
conclude that there exists a linear monotone increasing extension of f whose domain is a
vector subspace of Rn with dimension n, that is, the domain is Rn itself. By contradiction,
assume instead that n = dim N < n. It means that, in looking for an extension of f
which preserves linearity and monotonicity, one can at most nd a monotone increasing
linear extension f~ : W ! R where W is a vector subspace of dimension n < n. Let
x1 ; :::; xn be a basis of W . Since n < n, we can nd at least a vector xn+1 2 Rn such
that x1 ; :::; xn ; xn+1 is still linearly independent. Fix a vector x 2 V \ Rn++ . Clearly, we
have that x 2 V W and for each z 2 Rn there exists m 2 N such that mx z mx.
Let U = x 2 W : x xn+1 and L = y 2 W : xn+1 y . Since x 2 W , both sets are not
empty. Consider now f~ (U ) and f~ (L) which are both subsets of the real line. Since f~ is
monotone increasing, it is immediate to see that each element of f~ (U ) is greater or equal than
each element of f~ (L). By the separation property of the real line, we have that there exists
c 2 R such that a c b for every a 2 f~ (U ) and for every b 2 f~ (L). Observe also that
each vector x 2 span x1 ; :::; xn ; xn+1 can be written in a unique way as x = yx + x xn+1 ,
where yx 2 W and x 2 R (why?).
De ne now f^ : span x1 ; :::; xn ; xn+1 ! R to be such that f^ (x) = f~ (yx ) + x c for
every x 2 span x1 ; :::; xn ; xn+1 . We leave to the reader to verify that f^ is indeed lin-
ear and f^ extends f . Note instead that f^ is positive, that is, f^ (x) 0 for all x 2
span x1 ; :::; xn ; xn+1 \ Rn+ . Otherwise, there would exist x 2 span x1 ; :::; xn ; xn+1 such
that x 0 and f^ (x) < 0. If x = 0, then yx = yx + x xn+1 = x 0 and this would yield
that yx 0, that is, since f~ is monotone increasing, 0 > f^ (x) = f~ (yx ) 0, a contradiction.
If x 6= 0, then xn+1 yx = x and c < f~ ( yx = x ). In other words, we have that yx = x
belongs to L, thus f~ ( yx = x ) 2 f~ (L) and c f~ ( yx = x ) > c a contradiction. Since we
just showed that f must be positive, by Proposition 650, this implies that f^ is monotone
^
increasing as well. To sum up, we just constructed a function (namely f^) which extends
f to a vector subspace which has dimension n + 1 (namely span x1 ; :::; xn ; xn+1 ), thus
max N n + 1. At the same time, our working hypothesis was that n = max N , thus
reaching a contradiction.

In Example 761, the function f (x) = x1 + x2 is linear and strictly increasing on V =


f(x1 ; x2 ; 0) : x1 ; x2 2 Rg and any f (x) = x1 + x2 + x3 with > 0 is a strictly increasing
linear extension for it on R3 . Note that there may be non-monotone linear extensions: it is
enough to consider f (x) with < 0.
The last theorem permits to extend the Riesz-Markov's Theorem to functions on vector
subspaces.
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 539

Theorem 765 (Monotone Riesz) Let V be a vector subspace of Rn . A function f : V !


R is linear and increasing if and only if there exists a positive vector 2 Rn+ such that

f (x) = x 8x 2 V

Such a vector is unique if V = Rn . In particular,

(i) > 0 if and only if f is strongly increasing;

(ii) 0 if and only if f is strictly increasing.

As to (i), note that the function f (x) = x1 + x2 is strongly positive, and so is f (x) =
x1 + x2 + x3 with > 0.

A nice dividend of the Hahn-Banach's Theorem is the following extension result for a ne
functions, which will be introduced momentarily in the next chapter and play a key role in
applications (cf. Chapter 42).

Theorem 766 Let C be a convex subset of Rn . If f : C ! R is a ne, then there exists an


a ne extension of f to the entire space Rn .

Proof. We begin with a Claim.

Claim Let C be a convex subset of Rn . If f : C ! R is a ne, then for each triple x; y; z 2 C


and weights ; ; 2 R such that + + = 1 and x + y + z 2 C

f ( x + y + z) = f (x) + f (y) + f (z) (15.51)

Proof of the Claim We start by proving that the statement is true when = 0. Let
x; y 2 C and ; 2 R be such that + = 1 as well as x + y 2 C. We have two cases
either ; 0 or at least one of the two is strictly negative. In the rst case, since + = 1,
we have that 1. Since f is a ne and = 1 , this implies that

f ( x + y) = f ( x + (1 ) y) = f (x) + (1 ) f (y) = f (x) + f (y) (15.52)

In the second case, without loss of generality, we can assume that < 0. Since + = 1,
we have that = 1 > 1. De ne w = x + (1 ) y = x + y 2 C. De ne = 1= and
note that 2 (0; 1). Observe that x = w + (1 ) y. Since f is a ne, we have that

1 1
f (x) = f ( w + (1 ) y) = f (w) + (1 ) f (y) = f ( x + (1 ) y) + 1 f (y)

by rearranging terms, we get that (15.52) holds. We next prove that (15.51) holds. Let us
now consider the more general case, that is, x; y; z 2 C and ; ; 2 R such that + + = 1
and x + y + z 2 C. We split the proof in three cases:

1. All three scalars are positive, i.e., ; ; 0. Since + + = 1, we have that


x + y + z is a standard convex combination. Since f is a ne, (15.51) holds.
540 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

2. Only two scalars are positive, say ; 0. De ne w = + x + + y and = + .


Since + + = 1, then > 0. Since C is convex and x; y 2 C, we have that w 2 C.
It is immediate to check that w + (1 ) z = x + y + z 2 C where 2 R. Since
(15.52) holds, we have that

f ( x + y + z) = f ( w + (1 ) z) = f (w) + (1 ) f (z)

= ( + )f x+ y + (1 ) f (z)
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)

proving the statement.

3. One scalar is positive, say ; < 0. De ne w = + x+ + y and =1 . It


follows that 1 = + < 0 and + ; + > 0 as well as + + + = 1. Since
C is convex and x; y 2 C, this implies that w 2 C. It is immediate to check that
z + (1 ) w = x + y + z 2 C where 2 R. Since (15.52) holds, we have that

f ( x + y + z) = f ( z + (1 ) w) = f (z) + (1 ) f (w)

= f (z) + ( + ) f x+ y
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)

proving the statement.

We can now start proving the main statement. We do so by further assuming that
0 2 C and f (0) = 0. We will show that f admits a linear extension to Rn . This will
prove the statement in this particular case (why?). If C = f0g, then any linear function
extends f and so any linear function is an a ne extension of f . Assume C 6= f0g. Since
f0g = 6 C Rn there exists a linearly independent collection x1 ; :::; xk C with 1
k n. Let k be the maximum number of linearly independent vectors of C. Note that
span x1 ; :::; xk C. Otherwise, we would have that there exists a vector x in C that
does not belong to span x1 ; :::; xk . Now, observe that if we consider a collection f g [
P
f i gki=1 R of k + 1 scalars, we can say that if x + ki=1 i xi = 0, then we have two cases:
P
either 6= 0 or = 0. In the former case, we could conclude that x = ki=1 ( i = ) xi 2
span x1 ; :::; xk , a contradiction with x 62 span x1 ; :::; xk . In the latter case, we could
Pk i 1 k
conclude that i=1 i x = 0. Since the vectors x ; :::; x are linearly independent, it
follows that i = 0 for all i 2 f1; :::; kg, proving that x ; :::; xk ; x are linearly independent,
1

a contradiction with the fact that x1 ; :::; xk contains the maximum number of linearly
P
independent vectors of C. De ne f : span x1 ; :::; xk ! R by f (x) = ki=1 i f xi , where
P
f i gki=1 is the unique collection of scalars such that x = ki=1 i xi . By construction, f is
linear (why?). Next, we show it extends f . Let x 2 C. There exists a unique collection of
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 541

Pk
scalars f i gki=1 such that x = i=1
i
ix . Divide these scalars in three sets

P = fi 2 f1; :::; kg : i > 0g ; N = fi 2 f1; :::; kg : i < 0g


Z = fi 2 f1; :::; kg : i = 0g

P P
De ne = i2P i and = i2N i. We have four cases:

1. = 0 = . Then, i = 0 for all i 2 f1; :::; kg and x = 0

k
X
f (x) = if xi = 0 = f (0) = f (x)
i=1

2. 6= 0 and = 0. Then, i = 0 for all i 2 N [ Z. De ne i = P i i > 0 for all


P P i2P P
i 2 C. It follows that x = i
i2
P P . Note that i2P i = 1 and i2P i x i2P i x =
i
i2P i x + (1 ) 0. We have that

k
!
X X X i
X
i
f (x) = if x = i P f xi = if xi
i=1 i2P i
i2P i2P i2P
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2P i2P
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2P i2P

3. = 0 and 6= 0. Then, i = 0 for all i 2 P [ Z. De ne i = P i i > 0 for all


P P i2N P
i 2 C. It follows that x = i
i2
P N . Note that i2N i = 1 and i2N i x i2N i x =
i
i2N i x + (1 ) 0. We have that

k
!
X X X i
X
f (x) = if xi = i P f xi = if xi
i=1 i2N i
i2N i2N i2N
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2N i2N
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2N i2N
542 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS

4. 6= 0 and 6= 0. De ne i and i as in points 2 and 3. We have that


k
X X
f (x) = if xi = if xi
i=1 i2P [N
X X
i
= if x + if xi
i2P i2N
! !
X X i
X X i
i
= i P f x + i P f xi
i2P i i2N i
i2P i2P i2N i2N
! !
X X X X
= if xi + if xi = f ix
i
+ f ix
i

i2P i2N i2P i2N


! !
X X
i i
= f ix + f ix + (1 ) f (0)
i2P i2N
! !
X X X
i i i
=f ix + ix + (1 )0 =f ix = f (x)
i2P i2N i2P [N

Thus, we have that f is a linear extension of f to span x1 ; :::; xk . By the Hahn-Banach's


Theorem, f can then be linearly extended to the entire space Rn , proving the statement for
the case 0 2 C and f (0) = 0.
Now assume that either 0 62 C or f (0) 6= 0. Let x 2 C. De ne D = fy 2 Rn : y = x xg.
As the reader can verify, D has three notable features: (a) D is convex, (b) for each y 2 D
there exists a unique vector xy 2 C such that y = xy x, (c) 0 2 D. De ne the function
f^ : D ! R to be such that f^ (y) = f (xy ) f (x) for every y 2 D. The reader can verify that
f^ is a ne and such that f^ (0) = 0. By the previous part of the proof, there exists a linear
extension to Rn of f^. Denote such an extension by f and de ne k = f (x) f (x) 2 R. It
follows that for every x 2 C

f (x) = f^ (x x) + f (x) = f (x x) + f (x) = f (x) + f (x) f (x) = f (x) + k

that is, f is extended to the entire space Rn by the a ne function f + k.


Chapter 16

Convexity and a nity

16.1 Convex sets


In economics it is often important to be able to combine the di erent alternatives among
which decision makers have to choose. For example, if x and y are bundles of goods or vectors
of inputs, we may want to consider also their mixtures x + (1 ) y, with 2 [0; 1]. If
x = (10; 0) and y = (0; 10) are two vectors of inputs, the former with ten units of iron and
zero of copper, the latter with zero units of iron and ten of copper, we may want to consider
also their combination
1 1
(0; 10) + (10; 0) = (5; 5)
2 2
that consists of ve units of both materials.
The sets that always allow such combinations are called convex. They play a key role in
economics.

De nition 767 A set C in Rn is said to be convex if, for every pair of points x; y 2 C,

x + (1 )y 2 C

for every 2 [0; 1].

The meaning of convexity is based on the notion of convex (linear ) combination

x + (1 )y

which, when varies in [0; 1], represents geometrically the points of the segment

f x + (1 )y : 2 [0; 1]g (16.1)

that joins x with y. A set C is convex if it contains the segment (16.1) that joins any two

543
544 CHAPTER 16. CONVEXITY AND AFFINITY

points x and y of C. Graphically, a convex set:

and a non convex set:

Example 768 (i) Intervals, bounded or not, are the convex sets of the real line (see Propo-
sition 30). Convex sets can, therefore, be seen as the generalization to Rn of the notion of
interval.
(ii) Neighborhoods are convex sets in Rn . To see why, take a neighborhood B" (x) =
fy 2 Rn : kx yk < "g. Let y 0 ; y 00 2 B" (x) and 2 [0; 1]. By the properties of the norm
(Proposition 108),

x y 0 + (1 ) y 00 = x + (1 )x y 0 + (1 ) y 00
= x y 0 + (1 ) x y 00
x y 0 + (1 ) x y 00 < "

Therefore, y 0 + (1 ) y 00 2 B" (x). This proves that B" (x) is a convex set. N
16.1. CONVEX SETS 545

Let us see a rst topological property of convex sets (for brevity, we omit its proof).

Proposition 769 The closure and the interior of a convex set are convex sets.

The converse does not hold: a non-convex set may happen to have a convex interior or
closure. In the real line, the set [2; 5] [ f7g is not convex but its interior (2; 5) is convex, the
set (0; 1) [ (1; 5) is not convex but its closure [0; 5] is convex. In the plane, take a square
and on a side remove a point that is not a vertex: the resulting set is not convex, yet both
its closure and its interior are convex.

Proposition 770 The intersection of any collection of convex sets is a convex set.

In contrast, a union of convex sets is not necessarily convex: the union (0; 1) [ (2; 5) is
not a convex set although both sets (0; 1) and (2; 5) are so.

Proof Let fCi gi2I be any


T collection of convex sets, where i runs over any, nite or in nite,
index set I. Let C = i2I Ci . The empty set is trivially convex, so if C = ; the result
holds. Suppose, therefore, that C 6= ;. Let x; y 2 C and 2 [0; 1]. We want to prove that
x + (1 ) y 2 C. Since x; y 2 Ci for each i, we haveTthat x + (1 ) y 2 Ci for each i
because each set Ci is convex. Hence, x + (1 ) y 2 i2I Ci , as desired.

Notation In the rest of the chapter C denotes a convex set in Rn .

The points of the segment (16.1) are convex combinations of the vectors x and y. In
k
general, given a collection xi i=1 of vectors, a linear combination

k
X
i
ix
i=1

k
is called a convex (linear ) combination of the vectors xi i=1 if the coe cients f i gki=1 are
P
weights, i.e., i 0 for each i and ki=1 i = 1. In words, weights are positive coe cients
that add up to 1. In the case n = 2, 1 + 2 = 1 implies 2 = 1 1 , hence convex
combinations of two vectors have the form x + (1 ) y, with 2 [0; 1], used in de ning
convex sets.
Via convex combinations we can de ne a basic class of convex sets.
k
De nition 771 Given a nite collection of vectors xi i=1 of Rn , the polytope that they
generate is the set
( k k
)
X X
i
P = ix : i = 1 and i 0 for every i
i=1 i=1

of all their convex combinations.

A polytope generated by a nite collection of vectors thus consists all possible convex
combinations that one can form with them. Clearly, polytopes are convex sets. In particular,
the polytope generated by two vectors x and y is the segment that joins them.
546 CHAPTER 16. CONVEXITY AND AFFINITY

In the plane, polytopes have simple geometric interpretations that takes us back to high
school. Given three vectors x, y and z of the plane (not aligned), the polytope

f 1x + 2y + 3z : 1; 2; 3 0 and 1 + 2 + 3 = 1g

is the triangle that has them as vertices:1

2
x

1 y

-1
z

-2
-3 -2 -1 0 1 2 3 4 5

In general, the polytope P generated by vectors of the plane is a polygon whose vertices are
among these vectors. The polygons that we studied in high school can be regarded as the
locus of all convex combinations of their vertices.

Example 772 The rhombus

1.5

0.5

-0.5

-1

-1.5

-2
-2 -1 0 1 2 3

1
A caveat: if, for instance, x lies on the segment that joins y and z (i.e., the vectors are linearly dependent),
the triangle generated by x, y and z reduces to that segment. In this case, the vertices are only y and z.
Similar remarks applies to general polygons, as Example 772 will momentarily show.
16.1. CONVEX SETS 547

is the polytope generated by the four vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g, which are its
vertices. Note that also the ve vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g generate
the same rhombus

1.5

0.5

-0.5

-1

-1.5

-2
-2 -1 0 1 2 3

because the added vector (1=2; 1=2) already belongs to the rhombus. All vertices of a polytope
are among the vectors that generate it, but the converse in general fails: some of these vectors
may not be, like (1=2; 1=2), vertices. Later in the chapter the notion of extreme point will
clarify this issue. N

Proposition 773 A set is convex if and only if it is closed under all convex combinations
of its own elements.

In other words, a set is convex if and only if contains all the polytopes { all the polygons,
in the plane { generated by its elements. Though they are de ned in terms of segments,
convex sets actually contain all polytopes. In symbols, C is convex if and only if
n
X
i
ix 2C
i=1
n
for every nite collection xi i=1
of vectors of C and every collection f i gni=1 of weights.

Proof The \if" is obvious because by considering the convex combinations with n = 2 we
get De nition 767. We prove the \only if". Let C be convex. We claim that C then contains
any convex combination with n of its elements. We proceed by induction on n. Initial step:
the claim is true for n = 2 since C contains, by the de nition of convexity, any convex
combination of any two of its elements. Induction step: let us assume (induction hypothesis)
that the claim is true for n 1, i.e., that C contains any convex combinations of n 1 of
its elements. We want to show that this implies the claim for n, i.e., C contains any convex
combinations of n of its elements.. We have:
n
X n
X1 n
X1
i i n i
ix = ix + nx = (1 n) xi + nx
n
1 n
i=1 i=1 i=1
548 CHAPTER 16. CONVEXITY AND AFFINITY

By the induction hypothesis, we have:


n
X1 i
xi 2 C
1 n
i=1

Hence, by the convexity of C we have


n
X n
X1
i i
ix = (1 n) xi + nx
n
2C
1 n
i=1 i=1

as desired. This completes the inductive step, so the induction argument. We conclude that
the claim is correct, i.e., C is closed under all convex combinations of its own elements.

Through convex combinations next we introduce an important class of convex sets.


Example 774 Given the versors e1 , e2 , ..., en of Rn , the set
( n n
)
X X
i
n 1 = ie : i = 1 and i 0 for every i
i=1 i=1
( n
)
X
= ( 1 ; :::; n) : i = 1 and i 0 for every i
i=1

of all their convex combinations is called standard simplex. For instance, the standard simplex
of the plane

1 = e1 + e2 : 2 [0; 1] = f (1; 0) + (1 ) (0; 1) : 2 [0; 1]g


= f( ; 1 ): 2 [0; 1]g
is the segment with endpoints the versors e1 and e2 . The standard simplex of R3 is:
1 2 3
2 = 1e + 2e + 3e : 1; 2; 3 0 and 1 + 2 + 3 =1
=f 1 (1; 0; 0) + 2 (0; 1; 0) + 3 (0; 0; 1) : 1; 2; 3 0 and 1 + 2 + 3 = 1g
= f( 1; 2; 3) : 1; 2; 3 0 and 1 + 2 + 3 = 1g
= f( 1; 2; 1 1 2) : 1; 2 0 and 1 + 2 1g
It is the equilateral triangle with vertices the versors e1 , e2 and e3 . Graphically:
16.2. THE SKELETON OF CONVEXITY 549

We can express standard simplices as follows:

1 = f(x; 1 x) : x 2 [0; 1]g R2 ; 2 = f(x; y; 1 x y) : x; y 0 and x + y 1g R3

and ( )
n
X
n 1 = x 2 Rn+ : xi = 1 Rn
i=1
Standard simplices are an important class of polytopes, as we will see later in the chapter.

16.2 The skeleton of convexity


16.2.1 Convex envelopes
To gain a deeper understanding of convexity, we introduce a key notion.2

De nition 775 We call convex envelope (or hull), written co A, of a subset A of Rn the
smallest convex set that contains A.

Next we show that convex envelopes are the counterpart for convex combinations of what
generated subspaces are for linear combinations (cf. Section 3.4).

Proposition 776 The convex envelope of a set is the intersection of all convex sets that
contain it.

Proof Given a set A of Rn , let fCi gi2I be the collection of all convex subsetsT containing
A, where I is a (Tnite or in nite) index set. We want to show that co A = i2I T Ci . By
Proposition 770, i2I Ci is a convex set. Since A Ci for each i, we have co A i2I Ci
since, by de nition, co A is the smallest convex subset containing A. On the other hand,
co A belongs to the collection fCi gi2I , being a T
T convex subset containing A. It follows that
C
i2I i co A and we therefore conclude that i2I Ci = co A.

The next result shows that convex envelopes can be represented through convex combi-
nations.

Theorem 777 Let A be a set in Rn . A vector x 2 Rn belongs to co A if and only if it is a


convex combination of vectors of A.

In other words, x 2 co A if and only if there exists a nite set xi of A and a nite
P i2I
set f i gi2I of weights such that x = i2I i xi .

Proof \If". Let x 2 Rn be convex combination of a nite set xi i2I of vectors of A. The
set co A is convex and, since xi i2I co A, Lemma 773 implies x 2 co A, as desired.
\Only if". Let C be the set of all the vectors that can be expressed as convex combinations
of vectors of A, i.e., x 2 C if there exist nite sets xi i2I A and weights f i gi2I such
2
The rest of the chapter is for coda readers.
550 CHAPTER 16. CONVEXITY AND AFFINITY

P
that x = ni=1 i xi . It is easy to see that C is a convex subset containing A. It follows that
co A C and hence each x 2 co A is a linear combination of vectors of A.

For instance, the polytope P generated by a set A = x1 ; :::; xk Rn is its convex


envelope, i.e.,
P = co A (16.2)
In particular, standard simplices are the convex envelope of the versors.

Proposition 778 The convex envelope of a compact set is compact.

Thus, convex envelopes preserve compactness (we omit the proof). When K is a compact
subset, co K is then compact. For instance, polytopes are compact because they are the
convex envelope of a nite (so, compact) collection of vectors of Rn . Formally:

Corollary 779 Polytopes are compact sets.

16.2.2 Extreme points


We have seen how, given any set, we can construct through the convex combinations of
its elements the smallest convex set that contains it, that is, its convex envelope. Now we
consider, in a sense, the opposite problem: given a convex set C, we ask what is the smallest
set of its points from which C can be reconstructed via their convex combinations. In other
words, we ask what is the minimal set A C such that co A = C.
If it exists, such set A gives us the essence of the set C, its \skeleton". From the point
of view of convexity, the knowledge of A would be equivalent to the knowledge of the entire
set C because C could be reconstructed from A in a \mechanical" way by taking convex
combinations of its elements.
To understand how to address this problem, we go back to the rhomb described in Ex-
ample 772. There we saw how this polygon is the convex envelope of its vertices A =
f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g. Indeed, as already remarked, a polygon is the convex enve-
lope of its vertices. On the other hand, we also observed how the same rhomb can be seen
as the convex envelope of the set:

A0 = f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g

In this set, besides the vertices there is also the vector (1=2; 1=2), which is useless for the
representation of the polygon because is itself a convex combination of the vertices.3 We
therefore have a redundancy in the set A0 , while this does not happen in the set A of the
vertices, whose elements are all essential for the representation of the rhomb.
Hence, for a polygon the set of the vertices is the natural candidate to be the minimal set
that allows to represent each point of the polygon as a convex combination of its elements.
This motivates the notion of extreme point, which generalizes that of vertex of a polygon to
any convex set.

De nition 780 A point x0 of a convex set C is said to be an extreme point of C if x0 =


tx + (1 t) y with t 2 (0; 1) and x; y 2 C implies y = x = x0 .
3 1 1
Indeed, (1=2; 1=2) = 2 (1; 0) + 2 (0; 1).
16.2. THE SKELETON OF CONVEXITY 551

Thus, a point x0 2 C is extreme if it is not a convex combination of other two vectors


of C. The set of the extreme points of C is denoted by ext C. In the case of polytopes, the
extreme points are called vertices. The next result gives a simple characterization of extreme
points: they are the points that can be eliminated without altering the convex nature of the
set considered. Indeed, if in the plane we remove a vertex in a polygon, we still have a convex
set.

Lemma 781 A point x0 of a convex set C is extreme if and only if the set C x0 is
convex.

Proof Let x0 2 ext C and x; y 2 C x0 . Since C is convex, tx + (1 t) y 2 C for


each t 2 [0; 1]. To prove that tx + (1 t) y 2 C x0 , it is therefore su cient to prove
0
that x 6= tx + (1 t) y. This is obvious if t 2 f0; 1g. On the other hand, if it held
x0 = tx + (1 t) y for some t 2 (0; 1), then De nition 780 implies u = x = x0 , which
contradicts x; y 2 C x0 . In conclusion, tx + (1 t) y 2 C x0 , and the set C x0
is therefore convex.
Vice versa, assume that x0 2 C is such that the set C x0 is convex. We prove that
0 0
x 2 ext C. Let x; y 2 C be such that x = tx + (1 t) y with t 2 (0; 1). Since C x0 is
convex, if x; y belong to C x0 , then tx + (1 t) y 2 C x0 for each t 2 [0; 1]. Hence,
0
x 6= tx + (1 t) y for each t 2 [0; 1]. It follows that x; y do not belong to C x0 , which
0 0
is equivalent to say that y = x = x . In conclusion, x 2 ext C.

The next result shows that extreme points must be boundary points. No interior point
of a convex set can be an extreme point.

Proposition 782 We have ext C @C.

Proof Let x be an interior point of C. We prove that x 2


= ext C. Since x is an interior point,
there exists a neighborhood B" (x) such that B" (x) C. Consider the points (1 "=n) x
and (1 + "=n) x. We have:
" " " "
1 x x = kxk and 1+ x x = kxk
n n n n
and hence (1 "=n) x; (1 + "=n) x 2 B" (x) for n su ciently large. On the other hand,

1 " 1 "
x= 1 x+ 1+ x
2 n 2 n
and so x 2
= ext C.

Open convex sets (like, for example, open unit balls) thus do not have extreme points.
We now present other examples in which we nd the extreme points of some convex sets.

Example 783 Consider the polytope co A generated by a nite collection A = x1 ; :::; xk


Rn . It is easy to see that ext co A is not empty, with ext co A A. That is, the vertices of
the polytope necessarily belong to the nite collection that generates it (cf. Example 772).
N
552 CHAPTER 16. CONVEXITY AND AFFINITY

Example 784 Consider the closed unit ball B1 (0) = fx 2 Rn : kxk 1g of Rn . We have:

ext B1 (0) = @B1 (0) = fx 2 Rn : kxk = 1g

In words, the set of the extreme points of the closed unit ball is given by the unit sphere,
i.e., by its skin. Though a quite intuitive result (just draw a circle), it is a bit delicate
to prove. Since @B1 (0) = fx 2 Rn : kxk = 1g, the last proposition implies the inclusion
ext B1 (0) fx 2 Rn : kxk = 1g. As to the converse inclusion, let x0 2 @B1 (0). Let x0 =
tx + (1 t) y 2 B1 (0) with x; y 2 B1 (0) and t 2 (0; 1). We want to show that x = y. We
have

ktx + (1 t) yk2 = t2 kxk2 + (1 t)2 kyk2 + 2t (1 t) x y


2 2 2
= t2 kxk + (1 t) kyk + 2t (1 t) kxk kyk cos ( )
2 2
= t + (1 t) + 2t (1 t) cos ( )

where the angle is the di erence of the angles and determined by the two vectors
(Section C.3). If x 6= y, we have cos ( ) < 1, so ktx + (1 t) yk2 < 1. This contradicts
x0 2 @B1 (0), therefore x = y. We conclude that x 2 ext B, as desired. N

We are now ready to address the opening question of this section. We rst need a
preliminary \minimality" lemma that shows that ext C is included in all subsets of C whose
convex envelope is C itself.

Lemma 785 If A C is such that co A = C, then ext C A.

In other words, the extreme points of the convex hull of a set A belong to A.

Proof Let x 2 ext C. We want to show that x 2 A. Since x 2Pco A, there is a collection
n
xi i=1 A and a collection fti gni=1 of weights such that x = ni=1 ti xi . Without loss of
generality, assume ti > 0 for every i. We have:
n
X
1 ti
x = t1 x + (1 t1 ) xi
1 t1
i=2
Pn i
Since C is convex, i=2 ti x = (1 t1 ) belongs to C. Then,
n
X ti
x = x1 = xi
1 t1
i=2

since x is an extreme point. Set i = ti = (1 t1 ) for i = 2; :::; n, so that


n
X
2 i
x= 2x + (1 2) xi
1 2
i=2

Since x is an extreme point, we now have


n
X
2 i
x=x = xi
1 2
i=2
16.3. AFFINE SETS 553

By proceeding in this way, we prove that x = xi for every i. Hence, x 2 A.

The next fundamental result shows that convex and compacts sets can be reconstructed
from its extreme points by taking all their convex combinations. We omit the proof.

Theorem 786 (Minkowski) Let K be a convex and compact subset of Rn . Then:

K = co (ext K) (16.3)

In view of the previous lemma, Minkowski's Theorem answers the opening question:
ext K is the minimal set in K for which (16.3) holds. Indeed, if A K is another set for
which K = co A, then ext K A by the lemma. Summing up:

all the points of a compact and convex set K can be expressed as convex combinations
of the extreme points;

the set of the extreme points of K is the minimal set in K for which this is true.

Minkowski's Theorem stands out as the deepest and most beautiful result of this chapter.
It shows that, in a sense, convex and compact sets in Rn are generalized polytopes (cf.
Example 783) with extreme points generalizing the role of vertices. In particular, polytopes
are the convex and compact sets of Rn that have a nite number of extreme points (which
are then their vertices).

16.3 A ne sets
De nition 787 A set A in Rn is said to be a ne if x + (1 ) y 2 A for all x; y 2 A and
all 2 R.

Geometrically, the set


f x + (1 )y : 2 Rg
represents the line passing through the points x and y. A set is thus a ne when contains all
the lines that pass through any two of its points.

Example 788 The solutions of a linear system form an a ne set. Formally, given a m n
matrix B and a vector b 2 Rm , the set A = fx 2 Rn : Bx = bg is a ne. Indeed, if x; y 2 A
and 2 R, then

B ( x + (1 ) y) = Bx + (1 ) By = b + (1 )b = b

and so x + (1 ) y 2 A, as desired. N

Vector subspaces are a ne, and a ne sets are convex. The converses are false: a bounded
interval is a simple example of a convex set which is not a ne; in the last example, the
solution set A is a ne but not a vector subspace unless b = 0 (so, unless the system is
homogeneous). A nity is thus an intermediate notion between convexity and linearity. It
shares with them a, readily proved, basic set stability property.
554 CHAPTER 16. CONVEXITY AND AFFINITY

Lemma 789 The intersection of any collection of a ne sets is an a ne set.

The next result clari es the nature of a ne sets by showing that they are \parallel" to
vector subspaces.

Proposition 790 A set A of Rn is a ne if and only if there is a vector subspace V of Rn


and a vector z 2 V such that

A = V + z = fx + z : x 2 V g

Proof \If". Let A = V + z, where V is a vector subspace. We want to show that A


is a ne. If x; y 2 A, then x = x1 + z and y = x2 + z for some x1 ; x2 2 V , and so
x + (1 ) y = x1 + (1 ) x2 + z 2 V + z = A.
\Only if". Take a point z 2 A and set V = A z. We want to show that V is a
vector subspace. Let x 2 V , that is, x = y z for some y 2 A. For all 2 R we have
x = y z = y + (1 ) z z. Since y; z 2 A, we have y + (1 ) z 2 A and so
x 2 A y = V . Now, let x1 ; x2 2 V , namely, x1 = y 1 z and x2 = y 2 z. Then,

y1 + y2
x1 + x2 = y 1 + y 2 2z = 2 z 2V
2

We conclude that V is a vector subspace. The nal part of the statement is easily proved.

As a dividend of this result, we can characterize vector subspaces as the a ne sets


containing the origin.

Corollary 791 An a ne set A in Rn is a vector subspace if and only if 0 2 A.

The next example illustrates these rst results on a nity.

Example 792 Given any two distinct vectors x and y of Rn , let A = f x + (1 ) y : 2 Rg


be the line that passes through them. Clearly, the set A is a ne. In accordance with the
last result, we can write
A=V +y
where V = f (x y) : 2 Rg is a vector subspace. N

The following proposition permits to establish a concrete representation of a ne sets by


showing that they all have the form of Example 788.

Proposition 793 A set A in Rn is a ne if and only if there is a m n matrix B and a


vector b 2 Rm such that
A = fx 2 Rn : Bx = bg

Proof The \if" is contained in Example 788. We omit the proof of the converse, which uses
the last proposition.4
4
For a proof, we refer readers to Rockafellar (1970) p. 6.
16.4. AFFINE INDEPENDENCE 555

A ne sets thus correspond to solution sets of linear systems. By Corollary 791, vector
subspaces have then the form fx 2 Rn : Bx = 0g, so correspond to solution sets of homoge-
neous linear systems.5
If we de ne the linear operator T : Rn ! Rm by T (x) = Bx, we have
1
T (b) = fx 2 Rn : Bx = bg

We can thus view the last result as saying that a ne sets are the level sets of linear operators.
This angle will be useful later in the chapter.

16.4 A ne independence
P i m
Pm
A linear combination m i
i=1 i x of vectors x i=1 is a ne if i=1 i = 1. An a ne combi-
nation of two vectors x and y has the form x + (1 ) y, so a set is a ne when it contains
the a ne combinations of any two of its elements. Next we show that a ne combinations
play for a ne sets the role that linear and convex combinations played for vector spaces and
convex sets, respectively.

Proposition 794 A set is a ne if and only if it closed with respect to all a ne combinations
of its own elements.

Proof \Only if". Let A be an a ne set. By Proposition 790, there is a vector subspace V
m
and a vector z 2 A such that A = z +V . Let xi i=1 A and let f i gm i=1 be scalars such that
Pm i
Pm Pm
i=1 i = 1. Since x z 2 V for each i, it follows that i=1 i x = z + i=1 i xi z 2
i

z+V.
\If". Assume that set A is closed under a ne combinations. Let z 2 A. We must prove
that A z is a vector space. Given x1 ; x2 2 A and 1 ; 2 2 R, we have
1 2 1 2
1 (x z) + 2 (x z) + z = 1x + 2x + (1 1 + 2) z 2A

Hence, 1 x1 z + 2 x2 z 2A z. Consequently, A z is a vector space.

There is a natural notion of a ne independence.

De nition 795 A nite set of vectors x1 ; :::; xm of Rn is said to be a nely independent


if
m
X Xm
i
i x = 0 and i = 0 =) 1 = = m=0
i=1 i=1

Unlike linear independence, where the coe cients are unconstrained, here they are re-
quired to add up to 0. So, it is a weaker notion of independence. Next we relate the two
notions.
m
Proposition 796 A set of m vectors xi i=1
is a nely independent if and only if the m 1
m 1
vectors xi xm i=1 are linearly independent.6
5
Corollary 743 is an early version of Proposition 790 in the context of linear systems.
6
As the proof clari es, the choice of xk is arbitrary: any other of the k vectors could have been chosen.
556 CHAPTER 16. CONVEXITY AND AFFINITY

This result implies, inter alia, that there are at most n + 1 a nely independent vectors
in Rn .
m Pm 1
Proof \Only if". Let xi i=1
be a nely independent. Set i=1 i xi xm = 0. Then,

m
!
X1 m
X1
m i
i x + ix =0
i=1 i=1

m 1
which implies 1 = = m 1 = 0, so the vectors xi xm are linearly independent.
i=1
m 1 P
\If".PSuppose that the vectors xi xm i=1 are linearly independent. Let m i
i=1 i x =
0 with m i=1 i = 0. Then,

m
!
X1 m
X1 m
X1 m
X1
m i
0= mx + ix = i xm + i
ix = i xi xm
i=1 i=1 i=1 i=1

m
Hence, 1 = 2 = = m = 0, so the vectors xi i=1
are a nely independent.

De nition 797 We call a ne envelope (or hull), written a B, of a subset B of Rn the


smallest a ne subspace containing B.

In view of Lemma 789, the a ne envelope of a set B is easily seen to be the intersection
of all a ne subspaces that contain B. It is also readily checked to be the collection of all
a ne combinations of elements of B.

Example 798 (i) The a ne envelope of the standard simplex 1 is the a ne set f( ; 1 ): 2
Geometrically, 1 is the segment that joins the versors e1 and e2 , while a 1 is the line
that passes through them.
(ii) The a ne envelope of the standard simplex 2 is the a ne set f( 1 ; 2 ; 1 1 2) : 1; 2
1 2 3
Geometrically, 2 is the equilateral triangle with vertices the versors e , e and e , while
a 2 is the plane that passes through them. N

A ne envelopes permit to introduce a natural notion of a ne basis.

De nition 799 Let A be an a ne set in Rn . A nite subset B of A is an a ne basis of A


if it is an a nely independent set such that a B = A.

The next result justi es the basis terminology.

Proposition 800 Let A be an a ne set in Rn . A nite subset B of A is an a ne basis


of A if and only if each x 2 A can be written in a unique way as an a ne combination of
vectors in A.

An element x of A thus admits a unique representation


m
X
i
x= ix
i=1
16.5. SIMPLICES 557

as an a ne combination of the elements of an a ne basis B = x1 ; x2 ; :::; xm . The unique


coe cients i are called barycentric coordinates of x for the basis B.

x1 ; x2 ; :::; xm
Proof We prove the \only if" and leave the converse to the reader. Let B = P
be a basis. Clearly, any x 2 a B is representable as an a ne combination m i
i=1 i x . Let
us shows that the coe cient i are uniquely determined. Suppose that
m
X m
X
i 0 i
x= ix = ix
i=1 i=1
Pm Pm 0
with i=1 i =1= i=1 i. This implies
0
1 1 x1 + 2
0
2 x2 + + m
0
m xm = 0

and
0 0 0
1 1 + 2 2 + + m m =0
0
Since the vectors are a nely independent, we have i = i for i = 1; :::; m. So, the repre-
sentation is unique.

16.5 Simplices
An important class of polytopes is de ned through a ne independence.

De nition 801 A simplex in Rn is a polytope generated by a nely independent vectors.

A simplex is thus a convex set P of the form co B, where B = x1 ; x2 ; :::; xm is an


a nely independent set, with m n + 1. It consists of the vectors that have positive
barycentric coordinates when B is regarded as a basis of a B.

Example 802 (i) Triangles are the simplices of the plane. For instance, the right triangle
558 CHAPTER 16. CONVEXITY AND AFFINITY

with catheti of length 1 is the simplex P = co B generated by the a nely independent


vectors B = 0; e1 ; e2 of R2 .
(ii) The simplex P = co B generated by the a nely independent vectors B = e1 ; :::; en ; 0
of Rn is ( )
Xn
P = x 2 Rn+ : xi 1
i=1

It generalizes to Rnthe previous right triangle. Even more in general, we can replace 0
with any vector y 2 Rn by considering the simplex P = co B generated by the a nely
independent vectors B = y + e1 ; :::; y + en ; y of Rn . To check the a ne independence of
these vectors, let
n+1
X Xn
i
i = 0 and i y + e + n+1 y = 0 (16.4)
i=1 i=1

We need to show that 1 = = n = n+1 = 0. We have


n
X n+1
X n
X n
X
i i i
i y+e + n+1 y =y i + ie = ie
i=1 i=1 i=1 i=1
Pn i
Pn+1 i=1 i e = 0, which in turn implies 1 =
Hence, (16.4) implies = n = 0 and so also
n+1 = 0 because i=1 i = 0.
(iii) A linearly independent set B is a nely independent, so generates a simplex P . In
this case, 0 2= P (why?). For instance, the standard simplex
( n
)
X
n
n 1 = x 2 R+ : xi = 1
i=1

of Rn (Example 774) is generated by the versors e1 ; :::; en , i.e., n 1 = co e1 ; :::; en . N

Minkowski's Theorem ensures that the elements of a convex and compact set can be rep-
resented as a convex combination of its extreme points but, in general, this representation is
not unique: for example, the origin 0 in the closed unit ball in Rn can be expressed in di erent
ways as a convex combination of the ball's extreme points (cf. Example 784). Remarkably,
simplices are an important class of convex compact sets for which the representation turns
out to be unique, as the next important result shows.

Theorem 803 (Choquet-Meyer) For a compact convex set K of Rn , the following con-
ditions are equivalent:

(i) K is a simplex;

(ii) ext K is a maximal a nely independent set in K;

(iii) ext K is an a nely independent set in K;

(iv) each element of K has a unique representation as a convex combination of its extreme
points.
16.5. SIMPLICES 559

A strong version of Minkowski's Theorem thus holds for simplices.7 The equivalence of
(ii) and (iii) shows that when the set ext K is an a nely independent, it is automatically
maximally so.

Proof (i) implies (ii). Let K = co B be a simplex, where B = x1 ; x2 ; :::; xm is an


a nely independent set. By Minkowski's Theorem, K = co (ext K). WePwant to show that
ext K = B. Let e 2 ext K. There exist weights f i gm m i
i=1 such that e = Pi=1 i x . Without
m
loss of generality, assume that 1 2 (0; P1). Then, e = 1 x1 + (1 1) i=2 i = (1 1) x
i
1 1 m i
and so e = x because both x and i=2 i = (1 1 ) x belong to K. This implies that
ext K B. To prove the equality, suppose per contra that there exists an element of B,
say xm , that does not belong Pto ext K. Again by Minkowski's P Theorem, there exist weights
f e ge2ext P such that xm = e2ext K e e, i.e., such that e2ext P e (xm e) = 0. Since the
vectors fe xm ge2ext K are linearly Pindependent (Proposition 796), this implies e = 0 for
each e 2 ext K, which contradicts e2ext K e = 1. We conclude that B = ext K.
It remains to prove that B, so ext K, is a maximal set of a nely independent vectors
of K. Suppose that there exists a vector x 2 K B such that the set B [P fxg is a nely
m m i
independent. Pm Since K = co B, there exist weights f i g i=1 such that x = i=1 i x , i.e.,
i i m
such that i=1 i x x = 0. Since the vectors x x i=1 are linearly P independent
(Proposition 796), this implies i = 0 for each i = 1; :::; m, which contradicts m i=1 i = 1.
We conclude that B is a maximal set of a nely independent vectors of K.
(ii) trivially implies (iii).
(iii) implies (iv). Suppose that ext P is an a nely independent set in PK. Let x 2 P .
By Minkowski's Theorem, there exist weights f e ge2ext K such that x = e2ext K e e. We
want to show that P these weights are
P unique. Indeed, suppose P there exist weights f e ge2ext K
such that x = x i = e. Then,
P e2ext K e P e2ext K e P e2ext K ( e e ) e = 0 as well as
e2ext K ( e e ) = 0 because e2ext K e = e2ext K e = 1. By a ne independence, we
have e e = 0, so e = e , for each e 2 ext K.
(iv) implies (i). Suppose that each element of K has a unique representation as a convex
combination of its extreme points. In viewP of Minkowski's Theorem, P it is enough to show
that ext K is a nely independent. So, let e2ext K e e = 0 with e2ext K e = 0. Given
any 0 6= x 2 KP (if K = f0g the result is trivially true), there exist unique weights f e ge2ext K
such that x = e2ext K e e. We have
X X X
x=x+0= e e + e e = ( e + e) e
e2ext K e2ext K e2ext K
P P P P
Since e2ext K ( e + e ) = e2ext K e + e2ext K e = e2ext K e = 1, we then have
e + e = e for each e 2 ext K, so that e = 0 for each e 2 ext K. We conclude that the
set ext K is a nely independent.

Inspection of the proof shows that for a simplex P = co x1 ; x2 ; :::; xm it holds ext P =
x1 ; x2 ; :::; xm
. Thus, each element x of a simplex P can be uniquely written as a convex
combination
Xm
x= i xi
i=1
7
It is named after the two mathematicians who developed the far-reaching consequences of this result, as
readers may learn in more advanced courses.
560 CHAPTER 16. CONVEXITY AND AFFINITY

of the extreme points of P , i.e., of its vertices. The unique weights i are the positive
barycentric coordinates of x in P .

Example 804 For the simplices


( n
) ( n
)
X X
n
P = x 2 R+ : xi 1 and n 1 = x 2 Rn+ : xi = 1
i=1 i=1

we have ext P = 0; e1 ; :::; en and ext n 1 = e1 ; :::; en . N

In sum, for a simplex the set of its extreme points forms a meaningful notion of convex
basis, conceptually analogous to the notion of basis of vector subspaces and of a ne sets.
Since no similar analog of the notion of basis exists for general convex sets, this remarkable
property is peculiar to simplices among convex sets.

16.6 Dimension
We begin with a non-trivial sharpening of the \if" part of Proposition 790 that shows that
the choice of z 2 A is immaterial and that there is a unique vector subspace parallel to an
a ne set.

Proposition 805 Let A be an a ne set in Rn . There is a unique vector subspace V such


that, for some z 2 A,
A=V +z
This unique V is given by A A = fx y : x; y 2 Ag and is such that, for all z 2 A,

A=V +z

Proof We rst prove uniqueness. Let V1 and V2 be two subspaces such that V1 + z = V2 + z
for some z 2 A. Let x1 2 V1 . Then, there exists x2 2 V2 such that x1 + z = x2 + z. In turn,
this implies x1 = x2 and so x1 2 V2 . Thus, V1 V2 . By interchanging the roles of V1 and
V2 , we also have V2 V1 . We conclude that V1 = V2 .
Next we prove that A A is a vector subspace. Clearly, 0 2 A A. By Corollary 791,
it is enough to prove that A A is an a ne set. Let x; y 2 A A and 2 R. There exist
x1 ; x2 ; y 1 ; y 2 2 A such that x = x1 x2 and y = y 1 y 2 . Hence,

x+(1 )y = x1 x2 +(1 ) y1 y 2 = x1 + (1 ) y1 x2 + (1 ) y2 2 A A
| {z } | {z }
2A 2A

We conclude that A A is an a ne set, so a vector subspace.


For each z 2 A, the set A z is a vector subspace (see the proof of Proposition 790).
Since (A z) + z = A, by the uniqueness previously established we have A z = A z 0 for
all z; z 0 2 A. So, A z = A A for all z 2 A. In turn, this is easily seen to imply that
A = (A A) + z for all z 2 A.

It is natural to de ne the dimension of a non-empty a ne set A, written dim A, as the


dimension of its unique parallel subspace A A. When needed, the dimension of the empty
set is by convention 1.
16.6. DIMENSION 561

Example 806 (i) Consider the a ne set A = f x + (1 )y : 2 Rg of Example 792. It


is easy to check that its parallel subspace is

A A = f (x y) : 2 Rg

Thus, dim A = dim (A A) = 1.


(ii) Given any three distinct vectors x, y and z of Rn , let

A=f 1x + 2y + (1 1 2) : 1; 2 2 Rg

be the plane that passes through them. It is easy to check that the set A is a ne, with

A A=f 1 (x z) + 2 (y z) : 1; 2 2 Rg

Thus, dim A = dim (A A) = 2.8 N

As for vector subspaces, also for a ne sets the notion of dimension and independence
are closely related.

Proposition 807 An a ne set A has dimension m if and only if m + 1 is the maximum


number of its elements that are a nely independent.

The proof relies on the following lemma.

Lemma 808 A collection x1 ; :::; xm is an a ne basis of an a ne set A if and only if the


m 1
collection xi xm i=1 is a basis of the vector subspace A A.

Proof In view of Proposition 796, it is enough to prove that a x1 ; :::; xm = A if and only
m 1 m 1
if span xi xm i=1 = A A. So, assume that span xi xm i=1 = A A. Let x 2 A.
We want to show that x 2 a x1 ; :::; xm . By Proposition 805, A = (A A) + xm and so
there exist scalars f i gm 1
i=1 such that

m
!
X1 m
X1 m
X1
i
x= i (x xm ) + xm = i
ix + 1 i x
m

i=1 i=1 i=1

Thus, x 2 a x1 ; :::; xm . As to the converse, assume that a x1 ; :::; xm = A. Let


x 2 AP A. Since A = (A A) + xm , itPholds x + xm 2 A and so there exists scalars
P f i gm
i=1 ,
m m m
with i=1 i = 1, such that x + x = i=1 i x . In turn, this implies that x = i=1 i xi
m i
P P Pm m 1
xm = m i=1 i x
i ( m m
i=1 i ) x = i=1 i x
i xm . Thus, x 2 span xi xm i=1 .

Proof \Only if". Let A be an a ne set with dimension m. By de nition, dim (A A) = m


and so, by Lemma 808, A has a basis with m + 1 elements, so has at most m + 1 a nely
independent vectors.
\If". Assume that m + 1 is the maximum number of elements of A that are a nely
independent. So, let B = x1 ; :::; xm+1 be m + 1 a nely independent vectors of A. Then,
a B A and dim (a B a B) = (m + 1) 1 = m. Thus, dim A m. It remains to show
8
Provided the vectors x z and y z are not collinear.
562 CHAPTER 16. CONVEXITY AND AFFINITY

that dim A = m. Suppose, by contradiction, that dim A = p > m. Then, there exists a basis
of A A with p elements and so, by Lemma 808, an a ne basis of A with p + 1 > m + 1
elements, a contradiction.

A ne sets of dimension 1 are called lines and those of dimension 2 planes (cf. Example
806). Next we introduce the a ne sets of dimension n 1. To this end, de ne an hyperplane
H in Rn as the set of points x 2 Rn that satisfy the condition a x = b for some 0 6= a 2 Rn
and b 2 R. That is,
H = fx 2 Rn : a x = bg
In view of Riesz's Theorem, hyperplanes are the level curves of linear functions, that is, they
have the form f 1 (b) where f : Rn ! R is a linear function.
Hyperplanes are easily seen to be a ne sets. They are actually the a ne sets of dimension
n 1.

Proposition 809 An a ne set in Rn has dimension n 1 if and only if it is an hyperplane.

Earlier in the chapter we learned that a ne sets are level curves of linear operators
T : Rn ! Rm (Proposition 793). This result shows that, in particular, those of dimension
n 1 correspond to the level sets of linear functions f : Rn ! R.
The proof relies on the following lemma.

Lemma 810 If H is an hyperplane and x0 2


= H, then a H; x0 = Rn .

Proof Let f : Rn ! R be a linear function and a scalar such that H = fx 2 Rn : f (x) = g.


Let f x0 = . Clearly, 6= . Take any point x 2 Rn with f (x) 6= . Consider the vector

x1 = x+ 1 x0
f (x) f (x)

It is easy to see that f x1 = . Hence, x1 2 H and consequently

f (x) f (x)
x= x1 + 1 x0

All vectors outside the hyperplane H = fx 2 Rn : f (x) = g thus lie in a H; x0 , i.e.,


H c a H; x0 . Since H a H; x0 (why?), we conclude that a H; x0 = Rn .
Consequently, if A H is a ne, then either A = H or A = Rn . For, if A H then
there is a point x 2 A with x0 2
0 = H. Hence, a H; x0 A and so A = Rn .

Proof of Proposition 809 Let H be an hyperplane. If A H is a ne, by the last lemma


either A = H or A = Rn . Hence, dimH n 1. Since dimH < n (why?), we conclude that
dimH = n 1. We omit the proof of the converse.9

Next we extend the scope of the notion of dimension.

De nition 811 The dimension of a convex set C of Rn , written dim C, is the dimension
of its a ne envelope a C.
9
For a proof, we refer readers to Rockafellar (1970) p. 5.
16.6. DIMENSION 563

In view of Example 806, dim 1 = 1 and dim 2 = 2. More in general, the dimension
of a simplex P = co x1 ; x2 ; :::; xm is m 1 because, by the Choquet-Meyer Theorem,
x1 ; x2 ; :::; xm is a maximal a ne independent set in P and so, by Proposition 807, dim P =
m 1.
In the important case when Pndim C = n we say that C has full dimension. For instance,
n
the simplex P = x 2 R+ : i=1 xi 1 has dimension n and so is full-dimensional (cf.
Example 802).
Earlier in the book (Section 5.3.2), we observed that the straight line
A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1
has an empty interior, as the following gure indicates:

4
x
2
3

2 2

0
-1 O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

The set A is a ne and has dimension 1, so smaller than the dimension n of the space
Rn . This simple example suggests that non-empty convex sets with empty interior have
dimension < n. Next we show that, indeed, this is the case, thus proving an important
topological consequence of dimension.
Proposition 812 A convex subset in Rn has a non-empty interior if and only if it has full
dimension.
Proof \If". Let C be a convex subset in Rn . We prove the contrapositive: if int C = ;, then
dim C < n. To this end, it su ces to prove there is an a ne space A C with dim A < n.
We rst prove that there are no n + 1 a nely independent vectors x1 ; x2 ; :::; xn+1 in C.
Suppose, per contra, that there exist such vectors. Then, = co x1 ; x2 ; :::; xn+1 C. In
view of Proposition 800, consider the barycentric coordinates x $ ( 1 ; 2 ; :::; n+1 ). To the
uniform barycentric coordinates (1= (n + 1) ; :::; 1= (n + 1)) corresponds the vector
1
x= x1 + + xn+1 2 C
n+1
Consequently, there is a neighborhood U of x in Rn small enough so that all its elements
have positive barycentric coordinates. Hence, U C and so C has non-empty interior,
a contradiction.
564 CHAPTER 16. CONVEXITY AND AFFINITY

So, there are at most n a nely independent vectors in C. Let m < n+1 be the maximum
number of a nely independent vectors in C and denote them by x1 ; x2 ; :::; xm . If x is any
vector of C, the linear system
1 2 m
1x + 2x + + mx + v = 0
1 + 2 + + m + = 0

has then a non-trivial solution. Since the vectors


P x1 ; x2 ; :::; xm are a nely independent, the
scalar must be non-zero. Hence, x = ( i = ) xi and so x is an a ne combination
of the vectors x1 ; x2 ; :::; xm . Consequently, C a x1 ; x2 ; :::; xm . In turn, this implies
1 2 m
a C a x ; x ; :::; x , thus proving that dim C < n.
\Only if". Let C be a convex subset in Rn with non-empty interior, i.e., int C 6= ;. By
Lemma 769, int C is an open convex set. Let y 2 int C. There exists B" (y) int C. For each
versor ei there exists 2 (0; 1) small enough so that ei + y 2 B" (y) for each i = 1; :::; n.
The simplex P = co B generated by the a nely independent vectors y + e1 ; :::; y + en ; y
(cf. Example 802) is full-dimensional and included in B" (y). Hence, C is full-dimensional
because a convex set is easily seen to be full-dimensional when it contains a full-dimensional
convex subset.
Chapter 17

Concave functions

17.1 Generalities

A convex set can represent, for example, a collection of bundles on which a utility function
is de ned, or a collection of inputs on which a production function is de ned. The convexity
of the sets allows us to combine bundles or inputs. It then becomes important to study
how the functions de ned on such sets, be they utility or production functions, behave with
respect to these combinations.
For this reason, concave and convex functions are extremely important in economics. We
have already introduced them in Section 6.4.5 for scalar functions de ned on intervals of R.
The following de nition holds for any function de ned on a convex set C of Rn .

De nition 813 A function f : C Rn ! R is said to be concave if

f ( x + (1 ) y) f (x) + (1 ) f (y) (17.1)

for every x; y 2 C and every 2 [0; 1], and it is said to be convex if

f ( x + (1 ) y) f (x) + (1 ) f (y) (17.2)

for every x; y 2 C and every 2 [0; 1].

The geometric interpretation is the same as the one seen in the scalar case: a function is
concave if the chord that joins any two points (x; f (x)) and (y; f (y)) of its graph lies below
the graph of the function, while it is convex if the opposite happens, that is, if this chord
lies above the graph of the function.

565
566 CHAPTER 17. CONCAVE FUNCTIONS

14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0

-2 x O y -2 x O y
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5

Concave function Convex function

Indeed, such a chord consists of the points

f (x; f (x)) + (1 ) (y; f (y)) : 2 [0; 1]g


= f( x + (1 ) y; f (x) + (1 ) f (y)) : 2 [0; 1]g

So, the following gure of a concave function should clarify its geometric interpretation:

Example 814 The absolute value function j j : R ! R is convex since

j x + (1 ) yj j xj + j(1 ) yj = jxj + (1 ) jyj


17.1. GENERALITIES 567

for every x; y 2 R and every 2 [0; 1]. More generally, the norm k k : Rn ! R is a convex
function. Indeed,

k x + (1 ) yk k xk + k(1 ) yk = kxk + (1 ) kyk (17.3)

for every x; y 2 Rn and every 2 [0; 1]. N

Note that a function f is convex if and only if f is concave: through this simple duality,
the properties of convex functions can be easily obtained from those of concave functions.
Accordingly, we will consider only the properties of concave functions, leaving to the reader
the simple deduction of the corresponding properties of convex functions.

N.B. The natural domains of concave (convex) functions are convex sets to ensure that the
images f ( x + (1 ) y) are well de ned for all 2 [0; 1]. For this reason, throughout the
chapter we assume, often without mentioning it, that concave (and convex) functions are
de ned on a convex set C of Rn . O

We say that a function f : C Rn ! R is strictly concave if

f ( x + (1 ) y) > f (x) + (1 ) f (y)

for every x; y 2 C, with x 6= y, and every 2 (0; 1). For this important class of concave
functions, the inequality (17.1) is thus required to be strict. This implies that the graph of
a strictly concave function has no linear parts. In a dual way, a function f : C Rn ! R is
strictly convex if
f ( x + (1 ) y) < f (x) + (1 ) f (y)

for every x; y 2 C, with x 6= y, and every 2 (0; 1). In particular, a function is strictly
convex if and only if f is strictly concave.

We give now some examples of concave and convex functions. To verify these properties
using their de nition is often not easy. For this reason we invite readers to resort to their
geometric intuition for these examples and to wait to see later in the book some su cient
conditions based on di erential calculus that greatly simplify the veri cation (Chapter 31).
p
Example 815 (i) The square root function f : [0; 1) ! R given by f (x) = x is strictly
concave.
(ii) The logarithmic function f : (0; 1) ! R given by f (x) = log x is strictly concave.
(iii) The quadratic function f : R ! R given by f (x) = x2 is strictly convex.
(iv) The cubic function f : R ! R given by f (x) = x3 is neither concave nor convex.
However, it is strictly concave on the interval ( 1; 0] and strictly convex on the interval
[0; 1).
(v) The function f : R ! R given by
(
x if x 1
f (x) =
1 if x > 1
568 CHAPTER 17. CONCAVE FUNCTIONS

is concave but not strictly concave, as its graph shows:

Example 816 (i) The function f : R2 ! R given by f (x) = x21 + x22 is strictly convex.
(ii) The Cobb-Douglas function (Example 187) is concave, as it will be seen in Corollary
880. N

Example 817 The Leontief function f : Rn ! R de ned by

f (x) = min xi
i=1;:::;n

is concave. Indeed, given any two vectors x; y 2 Rn , we have

min (xi + yi ) min xi + min yi


i=1;:::;n i=1;:::;n i=1;:::;n

because in minimizing separately x and y we have more degrees of freedom than in minimizing
them jointly, i.e., their sum. It then follows that, if x; y 2 Rn and 2 [0; 1], we have

f ( x + (1 ) y) = min ( xi + (1 ) yi )
i=1;:::;n

min xi + (1 ) min yi = f (x) + (1 ) f (y)


i=1;:::;n i=1;:::;n

It is, however, not strictly concave: for any x 2 Rn ,

1 1 1 1 1 1 1 1
f x+ 0 =f x = min xi = min xi = f (x) = f (x) + f (0)
2 2 2 i=1;:::;n 2 2 i=1;:::;n 2 2 2

In consumer theory, u (x) = mini=1;:::;n xi is the Leontief utility function (Example 233). N
17.1. GENERALITIES 569

Example 818 Given a convex function f : (0; 1) ! R, the function g : R2++ ! R de ned
by
x2
g (x1 ; x2 ) = x1 f
x1
is convex. Indeed, let x; y 2 R2++ and 2 [0; 1]. We have,
x2 + (1 ) y2
g ( x + (1 ) y) = ( x1 + (1 ) y1 ) f
x1 + (1 ) y1
!
x1 xx12 + (1 ) y1 yy21
= ( x1 + (1 ) y1 ) f
x1 + (1 ) y1
x1 x2 (1 ) y1 y2
= ( x1 + (1 ) y1 ) f +
x1 + (1 ) y1 x1 x1 + (1 ) y1 y1
x1 x2 (1 ) y1 y2
( x1 + (1 ) y1 ) f + f
x1 + (1 ) y1 x1 x1 + (1 ) y1 y1
x2 y2
= x1 f + (1 ) y1 f = g (x) + (1 ) g (y)
x1 y1
as desired. Note that the inequality step holds because f is convex and
x1 (1 ) y1
+ =1
x1 + (1 ) y1 x1 + (1 ) y1
If f is strictly convex, this inequality is actually strict and so the previous argument shows
that in this case g is strictly convex. For instance, if we consider the strictly convex function
f (x) = log x, we have
x2 x1
g (x1 ; x2 ) = x1 log = x1 log
x1 x2
So, the function g : R2++ ! R de ned by
x1
g (x1 ; x2 ) = x1 log
x2
is strictly convex, a noteworthy nding. N

Since inequalities (17.1) and (17.2) are weak, a function may be at the same time concave
and convex. The next de nition covers this important case.

De nition 819 A function f : C Rn ! R is said to be a ne if

f ( x + (1 ) y) = f (x) + (1 ) f (y)

for every x; y 2 C and every 2 [0; 1].

In other words, a function f : C Rn ! R is a ne if

f ( x + (1 ) y) = f (x) + (1 ) f (y)

for every x; y 2 C and every 2 [0; 1]. The notion of a ne function is closely related to
that of linear function.
570 CHAPTER 17. CONCAVE FUNCTIONS

Proposition 820 A function f : C Rn ! R is a ne if and only if there exist a linear


n
function l : R ! R and a scalar q 2 R such that

f (x) = l (x) + q 8x 2 C (17.4)

A ne functions are thus translations of linear functions. To x ideas, consider the


important case when 0 2 C (for instance, when C is the entire space Rn ). Then, the
translation is given by f (0) = q, so f is linear if and only if f (0) = 0. A nity can,
therefore, be seen as a weakening of linearity that permits a non-zero \intercept" q.
By Riesz's Theorem, we can recast expression (17.4) as
n
X
f (x) = x+q = i xi +q (17.5)
i=1

where 2 Rn and q 2 R. For example, when = (3; 4) and q = 2 we have the a ne function
f : R2 ! R given by f (x) = 3x1 + 4x2 + 2. In the scalar case, we get

f (x) = mx + q (17.6)

with m 2 R.1 A ne functions of a single variable have, therefore, a well-known form: they
are the straight lines with slope m and intercept q. In particular, this con rms that the
linear functions of a single variable are the straight lines passing through the origin. Indeed,
for them it holds f (0) = q = 0.

Proof In view of Theorem 766, it is enough to prove the result for C = Rn . \If". Let
x; y 2 Rn and 2 [0; 1]. We have

f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q + (1 )q
= (l (x) + q) + (1 ) (l (y) + q)

So, f (x) = l (x) + q is a ne.

\Only if". Let f : Rn ! R be a ne and set l (x) = f (x) f (0) for every x 2 Rn . Setting
q = f (0), we have to show that l is linear. We start by showing that

l ( x) = l (x) 8x 2 Rn ; 8 2 R (17.7)

For every 2 [0; 1] we have

l ( x) = f ( x) f (0) = f ( x + (1 ) 0) (1 ) f (0) f (0)


= f (x) + (1 ) f (0) (1 ) f (0) f (0) = f (x) f (0) = l (x)

Let now > 1. Setting y = x, by what has just been proved we have
y 1
l (x) = l = l (y)

1
We use in the scalar case the more common letter m in place of .
17.1. GENERALITIES 571

and so l ( x) = l (x). On the other hand,

1 1 1 1
0 = l (0) = l x x =f x x f (0)
2 2 2 2
1 1 1 1 1 1
= f (x) + f ( x) f (0) f (0) = l (x) + l ( x)
2 2 2 2 2 2
so that l ( x) = l (x). Hence, if < 0 then

l ( x) = l (( ) ( x)) = ( ) l ( x) = ( ) ( l (x)) = l (x)

All this proves that (17.7) holds. In view of Proposition 644, to complete the proof of the
linearity of l we have to show that

l (x + y) = l (x) + l (y) 8x; y 2 Rn (17.8)

We have
x+y x y x y
l (x + y) = 2l = 2l + =2 f + f (0)
2 2 2 2 2
1 1 1 1
=2 f (x) + f (y) f (0) f (0) = l (x) + l (y)
2 2 2 2
as desired.

The de nition of concavity requires the entire chord joining any two points of its graph
to lie below the graph of the function. Remarkably, for continuous functions it is enough
that there exists just a single point in this chord that lies below the graph of the function.
Under continuity, concavity thus takes a much less demanding appearance. This is proved
in the next theorem (cf. Hardy et al., 1934, p. 73).

Theorem 821 (Jessen-Riesz) A continuous function f : C ! R is concave if and only if,


for each x; y 2 C, there exists x;y 2 (0; 1) such that

f( x;y x + (1 x;y ) y) x;y f (x) + (1 x;y ) f (y)

The inequality ensures that the point

x;y f (x) + (1 x;y ) f (y) (17.9)

of the chord joining the points (x; f (x)) and (y; f (y)) of the graph of f lies below the graph
of f . The weight x;y is allowed to depend on x and y, so it can vary as we consider di erent
pairs of points x and y.2

Proof We prove the \if", the non-trivial direction. Suppose, by contradiction, that f is not
concave. So, there exist x; y 2 C and 2 (0; 1) such that

f ( x + (1 ) y) < f (x) + (1 ) f (y) (17.10)


2
The weight is required to belong to (0; 1) to ensure that point (17.9) is distinct from the endpoints
(x; f (x)) and (y; f (y)) of the chord.
572 CHAPTER 17. CONCAVE FUNCTIONS

De ne ' : [0; 1] ! R by ' ( ) = f ( x + (1 ) y). This function is easily seen to be


continuous, with ' ( ) < ' (1) + (1 ) ' (0). Fix 1 ; 2 2 [0; 1] and set zi = i x +
(1 i ) y 2 C for i = 1; 2. There exists z1 ;z2 2 (0; 1) such that

f( z1 ;z2 z1 + (1 z1 ;z2 ) z2 ) z1 ;z2 f (z1 ) + (1 z1 ;z2 ) f (z2 ) (17.11)

To ease notation, set = z1 ;z2 . It holds

'( 1 + (1 ) 2) = f (( 1 + (1 ) 2) x + (1 ( 1 + (1 ) 2 )) y)
= f (( 1 + (1 ) 2) x + ( + (1 ) ( 1 + (1 ) 2 )) y)
= f (( 1 + (1 ) 2) x + ( (1 1) + (1 ) (1 2 )) y)
= f ( z1 + (1 ) z2 ) f (z1 ) + (1 ) f (z2 ) = ' ( 1) + (1 )'( 2

Since 1 and 2 were arbitrarily chosen, we conclude that for each 1; 2 2 [0; 1] there is
1; 2 2 (0; 1) such that

'( 1; 2 1 + (1 1; 2 ) 2) 1; 2 '( 1) + (1 1; 2 )'( 2) (17.12)

De ne : [0; 1] ! R by

( ) = '( ) (' (1) ' (0)) ' (0)

Clearly, (0) = (1) = 0 and

( ) = ' ( ) (' (1) ' (0)) ' (0) < ' (1)+(1 ) ' (0) (' (1) ' (0)) ' (0) = 0

Moreover, it is easy to see that (17.12) continues to hold, i.e., for each 1; 2 2 [0; 1] there
exists 1 ; 2 2 (0; 1) such that

( 1; 2 1 + (1 1; 2 ) 2) 1; 2 ( 1) + (1 1; 2 ) ( 2)

Consider the sets

A=f : ( ) = 0g = [0; ]\( = 0) and B = f : ( ) = 0g = [ ; 1]\( = 0)

These sets are compact because is continuous and are non-empty because 0 2 A and 1 2 B.
Thus, there exist
a = max A and b = min B
with 0 a< <b 1. Since (a) = (b) = 0, we have

( a + (1 ) b) < 0 = (a) + (1 ) (b)


for all 2 (0; 1). On the other hand, there exists a;b 2 (0; 1) such that

( a;b a + (1 a;b ) b) a;b (a) + (1 a;b ) (b)

a contradiction. We conclude that f is concave.

A special case of this theorem, involving the chord midpoint ((x + y) =2; (f (x) + f (y)) =2),
is noteworthy. Here the weight x;y is kept xed and equal to 1=2.
17.2. PROPERTIES 573

Corollary 822 A continuous function f : C Rn ! R is concave if and only if it is


midpoint concave, that is,

1 1 1 1
f x+ y f (x) + f (y)
2 2 2 2

for all x; y 2 C.

An immediate, but important, consequence of the Theorem of Jessen-Riesz is a charac-


terization of a nity for continuous functions.

Corollary 823 A continuous function f : C ! R is a ne if and only if, for each x; y 2 C,


there exists x;y 2 (0; 1) such that

f( x;y x + (1 x;y ) y) = x;y f (x) + (1 x;y ) f (y)

In words, a continuous function is a ne if and only if in each chord joining any two
points of its graph there is at least a point that lies exactly on the graph of the function,
i.e., that touches it.

17.2 Properties
17.2.1 Concave functions and convex sets
There exists a simple characterization of concave functions f : C Rn ! R that uses convex
sets. Namely, consider the set

hypo f = f(x; y) 2 C R : f (x) yg Rn+1 (17.13)

called the hypograph of f , formed by the points (x; y) 2 Rn+1 that lie below the graph of the
function.3 Graphically, the hypograph of a function is:

6
y

1
O x

0
0 1 2 3 4 5 6

3
Recall that the graph is given by Gr f = f(x; y) 2 C R : f (x) = yg Rn+1
574 CHAPTER 17. CONCAVE FUNCTIONS

The next result shows that the concavity of f is equivalent to the convexity of its hypo-
graph.

Proposition 824 A function f : C Rn ! R is concave if and only if its hypograph is a


convex set in Rn+1 .

Proof Let f be concave, and let (x; y) ; (y; z) 2 hypo f . By de nition, y f (x) and
z f (y). It follows that

t + (1 )z f (x) + (1 ) f (y) f ( x + (1 ) y)

for every 2 [0; 1]. Therefore, ( x + (1 ) y; t + (1 ) z) 2 hypo f , which proves that


hypo f is convex.
For the converse, suppose that hypo f is convex. By de nition, for every x; y 2 C and
2 [0; 1],
( x + (1 ) y; f (x) + (1 ) f (y)) 2 hypo f

that is,
f (x) + (1 ) f (y) f ( x + (1 ) y)

as desired.

It is easy to check that the dual result for a convex function f features the convexity of
its epigraph epi f = f(x; y) 2 C R : f (x) yg, i.e., the collection of points (x; y) 2 Rn+1
that lie above its graph.
Earlier in the book (Section 6.3.1) we have de ned the level curves of a function f : C
Rn ! R as the preimages
1
f (k) = fx 2 C : f (x) = kg

for k 2 R. In a similar way, the sets

fx 2 C : f (x) kg

are called upper contour (or superlevel ) sets, denoted by (f k), while the sets

fx 2 C : f (x) kg

are called lower contour (or sublevel) sets, denoted by (f k). Clearly,

1
f (k) = (f k) \ (f k) (17.14)

and so sometimes we use the notation (f = k) in place of f 1 (k).


The next gure describes the upper contour sets of a function:
17.2. PROPERTIES 575

5
y
4

2 y=k

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Example 825 Consider the quadratic function f : R ! R given by f (x) = x2 . It holds,


for each t 0,
p p p p
(f t) = ( 1; t] [ [ t; 1) ; (f t) = [ t; t]

and, for each t < 0,


(f t) = R ; (f t) = ;

For instance,

(f 2) = ( 1; 2] [ [2; 1) ; (f 1) = [ 1; 1] ; (f 1) = R ; (f 1) = ;

In economics upper contour sets appear already in the rst lectures of a course in mi-
croeconomics principles. Indeed, for a utility function u : Rn+ ! R, the upper contour
set
(u k) = x 2 Rn+ : u (x) k

consists of all the bundles x whose utility u (x) is k. When n = 2, graphically (u k) is


the region of the plane lying above the indi erence curve u 1 (k). Usually in microeconomics
such regions are assumed to be convex. Indeed, it is the convexity of (u k) that one has
in mind when talking, improperly, of convex indi erence curves.4 As the next result shows,
this convexity holds when the utility function u is concave.

Proposition 826 If f : C Rn ! R is concave, then all its upper contour sets (f k) are
convex.
4
This notion will be made rigorous later in the book (cf. Section 34.3).
576 CHAPTER 17. CONCAVE FUNCTIONS

Proof Given k 2 R, let (f k) be non-empty (otherwise, the result is obvious because


empty sets are trivially convex). Let x; y 2 (f k) and 2 [0; 1]. By the concavity of f ,

f ( x + (1 ) y) f (x) + (1 ) f (y) k + (1 )k = k

and therefore x + (1 ) y 2 (f k).

We have thus shown that the usual form of the indi erence curves is implied by the
concavity of the utility functions. That is, more rigorously, we have shown that concave
functions have convex upper contour sets. The converse is not true! Take for example any
strictly increasing function f : R ! R: we have
1
(f k) = f (k) ; +1

for all k 2 R. All the upper contour sets are therefore intervals, so convex sets, although in
general strictly increasing functions might well be not concave.5

Example 827 The cubic function f : R ! R given by f (x) = x3 is strictly increasing but
1
not concave. For each k 2 R, we have (f k) = [k 3 ; +1) and so all the upper contour sets
of the cubic function are intervals. N

In sum, concavity is a su cient, but not necessary, condition for the convexity of the
upper contour sets: there exist non-concave functions, like the cubic function, with convex
upper contour sets. In particular, this means that the concavity of utility functions is a
su cient, but not necessary, condition for the \convexity" of the indi erence curves. At this
point it is natural to ask what is the class of utility functions, larger than that of the concave
ones, characterized by having \convex" indi erence curves. More formally, which class of
functions is characterized by having convex upper contour sets. Section 17.3 will answer this
question by introducing quasi-concavity.

The dual version of the last result holds for convex functions, in which the lower contour
sets (f k) are convex. If f is a ne, it then follows by (17.14) that the level sets (f = k)
are convex, being the intersection of convex sets. But, much more can be said for the level
sets of a ne functions besides their convexity.

Proposition 828 Let A be an a ne subset of Rn . If f : A ! R is a ne, then all its level


sets (f = k) are a ne.

The proof is an immediate consequence of the following lemma.

Lemma 829 Let A be an a ne subset of Rn . A function f : A ! R is a ne if and only if

f ( x + (1 ) y) = f (x) + (1 ) f (y)

for every x; y 2 A and every 2 R.


5 1
To x ideas, think of the cubic function f (x) = x3 , for which we have (f c) = [c 3 ; +1) for every
c 2 R.
17.2. PROPERTIES 577

Remarkably, is any scalar, it is not required to lie in [0; 1].

Proof Consider the \only if", the converse being trivial. If f is a ne, it can be written as
f (x) = l (x) + q for every x 2 Rn (Proposition 820). This implies that, for all 2 R and all
x; y 2 Rn ,

f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q
= l (x) + (1 ) l (y) + q + (1 ) q = f (x) + (1 ) f (y)

as desired.

17.2.2 Jensen's inequality and continuity

Although concavity is de ned via convex combinations involving only two elements, next we
show that it actually holds for all convex combinations.

Proposition 830 (Jensen's inequality) A function f : C Rn ! R is concave if and


only if
n
! n
X X
f x
i i i f (xi ) (17.15)
i=1 i=1

for every nite collection fxi gni=1 of vectors of C and every collection f i gni=1 of weights.

The inequality (17.15) is known as Jensen's inequality and is very important in applica-
tions.6 A dual version, with , holds for convex functions, while for a ne functions we have
a Jensen equality
n
! n
X X
f x
i i = i f (xi )
i=1 i=1

So, a ne functions preserve all a ne combinations, be they with two or more elements.

Proof \If". It is obvious because for n = 2 the Jensen inequality reduces to the de nition
of concavity.
\Only if". Let f be concave. We want to show that the Jensen inequality holds. We
proceed by induction on n. Initial step: the inequality (17.15) trivially holds for n = 2
because f is concave. Induction step: suppose that the Jensen inequality holds for n 1
Pn 1 Pn 1
(induction hypothesis), i.e., f i=1 i xi i=1 i f (xi ) for every convex combination
of n 1 elements of C. We want to show that it holds for convex combinations of n elements

6
The inequality is named after Johan Jensen, who introduced concave functions in 1906.
578 CHAPTER 17. CONCAVE FUNCTIONS

of C. If n = 1, inequality (17.15) holds trivially. Let therefore n < 1. We have:


n
! n
! !
X X1 1 n
n
X 1
f i xi =f i xi + n xn =f i xi + n xn
1 n
i=1 i=1 i=1
n
!
X1 i
= f (1 n) xi + n xn
1 n
i=1
n
!
X1 i
(1 n) f xi + n f (xn )
1 n i=1
n
X1 i
(1 n) f (xi ) + nf (xn )
1 n
i=1
n
X1 n
X
= if (xi ) + nf (xn ) = if (xi )
i=1 i=1

as desired. Here the rst inequality follows from the concavity of f and the second one from
the induction hypothesis.

It is easy to see that for strictly concave and convex functions we have strict versions of
the Jensen's inequality. This and much more is illustrated in the next example.

Example 831 (i) The function g : R2++ ! R de ned by


x1
g (x1 ; x2 ) = x1 log (17.16)
x2
is strictly convex, as we learned in Example 818. By the dual version of Jensen's inequality
for strictly convex functions, we have
n n n n
! Pn
X xi1 X X X
i=1 i xi1
i xi1 log = i g (xi1 ; xi2 ) > g ( i xi1 ; i xi2 ) = i xi1 log Pn
xi2 i=1 i xi2
i=1 i=1 i=1 i=1

By taking equal weights i = 1=n we get


n n
! Pn
1X xi1 1X 1
xi1
xi1 log > xi1 log n
1 Pi=1
n
n xi2 n n i=1 xi2
i=1 i=1

that is, !
n
X n
X Pn
xi1 xi1
xi1 log > xi1 log Pi=1
n
xi2 i=1 xi2
i=1 i=1
This remarkable inequality is called log-sum inequality and plays a key role in some important
applications.
(ii) Given a convex function f : (0; 1) ! R, the log-sum inequality takes the general
form ! Pn
Xn Xn
xi2 xi2
xi1 f xi1 f Pi=1 n
xi1 i=1 xi1
i=1 i=1
17.2. PROPERTIES 579

by applying Jensen's inequality to the convex function g : R2++ ! R de ned by

x2
g (x1 ; x2 ) = x1 f (17.17)
x1

as the reader can check (recall Example 818). The inequality becomes strict when f is
strictly convex. The function g is called perspective function. Note that here we have the
ratios xi2 =xi1 while in the log case we have their reciprocals { indeed, (17.16) is the special
case of (17.17) that corresponds to f (t) = log t. N

Concavity is preserved by addition as well as by \positive" scalar multiplication (the


proof is left to the reader):

Proposition 832 Let f; g : C Rn ! R be two concave functions. The function f + g is


concave, while f is concave if 0.

Concave functions are very well behaved, in particular they have remarkable continuity
properties.

Theorem 833 A concave function is continuous at every interior point of its domain.

Geometrically, it should be easy to see that the presence of a discontinuity at an interior


point of the domain forces some chord to cut the graph of the function, thereby preventing it
to be concave (or convex). If the discontinuity is on the boundary, this does not necessarily
happen.

Example 834 Let f : [ 1; 1] ! R be de ned by:


(
2 x2 if x 2 (0; 1)
f (x) =
0 if x 2 f0; 1g

Then f is concave on the entire domain [ 1; 1] and is discontinuous at 0 and 1, i.e., at the
boundary points of the domain. In accordance with the last theorem, f is continuous on
(0; 1), the interior of its domain [0; 1]. N

Proof of Theorem 833 We prove the result for scalar functions. Let f be a concave
function de ned on an interval C of the real line. We will show that f is continuous in every
closed interval [a; b] included in the interior of C: this will imply the continuity of f on the
interior of C.
So, let [a; b] int C. Let m be the smallest between the two values f (a) and f (b); for
every x = a + (1 ) b, with 0 1, that is, for every x 2 [a; b], one has

f (x) f (a) + (1 ) f (b) m + (1 )m = m

Therefore, f is bounded below by m on [a; b]. For every

a b b a
t
2 2
580 CHAPTER 17. CONCAVE FUNCTIONS

one has, due to the concavity of f , that


a+b 1 a+b 1 a+b
f f +t + f t
2 2 2 2 2
That is,
a+b a+b a+b
f +t 2f f t
2 2 2
Moreover, since
a+b b a b a
t 2 [a; b] 8t 2 ;
2 2 2
we have
a+b
f t m
2
whence
a+b a+b
f +t 2f m
2 2
By setting
a+b
M = 2f m
2
and by observing that
a+b b a b a
[a; b] = + t, for t 2 ;
2 2 2
we conclude that f is also bounded above by M on [a; b]. Thus, the function f is bounded
on [a; b].
Now consider the interval [a "; b + "], with " > 0. Clearly, it is also contained in the
interior of C, so f is bounded also on it (by what we have just proved). Let m" and M" be
the in mum and the supremum of f on [a "; b + "]. If m" = M" , the function is constant
and, even more so, continuous. Let then m" < M" . Take two points x 6= y in [a; b] and set
" (x y) jx yj
z=y ; =
jx yj " + jx yj
We see immediately that z 2 [a "; b + "] and that y = z + (1 ) x. Therefore,
f (y) f (z) + (1 ) f (x) = f (x) + [f (z) f (x)]
that is,
jx yj
f (x) f (y) [f (x) f (z)] (M" m" ) = (M" m" )
" + jy xj
M" m"
< jx yj
"
In conclusion,
jf (x) f (y)j k jx yj
where k = (M" m" ) =". Now, if y ! x, that is, jx yj ! 0, then f (y) ! f (x). This
proves the continuity of f at x. Since x is arbitrary, the statement follows.

We the following immediate, yet important, corollary.


17.3. QUASI-CONCAVE FUNCTIONS 581

Corollary 835 A concave function de ned on an open convex set is continuous.

For instance, concave functions f : Rn ! R de ned on the entire space Rn are continuous.
This is the case, for instance, for the norm k k : Rn ! R, whose continuity { proved in
Proposition 560 { now also follows from its convexity (Example 814).
If we strengthen the hypothesis on f we can weaken that on its domain, as the next
interesting result shows.

Proposition 836 An a ne function de ned on a convex set is continuous.

Proof Let f : C Rn ! R be a ne on the convex set C. By Proposition 820, we have


f = l + q on C. By the last result, l is continuous, so f as well is continuous.

17.3 Quasi-concave functions


17.3.1 De nition and basic notions
In the previous section we posed a question, motivated by some simple observations from
utility theory, that we can reformulate as follows: given that concavity is only a su cient
condition for the convexity of the upper contour sets, which weakening of the notion of
concavity does permit to identify the functions featuring convex upper contour sets? In the
language of utility theory, what is the characterization of utility functions with \convex"
indi erence curves?
The answer to these questions is the following class of functions de ned on convex sets
C of Rn .

De nition 837 A function f : C Rn ! R is said to be quasi-concave if

f ( x + (1 ) y) min ff (x) ; f (y)g (17.18)

for every x; y 2 C and every 2 [0; 1], and it is said to be quasi-convex if

f ( x + (1 ) y) max ff (x) ; f (y)g (17.19)

for every x; y 2 C and every 2 [0; 1] :

When the inequality in (17.18) is strict, with 2 (0; 1) and x 6= y, the function f is said
to be strictly quasi-concave. Similarly, when the inequality in (17.19) is strict, with 2 (0; 1)
and x 6= y, the function f is said to be strictly quasi-convex.
Finally, a function f is said to be quasi-a ne if it is both quasi-concave and quasi-convex,
that is,
min ff (x) ; f (y)g f ( x + (1 ) y) max ff (x) ; f (y)g (17.20)
for every x; y 2 C and every 2 [0; 1].

Concave functions are quasi-concave because

f ( x + (1 ) y) f (x) + (1 ) f (y) min ff (x) ; f (y)g

while convex functions are quasi-convex. In particular, a ne functions are quasi-a ne. The
converses of these implications are false, as the following example shows.
582 CHAPTER 17. CONCAVE FUNCTIONS

Example 838 Monotone scalar functions (e.g., the cubic) are quasi-a ne. Indeed, let f :
C R ! R be increasing on the interval C and let x; y 2 C and 2 [0; 1], with x y. Then,
x x + (1 ) y y and the increasing monotonicity implies f (x) f ( x + (1 ) y)
f (y), that is, (17.20) holds. A similar argument applies when f is decreasing. The following
gure illustrates:

8
y
6

4
y=k
2

O x
-2

-4
-4 -3 -2 -1 0 1 2 3 4

This example shows that, unlike concave functions, quasi-concave functions may be quite
irregular. For instance, they might well be discontinuous at interior points of their domain
(just take any discontinuous monotone scalar functions). N

Strictly concave functions are strictly quasi-concave:

f ( x + (1 ) y) > f (x) + (1 ) f (y) min ff (x) ; f (y)g

while strictly convex functions are strictly quasi-convex. The converses of these implications
are false. In particular, note that a quasi-concave function can be strictly convex { for
example, the exponential f (x) = ex . The terminology must, therefore, be taken cum grano
salis.

The next important result justi es the study of quasi-concave functions.

Proposition 839 A function f : C Rn ! R is quasi-concave if and only if all its upper


contour sets (f k) are convex.

A dual version of this result holds for quasi-convex functions with lower contour sets in
place of the upper contour ones.

Proof Let f be quasi-concave. Given a non-empty (otherwise the result is trivial) upper
contour set (f k), let x; y 2 (f k) and 2 [0; 1]. We have

f ( x + (1 ) y) min ff (x) ; f (y)g k


17.3. QUASI-CONCAVE FUNCTIONS 583

and so x + (1 ) y 2 (f k). The set (f k) is therefore convex.


Vice versa, suppose that all the upper contour sets (f k) are convex. Let x; y 2 C and
2 [0; 1]. Without loss of generality, suppose f (x) f (y). Setting k = f (y), we have
x + (1 ) y 2 (f k), and therefore

f ( x + (1 ) y) k = min ff (x) ; f (y)g

This proves the quasi-concavity of f .

Quasi-concave functions are thus characterized by the convexity of their upper contour
sets. So, quasi-concavity is the weakening of the notion of concavity that answers the opening
question.
It is a weakening that, however, also brings some bad news. We already remarked in the
last example that quasi-concave functions are, in general, much less regular than concave
functions. In a similar vein, additivity preserves concavity (Proposition 832) but not quasi-
concavity, as next we show.

Example 840 Let f; g : R ! R be given by f (x) = x3 and g (x) = x2 . These two


scalar functions are monotone, so quasi-concave. De ne h : R ! R by h = f + g, that is,
h (x) = x3 x2 . If we take the points x = 0 and y = 1, we have

1 1 1 1
h x+ y =h = < 0 = h (x) = h (y)
2 2 2 8

So, h is not quasi-concave. N

Interestingly, quasi-concavity can be equivalently characterized by the convexity of strict


upper contour sets.

Proposition 841 A function f : C Rn ! R is quasi-concave if and only if all its strict


upper contour sets (f > k) are convex.

Proof The \only if" is similar


\ to that of Proposition 839 and left to the reader. As to the
\if", observe that (f k) = (f > k 1=n) for all k 2 R, so all sets (f k) are convex if
n 1
all sets (f > k) are convex. By Proposition 839, f is quasi-concave.

Interestingly, in the scalar continuous case we can fully characterize quasi-concave func-
tions.

Proposition 842 If f : [a; b] ! R is continuous and quasi-concave, then there exists x 2


[a; b] such that f is increasing on [a; x] and decreasing on [x; b].

If x 2 fa; bg, then f is monotone: decreasing if x = a and increasing if x = b (cf. Example


838). If x 2 (a; b), then f rst increases and then decreases. These are the only patters that
a continuous and quasi-concave function on the real line may feature.
584 CHAPTER 17. CONCAVE FUNCTIONS

Proof Let x = min(arg max[a;b] f ), i.e., x is the smallest maximizer of f . By the Weierstrass
Theorem, arg max[a;b] f is compact and non-empty, so x is well de ned. We divide the proof
in three steps.

Step 1: x = a. Let x; y 2 [a; b] be such that x y. Since f is quasi-concave and f (x) f (x),
it follows that (f f (x)) is a convex set and x belongs to it. Since x y x, this implies
that y 2 [a; x] (f f (x)). Thus, f (y) f (x). Since x and y were arbitrarily chosen, it
follows that f is decreasing.

Step 2: x = b. Let x; y 2 [a; b] be such that x y. Since f is quasi-concave and f (x) f (y),
it follows that (f f (y)) is a convex set and x belongs to it. Since x x y, it follows
that x 2 [y; b] (f f (y)). Thus, f (x) f (y). Since x and y were arbitrarily chosen, it
follows that f is increasing.

Step 3: x 2 (a; b). De ne I^ = [a; x] and I~ = [x; b]. Denote f restricted to I^ by f^ and
f restricted to I~ by f~. Both restrictions are continuous and quasi-concave. In particu-
lar, x continues to be the smallest maximizer for both of them, i.e., min(arg maxI^ f^) =
x = min(arg maxI~ f~). In view of steps 1 and 2, we conclude that f^ is increasing and f~ is
decreasing. This proves the statement.

We close with a quasi-concave version of the all-important Jensen inequality.

Proposition 843 A function f : C Rn ! R is quasi-concave if and only if


n
!
X
f i xi min f (xi ) (17.21)
i=1;:::;n
i=1

for every nite collection fxi gni=1 of vectors of C and every collection f i gni=1 of weights.

The simple induction proof and the dual version for quasi-convex functions are left to
the reader.

17.3.2 Convexity of indi erence curves


In utility theory, quasi-concave utility functions are precisely those featuring \convex" in-
di erence curves, the usual form of indi erence curves. This makes quasi-concave utility
functions the most important class of utility functions.
To elaborate, observe that a quasi-a ne function f has convex level curves (f = k)
because (f = k) = (f k) \ (f k). The converse is, however, false: injective functions
have level curves that are singletons, so convex, but they might be not quasi-a ne. For
instance, take the function f : R ! R given by
( 1
x if x 6= 0
f (x) =
0 otherwise
Since f is injective, its level curves are singletons. In particular,
8
< f0g if y = 0
f 1 (y) = n o
: 1 if y =
6 0
y
17.3. QUASI-CONCAVE FUNCTIONS 585

So, the level curves are, trivially, convex. Yet, f is neither quasi-concave nor quasi-convex,
a fortiori not quasi-a ne.
In utility theory this observation shows that a su cient, but not necessary, condition for a
utility function u to have convex (in a proper sense!) indi erence curves is to be quasi-a ne.
Recall that previously we talked about convexity in an improper sense { so, within quotes
\." { of the indi erence curves, meaning by this the convexity of the upper contour sets
(u k). Although improper, this is a common terminology. In a proper sense, the convexity
of the indi erence curves is the convexity of the level curves (u = k). Thanks to Proposition
839, the improper convexity of the indi erence curves characterizes quasi-concave utility
functions, while their proper convexity is satis ed by quasi-a ne utility functions (without
being, however, a characterizing property of a nity).

17.3.3 Transformations, cardinality and ordinality


Quasi-concavity is preserved by monotone transformations, unlike concavity. To shed light
on this key di erence, it is useful to study together the behavior of concavity and of quasi-
concavity with respect to composition.

Proposition 844 Let g : C Rn ! R and f : D R ! R be two functions de ned on


convex sets and such that Im g D.

(i) If g is concave and f is concave and increasing, then f g : C ! R is concave.

(ii) If g is quasi-concave and f is increasing, then f g : C ! R is quasi-concave.

Proof (i) Let x; y 2 C and 2 [0; 1]. Thanks to the properties of the functions f and g, we
have

(f g) ( x + (1 ) y) = f (g ( x + (1 ) y)) f ( g (x) + (1 ) g (y))


f (g (x)) + (1 ) (f (g (y)))
= (f g) (x) + (1 ) (f g) (y)

as desired.
(ii) Again, let x; y 2 C and 2 [0; 1]. Now we have

(f g) ( x + (1 ) y) = f (g ( x + (1 ) y)) f (min fg (x) ; g (y)g)


= min ff (g (x)) ; f (g (y))g = min f(f g) (x) ; (f g) (y)g

as desired.

Example 845 (i) Given any strictly positive concave function g : C Rn ! (0; 1), by
Proposition 844-(i) its transformation log f is concave. (ii) Consider a version of the Cobb-
Douglas h : Rn+ ! R given by
Yn
h (x) = xi i
i=1
Pn
with the exponents i > 0 (we do not require i=1 i = 1). We have
Pn
log xi
h (x) = e i=1 i
8x 2 Rn++
586 CHAPTER 17. CONCAVE FUNCTIONS

P
Since the function g (x) = ni=1 i log xi is easily seen to be concave on Rn++ and f (x) = ex
is increasing, by Proposition 844-(ii) we conclude that h = f g is quasi-concave on Rn++ .
In turn, this easily implies that h is quasi-concave on the entire orthant Rn+ (why?). N

Between (i) and (ii) there is an important di erence: concavity is preserved by the mono-
tone transformation f g if f is both increasing and concave, while increasing monotonicity
is su cient to preserve quasi-concavity. In other terms, quasi-concavity is preserved by
monotone (increasing) transformations, while this is not true for concavity. For example, if
p
f; g : [0; 1) ! R are g (x) = x and f (x) = x4 , the composite function f g : [0; 1) ! R is
the quasi-concave and strictly convex function x2 .7 So, with f increasing but not concave,
the concavity of g only implies the quasi-concavity of f g, nothing more.
This di erence between (i) and (ii) is important in utility theory. A property of the
utility functions that is preserved for strictly increasing monotone transformations is called
ordinal, while a property that is preserved only for strictly increasing a ne transformations
{ that is, for f (x) = x + with > 0 and 2 R { is called cardinal. Naturally, an ordinal
property is also cardinal, while the converse is false. Thanks to Proposition 844, we can thus
say that quasi-concavity is an ordinal property, while concavity is only cardinal.
The distinction between cardinal and ordinal properties is, conceptually, very important.
Indeed, given a utility function u : C Rn ! R and a strictly increasing function f : Im u !
R, we saw in Section 6.4.4 that the transformation f u : C Rn ! R of the utility function
u : C Rn ! R is itself a utility function equivalent to u. In other words, f u represents
the same preference relation %, which is the fundamental economic notion (Section 6.8).
Indeed,
x % y () u (x) u (y) () (f u) (x) (f u) (y)

For this reason, ordinal properties { which are satis ed by u and all its equivalent transfor-
mations f u { are characteristic of utility functions in that numeric representations of an
underlying preference %. In contrast, this is not true for cardinal properties, which might
well get lost through strictly increasing transformations that are not linear.

In light of this, ordinal quasi-concavity, rather than cardinal concavity, seems to be the
relevant property for utility functions u : C Rn ! R. Nevertheless, before we declare
quasi-concavity to be the relevant property, in place of concavity, we have to make a last
subtle observation. The monotone transformation f u is quasi-concave if u is concave; does
the opposite also hold? That is, is any quasi-concave function a monotone transformation of
a concave function?
If this were the case, concavity would be back in business also in an ordinalist approach:8
given a quasi-concave function, it would be then su cient to consider its equivalent concave
version, obtained through a suitable strictly increasing transformation.
The answer to the question is negative: there exist quasi-concave functions that are not
monotone transformations of concave functions.

7
Note that x4 is here strictly increasing because we are considering its restriction on [0; 1). For the same
reason, x2 is quasi-concave.
8
Recall the discussion of Section 6.2.1.
17.3. QUASI-CONCAVE FUNCTIONS 587

Example 846 Let g : R ! R be given by


8
>
> x if x 0
<
g (x) = 0 if x 2 (0; 1)
>
>
:
x 1 if x 1

This function is increasing, so quasi-concave. We claim that there is no f : R ! R strictly


increasing and no h : R ! R concave such that

g=f h (17.22)

If f is strictly increasing, it has a strictly increasing inverse f 1 (Proposition 222) Therefore,


(17.22) is equivalent to h = f 1 g. Hence, our claim amounts to saying that f 1 g is not
concave for any f : R ! R strictly increasing. Suppose, by contradiction, that f 1 g is
concave. By setting x = 3=2 and y = 0, we have

1 1 3 1 1 1
f (0) = f g = f g x+ y
4 2 2
1 1 1 1 1 1 1 1 1
f g (x) + f g (y) = f + f (0)
2 2 2 2 2

that is
1 1 1
f (0) f
2
which contradicts the fact that f 1 is strictly increasing. This proves the claim. N

This example shows that there exist genuinely quasi-concave functions that cannot be
represented as monotone transformations of concave functions. It is the de nitive proof
that quasi-concavity, and not concavity, is the relevant property in an ordinalist approach.
This important conclusion was reached in 1949 by Bruno de Finetti in the article in which
he introduced quasi-concave functions, whose theory was then extended in 1954 by Werner
Fenchel.

17.3.4 Multivariable transformations


To conclude, we present the multivariable extension of Proposition 847. Mutatis mutandis,
the concavity part extends verbatim. In contrast, the quasi-concavity part requires some
additional conditions (note, however, that the quasi-concavity of f is implied by monotonicity
in the scalar case).

Proposition 847 Let g = (g1 ; :::; gm ) : C Rn ! Rm and f : D Rm ! R be two


functions de ned on convex sets, with Im g D.

(i) If each gi is concave and f is concave and increasing, then f g : C ! R is concave.

(ii) If each gi is quasi-concave and f is quasi-concave and increasing, then f g:C!R


is quasi-concave.
588 CHAPTER 17. CONCAVE FUNCTIONS

Proof We prove only (ii), as (i) can be similarly proved. Let x; y 2 C and 2 [0; 1]. We
have

(f g) ( x + (1 ) y) = f (g ( x + (1 ) y))
= f (g1 ( x + (1 ) y) ; :::; gm ( x + (1 ) y))
f ( g1 (x) + (1 ) g1 (y) ; :::; gm (x) + (1 ) gm (y))
= f ( (g1 (x) ; :::; gm (x)) + (1 ) (g1 (y) ; :::; gm (y)))
= f ( g (x) + (1 ) g (y)) min ff (g (x)) ; f (g (y))g
= min f(f g) (x) ; (f g) (y)g

where the last inequality holds because of the quasi-concavity of f .

Example 848 The function f : Rm


+ ! R given by

n
Y
f (x) = xi i
i=1

with the exponents i > 0 is quasi-concave and increasing (Example 845). By Proposition
847-(ii), the function h = f g = C Rn ! R given by
n
Y
h (x) = gi i (x)
i=1

is quasi-concave if each gi : C Rn ! R is concave and positive.


P
If ni=1 1 = 1, the function f becomes concave (as it will be seen in Corollary 880). By
Proposition 847-(i), in this case the function h is concave. N

A dividend of this example is the following useful result.

Corollary 849 The product of a concave and of a positive function is a quasi-concave func-
tion.

17.4 Diversi cation principle


It is time to justify the economic relevance of the notions studied in the chapter. We will
focus on consumer theory, but similar considerations hold for production theory.
We have observed many times that in consumer theory we usually consider utility func-
tions with \convex" indi erence curves, that is, utility functions with convex upper contour
sets. As observed, this is why quasi-concavity is a fundamental property of utility functions.
But, what is the economic motivation for assuming \convex" indi erence curves, that is,
quasi-concave utility functions?
The answer is in the diversi cation principle: if two bundles of goods ensure a certain
level of utility, say k, a convex combination of them, a mixture, x + (1 ) y will yield at
least as much. In other words, the diversi cation that the compound bundle x + (1 )y
a ords relative to the original bundles x and y, guarantees a utility level which is not smaller
17.4. DIVERSIFICATION PRINCIPLE 589

than the original one, i.e., k. If x = (0; 1) is the bundle composed by 0 units of water and 1
of bread, while y = (1; 0) is composed by 1 unit of water and 0 of bread, their mixture

1 1 1 1
(0; 1) + (1; 0) = ;
2 2 2 2
is a diversi ed bundle, with positive quantities of both water and bread. It is natural to
think that this mixture gives a utility which is not smaller than the utility of the two original
bundles.

Formally, the diversi cation principle translates into the condition:

u (x) k and u (y) k =) u ( x + (1 ) y) k 8k 2 R (DP)

for every 2 [0; 1]. This is, precisely, the classic property of \convexity" of indi erence
curves. Mathematically, it is the convexity of the upper contour set (u k), which corre-
sponds to the quasi-concavity of utility functions.

Everything ne? Almost, we can actually sharpen what was just said. Observe that the
diversi cation principle implies that, for every x; y 2 C,

u (x) = u (y) =) u ( x + (1 ) y) u (x) 8 2 [0; 1] (PDP)

Indeed, by setting k = u (x) = u (y), we obviously have u (x) k and u (y) k, which
implies u ( x + (1 ) y) k by the diversi cation principle. We call condition PDP the
pure diversi cation principle. In preferential terms, the PDP takes the nice form

x y =) x + (1 )y % x 8 2 [0; 1]

which well expresses its nature.


The PDP is very interesting: it states that each bundle which is a mixture of indi erent
bundles is preferred to the original ones. If we draw an indi erence curve, we see that the
weaker property PDP is often used as the property that characterizes the \convexity" of the
indi erence curves. Indeed, PDP is the purest and most intuitive form of the diversi cation
principle: by combining two indi erent alternatives, we get a better one. Going back to the
example of bread and water, it is plausible that
1 1
(0; 1) (1; 0) - ;
2 2
The next result shows that, in most cases of interest for consumer theory, the two principles
turn out to be equivalent. The result uses the notion of directed set.

De nition 850 A set C in Rn is said to be directed if, for every x; y 2 C, there exists
z 2 C such that z x and z y.

In words, a set is directed when any pair of its elements has a common lower bound that
belongs to the set. In consumer theory many sets of interest are directed. For example, all
sets C Rn+ that contain the origin are directed. Indeed, 0 x for every x 2 Rn and,
therefore, the origin itself is the lower bound common to all the pairs of elements of C.
590 CHAPTER 17. CONCAVE FUNCTIONS

Proposition 851 Let u : C Rn ! R be a continuous and increasing function de ned


on a convex and directed set C. The function u is quasi-concave if and only if it satis es
condition PDP.

Proof Since the \only if" part is obvious, we prove the \if" part: the PDP implies the
quasi-concavity of u. Let x; y 2 C and 2 [0; 1], with u (x) u (y). Since C is directed,
there exists z 2 C such that z x and z y. By the increasing monotonicity of u,
we have u (z) u (x) and u (z) u (y). De ne the auxiliary function : [0; 1] ! R by
(t) = u (tx + (1 t) z) for t 2 [0; 1]. Since C is convex, the function is well de ned. The
continuity of u implies that of . Indeed:

tn ! t =) tn x + (1 tn ) z ! tx + (1 t) z
=) u (tn x + (1 tn ) z) ! u (tx + (1 t) z) =) (tn ) ! (t)

Since (0) = u (z) u (y) u (x) = (1), by the Intermediate Value Theorem the continu-
ity of implies the existence of t 2 [0; 1] such that (t) = u (y). By setting w = tx+(1 t) z,
we have therefore u (w) = u (y). Moreover, z x implies w x.
By the PDP condition, it follows that u ( w + (1 ) y) u (w) = u (y), while w x
implies that w + (1 )y x + (1 ) y. Since u is increasing, we conclude that

u ( x + (1 ) y) u ( w + (1 ) y) u (y) = min fu (x) ; u (y)g

which proves that u is quasi-concave.

The result just proved guarantees that, under assumptions typically satis ed in consumer
theory, the two possible interpretations { proper and improper { of the convexity of the indif-
ference curves are equivalent. We can therefore consider the pure principle of diversi cation,
which is the clearest form of the diversi cation principle, as the motivation for the use of
quasi-concave utility functions.

What about concave functions? They satisfy the diversi cation principle and therefore
their use does not violate the principle. Example 846 has shown, however, that there ex-
ist examples of quasi-concave functions that are not monotone transformations of concave
functions, i.e., that do not have the form f g with f strictly increasing and g concave. In
other words, quasi-concavity (so, the diversi cation principle) is a weaker property than the
concavity in ordinal utility theory.
In conclusion, the use of concave functions is consistent with the diversi cation principle,
but it is not justi ed by it. Only quasi-concavity is justi ed by this principle, being its
mathematical counterpart.9

We make a last observation on the pure diversi cation principle that does not add much
conceptually, but is useful in applications. Consider a version of condition PDP with strict
inequality: for every x 6= y,

u (x) = u (y) =) u ( x + (1 ) y) > u (x) 8 2 (0; 1) (SDP)


9
In a microeconomics course, readers will learn that concavity can be given a compelling justi cation in
terms of risk aversion in choice under risk.
17.5. GRAND FINALE: CAUCHY'S EQUATION 591

or, equivalently, in preferential terms,

x y =) x + (1 )y x 8 2 (0; 1)

We thus obtain a strong form of the principle in which diversi cation is always strictly
preferred by the consumer. Condition SDP is implied by the strict quasi-concavity of u since

u (x) = u (y) =) u ( x + (1 ) y) > min fu (x) ; u (y)g = u (x) 8 2 (0; 1)

Under the hypotheses of Proposition 851, it is indeed equivalent to SDP. We thus have the
following version of that proposition (the proof is left to the reader).

Proposition 852 Let u : C Rn ! R be a continuous and increasing function de ned on


a convex and directed set C. The function u is strictly quasi-concave if and only if it satis es
condition SDP.

SDP is thus the version of the diversi cation principle that characterizes strict quasi-
concavity, a property often used in applications because it ensures the uniqueness of the
solutions of optimization problems, as it will be discussed in Section 22.6.

We close by observing that, although important, the diversi cation principle does not
have universal validity: there are cases in which it makes little sense. For example, if the
bundle (1; 0) consists of 1 unit of brewer's yeast and 0 of cakes' yeast, while the bundle (0; 1)
consists of 1 unit of cakes' yeast and 0 of brewer's yeast, and we judge them indi erent, their
combination (1=2; 1=2) might be useless for making both a pizza and a cake. In this case,
the combination turns out to be rather bad.

17.5 Grand nale: Cauchy's equation


17.5.1 The basic equation
We close the chapter with a remarkable re nement of Proposition 644 which shows that,
for functions that satisfy a minimal condition of regularity (continuity at one point), the
property of additivity f (x + y) = f (x) + f (y) is su cient to characterize linear functions
of a single variable.
This re nement is usually stated through the Cauchy functional equation: we ask whether
or not there are functions f : R ! R that satisfy the condition10

f (x + y) = f (x) + f (y) 8x; y 2 R

Naturally, a function satis es Cauchy's equation if and only if it is additive (cf. De nition
856). Much more is true:

Theorem 853 (Cauchy) If f : R ! R is continuous at least at one point, then it satis es


Cauchy's equation if and only if it is linear, that is, it is such that f (x) = mx for some
m 2 R.
10
Here, we talk of a functional equation because the \unknown" is a function f and not just a scalar or a
vector, as it is the case for the equations studied in Section 14.
592 CHAPTER 17. CONCAVE FUNCTIONS

In the language of Proposition 644 the theorem reads: a function f : R ! R, continuous


at least at one point, is linear if and only if it is additive. With a minimal regularity property
(continuity at a point), the property of homogeneity (i) of Proposition 644 is automatically
satis ed.11

N.B. The conclusion of Theorem 853 holds also when f is de ned only on R+ : the proof is
the same. O

Proof The \if" part is trivial; let us show the \only if" part in three steps. (i) Taking
x = y = 0, the equation gives f (0) = f (0) + f (0) = 2f (0), that is, f (0) = 0: the graphs of
all functions that satisfy the equation pass through the origin.
(ii) We claim that f is continuous at every point. Let x0 be the point at which, by
hypothesis, f is continuous, so that f (x) ! f (x0 ) as x ! x0 . Take another (generic) point
z0 . By the Cauchy equation and the continuity of f at x0 ,

f (x) = f (x z0 + z0 ) = f (x z0 ) + f (z0 ) ! f (x0 ) as x ! x0

Therefore,
f (x z0 ) ! f (x0 ) f (z0 ) = f (x0 z0 ) as x ! x0
which proves the continuity of f at x0 z0 and, by the arbitrariness of x0 z0 , f is everywhere
continuous.
(iii) Using Cauchy's equation n times, we can write that, for every x 2 R and for every
n 2 N,
f (nx) = f (x
| +x+ {z + x}) = f| (x) + f (x){z+ + f (x) = nf (x)
}
n times n times

Since f (0) = 0, we have 0 = f (x x) = f (x + ( x)) = f (x) + f ( x). Thus f ( x) = f (x)


and therefore f ( nx) = ( n)f (x) for every n 2 N. We conclude that

f (kx) = kf (x) 8x 2 R, 8k 2 Z (17.23)

By setting y = kx, we then have f (y) = kf (y=k), so

1 1
f y = f (y) 8y 2 R, 8k 2 Z (17.24)
k k

In conclusion, combining the two equalities (17.23) and (17.24), we get


m m
f x = f (x) 8x 2 R, 8m; n 2 Z with n 6= 0
n n
that is,
f (rx) = rf (x) 8x 2 R, 8r 2 Q
Hence, putting x = 1 and denoting f (1) = a, we have f (r) = ar for every r 2 Q. The
function f is therefore linear on the rationals. Now assume x is irrational and take a sequence
11
In view of Theorem 853, non-linear additive functions must be discontinuous at each point, so are ex-
tremely irregular. This makes them complicated to describe (and not particularly nice to see); for brevity we
do not provide examples of them.
17.5. GRAND FINALE: CAUCHY'S EQUATION 593

frk g of rationals that tends to x. We know that f (rk ) = ark for every k 1. Since ark ! ax
as k ! 1, the continuity of f at each x 2 R then yields

ax = a lim rk = lim ark = lim f (rk ) = f lim rk = f (x)


k!1 k!1 k!1 k!1

as desired.

17.5.2 Remarkable variants


Simple variants of Cauchy's equation are:

(i) \+- ": consider the functional equation


f (x + y) = f (x) f (y) 8x; y 2 R (17.25)

It admits the trivial solution f (x) = 0 for every x 2 R. Every other solution is strictly
positive. Indeed, if f is such a solution, for every x 2 R we have:
x x x x h x i2
f (x) = f + =f f = f 0
2 2 2 2 2
Moreover, if there exists y 6= 0 with f (y) = 0, then f (x) = f ((x y) + y) =
f (x y) f (y) = 0 for every x 2 R, which contradicts the non-triviality of f . Ev-
ery non-trivial solution of (17.25) is therefore strictly positive. This allows us to take
the logarithm of both members of (17.25), so that
log f (x + y) = log f (x) + log f (y) 8x; y 2 R
which is the Cauchy equation in the unknown function log f . The solution is therefore
log f (x) = mx with m 2 R, so the exponential function
f (x) = emx
is the non-trivial solution of the functional equation (17.25).
(ii) \ -+": consider the functional equation
f (x y) = f (x) + f (y) 8x; y > 0 (17.26)
It also admits the trivial solution f (x) = 0 for every x 2 R. By using the identity
xy = elog x+log y , (17.26) becomes

f elog x+log y = f elog x + f elog y 8x; y > 0

By setting g (x) = f (ex ) for every x 2 R, we obtain the Cauchy equation


g (log x + log y) = g (log x) + g (log y)
in the unknown function g. We know that its solution is g (x) = mx with m 2 R. This
yields f (ex ) = mx. In other words, the logarithmic function
f (x) = log xm
is the solution of the functional equation (17.26).
594 CHAPTER 17. CONCAVE FUNCTIONS

(iii) \ - ": consider the functional equation

f (x y) = f (x) f (y) 8x; y 0 (17.27)

It admits, too, the trivial solution f (x) = 0 for every x 2 R. The reader can verify that
also in this case we can take the logarithm of the two members, so that the equation
reduces to (ii) with log f in place of f , with solution log f (x) = m log x, that is, the
power function
f (x) = em log x = xm

The results just seen are remarkable because they establish a functional foundation to
the elementary functions. For example, the exponential function can be characterized, as in
Theorem 399, via the limit
x n
ex = lim 1 +
n!1 n
but also, from a completely di erent angle, as the function that solves the functional equation
(17.25). Both points of view are of great importance.
Because of the importance of this new perspective on elementary functions, we record as
a theorem what we established.

Theorem 854 (i) The exponential function f (x) = emx , with m 2 R, is the unique non-
trivial solution of the functional equation

f (x + y) = f (x) f (y) 8x; y 2 R

(ii) The logarithmic function f (x) = log xm , with m 2 R, is the unique non-trivial solution
of the functional equation

f (x y) = f (x) + f (y) 8x; y > 0

(iii) The power function f (x) = xm , with m 2 R, is the unique non-trivial solution of the
functional equation
f (x y) = f (x) f (y) 8x; y 0

17.5.3 Continuous compounding


A common nancial problem consists in calculating the value of a monetary capital at a future
date. Denote by c the amount of capital available today, date 0, and by m its terminal value,
that is, its value at a date t 0. The most common, and simplest, hypothesis is that m
depends only on c and t, so that

m = m (c; t) : R R+ ! R

Here, c < 0 is interpreted as a debt. Consider the following properties on this function:

(i) m (c1 + c2 ; t) = m (c1 ; t) + m (c2 ; t) for every t 0 and every c1 ; c2 2 R;

(ii) t1 < t2 implies m (c; t1 ) m (c; t2 ) for every c;


17.5. GRAND FINALE: CAUCHY'S EQUATION 595

(iii) m (c; 0) = c for every c 2 R.

Condition (i) requires that the terminal value of a sum of capitals be the sum of their
terminal values. Observe that it would be meaningless to suppose that m (c1 + c2 ; t) <
m (c1 ; t) + m (c2 ; t) for some c1 ; c2 0 because, in such a case, it would be more pro table
to invest separately c1 and c2 than their sum c1 + c2 . In contrast, it might be reasonable to
have m (c1 + c2 ; t) m (c1 ; t) + m (c2 ; t), but this would lead us a bit too far away.
Condition (ii) requires that the terminal value increases with the length of the investment.
This presumes that such value is measured in nominal terms. Finally, condition (iii) is
obvious.

Theorem 855 Let m be continuous for, at least, some value of c. It satis es conditions
(i)-(iii) if and only if
m (c; t) = cf (t)

where f : [0; 1) ! R is an increasing function such that f (0) = 1.

Proof De ne mt : R ! R by mt (c) = m (c; t). By condition (i), mt satis es the Cauchy


functional equation. Therefore, for each t 0 there is a scalar t such that mt (c) = t c.
De ne f : [0; 1) ! R by f (t) = t , so that we can write m (c; t) = cf (t). To satisfy (ii), f
must be increasing and, by (iii), we have f (0) = 1.

Under conditions (i)-(iii), the terminal value is therefore proportional to the amount c
of the capital. In particular, we have f (t) = m (1; t), so f (t) can be interpreted as the
terminal value in t of a unit capital. The terminal value of any other amount of capital can
be obtained simply by multiplying it by f (t). For this reason, f (t) is called the compound
factor.
The most common compound factor has the form
t
f (t) = e

with 0. To see how the exponential factor may come up, suppose that one has to invest
a capital c from today, 0, until the date t1 + t2 . We can think of two investment strategies:

(a) to invest from the beginning to the end, thus obtaining the terminal value cf (t1 + t2 );

(b) to invest rst from 0 to t1 , getting the terminal value cf (t1 ), and then reinvest this
amount for the remaining t2 , thus obtaining the terminal value (cf (t1 )) f (t2 ).

If the two terminal values di er, that is, f (t1 + t2 ) 6= f (t1 ) f (t2 ), arbitrage opportunities
may open if in the nancial market it is possible to lend and borrow without quantity
constraints and transactions costs. Indeed, if f (t1 + t2 ) > f (t1 ) f (t2 ), it would be pro table
to invest without interruptions from 0 to t1 + t2 and to borrow with interruption at t1 ,
earning in this way the di erence f (t1 + t2 ) f (t1 ) f (t2 ) > 0. Vice versa, if f (t1 + t2 ) <
f (t1 ) f (t2 ), it would be pro table to borrow without interruptions, and then investing with
an interruption at t1 .
596 CHAPTER 17. CONCAVE FUNCTIONS

In sum, the equality f (t1 + t2 ) = f (t1 ) f (t2 ) must hold for every t1 ; t2 0 in order
not to open arbitrage opportunities. Remarkably, from the study of the variant (17.25) of
Cauchy's equation, it follows that this equality amounts to
t
f (t) = e

provided f is continuous at least at one point. The exponential compound factor is thus the
outcome of a no arbitrage argument, as it is the case for many key results in nance (cf.
Section 24.6).

N.B. In this section we assumed that time is continuous, so t can take any positive value,
so each c induces a function mt (see the proof of the last theorem). In contrast, if time were
discrete, with t 2 N+ , we would have a sequence. In this case, the discrete compound factor
f : N+ ! R that corresponds to the exponential continuous compound factor is given by
f (t) = (1 + r)t with mt = (1 + r)t c (cf. Example 295).

17.5.4 Additive functions


De nition 856 A function f : Rn ! R is called additive if it satis es the Cauchy functional
equation, that is,
f (x + y) = f (x) + f (y) 8x; y 2 Rn

A function f : Rn ! R is linear if and only if it is additive and homogeneous (Proposition


644). The following result generalizes Cauchy's Theorem to the multivariable case and, in
so doing, provides a new twist on Riesz's Theorem. Remarkably, it shows that for additive
functions, the topological property of continuity (even just at some point) and the algebraic
property of homogeneity become equivalent.

Theorem 857 For a function f : Rn ! R, the following conditions are equivalent:

(i) f is continuous at least at one point and additive;

(ii) f is continuous and additive;

(iii) f is linear;

(iv) there exists a (unique) vector 2 Rn such that f (x) = x for all x 2 Rn .

Proof (iv) implies (iii) by Riesz's Theorem. (iii) implies (ii) by Theorem 646. (ii) trivially
implies (i). Finally, to prove that (i) implies (iv) is enough to show, along the lines of the
proof of Cauchy's Theorem for scalar functions (which is easily adapted to Rn , as readers
can check), that (i) implies that f is homogeneous, so linear.

Interestingly, it can be proved that a function f : Rn ! R is the non-trivial continuous


solution of the multidimensional version of variant (17.25), i.e.,

f (x + y) = f (x) f (y) 8x; y 2 Rn

if and only if there exists a vector 2 Rn such that f (x) = e x for all x 2 Rn .
Chapter 18

Homogeneous functions

18.1 Preamble: cones


De nition 858 A set C in Rn is said to be a cone if, for each x 2 C, we have x 2 C for
all 0.

Geometrically, C is a cone if, any time x belongs to C, the set C also includes the whole
half-line starting at the origin and passing through x.

5
y 7

6 y
4

3
4

2 3

2
1 O x
O x
1

0
0

-1 -1
-3 -2 -1 0 1 2 3 4 5 6 7 -6 -4 -2 0 2 4 6 8 10

Convex cone Cone not convex

Note that the origin 0 always belong to a cone: given any x 2 C, by taking = 0 we have
0 = 0x 2 C.

One can easily show that the closure of a cone is a cone and that the intersection of two
cones is still a cone.

Proposition 859 A convex set C in Rn is a cone if and only if

x; y 2 C =) x + y 2 C 8 ; 0

597
598 CHAPTER 18. HOMOGENEOUS FUNCTIONS

While a generic convex set is closed with respect to convex combinations, convex cones
are closed with respect to all linear combinations with positive coe cients (regardless of
whether or not they add up to 1). This is what distinguishes them among all convex sets.

Proof \Only if". Let C be a cone. Take x; y 2 C. We want to show that x + y 2 C for
all ; 0. Fix ; 0. If = = 0, then x + y = 0 2 C. Assume that + > 0.
Since C is convex, we have
x+ y2C
+ +
Since C is a cone, we have

x+ y =( + ) x+ y 2C
+ +
as desired.
\If". Suppose that x; y 2 C implies that x + y 2 C for all ; 0. We want to show
that C is a cone. By taking = = 0, one can conclude that 0 2 C and, by taking y = 0,
that x 2 C for all 0. Hence, C is a cone.

Example 860 (i) A singleton fxg Rn is always convex; it is also a cone if x = 0. (ii)
The only non-trivial cones in R are the two half-lines ( 1; 0] and [0; 1).1 (iii) The set
Rn+ = fx 2 Rn : x 0g of the positive vectors is a convex cone. N

Cones can be closed, for example Rn+ , or open, for example Rn++ . Vector subspaces form
an important class of closed convex cones (the non-trivial proof is omitted).

Proposition 861 Vector subspaces are closed subsets of Rn .

For example, this proposition implies that the graphs of straight lines passing through
the origin are closed sets because they are vector subspaces of R2 .

18.2 Homogeneity and returns to scale


18.2.1 Homogeneous functions
Returns to scale are a main property of production functions. Their mathematical coun-
terpart is homogeneity. We begin with the simplest kind of homogeneity, namely positive
homogeneity. For production functions, it corresponds to the hypothesis of constant returns
to scale.

De nition 862 A function f : C Rn ! R de ned on a convex set C with 0 2 C is said


to be positively homogeneous if
f ( x) = f (x) (18.1)
for all x 2 C and all 2 [0; 1].

Hence, a reduction of proportion x of all the components of a vector x determines an


analogous reduction f (x) of the value f ( x) of the function.
1
The trivial cones in R are the singleton f0g and R itself.
18.2. HOMOGENEITY AND RETURNS TO SCALE 599

Example 863 (i) Linear functions f : Rn ! R are positively homogeneous. (ii) The func-
p
tion f : R2+ ! R given by f (x) = x1 x2 is positively homogeneous. Indeed
p p p
f ( x) = ( x1 ) ( x2 ) = 2x x = x1 x2 = f (x)
1 2

for all 0. N

For any positively homogeneous function we have


f (0) = 0 (18.2)
Indeed, for all 2 [0; 1] we have f (0) = f ( 0) = f (0), which implies f (0) = 0. Positively
homogeneous functions thus have zero value at the origin.

The condition 0 2 C in the de nition ensures that x 2 C for all 2 [0; 1], so that (18.1)
is well-de ned. Whenever C is a cone { as in the previous examples { property (18.1) holds,
more generally, for any positive scalar .

Proposition 864 A function f : C Rn ! R de ned on a cone C is positively homoge-


neous if and only if
f ( x) = f (x) (18.3)
for all x 2 C and all 0.

Proof Since the \if" side is trivial, we focus on the \only if". Let f be positively homogeneous
and let x 2 C. We must show that f ( x) = f (x) for every > 1. Let > 1 and
set y = x, so that x = y= . From > 1 it follows that 1= < 1. Thanks to the
positive homogeneity of f , we have that f (x) = f (y= ) = f (y) = = f ( x) = , that is,
f ( x) = f (x), as desired.

A positively homogeneous function on a cone thus preserves positive scalar multiplication:


if one multiplies a vector x by any positive scalar , the image f ( x) is equal to the image
f (x) of x times the scalar . Hence, both proportional reductions and increases determine
analogous reductions and increases in f (x). When f is a production function, we are in
a classic constant returns to scale scenario: by doubling the inputs we double the output
( = 2), by tripling the inputs we triple the output ( = 3), and so on.

Linear production functions are positively homogeneous, thus having constant returns to
scale (Example 643). Let us now illustrate another famous example.

Example 865 Let f : R2+ ! R be a CES (constant elasticity of substitution) production


function de ned by
1
f (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and > 0. It is positively homogeneous:
1 1
f ( x) = ( ( x1 ) + (1 ) ( x2 ) ) = ( ( x1 + (1 ) x2 ))
1
= ( x1 + (1 ) x2 ) = f (x)
for all 0. N
600 CHAPTER 18. HOMOGENEOUS FUNCTIONS

Apart from being constant, returns to scale may be increasing or decreasing. This moti-
vates the following de nition.

De nition 866 A function f : C Rn ! R de ned on a convex set C with 0 2 C is said


to be ( positively) superhomogeneous if

f ( x) f (x)

for all x 2 C and all 2 [0; 1], while it is said to be ( positively) subhomogeneous if

f ( x) f (x)

for all x 2 C and all 2 [0; 1].

Naturally, a function is positively homogeneous if and only if it is both superhomogeneous


and subhomogeneous.

Example 867 Given any scalar k > 0, the function f : [0; 1) ! R de ned by f (x) =
1
1 + xk k
is subhomogeneous. Indeed, for each 2 [0; 1] we have
1
1 1 1
k k k k k 1 k
k
k
f ( x) = 1 + ( x) = 1+ x = k
+x 1 + xk = f (x)

as desired. N

Whenever f is a production function, subhomogeneity captures decreasing returns to


scale, while superhomogeneity captures increasing returns. This can easily be seen in the next
result, a version of Proposition 864 for subhomogeneous functions (we leave the analogous
superhomogeneous case to the reader).

Proposition 868 A function f : C Rn ! R de ned on a convex cone is subhomogeneous


if and only if and every if for every x 2 C we have

f ( x) f (x) 8 2 [0; 1]

and
f ( x) f (x) 8 1

Proof We consider the \only if" side, the converse being trivial. Let f be subhomogeneous
and x 2 C. Our aim is to show that f ( x) f (x) for all > 1. Take > 1 and set
y = x, so that x = y= . Since > 1, we have 1= < 1. By the positive subhomogeneity of
f , we have f (x) = f (y= ) f (y) = = f ( x) = , that is, f ( x) f (x), as desired.

Thus, by doubling all inputs ( = 2) the output is less than doubled, by tripling all inputs
( = 3) the output is less than tripled, and so on for each 1. A proportional increase of
all inputs brings along a less than proportional increase in output, which models decreasing
returns to scale. Dual considerations hold for increasing returns to scale, which entail more
than proportional increases in output as all inputs increase proportionally. Note that when
2 [0; 1], so we cut inputs, opposite output patterns emerge.
18.2. HOMOGENEITY AND RETURNS TO SCALE 601

Example 869 Consider the following version of a Cobb-Douglas production function f :


R2+ ! R
f (x) = xa1 xb2
with a; b > 0 (we do not require a + b = 1; cf. Example 845). For each a 2 (0; 1) we have

f ( x) = ( x1 )a ( x2 )b = a+b a b
x1 x2 = a+b
f (x)

Such a production function is, thus, positively:

(i) homogeneous if a + b = 1 (constant returns to scale);

(ii) subhomogeneous if a + b 1 (decreasing returns to scale);

(iii) superhomogeneous if a + b 1 (increasing returns to scale).

All of this can be easily extended to the general case where


n
Y
f (x) = xai i
i=1

with ai > 0 for each i. Indeed:


n
Y n
Y n
Y n
Y
ai ai ai
f ( x) = ( xi ) = xi = ai
xai i
i=1 i=1 i=1 i=1
Pn n
Y Pn
= i=1 ai
xai i = i=1 ai
f (x)
i=1
Pn
for each
P 2 [0; 1]. It follows that f is
Phomogeneous if i=1 ai = 1, subhomogeneous if
n
i=1 ai 1 and superhomogeneous if n
i=1 ai 1. N

In conclusion, the notions of homogeneity are de ned for 2 [0; 1] { that is, for propor-
tional cuts { on convex sets containing the origin. Nonetheless, their natural domains are
cones, where they model the classic returns to scale hypotheses in which both cuts, 2 [0; 1],
and raises, 1, in inputs are considered.

18.2.2 Average functions


When f : [0; 1) ! R is a scalar function de ned on the positive half-line, the corresponding
\average function" fm : (0; 1) ! R is de ned by

f (x)
fm (x) =
x
for each x > 0. It is important in applications: for example, if f is a production function, fm
is the average production function; if f is the cost function, fm is the average cost function;
and so on.
If f : Rn+ ! R is a function of several variables, it is no longer possible to \divide" it by
a vector x. We must, therefore, come up with an alternative concept of \average function".
602 CHAPTER 18. HOMOGENEOUS FUNCTIONS

The most natural surrogate for such a function is the following. Having chosen a generic
y
vector 0 6= y 2 Rn+ , let us consider the function fm : (0; 1) ! R given by

y f (zy)
fm (z) =
z
It yields the average value of f with respect to positive multiples of z only (which is arbitrarily
chosen). In the n = 1 case, by choosing y = 1 one ends up with the previous de nition of
average function.

The following characterization allows for a simple reinterpretation of subhomogeneity in


terms of average functions.

Proposition 870 A function f : C Rn+ ! R de ned on a convex cone, with f (0) = 0,


y
is subhomogeneous if and only if the corresponding average functions fm : (0; 1) ! R are
decreasing (for any choice of y).

A function is thus subhomogeneous if and only if the corresponding average function is


decreasing. Similarly, a function is superhomogeneous if and only if its average function is
increasing.
A subhomogeneous production function is, thus, characterized by a decreasing average
production function. In other words, a decreasing average production function characterizes
decreasing returns to scale (as is quite natural to expect).

Proof \Only if". If f is subhomogeneous one has that, for any 0 < ,

f ( y) = f y f ( y)

y y y
that is f ( y) = f ( y) = , or fm ( ) fm ( ). Therefore, the function fm is decreasing.
y y y
\If". If fm is decreasing, by setting = 1, we have fm ( ) fm (1) for 0 < 1 and so
f ( y) = f (y), that is, f ( y) f (y) for each 0 < 1. Since f (0) = 0, the function
f is subhomogeneous.

18.2.3 Homogeneity and concavity: superlinear functions


We begin with a characterization of concavity for positively homogeneous functions. In
reading the result, recall that x + y 2 C if x; y 2 C when C is a convex cone (Proposition
859).

Proposition 871 Let f : C Rn ! R be a positively homogeneous function de ned on a


convex cone. Then, f is concave if and only if it is superadditive, i.e., f (x + y) f (x)+f (y)
for each x; y 2 C.

Proof \Only if" Let f be positively homogeneous and concave. Then, for all x; y 2 C we
have
1 1 1 1 1
f (x + y) = f x+ y f (x) + f (y)
2 2 2 2 2
18.2. HOMOGENEITY AND RETURNS TO SCALE 603

So, f (x + y) f (x)+f (y). \If" Let f be positively homogeneous and superadditive. Then,
for all x; y 2 C and 2 [0; 1] we have

f ( x + (1 ) y) f ( x) + f ((1 ) y) = f (x) + (1 ) f (y)

So, f is concave.

In view of this result, next we introduce an important class of functions.

De nition 872 A function f : C Rn ! R de ned on a convex cone is superlinear if it is


positively homogeneous and superadditive.

By the last result, superlinear functions are the positively homogeneous that are concave.
So, concavity and positive homogeneity join forces in this important class of functions.

A function f : C Rn ! R de ned on a convex cone is sublinear if it is positively


homogeneous and subadditive, i.e., if f (x + y) f (x) + f (y) for each x; y 2 C. It is
immediate to see that f is sublinear if and only if f is superlinear. A dual version of the
last result shows that sublinear functions are the positively homogeneous that are convex.

Example 873 (i) The norm k k : Rn ! R is a sublinear function (cf. Example 814). (ii)
De ne f : Rn ! R by
f (x) = inf i x 8x 2 Rn
i2I

where f i gi2I be a collection, nite or in nite, of vectors of Rn . This function is easily seen
to be superlinear. (iii) Given a convex function f : (0; 1) ! R, consider the perspective
function g : R2++ ! R de ned by

x2
g (x1 ; x2 ) = x1 f
x1

This function is convex (Example 831) and is easily seen to be positively homogeneous. So,
it is sublinear. N

Next we report some useful properties of superlinear functions, for simplicity de ned
directly on Rn (readers can consider versions of the next results on a ne sets).

Proposition 874 Let f : Rn ! R be superlinear. Then, f (0) = 0 and

f ( x) f (x) 8x 2 Rn (18.4)

Furthermore, f is linear if and only if f ( x) = f (x) for each x 2 Rn .

Proof Since f is positively homogeneous, we have f ( 0) = f (0) for each 0. Since


0 = 0, we have f (0) = f (0) for each 0, which can happen only if f (0) = 0.2 For
n
each x 2 R , we thus have 0 = f (0) = f (x x) f (x) + f ( x), so (18.4) holds.
Clearly, if f is linear we have f ( x) = f (x) for each x 2 Rn . As to the converse,
assume that f ( x) = f (x) for each x 2 Rn . Consider the function g : Rn ! R de ned
2
Note that the argument is analogous to the one used in the proof of Proposition 645.
604 CHAPTER 18. HOMOGENEOUS FUNCTIONS

as g (x) = f ( x) for each x 2 Rn . It is easy to check that g is sublinear. From f ( x) =


f (x) it follows that f (x) = g (x) for each x 2 Rn , so f is a ne. By Proposition 820, there
exist a linear function l : Rn ! R and 2 R such that f = l + . On the other hand,
= f (0) = 0, so f = l. We conclude that f is linear.

A simple consequence of the last result is the following corollary, which motivates the
\superlinear" terminology.

Corollary 875 A function f : Rn ! R is both superlinear and sublinear if and only if is


linear.

Proof Let f be both superlinear and sublinear. By (18.4), we have both f ( x) f (x)
and f ( x) f (x) for all x 2 Rn , that is, f ( x) = f (x) for all x 2 Rn . By Proposition
874, f is then linear. The converse is trivial.

Inequality (18.4) delivers an interesting sandwich.3

Proposition 876 Let f : Rn ! R be superlinear. Then, for each x 2 Rn ,

f (x) l (x) () f (x) l (x) f ( x) 8l 2 (Rn )0 (18.5)

In words, a linear function l dominates pointwise f if and only if it is pointwise sandwiched


between f and g, where f : Rn ! R is the dual sublinear function of f de ned by f (x) =
f ( x).

Proof Let l 2 (Rn )0 and suppose that f (x) l (x) for all x 2 Rn . Let x 2 Rn . Then, we
have both f (x) l (x) and f ( x) l ( x), which in turn implies f (x) l (x) = l ( x)
f ( x). This proves (18.5).

18.2.4 Homogeneity and quasi-concavity


We conclude our study of homogeneity with a nice result that shows how quasi-concavity
becomes equivalent to concavity as long as we consider positive functions which are also
positively homogeneous. To better appreciate the signi cance of this result, recall that
quasi-concavity is, in general, much weaker than concavity.
For this result, we need a weak form of superadditivity: we say that f : C Rn ! R
is null-superadditive if, for all x; y 2 C, we have f (x + y) f (y) whenever f (x) = 0. It is
easy to check that f is null-superadditive if it is increasing on C Rn+ or if f (x) > 0 for all
0 6= x 2 C.

Theorem 877 Let f : C Rn ! R be a positively homogeneous function de ned on a con-


vex cone. If f 0, then f is concave if and only if it is quasi-concave and null-superadditive.
3
Recall that (Rn )0 denotes the dual space of Rn , i.e., the collection of all linear functions on Rn (Section
15.1.2).
18.2. HOMOGENEITY AND RETURNS TO SCALE 605

Proof We only prove the \if", the converse being obvious. Let f 0 be quasi-concave and
null-superadditive. In view of Proposition 871, it is enough to show that f is superadditive,
i.e., f (x + y) f (x) + f (y) for all x; y 2 C.
Let x; y 2 C. First, assume that f (x) + f (y) = 0. Since f 0, we trivially have
f (x + y) f (x) + f (y). Second, assume that f (x) + f (y) > 0. Since f 0, either
f (y) is strictly greater than 0 or f (x) or both. If f (x) = 0 and f (y) > 0, then, by
null-superadditivity, f (x + y) f (y) = f (x) + f (y). If f (y) = 0 and f (x) > 0, by the
same argument, we obtain again f (x + y) f (x) = f (x) + f (y). Finally, in the third
case, f (x) > 0 and f (y) > 0. Then, there exist strictly positive scalars and such that
f (x) and f (y) . Since f is positively homogeneous, f (x= ) 1 and f (y= ) 1.
Hence x= , y= 2 (f 1) where the latter set is convex, being f quasi-concave. Therefore,
1 x+y x y
f (x + y) = f =f + 1
+ + + +
that is f (x + y) + . By taking = f (x) and = f (y), it follows f (x + y)
f (x) + f (y).

Null-superadditivity cannot be removed in this result. The function f : R ! R de ned


by
x x 0
f (x) =
0 else
is positively homogeneous on the convex cone C = R, as well as quasi-concave (being in-
creasing) and 0. Yet, f is neither concave (it is actually convex) nor null-superadditive.
Also the condition f 0 is needed: the function f : R ! R given by
(
2x if x 0
f (x) =
x if x < 0

is null-superadditive (since f (x) = 0 only if x = 0), increasing (so, quasi-concave) and


positively homogeneous. Nonetheless, it is not concave (it is convex!).

Under continuity, we can establish an alternative version of the last result.

Proposition 878 Let f : C Rn ! R be a positively homogeneous and continuous function


de ned on a convex cone with non-empty interior. If f > 0 on int C, then f is concave if
and only if it is quasi-concave.

The proof relies on the next lemma, of some independent interest.

Lemma 879 Let C Rn be a convex set with int C 6= ;. If x 2 C and y 2 int C, then
(1 ) x + y 2 int C for all 2 (0; 1). Moreover, we have int C = C

Proof Consider 2 (0; 1), x 2 C, and y 2 int C. Since y 2 int C, there exists " > 0 such
that B" (y) C. De ne = " > 0. We next show that B ((1 ) x + y) C which will
prove that (1 ) x + y 2 int C. To do so, consider z 2 B ((1 ) x + y) and de ne
z (1 )x
w=
606 CHAPTER 18. HOMOGENEOUS FUNCTIONS

Observe that z = (1 ) x + w. By de nition of w, we have

z (1 )x y + (1 )x z
ky wk = y = < = ";

So, w 2 B" (y). Since w then belongs to C, we have z = (1 ) x + w 2 C.


Next, observe that int C C, yielding that int C C. For, consider a generic x 2 C and
some y 2 int C and for each n 1 de ne n = 1=n 2 (0; 1) and zn = (1 n ) x+ n y. By the
previous part of the proof, fzn g belongs to int C and converges to x. Since x was arbitrarily
chosen, this implies that int C C, proving that int C C. Clearly, since int C C, we
have the opposite inclusion int C C, yielding that int C = C.

Proof of Proposition 878 We only prove the \if", the converse being obvious. Let f be
quasi-concave, with f > 0 on int C. By proceeding as in the last proof when we studied the
case f (x) > 0 and f (y) > 0, we can show that f (x + y) f (x) + f (y) for all x; y 2 int C.
Next, consider x; y 2 int C and 2 (0; 1). By Lemma 879 and since 0 2 C and C is
convex, it follows that (1 ) y = 0 + (1 ) y 2 int C. A similar argument yields that
x 2 int C. By the previous part, it follows that f ( x + (1 ) y) f ( x)+f ((1 ) y) =
f (x) + (1 ) f (y), that is, f is concave on int C. By Lemma 879 and since C is convex,
we have C int C. The continuity of f then implies its concavity on C.

Let us illustrate a couple of noteworthy applications of the last two results. In both
applications, we will use these results to establish the concavity of some classic functions by
showing their positivity, quasi-concavity and positive homogeneity. This route is far more
simple than verifying concavity directly.

Corollary 880 (i) The CES production function P is concave if 0 < 1. (ii) The Cobb-
Douglas production function is concave as long as ni=1 ai = 1.

Proof of Corollary 880 (i) For = 1 the statement is obvious. If < 1, note that on R+
the power function x is concave if 2 (0; 1). Hence, also g (x) = x1 + (1 ) x2 is concave.
1
Since h (x) = x is strictly increasing on R+ for any > 0, it follows that f = h g is
quasi-concave and null-superadditive. Since f 0 and thanks to Theorem 877, we conclude
that f is concave as we have previously shown its homogeneity.
(ii) The Cobb-Douglas function f is quasi-concave (Example 848). Since f is continuous
and on Rn++ is > 0, Proposition 878 implies P
that f is concave on Rn+ as we have already seen
that f is positively homogeneous whenever ni=1 ai = 1.4

We close by observing that an homogeneous function f : C Rn ! R cannot be strictly


concave: indeed, for any 2 [0; 1] and any x 2 C we have

f ( x + (1 ) 0) = f ( x) = f (x) + (1 ) f (0)

because f (0) = 0 (cf. Example 817).


4
As the reader may check, also Theorem 877 could have been invoked.
18.3. HOMOTHETICITY 607

18.3 Homotheticity
18.3.1 Semicones
For the sake of simplicity, till now we considered convex sets containing the origin 0, and
cones in particular. To introduce the notions of this nal section such an assumption becomes
too cumbersome to maintain, so we will consider the following generalization of the notion
of cone.

De nition 881 A set C in Rn is said to be a semicone if, for every x 2 C, we have x 2 C


for any > 0.5

Unlike the de nition of cone, here we require that x belong to C only for > 0 rather
than for 0. A cone is thus, a fortiori, a semicone. However, the converse does not hold:
n
the set R++ is a notable example of a semicone that is not a cone.

Lemma 882 A semicone C is a cone if and only if 0 2 C.

Therefore, semicones do not necessarily contain the origin and, when they do, they
automatically become cones. In any case, the origin is always in the surroundings of a
semicone:

Lemma 883 If C is a semicone, then 0 belongs to the closure of C.

The easy proofs of the above lemmas are left to the reader. The last lemma, in particular,
leads to the following result.

Proposition 884 A closed semicone is a cone.

The distinction between cones and semicones thus disappears when considering closed
sets. Finally, the following version of Proposition 859 holds for semicones, with coe cients
that now are required to be strictly positive, as the reader can check.

Proposition 885 A set C in Rn is a convex semicone if and only if

x; y 2 C =) x + y 2 C

for all ; 0 with + > 0.

Proof \Only if" Consider ; 0 such that + > 0 and x; y 2 C. De ne ^ = = ( + )


as well as = = ( + ). Note that ^ ; ^ 2 [0; 1] and ^ = 1 ^ . Since C is convex, we
^
have that ^ x + ^ y 2 C. Since C is a semicone and + > 0, we have that x + y =
( + ) ^ x + ^ y 2 C. \If" Consider x; y 2 C as well as 2 [0; 1] and > 0. Note that if
we de ne = 1 , then 0 and + = 1 > 0 as well as x + (1 ) y = x + y 2 C,
proving that C is convex. Similarly, if we set = 0, we have that + = > 0 and
x = x + y 2 C, proving C is a semicone.
5
This terminology is not standard.
608 CHAPTER 18. HOMOGENEOUS FUNCTIONS

Example 886 (i) The two half-lines ( 1; 0) and (0; 1) are semicones in R (but they are
not cones) (ii) The set Rn++ = fx 2 Rn : x 0g of the strongly positive vectors is a convex
semicone (which is not a cone). N

The notion of positive homogeneity can be easily extended to semicones.

De nition 887 A function f : C Rn ! R de ned on a semicone C is said to be positively


homogeneous if
f ( x) = f (x) (18.6)
for all x 2 C and all > 0.

The next result shows that this notion is consistent with what we did so far.

Lemma 888 Let f : C Rn ! R be a positively homogeneous function on a semicone C.


If 0 2 C, then f (0) = 0.

Proof If 0 2 C, then for every > 0 we have f (0) = f ( 0) = f (0). Hence, f (0) = 0.

Thus, when the semicone is actually a cone { i.e., it contains the origin (Lemma 882) { we
get back to the notion of positive homogeneity on cones of the previous section. Everything
ts together.
Pn
Example P 889 Consider the function f : Rn++ ! R given by f (x) = e i=1 ai log xi , with
ai > 0. If ni=1 ai = 1, the function is positively homogeneous. Indeed, for any > 0 we
have
Pn Pn Pn Pn
ai log xi ai (log +log xi )
f ( x) = e i=1 =e i=1 = elog e i=1 ai log xi
= e i=1 ai log xi

18.3.2 Homotheticity and utility


The following ordinal version of positive homogeneity is used in consumer theory.

De nition 890 A function f : C Rn ! R de ned on a semicone is said to be homothetic


if
f (x) = f (y) =) f ( x) = f ( y)
for every x; y 2 C and every > 0.

In particular, a utility function u is homothetic whenever the ordering between consump-


tion bundles x and y is preserved when both bundles are multiplied by the same positive
constant . By doubling (tripling, and so on) vectors, their ranking is not altered. In
preferential terms:
x y =) x y 8 >0
This property can be interpreted, in some applications, as invariance with respect to a
measurement scale.

Homotheticity has a mathematically simple, yet economically important, characteriza-


tion.
18.3. HOMOTHETICITY 609

Proposition 891 A continuous and strongly increasing function h : Rn+ Rn ! R is


homothetic if and only if
h=f g
with g : Rn+ Rn ! R positively homogeneous and f : Im g ! R strictly increasing.

In other words, a continuous and strongly increasing function is homothetic if and only if
it is a strictly increasing transformation of a positively homogeneous function.6 In particular,
homogeneous functions themselves are homothetic because f (x) = x is, trivially, strictly
increasing.
In sum, homotheticity can be seen as the ordinal version of positive homogeneity. As
such, it is the version relevant in ordinal utility theory.
Yn
Example 892 Let u : Rn+ ! R be the Cobb-Douglas utility function u (x) = xai i , with
P i=1
ai > 0 and ni=1 ai = 1. It follows from Example 869 that such a function is positively
homogeneous. If f is strictly increasing, the transformations f u of the Cobb-Douglas utility
function are homothetic. For example, if we consider the restriction of u on the semicone Rn++
(where it is still positively homogeneous) and the logarithmic transformation
Pn f (x) = log x,
we obtain the log-linear utility function v = log u given by v (x) = i=1 ai log xi , which is
thus homothetic. N

Proof The \if" part is simple. For, if h (x) = h (y) then g (x) = g (y) because f is strictly
increasing. Thus, g ( x) = g (x) = g (y) = g ( y) for all > 0, yielding that h ( x) =
f (g ( x)) = f (g ( y)) = h ( y) for all > 0, that is, h is homothetic. The \only if" part
is more involved. Set 1 = (1; :::1) 2 Rn+ . We next show that for each x 2 Rn+ there exists
a unique x 0 such that h (x) = h ( x 1). Note that for each x 2 Rn+ we have that there
exists ; 0 such that 1 x 1. Since h is strongly increasing, we have that

h ( 1) h (x) h ( 1)

De ne the map h ~ : [0; 1] ! R by h


~ ( ) = h ( ( 1) + (1 ) ( 1)) for all 2 [0; 1]. Since
~
h is continuous, we have that h is continuous (why?) and such that h ~ (1) = h ( 1)
h (x) h ( 1) = h ~ (0). By the Intermediate Value Theorem, there exists ^ 2 [0; 1] such
~
that h (x) = h (^ ) = h (^ ( 1) + (1 ^ ) ( 1)) = h ((^ + (1 ^ ) ) 1). De ne x = ^ +
(1 ^ ) 0. It is immediate to see that h (x) = h ( x 1). Next, assume that 0 is such
that h (x) = h ( 1). Since h is strongly increasing, note that if > x (resp., < x ), then
h (x) = h ( x 1) < h ( 1) = h (x) (resp., h (x) = h ( x 1) > h ( 1) = h (x)), a contradiction.
Thus, is equal to x , proving the uniqueness of x . De ne now g : Rn+ Rn ! R by
g (x) = x . Since for each x in Rn+ the value x is unique, g is well de ned. By construction,
we have

g (x) g (y) () x y () x1 y1 () h ( x 1) h y1 () h (x) h (y)

By Proposition 265, there exists a function f : Im g ! R which is strictly increasing and


such that h = f g. We are left to show that g is positively homogeneous. Consider > 0
6
Let the reader be reminded that the same does not hold for quasiconcavity: as previously noted, there
are quasiconcave functions which are not transformations of concave functions.
610 CHAPTER 18. HOMOGENEOUS FUNCTIONS

and x 2 Rn+ . Since h is homothetic and h (x) = h ( x 1), we have that h ( x) = h ( x 1).
Since x 0 is the unique number such that h ( x) = h ( x 1), we conclude that g ( x) =
x = x = g (x), proving that g is positively homogeneous.

The assumptions above on h play an important role, as the next example shows.

Example 893 De ne h : R2+ ! [0; 1) by

1 x1 > 0
h (x) =
0 x1 = 0

Note that R2+ is a semicone and that h is homothetic. Indeed, given > 0, if h (x) = h (y),
then either h (x) = h (y) = 1 and x1 ; y1 > 0 or h (x) = h (y) = 0 and x1 = y1 = 0. In the rst
case, the rst components of x and y are still strictly positive and h ( x) = 1 = h ( y).
In the second case, the rst components of x and y are still null and h ( x) = 0 = h ( y).
The homothetic function h, which is neither continuous nor strongly increasing, cannot be
obtained as composition of f and g where f : Im g ! R is strictly increasing and g is
positively homogeneous. Otherwise, consider 1 = (1; 1). Since h (1) = 1 > 0 = h (0) and
f is strictly increasing, we would have g (1) > g (0). Since g is positively homogeneous, we
could conclude that g (0) = 0. This would imply that g (1) > 0 and g (2; :::;2) = 2g (1) =
g (1) + g (1) > g (1), which would yield 1 = h (2; :::;2) > h (1) = 1, a contradiction. N
Chapter 19

Lipschitz functions

19.1 Global control


Lipschitz functions are an important class of functions that, unlike concavity, does not rely
on the vector structure of Rn but only on its topological structure.1 Yet, we will see that
Lipschitzianity sheds light on the continuity properties of linear and concave functions.
We begin with the de nition, which is stated directly in terms of operators.

De nition 894 An operator f : A Rn ! Rm is said to be Lipschitz on a subset B of Rn


if there exists a positive scalar k > 0 such that

kf (x1 ) f (x2 )k k kx1 x2 k 8x1 ; x2 2 B (19.1)

A function is called Lipschitz, without further quali cations, when the inequality (19.1)
holds on the entire domain of the function. When f is a function, this inequality takes the
simpler form
jf (x1 ) f (x2 )j k kx1 x2 k
where in the left hand side we have the absolute value in place of the norm.

In a Lipschitz operator, the distance kf (x1 ) f (x2 )k between the images of two vectors
x1 and x2 is controlled, through a positive coe cient k, by the distance kx1 x2 k between
the vectors x1 and x2 themselves. This \variation control" that the independent variable
exerts on the dependent variable is at the heart of Lipschitzianity. The rein is especially
tight when k < 1, so variations in the independent variable cause strictly smaller variations
of the dependent variable. In this case, the Lipschitz operator is called a contraction.
The control nature of Lipschitzianity translates in a strong form of continuity. To see
how, rst note that Lipschitz operators are continuous. Indeed, let x0 2 A. If xn ! x0 , we
have:
kf (xn ) f (x0 )k k kxn x0 k ! 0 (19.2)
and hence f (xn ) ! f (x0 ). So, f is continuous at x0 . More is true:

Lemma 895 Lipschitz operators are uniformly continuous.


1
This chapter and the next one are for coda readers. They use some (basic) di erential calculus notions
that will be introduced later in the book.

611
612 CHAPTER 19. LIPSCHITZ FUNCTIONS

The converse is false, as Example 897 will show momentarily. Because of its control
nature, Lipschitzianity thus embodies a stronger form of continuity than the uniform one.

Proof For each " > 0, take 0 < " < "=k. Then, kf (x) f (y)k k kx yk < " for each
x; y 2 Rn such that kx yk < " .

Example 896 A continuously di erentiable function f : [a; b] ! R is Lipschitz. Indeed,


set k = maxx2[a;b] jf 0 (x)j. Since the derivative f 0 is continuous on [a; b], by Weierstrass'
Theorem the constant k is well-de ned. Let x; y 2 [a; b]. By the Mean Value Theorem, there
exists c 2 [x; y] such that
f (x) f (y)
= f 0 (c)
x y
Hence,
jf (x) f (y)j
= f 0 (c) k
jx yj
So, f is Lipschitz. N
p
Example 897 The continuous function f : [0; 1) ! R de ned by f (x) = x is not
Lipschitz. Indeed,
p
f (x) f (0) x 1
lim = lim = lim p = +1
x!0+ x 0 x!0+ x x!0+ x

So, setting y = 0, there is no k > 0 such that jf (x) f (y)j k jx yj for each x; y 0.
That said, the previous example shows that f is Lipschitz on each interval [a; b] with
a > 0. So f is not Lipschitz on its entire domain, but it is in suitable subsets of it. More
interestingly, by Theorem 603 the function f is uniformly continuous on each interval [0; b],
with b > 0, but it is not Lipschitz on [0; b]. This also shows that the converse of the last
lemma does not hold. N

Next we present a remarkable class of Lipschitz operators.

Theorem 898 Linear operators are Lipschitz.

The theorem is a consequence of the following lemma of independent interest.

Lemma 899 Given a linear operator f : Rn ! Rm , there exists a constant k > 0 such that
kf (x)k k kxk for every x 2 Rn .

So, if x 6= 0 we have
kf (x)k
0< k
kxk
The ratio kf (x)k = kxk is thus bounded above by a constant k, so it cannot explode, for
all non-zero vectors x. In other words, there is no sequence fxn g of vectors such that
kf (xn )k = kxn k ! +1.
19.2. LOCAL CONTROL 613

Pn
Proof Set k = i=1 f ei . We have:

n
! n n
X X X
i i
kf (x)k = f xi e = xi f e jxi j f ei
i=1 i=1 i=1

Let x = (x1 ; :::; xn ) 2 Rn . For every j = 1; ::; n we have:


v
q uX
u n 2
jxj j = x2j t xj = kxk (19.3)
j=1

So, jxi j kxk for each i = 1; :::; n. Therefore,


n
X n
X n
X
jxi j f ei kxk f ei = kxk f ei = k kxk
i=1 i=1 i=1

which implies kf (x)k k kxk, as desired.

Proof of Theorem 898 Let x; y 2 Rn . Since f is linear, the last lemma implies

kf (x) f (y)k = kf (x y)k k kx yk

So, f is Lipschitz.

19.2 Local control


Lipschitzianity is a global property because the constant k in (19.1) is required to be the
same for each pair of vectors x and y in B. It is, however, possible to give a local version of
Lipschitzianity.

De nition 900 An operator f : A Rm ! Rn is said to be locally Lipschitz at a point


x0 2 A if there exist a neighborhood B" (x0 ) and a positive scalar kx0 > 0 such that

kf (x) f (y)k kx0 kx yk 8x; y 2 B" (x0 ) \ A

Note the local nature of this de nition: the constant kx0 depends on the point x0 at hand
and the inequality is required only between points of a neighborhood of x0 (not between any
two points of the domain of f ).

When f is locally Lipschitz at each point of a set B we say that it is locally Lipschitz on
B. If B is the entire domain, we say that the operator is locally Lipschitz, without further
quali cations.
Now, the \variation control" that the independent variable exerts on the dependent
variable is only local, in a neighborhood of a given point. This local control still translates in
a strong from of continuity at a point (with kx0 in place of k , (19.2) still holds as xn ! x0 ),
but no longer across points as it was the case with global Lipschitzianity.
614 CHAPTER 19. LIPSCHITZ FUNCTIONS

Example 901 A function f : [a; b] ! R is locally Lipschitz at x0 2 (a; b) if there is a


neighborhood B" (x0 ) [a; b] on which f is continuously di erentiable. Indeed, set

kx0 = max f 0 (x)


x2[x0 "0 ;x0 +"0 ]

where 0 < "0 < ". Since the derivative f 0 is continuous on [x0 "0 ; x0 + "0 ], by Weierstrass'
Theorem the constant k0 is well-de ned. By proceeding as in the Example 896, mutatis
mutandis, the reader can then check that f is locally Lipschitz at x0 . N

Clearly, an operator is Lipschitz on B is also locally Lipschitz on B. The converse fails,


as the example shows.

Example 902 The function f : R ! R de ned by f (x) = x2 is easily seen to be locally


Lipschitz at each x 2 R. But, f is not Lipschitz. Otherwise, there exists k such that
x2 y 2 k jx yj for all x; y 2 R. So, jx + yj k for all x; y 2 R, which is impossible. N

There is, however, an important case where local and global Lipschitzianity are equiva-
lent.

Proposition 903 An operator f : A Rm ! Rn is Lipschitz on a compact set K A if


and only if it is locally Lipschitz on K.

Proof Since the \only if " is obvious, we only prove the \if." Assume that f is locally
Lipschitz on K. Suppose, by contradiction, that f is not Lipschitz on K. So, there exist two
sequences fxn g and fyn g in K such that
kf (xn ) f (yn )k
! +1 (19.4)
kxn yn k
Since K is compact, by the Bolzano-Weierstrass' Theorem there exist two subsequences
fxnk g and fynk g such that xnk ! x 2 K and ynk ! y 2 K. Since f is continuous, we have
f (xnk ) ! f (x) and f (ynk ) ! f (y). We consider two cases.

(i) Suppose x 6= y. Then, kx yk > 0 and so


kf (xnk ) f (ynk )k kf (x) f (y)k
lim = < +1
k!1 kxnk ynk k kx yk
which contradicts (19.4).

(ii) Suppose x = y. By hypothesis, f is locally Lipschitz at x, so there is B" (x) such that

kf (x) f (y)k kx kx yk 8x; y 2 B" (x)

Since xnk ! x and ynk ! x, there is a large enough k" 1 so that xnk ; ynk 2 B" (x)
for all k k" . Then,
kf (xnk ) f (ynk )k
kx 8k k"
kxnk ynk k
which contradicts (19.4).
19.2. LOCAL CONTROL 615

In both cases, we thus end up with a contradiction. We conclude that f is Lipschitz on


K.

The next important result shows that concave functions are locally Lipschitz, thus clari-
fying the continuity properties of these fundamental functions.

Theorem 904 A concave function f : C ! R de ned on an open convex set C of Rn is


locally Lipschitz.

In view of Proposition 903, f is then Lipschitz on each compact set K C. The theorem
is a consequence of the following lemma of independent interest.

Lemma 905 A function continuous at an interior point of its domain is locally bounded at
that point.

Proof Let f : A Rn ! R be continuous at x0 2 int A. We want to show that f is locally


bounded at x0 , i.e., there exists a positive scalar mx0 > 0 and a neighborhood B" (x0 ) such
that jf (x)j mx0 for all x 2 B" (x0 ). By the de nition of continuity, limx!x0 f (x) = f (x0 ).
So, for " = 1 there is > 0 such that kx x0 k < implies jf (x) f (x0 )j < 1. For all x
that belong to the neighborhood B (x0 ) we thus have

jf (x)j = jf (x) f (x0 ) + f (x0 )j jf (x) f (x0 )j + jf (x0 )j 1 + jf (x0 )j

By setting mx0 = 1 + jf (x0 )j, we conclude that f is locally bounded at x0 .

Example 906 (i) The continuous function f : (0; 1) ! R de ned by 1=x is unbounded but
locally bounded at each point of its domain. (ii) The function f : R ! R de ned by
(
log jxj if x 6= 0
f (x) =
0 if x = 0

is neither continuous nor locally bounded at the origin. N

Proof of Theorem 904 We want to show that f is locally Lipschitz at any x 2 C. By the
last lemma, f is locally bounded at x, i.e., there exists mx 2 R and a neighborhood B2" (x),
without loss of generality of radius 2", such that jf (y)j mx for all y 2 B2" (x). Given
y1 ; y2 2 B2" (x), set
"
y3 = y 2 + (y2 y1 )
ky2 y1 k
Then, y3 2 B2" (x) since

"
ky3 xk = y3 y2 + (y2 y1 ) 2"
ky2 y1 k

Since
" ky2 y1 k
y2 = y1 + y3
ky2 y1 k + " ky2 y1 k + "
616 CHAPTER 19. LIPSCHITZ FUNCTIONS

concavity implies

" ky2 y1 k
f (y2 ) f (y1 ) + f (y3 )
ky2 y1 k + " ky2 y1 k + "

so that
ky2 y1 k ky2 y1 k
f (y1 ) f (y2 ) (f (y1 ) f (y3 )) 2mx (19.5)
ky2 y1 k + " "
Interchanging the roles of y1 and y2 , we get

ky1 y2 k ky1 y2 k
f (y2 ) f (y1 ) (f (y2 ) f (y3 )) 2mx
ky1 y2 k + " "

Along with (19.5), this implies

2mx
jf (y1 ) f (y2 )j ky1 y2 k
"
So, f is locally Lipschitz at x.

19.3 Translation invariance


De nition 907 A function f : Rn ! R, with f (1) 6= 0, is said to be:2

(i) normalized if f (k) = k for all k 2 R,

(ii) translation invariant if, for all x 2 Rn ,

f (x + k) = f (x) + kf (1) 8k 0 (19.6)

(iii) Blackwell if we replace = with .3

In words, a function is translation invariant if we can \take out positive constants", a


very weak form of linearity. Indeed, if f is linear we can take out any function, a much
stronger property. Even less is required on Blackwell functions.4
Note that a translation invariant f is normalized provided f (0) = 0 and f (1) = 1.
Indeed, by taking x = 0, we then have f (k) = f (0 + k) = f (0) + kf (1) = k.
Before presenting an example, next we show that translation invariant is a stronger notion
than it may appear prima facie.

Lemma 908 A function f : Rn ! R is translation invariant if and only if condition (19.6)


holds for all scalars k 2 R.
2
Throughout this section we set k = (k; :::; k) 2 Rn .
3
This terminology is not standard.
4
They are named after David Blackwell, who showed their great importance in dynamic programming.
19.3. TRANSLATION INVARIANCE 617

So, even if in the de nition we only require invariance with respect to positive constants,
it actually holds for any constant, positive or not.

Proof We only prove the \only if", the converse being trivial. Let f : Rn ! R be translation
invariant. We only need to prove that (19.6) also holds when k < 0. If k < 0, then k > 0.
By translation invariance with x + k in place of x, we have

f (x) = f ((x + k) k) = f (x + k) kf (1)

that is, by rearranging, f (x + k) = f (x) + kf (1), as desired.

Example 909 De ne f : Rn ! R by

f (x) = min li (x)


i=1;:::;n

where each li : Rn ! R is a linear function with li (1) = c 6= 0. Clearly, f (0) = 0 and


f (1) = c. The function f is translation invariant: for every x 2 Rn we have:

f (x + k) = min li (x + k) = kc + min li (x) = f (x) + kf (1) 8k 0


i=1;:::;n i=1;:::;n

It is normalized if and only if c = 1. Later in the book, Theorem 1564 will characterize this
class of translation invariant functions. N

Though translation invariance is much weaker than linearity, under monotonicity we still
have Lipschitzianity. Actually, for the result is enough that the function be Blackwell.

Proposition 910 An increasing Blackwell function is Lipschitz.

Proof First, note that since f is increasing, we have f (1) > 0. Let x 2 Rn . By (19.3), we
have jxi j kxk for each i = 1; :::; n.5 Therefore, maxi=1;:::;n jxi j kxk, which in turn implies
x y maxi=1;:::;n jxi yi j kxk for all x; y 2 Rn . So x y + kx yk. Since f is increasing
and Blackwell, we then have

f (x) f (y + kx yk) f (y) + kx yk f (1)

So, f (x) f (y) kx yk for all x; y 2 Rn . By exchanging the roles of x and y, we also
have f (y) f (x) kx yk for all x; y 2 Rn . We conclude that

jf (x) f (y)j f (1) kx yk 8x; y 2 Rn

as desired.

N.B. The proof shows that an increasing Blackwell function f is a contraction if and only
if f (1) < 1. In applications, this is the most relevant case. O

Remarkably, like positive homogeneity (Theorem 877), also under translation invariance
concavity and quasi-concavity are equivalent properties.
5
To ease matters, in this proof with an abuse of notation we write x k and x + k in place of x k and
x + k.
618 CHAPTER 19. LIPSCHITZ FUNCTIONS

Theorem 911 A translation invariant function is concave if and only if it is quasi-concave.

Proof We only prove the \if", the converse being obvious. Let f be quasi-concave. We
have, for all x 2 Rn ,
t t
f (x) t () f (x) t 0 () f x = f (x) f (1) 0 8t 2 R
f (1) f (1)
where t = (t; :::; t). So, for all x 2 Rn ,
t
x 2 (f t) () x 2 (f 0) 8t 2 R
f (1)

which implies:6
t
(f t) = (f 0) + 8t 2 R
f (1)
If t and s are any two scalars and 2 (0; 1), then
t + (1 )s
(f t) + (1 ) (f s) = (f 0) + (1 ) (f 0) +
f (1)
t + (1 )s
= (f 0) + = (f t + (1 ) s)
f (1)
That is,
(f t) + (1 ) (f s) = (f t + (1 ) s) (19.7)
Take any two points x; y 2 Rn and set f (x) = t and f (y) = s. Then, x 2 (f t) and
y 2 (f s), and x + (1 ) y 2 (f t) + (1 ) (f s). By (19.7), x + (1 )y 2
(f t + (1 ) s), that is,

f ( x + (1 ) y) t + (1 ) s = f (x) + (1 ) f (y)

So, f is concave.

Example 912 De ne the (negative) log-exponential function f : Rn ! R by


n
X
1 xi
f (x) = log ie
i=1
Pn
where ; > 0 and i=1 i = 1. We have f (0) = 0 and f (1) = = , as well as f (x + k) =
f (x) + kf (1) for all k 0. Hence, f is translation invariant, while it is normalized if and
onlyPif = . The function f is easily seen to be increasing. It is also quasi-concave because
log ni=1 P ie
xi is quasi-convex, being a strictly increasing transformation of the convex
n
function i=1 i e xi (which is a sum of convex functions). By the last two results, we
conclude that f is concave and Lipschitz. It is a contraction if and only if 0 < < . N
6
To be precise, the right hand side is the sum of sets
t t
(f 0) + = x+ : x 2 (f 0)
f (1) f (1)
in the sense of Section 21.4. Later in the proof we add upper contour sets.
19.3. TRANSLATION INVARIANCE 619

Observe that a translation invariant function f : Rn ! R cannot be strictly concave


if f (0) = 0: indeed, for any 2 [0; 1] we then have f ( 0 + (1 ) 1) = (1 ) f (1) =
f (0) + (1 ) f (1).
After the last result, another property that positive homogeneity and translation invari-
ance share. There is, indeed, a form of duality between these properties. Given a positively
homogeneous function f : Rn+ ! [0; 1), with f (x) > 0 for all x 6= 0, de ne g : Rn ! R by

g (x1 ; :::; xn ) = log f (ex1 ; :::; exn )

Then, g is translation invariant and normalized. Indeed,

g (x1 + k; :::; xn + k) = log f ex1 +k ; :::; exn +k = log f ex1 ek ; :::; exn ek
= log ek f (ex1 ; :::; exn ) = k + log f (ex1 ; :::; exn ) = k + g (x)

Conversely, let g : Rn ! R be translation invariant and normalized. De ne f : Rn+ ! (0; 1)


by
eg(log x1 ;:::;log xn ) x 2 Rn++
f (x1 ; :::; xn ) =
0 else
For all x 2 Rn++ , we then have

f ( x1 ; :::; xn ) = eg(log x1 ;:::;log xn )


= eg(log x1 +log ;:::;log xn +log )

= eg(log x1 ;:::;log xn )+log = eg(log x1 ;:::;log xn ) elog = f (x)

In turn, this easily implies that f is positively homogeneous.


620 CHAPTER 19. LIPSCHITZ FUNCTIONS
Chapter 20

Supermodular functions

20.1 Joins and meets


Given any two vectors x; y 2 Rn , the join x _ y is the vector of Rn given by

(x _ y)i = max fxi ; yi g 8i = 1; :::; n

while the meet x ^ y is the vector of Rn given by

(x ^ y)i = min fxi ; yi g 8i = 1; :::; n

The join x _ y is thus the smallest vector that is larger than both x and y, while the meet
x ^ y is the largest vector that is smaller than both of them. That is, for all z 2 Rn we have

z x and z y =) z x_y

and
z x and z y =) z x^y

Example 913 Let x = (0; 1) and y = (2; 0) be two vectors in the plane. We have

(x _ y)1 = max fx1 ; y1 g = max f0; 2g = 2 , (x _ y)2 = max fx2 ; y2 g = max f1; 0g = 1

so x _ y = (2; 1), while

(x ^ y)1 = min fx1 ; y1 g = min f0; 2g = 0 , (x ^ y)2 = min fx2 ; y2 g = min f1; 0g = 0

so x ^ y = (0; 0). N

The next simple, yet key, property relates meets, joins and sums.

Proposition 914 For all x and y in Rn we have:

x+y =x_y+x^y (20.1)

621
622 CHAPTER 20. SUPERMODULAR FUNCTIONS

Proof The equality is trivially true if x and y are scalars. If x and y are vectors of Rn , we
then have:

x ^ y + x _ y = ((x ^ y)1 ; :::; (x ^ y)n ) + ((x _ y)1 ; :::; (x _ y)n )


= ((x ^ y)1 + (x _ y)1 ; :::; (x ^ y)n + (x _ y)n )
= (x1 + y1 ; :::; xn + yn ) = x + y

as desired.

Next we report some further basic properties of joins and meets.

Proposition 915 For all x, y and z in Rn we have:

(i) x + (y _ z) = (x + y) _ (x + z) and x + (y ^ z) = (x + y) ^ (x + z);

(ii) (x + z) _ (y + z) = (x _ y) + z and (x + z) ^ (y + z) = (x ^ y) + z;

(iii) (x _ y) = ( x) _ ( y) and (x ^ y) = ( x) ^ ( y) for all 0;

(iv) x _ y = [( x) ^ ( y)] and x ^ y = (( x) _ ( y)).

Proof We prove only the rst equality in (i) and leave the other properties to the reader.
For each component i we have

(x + (y _ z))i = xi + max fyi ; zi g = max fxi + yi ; xi + zi g = ((x + y) _ (x + z))i

as desired.

Joins and meets permit to associate three positive vectors to a vector x in Rn : its positive
part x+ , its negative part x and its modulus jxj, de ned via the formulas

x+ = x _ 0 , x = (x ^ 0) and jxj = x _ ( x)

The modulus extends to Rn the absolute value by considering its order characterization (4.4),
while the norm extended it using the algebraic characterization. In terms of components, we
have
x+
i = max fxi ; 0g , xi = min fxi ; 0g and jxi j = max fxi ; xi g
In words, the components of x+ coincide with the positive ones of x and are 0 otherwise.
Similarly, the components of x coincide with the negative ones of x and are 0 otherwise.
In contrast, the components of jxj are the absolute values of the components of x (for this
reason, jxj is often called the absolute value of x). For instance, for x = ( 1; 2; 4) 2 R3 we
have
x+ = (0; 2; 4) , x = (1; 0; 0) and jxj = (1; 2; 4)
Next we report some, easily checked, properties of these three vectors.

Proposition 916 For all x and y in Rn we have:

(i) x = x+ x and x+ ^ x = 0;
20.1. JOINS AND MEETS 623

(ii) jxj = x+ + x .

In particular, by (ii) we have jxj = 0 if and only if x = 0.

Example 917 Let x = (x1 ; :::; xn ) 2 Rn be a portfolio of n primary assets traded in a


nancial market, where xi is the traded quantity of primary asset i. If xi 0 the portfolio
is long on asset i, that is, it buys xi units of the asset. In contrast, if xi 0 the portfolio is
short on asset i, that is, it sells xi units of the asset.
The decomposition x = x+ x can be interpreted as a trading strategy: the positive and
negative parts x+ and x describe, respectively, the long and short positions that portfolio x
involves { i.e., how much one has to buy and sell of each primary asset to form this portfolio.
For instance, let x = (1; 2; 3) 2 R3 be a portfolio in a market with three primary assets.
We have x+ = (1; 2; 0) and x = (0; 0; 3), so to form portfolio x one has to buy one unit of
the rst asset and two units of the second one and to sell three units of the third asset. N

Some further important properties are given next.

Proposition 918 For all x, y and z in Rn we have:

(i) if x = z y, with z; y 0, then x+ z and x y;

(ii) if x = z y and z ^ y = 0, then z = x+ and y = y ;

(iii) x y if and only if x+ y + and x y .

Proof (i) If x = z y, then z x and z 0. Hence, z x _ 0 = x+ . The other inequality


is obtained in a similar way.
(ii) From (i) we have x+ z and x y.
(iii) Let x y. Since y + y x and y + 0, we have y + x _ 0 = x+ . A
similar argument proves that x y . As to the converse, if x+ y + and x y then
x=x + x y + y = y.

The decomposition
x = x+ x (20.2)
is thus the minimal decomposition of a vector x. It has the natural monotonicity property
(iii). The next example illustrates.

Example 919 We can construct a portfolio x also by buying and selling according to any
pair of positive vectors x0 and x00 such that x = x0 x00 . In the last example we noted
that to form portfolio x = (1; 2; 3) one has to buy and sell the amounts prescribed by
x+ = (1; 2; 0) and x = (0; 0; 3), respectively. At the same time, this portfolio can be also
formed by buying an extra unit of the third asset and by selling the same extra unit of that
asset. In other words, we have that x = x0 x00 , where x0 = (1; 2; 1) and x00 = (0; 0; 4). By
Proposition 918-(i), we have
x+ x0 and x x00
So, the positive and negative parts represent the minimal holdings of the primary assets
needed to construct portfolio x. By Proposition 918-(iii), larger portfolios necessarily involve
larger short and long positions. N
624 CHAPTER 20. SUPERMODULAR FUNCTIONS

Some inequalities that hold for absolute values extend to modules.

Proposition 920 For all x and y in Rn we have:

(i) jx + yj _ jx yj = jxj + jyj;

(ii) jx + yj jxj + jyj;

(iii) jjxj jyjj jx yj.

The reader can prove these inequalities by rst establishing them for absolute values,
something that for properties (ii) and (iii) was actually done in Section 4.1.2. We close this
section by showing that moduli and norms are consistent in their ranking of vectors.

Proposition 921 For all x and y in Rn we have:

jxj jyj =) kxk kyk

Proof Just observe that jxj jyj implies jxi j jyi j, and so x2i yi2 , for each i = 1; :::; n.

In particular, we then have

kjxjk = kxk 8x 2 Rn

that is, the norm of a vector is equal to that of its module.

20.2 Lattices
Joins and meets permit to introduce lattices, an important class of sets.

De nition 922 A set L of Rn is a lattice if, for any two elements x and y of L, both x _ y
and x ^ y belong to L.

Lattices are, thus, subsets L of Rn that are closed under joins and meets, that is, both
the join and the meet of any its two elements belongs to L.

Example 923 (i) Given any x; y 2 Rn , the quadruple fx; y; x _ y; x ^ yg is the simplest
example of a nite lattice. (ii) Given any a; b 2 Rn , with a b, the interval

[a; b] = fx 2 Rn : a x bg

is clearly a lattice. Indeed, if a x b and a y b, it is easy to check that a x ^ y


x _ y b. Also the open and half-closed intervals in Rn are easily seen to be lattices. (iii)
A rectangle I = I1 In in Rn , where each Ii is an interval of the real line (bounded or
not), is a lattice. The intervals [a; b] are compact rectangles in which Ii = [ai ; bi ]. N
20.3. SUPERMODULAR FUNCTIONS 625

20.3 Supermodular functions


Next we introduce functions that have lattices as their natural domain.1

De nition 924 A function f : L Rn ! R is said to be:

(i) supermodular if f (x _ y) + f (x ^ y) f (x) + f (y) for all x; y 2 L;

(ii) submodular if the inequality is reversed;

(iii) modular if it is both supermodular and submodular.

Clearly, supermodularity and submodularity are dual notions, with f supermodular if


and only if f is submodular. In the rest of the chapter we will focus on supermodular
functions.

Example 925 (i) Functions of a single variable are modular. Indeed, let x; y 2 R with, say,
x y. Then, x ^ y = x and x _ y = x, so modularity trivially holds. (ii) Linear functions
f : Rn ! R are modular: by (20.1) we have

f (x _ y) + f (x ^ y) = f (x _ y + x ^ y) = f (x + y) = f (x) + f (y)

for all x; y 2 Rn . (iii) The function f : R2+ ! R de ned by f (x1 ; x2 ) = x1 x2 is supermodular,


as the reader can check. N

Interestingly, the modularity notions just introduced have no bite on functions of a single
variable, so they are of interest only in the multivariable case. That said, the next two results
show how to manufacture supermodular functions via convex transformations.

Proposition 926 Let f : L ! R be a monotone and modular function. If ' : C ! R is


a convex function de ned on a convex set of the real line, with Im ' C, then ' f is
supermodular.

Proof Let x; y 2 I with, say, f (x) f (y). By modularity, we have f (x _ y) f (x) =


f (y) f (x ^ y). We consider two cases. (i) Suppose that f is increasing. We then have
f (x _ y) f (x) f (y) f (x ^ y). Since ' has increasing increments (cf. Proposition
1424), we then have ' (f (y)) ' (f (x ^ y)) ' (f (x _ y)) f (x). So, ' f is supermodular.
(ii) Suppose that f is decreasing. Now, f (x _ y) f (y) f (x) f (x ^ y) and, since '
has increasing increments, we have ' (f (y)) ' (f (x _ y)) ' (f (x ^ y)) ' (f (x)). We
conclude that also in this case ' f is supermodular.

Example 927 Let f : Rn ! R be a positive linear function. Given any convex function
' : R ! R, the function ' f is supermodular. N

Proposition 928 Let f : L ! R be an increasing and supermodular function. If ' : C ! R


is a convex and increasing function de ned on a convex set of the real line, with Im ' C,
then ' f is supermodular.
1
Throughout the chapter L denotes a lattice in Rn .
626 CHAPTER 20. SUPERMODULAR FUNCTIONS

Proof Let x; y 2 I with, say, f (x) f (y). Since f is increasing, we have f (x ^ y) f (y)
f (x) f (x _ y). Set k = f (x _ y) f (x) f (y) f (x ^ y) = h. Since f is supermodular,
we have k h 0. Since ' has increasing increments, we then have

' (f (y)) ' (f (x ^ y)) = ' (f (x ^ y) + h) ' (f (x ^ y)) ' (f (x) + h) ' (f (x))
' (f (x) + k) ' (f (x)) = ' (f (x _ y)) ' (f (x))

where the last inequality holds because ' is increasing. So, ' f is supermodular.

Example 929 De ne f : R2+ ! R by f (x1 ; x2 ) = x1 x2 . Given any increasing and convex


function ' : R+ ! R, the function ' f is supermodular. N

20.4 Functions with increasing cross di erences


20.4.1 Sections
A function f : A1 A2 ! R de ned on a Cartesian product A1 A2 induces the functions
f x1 : A2 ! R de ned by f x1 (x2 ) = f (x1 ; x2 ) for each x1 2 A1 as well as the functions
f x2 : A1 ! R de ned by f x2 (x1 ) = f (x1 ; x2 ) for each x2 2 A2 . These functions are called
the sections of f .

Example 930 Consider the function f : [1; 1) [3; 1) ! R de ned by f (x1 ; x2 ) =


p
(x1 1) (x2 3). For a xed x1 1, the section f x1 : [3; 1) ! R has x2 as the in-
dependent variable.
p For instance, if x1 = 5 the section f 5 : [3; 1) ! R is de ned by
f 5 (x2 ) = 2 x2 3. On the other hand, for a xed x2 3, the section f x2 : [1; 1) ! R
has x1 as the independent 12
p variable. For example, if x2 = 12 the section f : [1; 1) ! R is
12
de ned by f (x1 ) = 3 x1 1. N

More in general, a function f : A1 An ! R de ned on a Cartesian product


A1 An induces, for each i = 1; :::; n, the sections f xi : A i ! R de ned by f xi (x i ) =
f (xi ; x i ) in which the vector x i is the variable.2
On the other hand, rather than blocking a single variable, we can do the opposite: block
all but a single variable. In this case, for each i = 1; :::; n we have the section f x i : Ai ! R
de ned by f x i (xi ) = f (xi ; x i ) in which the scalar xi is the variable.

Example 931 Consider the function f : [1; 1) [3; 1) [2; 1) ! R de ned by f (x1 ; x2 ; x3 ) =
p
(x1 1) (x2 3) (x3 2). For a xed x1 1, the section f x1 : [3; 1) [2; 1) ! R now has
x2 and x3 as the independent variables { indeed, we have x 1 = (x2 ; xp 3 ). For instance, if x1 =
5 5
5 the section f : [3; +1) [2; +1) ! R is de ned by f (x2 ) = 2 (x2 3) (x3 2). In a
similar way we can de ne the sections f x2 : [1; 1) [2; 1) ! R and f x3 : [1; 1) [3; 1) ! R.
On the other hand, if we x x 1 = (x2 ; x3 ) 2 [3; 1) [2; 1), we have the section
f x2 ;x3 : [1; 1) ! R that has x1 as the independent variable. Forpinstance,
p if x2 = 6 and
x3 = 10, the section f 6;4 : [1; 1) ! R is de ned by f 6;4 (x1 ) = 2 8 x1 1. In a similar
way we can de ne the sections f x1 ;x3 : [3; 1) ! R and f x1 ;x2 : [2; 1) ! R. N
2
Recall the notation x i from Section 14.1. Here A i is the Cartesian products of all sets fA1 ; :::; An g
except Ai , i.e., A i = j6=i Aj .
20.4. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 627

The sections f x i can be used to formalize ceteris paribus arguments in which all variables
are kept xed, except xi . Indeed, partial derivation at a point x 2 Rn can be expressed in
terms of these sections:
@f (x) fx i (xi + h) fx i (xi )
= lim
@xi h!0 h

In sum, we have sections f xi in which the variable xi is kept xed and the other variables
vary, as well as a section f x i in which the opposite holds: the variable xi is the only
independent variables, the other ones being kept xed. In a similar spirit we can have
\intermediate" sections in which we block a subset of the variables.

Example 932 Consider p the function f : [1; 1) [3; 1) [2; 1) [ 1; 1) ! R de ned


by f (x1 ; x2 ; x3 ) = (x1 1) (x2 3) (x3 2) (x4 + 1). The \intermediate" section f x2 ;x3 :
[1; 1) [ 1; 1) ! R has p x1 and x4 as independent variables. So, if x2 = 6 and x3 = 5, we
have f x2 ;x3 (x1 ; x4 ) = 3 (x1 1) (x4 + 1). N

In terms of notation, sections f x i : Ai ! R and f xi : A i ! R are often written as


f ( ; x i ) : Ai ! R and f (xi ; ) : A i ! R, respectively. For instance, we then write

@f (x) f (xi + h; x i ) f (x)


= lim
@xi h!0 h

Though this notation is more handy, superscripts best emphasize the parametric role of the
blocked variables.

20.4.2 Increasing cross di erences and complementarity


In what follows we denote by I = I1 In a rectangle in Rn , where each interval Ii is
bounded or not.

De nition 933 A function f : I Rn ! R has increasing (cross) di erences if, for each
xi 2 Ii and hi 0 with xi + hi 2 Ii , the di erence

f xi +hi (x i ) f xi (x i )

is increasing in x i , while f has decreasing di erences if such di erence is decreasing in x i .

Increasing and decreasing di erences are dual notions, so we will focus on the former.
For functions of two variables, we have a simple characterization of this property.

Proposition 934 A function f : I R2 ! R of two variables has increasing di erences if


and only if

f (x1 ; x2 + h2 ) f (x1 ; x2 ) f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 ) (20.3)

for all xi 2 Ii and hi 0 with xi + hi 2 Ii .


628 CHAPTER 20. SUPERMODULAR FUNCTIONS

Proof Let (x1 ; x2 ) 2 I and (h1 ; h2 ) 0 with xi + hi 2 Ii . By de nition, f has increasing


di erences when the di erences

f x1 +h1 (x2 ) f x1 (x2 ) and f x2 +h2 (x1 ) f x2 (x1 )

are increasing in x2 2 I2 and in x1 2 I1 , respectively. In particular, we then have

f x1 +h1 (x2 ) f x1 (x2 ) f x1 +h1 (x2 + h2 ) f x1 (x2 + h2 ) (20.4)

and
f x2 +h2 (x1 ) f x2 (x1 ) f x2 +h2 (x1 + h1 ) f x2 (x1 + h1 ) (20.5)
which are both equal to (20.3).

The inequality (20.3) admits an important economic interpretation. If f is a production


function, it says that the marginal contribution of increasing the second input from x2 to
x2 + h2 increases when we increase the rst input from x1 to x1 + h1 . By rearranging the
terms in the inequality (20.3) we have

f (x1 + h1 ; x2 ) f (x1 ; x2 ) f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )

So, symmetrically, an increase in the rst input has a higher impact when also the second
input increases. In sum, the marginal contribution of an input is increasing with the other
input: the two inputs are complementarity.

Proposition 935 A function f : I Rn ! R has increasing di erences if and only if, for
each 1 i 6= j n, the section f x ij : I ij R2 ! R satis es (20.3), i.e.,

fx ij
(xi ; xj + hj ) fx ij
(xi ; xj ) fx ij
(xi + hi ; xj + hj ) fx ij
(xi + hi ; xj ) (20.6)

for all (xi ; xj ) 2 Ii Ij and hi ; hj 0 with xi + hi 2 Ii and xj + hj 2 Ij .

In terms of the previous interpretation, we can say that a production function has increas-
ing di erences if and only if its inputs are pairwise complementary. Increasing di erences
thus model this form of complementarity. In a dual way, decreasing di erences model an
analog form of substitutability.
Proof Assume that f has increasing di erences. To x ideas, let i = 1 and j = 2. We want
to show that

fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )

We have

fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) = f (x1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 ; x2 ; x3 ; :::; xn )
x2 +h2 x2
=f (x1 ; x3 ; :::; xn ) f (x1 ; x3 ; :::; xn )
x2 +h2
f (x1 + h1 ; x3 ; :::; xn ) f x2 (x1 + h1 ; x3 ; :::; xn )
= f (x1 + h1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 + h1 ; x2 ; x3 ; :::; xn )
x x
=f 12
(x1 + h1 ; x2 + h2 ) f 12
(x1 + h1 ; x2 )
20.4. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 629

as desired. The general case is analogous, just notationally cumbersome. So, (20.6) holds.
We omit the proof of the converse.

The complementarity nature of functions with increasing di erences, in which \the


marginal contribution of an input is increasing with the other input", has mathematically a
(cross) second-order avor. The next di erential characterization con rms this intuition.

Proposition 936 A continuously di erentiable function f : (a; b) Rn ! R has increasing


di erences if and only if, for each 1 i 6= j n, we have
@f (x)
0 (20.7)
@xi @xj
Proof \Only if". Suppose f has increasing di erences. To x ideas, let i = 1 and j = 2. By
Proposition 935, the section f x 12 : I1 I2 ! R satis es (20.3). Let x1 x01 . By setting
0
h1 = x1 x1 , we get
fx 12 (x1 ; x2 + h2 ) fx 12 (x1 ; x2 ) fx 12 (x01 ; x2 + h2 ) fx 12 (x01 ; x2 )
:
h2 h2
So, letting h2 ! 0, we conclude that
@f (x1 ; x2 ; :::; xn ) @f (x01 ; x2 ; :::; xn )
x1 x01 =)
@x2 @x2
In turn, this implies @f (x) =@x2 @x1 0. A similar argument shows that @f (x) =@x1 @x2 0.
\If". Suppose @f (x) =@xi @xj 0 for all 1 i 6= j n. In view of Proposition 935, it
is enough to show that the sections f x ij have increasing di erences. Again to x ideas, let
i = 1 and j = 2. By hypothesis, x1 x01 implies @f x 12 (x1 ; x) =@x2 @f x 12 (x01 ; x) =@x2 .
Since f is continuously di erentiable, its partial derivatives are continuous. So, we have
Z x2 +h2
x 12 x 12 @f x 12 (x1 ; x2 )
f (x1 ; x2 + h2 ) f (x1 ; x2 ) = dx2
x2 @x2
Z x2 +h2
@f x 12 (x1 + h1 ; x2 )
dx2
x2 @x2
= fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )
By Proposition 934, f x 12 has increasing di erences.

Example 937 (i) Let f : R2+ ! R be a CES production function de ned by f (x) =
1
( x1 + (1 ) x2 ) with 2 [0; 1] and > 0 (cf. Example 865). We have
@f (x) 1 1
2
= (1 ) (1 ) (x1 x2 ) ( x1 + (1 ) x2 )
@x1 @x2
By the previous result, f has decreasing di erences if > 1 and increasing di erences
if 0 < < 1. So, the parameter determines whether the inputs in the CES pro-
duction functions are complements or substitutes. (ii) Let f : R2+ ! R be a Cobb-
Douglas production function f (x) = x1 1 x2 2 , with 1 ; 2 > 0 (cf. Example 869). Since
@f (x) =@x1 @x2 = 1 2 x1 1 1 x2 2 1 , by the previous result f has increasing di erences (so,
its inputs are complements). N
630 CHAPTER 20. SUPERMODULAR FUNCTIONS

Next we establish a key characterization of increasing di erences through supermodu-


larity, a simpler analytical property. Because of this result, one can say that supermodular
functions model complementarities.

Theorem 938 A function f : I Rn ! R has increasing di erences if and only if it is


supermodular, i.e.,

f (x _ y) + f (x ^ y) f (x) + f (y) 8x; y 2 I

A function f of several variables is easily seen to admit the following \telescopic" expan-
sion: if x y, then

f (y) f (x) = f (y1 ; x2 ; :::; xn ) f (x1 ; :::; xn ) + f (y1 ; y2 ; x3 ; :::; xn ) f (y1 ; x2 ; x3 ; :::; xn )
+ + f (y1 ; :::; yn ) f (y1 ; :::; yn 1 ; xn )
n
X
= f (y1 ; :::; yi ; xi+1 ; :::; xn ) f (y1 ; :::; yi 1 ; xi ; :::; xn )
i=1

The proof of the previous theorem relies on this expansion.

Proof \If". Suppose that f has increasing di erences. Let x; y 2 I. By (20.1), we can set

h=x_y x=y x^y 0

and so x _ y = x + h and x ^ y = y h. By the telescoping expansion, we have

f (x _ y) f (x) = f (x + h) f (x)
Xn
= f (x1 + h1 ; :::; xi + hi ; xi+1 ; :::; xn ) f (x1 + h1 ; :::; xi 1 + hi 1 ; xi ; :::; xn )
i=1
n
X
= f xi +hi (x1 + h1 ; :::; xi+1 ; :::; xn ) f xi (x1 + h1 ; :::; xi 1 + hi 1 ; xi+1 ; :::; xn )
i=1
Xn
f xi +hi (y1 ; :::; yi 1 ; yi+1 hi+1 ; :::; yn hn ) f xi (y1 ; :::; yi 1 ; yi+1 hi+1 ; ::
i=1
Xn
= f (y1 ; :::; yi 1 ; xi + hi ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; xi ; yi+1 hi
i=1
Xn
= f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi + hi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi )
i=1
Xn
f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi hi )
i=1
Xn
= f (y1 ; :::; yi; ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; yi hi ; :::; yn hn )
i=1
= f (y) f (y h) = f (y) f (x ^ y)
20.5. SUPERMODULARITY AND CONCAVITY 631

where the rst inequality follows from increasing di erences, while the second one holds
because a function of a single variable { like the section f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn { is
trivially supermodular.
\Only if". Suppose that f is supermodular. In view of Proposition 935, it is enough to
show that 20.6 holds. Let y = (xi ; xj + hj ; x ij ) 2 Rn and z = (xi + hi ; xj ; x ij ) 2 Rn , so
that z _ y = (xi + hi ; xj + hj ; x ij ) and z ^ y = x. By the supermodularity of f , we then
have

fx ij
(xi + hi ; xj + hj ) + f x ij
(xi ; xj ) = f (xi + hi ; xj + hj ; x ij ) + f (x) = f (z _ y) + f (z ^ y)
f (z) + f (y) = f (xi + hi ; xj ; x ij ) + f (xi ; xj + hj ; x ij )
x x
=f ij
(xi + hi ; xj ) + f ij
(xi ; xj + hj )

as desired.

20.5 Supermodularity and concavity


In general, concavity and supermodularity are independent properties: there exist super-
modular functions that are not concave (just take any non-concave function of a single
variable) as well as concave functions that are not supermodular { for instance the function
f : R2++ ! R de ned by f (x1 ; x2 ) = log (x1 + x2 ) is strictly concave but not supermodular.
Remarkably, supermodular functions become tightly connected to concave functions un-
der either positive homogeneity or translation invariance, as it was the case for quasi-concave
functions (Theorems 877 and 911).

Theorem 939 (Choquet) Let f : Rn+ ! R be positively homogeneous. If f is supermodu-


lar, then it is concave. The converse holds if n = 2.

For production functions, this means that, under constant returns to scale, complemen-
tarity implies concavity.

Proof We only prove the result when f is twice di erentiable on Rn++ . Let x; y 2 Rn+ . From

yi yj 2
yi2 yj2 yi yj
= + 2
xi xj x2i x2j xi xj

it follows that
2
1 xj 1 xi 1 yi yj
yi yj = yi2 + yj2 xi xj
2 xi 2 xj 2 xi xj
So,
0 1
X n
X Xn X 2
@f (x) y2 i @ @f (x) A 1 @f (x) yi yj
y i yj = xj xi xj
@xi @xj xi @xi @xj 2 @xi @xj xi xj
1 i;j n i=1 j=1 1 i;j n

Since f is homogeneous, by Euler's formula we have


n
X @f (x)
f (x) = xi 8x 2 Rn++
@xi
i=1
632 CHAPTER 20. SUPERMODULAR FUNCTIONS

By di erentiating with respect to xj , we then have


n
@f (x) @f (x) X @f (x)
= + xi 8x 2 Rn++
@xj @xj @xi @xj
i=1

that is,
n
X @f (x)
xi = 0 8x 2 Rn++
@xi @xj
i=1

We conclude that, for all x 2 Rn++ ,

X @f (x) 1 X @f (x) yi yj 2
yi yj = xi xj
@xi @xj 2 @xi @xj xi xj
1 i;j n 1 i;j n
X @f (x) yi yj 2
= xi xj 0
@xi @xj xi xj
1 i6=j n

where the last inequality follows from (20.7) and Theorem 938. The Hessian matrix of f is
thus negative semi-de nite for all x 2 Rn++ and so f is concave on Rn++ . The reader can
check that the converse holds when n = 2.

Example 940 Let f : R2+ ! R be the positively homogeneous function de ned by f (x) =
1
(x1 1 x2 2 ) 1 + 2 , with 1 ; 2 > 0. It is supermodular if 1 + 2 1 (why?), so it is concave
by Choquet's Theorem. N

A similar result holds for translation invariant functions (we omit the proof of this note-
worthy result).

Theorem 941 Let f : Rn ! R be translation invariant. If f is supermodular, then it is


concave. The converse holds if n = 2.

20.6 Log-convex functions


In what follows we denote by C a convex set in Rn .

De nition 942 A strictly positive function f : C ! (0; 1) is said to be log-convex if

f ( x + (1 ) y) [f (x)] [f (y)]1

for every x; y 2 C and 2 [0; 1], and it is said to be log-concave if the inequality is reversed.

The next lemma motivates the terminology.

Lemma 943 A strictly positive function f : C ! (0; 1) de ned on a convex set C is


log-convex (log-concave) if and only if the composite function log f is convex (concave).
20.6. LOG-CONVEX FUNCTIONS 633

Proof We prove the convex version, the concave one being similar. \If". Let log f be convex.
In view of Proposition 46, we have

f ( x + (1 ) y) = elog f ( x+(1 )y)


e log f (x)+(1 ) log f (y)
=e log f (x) (1
e ) log f (y)
1
= elog[f (x)] elog[f (y)] = [f (x)] [f (y)]1

So, f is long-convex. \Only if". Let f be long-convex. Then,

log f ( x + (1 ) y) log [f (x)] [f (y)]1 = log [f (x)] + log [f (y)]1


= log f (x) + (1 ) log f (y)

as desired.
2
Example 944 (i) The function f : R ! (0; 1) given by f (x) = ex is log-convex. (ii)
2
The Gaussian function f : R ! (0; 1) de ned by f (x) = e x is log-concave. (iii) The
exponential function is both log-concave and log-convex. N

Log-convexity is much better behaved than log-concavity, as the next result and example
show. They are far from being dual notions.

Proposition 945 (i) Log-convex functions are convex. (ii) Concave functions are log-
concave functions, which in turn are quasi-concave.

Proof (i) Let f be long-convex. Since log f is convex, the result follows from the convex
version of Proposition 844-(i) because we can write f = elog f . (ii) Obvious.

Example 946 The quadratic function f : (0; 1) ! (0; 1) de ned by f (x) = x2 is, at the
same time, strictly convex and log-concave. Indeed, in view of the last lemma, it is enough
to note that log f (x) = 2 log x is concave. So, the converse of point (i) of the last proposition
fails (there exist convex functions that are not log-convex), while point (ii) is all we can say
about log-concave functions (they can even be strictly convex). N

It is easy to check that the product of log-convex functions is log-convex, as well as that
the product of log-concave functions is log-concave. Addition, instead, does not preserve
log-concavity.

Example 947 Let f; g : R ! R be the log-concave functions given by f (x) = ex and


g (x) = e2x . Their sum h (x) = ex + e2x is not log-concave. Indeed,

d2 x 2x e x
log e + e = >0
dx2 (1 + e x )2

so log f is not concave. N

As a further proof of the much better behavior of log-convexity, we have the following
remarkable result that shows that addition preserves log-convexity (we omit the proof).

Theorem 948 (Artin) The sum of log-convex functions is log-convex.


634 CHAPTER 20. SUPERMODULAR FUNCTIONS

Example 949 Given n strictly positive scalars ti > 0 and a strictly positive function ' :
(0; 1) ! (0; 1), de ne f : C ! (0; 1) by
n
X
f (x) = ' (ti ) txi
i=1

where C any interval of the real line, bounded or not. By Artin's Theorem, f is log-convex.
Indeed, each function ' (ti ) txi is log-convex in x because log ' (ti ) txi = log ' (ti ) + x log ti is
a ne in x.
An integral version of Artin's Theorem actually permits to conclude that if ' is contin-
uous, then the function f : C ! (0; 1) de ned by
Z 1
f (x) = ' (t) tx 1 dt
0

is log-convex (provided the improper integrals are well-de ned for all x 2 C). In this regard,
note that the function ' (t) tx 1 is log-convex in x since log ' (t) tx 1 = log ' (t)+(x 1) log t
is a ne in x. In the special case ' (t) = e t and C = (0; 1), the function f is the classic
gamma function Z 1
(x) = tx 1
e t dt
0
We will consider this log-convex function later in the book (Chapter 30). N
Chapter 21

Correspondences

21.1 A set-theoretic notion


The notion of correspondence generalizes that of function by permitting that to an element
of the domain can be associated multiple elements of the codomain, not a single one as the
notion of function requires. Correspondences play an important role in economic applica-
tions, which actually provided a main motivation for their study. In this section we introduce
them.
Speci cally, given any two sets X and Y , a correspondence ' : X Y is a rule that,
to each element x 2 X, associates a subset ' (x) of Y { the image of x under '. The set
dom ' = fx 2 X : ' (x) 6= ;g is the domain of ' and Y is the codomain. We say that ' is
viable when dom ' = X, i.e., all points of X have a non-empty image.
If ' (x) is a singleton for all x 2 X, the correspondence reduces to a function ' : X ! Y .
In what follows, whenever ' (x) is a singleton, say fyg, with a small abuse of notation, we
will either write ' (x) = fyg or ' (x) = y.

Example 950 (i) The correspondence ' : R R given by ' (x) = [ jxj ; jxj] associates to
each scalar x the interval [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g.
(ii) The budget correspondence B : Rn+ R+ Rn+ de ned by B (p; w) = x 2 Rn+ : p x w
associates to each pair (p; w) of prices and income the corresponding budget set.
(iii) Given a concave function f : Rn ! R, the superdi erential correspondence @f :
R n Rn has as image @f (x) the superdi erential of f at x (cf. Proposition 1524 later
in the book). The superdi erential correspondence generalizes for concave functions the
derivative operator rf : Rn ! Rn de ned in (27.6).
(iv) Let f : X ! Y be a function between any two sets X and Y . The inverse corre-
spondence f 1 : Im f X is de ned by f 1 (y) = fx 2 X : f (x) = yg. If f is injective, we
get back to the inverse function f 1 : Im f ! Y . For instance, if f : R ! R is the quadratic
function f (x) = x2 , then Im f = [0; 1) and so the inverse correspondence f 1 : [0; 1) R
is de ned by
p p
f 1 (y) = f y; yg
for all y 0. Recall that in Example 179 we argued that this rule does not de ne a function
since, to each strictly positive scalar, it associates two elements of the codomain, i.e., its
positive and negative square roots.

635
636 CHAPTER 21. CORRESPONDENCES

(v) All correspondences considered in this example are viable. Yet, in the last point we
can equivalently write f 1 : Y X and then say that dom f 1 = Im f . For instance, the
inverse of the quadratic function can be equivalently written as f 1 : R R with
( p p
1
y; y if y 0
f (y) =
; else
and dom f 1 = [0; 1). Whether to specify right away the domain, as we did when writing
f 1 : [0; 1) R, or not, as we just did by writing f 1 : R R, is purely a matter of
convenience and depends on the problem at hand. N
The graph Gr ' of a viable correspondence ' : X Y is the set
Gr ' = f(x; y) 2 X Y : y 2 ' (x)g
Like the graph of a function, the graph of a correspondence is a subset of X Y . If ' is a func-
tion, we get back to the notion of graph of a function Gr ' = f(x; y) 2 X Y : y = ' (x)g.
Indeed, condition y 2 ' (x) reduces to y = ' (x) when each image ' (x) is a singleton.
Example 951 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:

(ii) The graph of the budget correspondence B : Rn+ R+ Rn+ is


Gr B = (p; w; x) 2 Rn+ R+ A : x 2 B (p; w)
N
We close with an afterthought. A correspondence ' : X Y can be viewed as a
function ' : X ! 2Y that associates to each element x of X an element ' (x) Y of the
power set 2Y of Y , which is the collection 2Y = fA : A Y g of all its subsets (Section 7.3).
Correspondences can be thus seen as a special class of functions if we enlarge the codomain
to its power set. Yet, the \set angle" ' : X Y turns out to be more fruitful that the
Y
\point angle" ' : X ! 2 and for this reason in this chapter we adopt this latter viewpoint.
21.2. BACK TO EUCLIDEAN SPACES 637

21.2 Back to Euclidean spaces

From now on we consider viable correspondences ' : A Rm that have as domain a subset
A of Rn and as codomain Rm . We say that such a ' is:

(i) closed-valued if ' (x) is a closed subset for all x 2 A;

(ii) compact-valued if ' (x) is a compact subset for all x 2 A;

(iii) convex-valued if ' (x) is a convex subset for all x 2 A.

Functions are, trivially, both compact-valued and convex-valued because singletons are
compact convex sets. Let us see an important economic example.

Example 952 Suppose that the consumption set A is both closed and convex, say it is Rn+ .
Then, the budget correspondence is convex-valued, as well as compact-valued if p 0 and
n
w > 0, that is, when restricted to R++ R++ (cf. Propositions 991 and 992). N

The graph of a correspondence ' : A Rm is a subset of A Rm given by Gr ' =


f(x; y) 2 A Rm : y 2 ' (x)g. It is easy to see that ' is:

(i) closed-valued when its graph Gr ' is a closed subset of A Rm ;

(ii) convex-valued when its graph Gr ' is a convex subset of A Rm .

The converse implications are false: closedness and convexity of the graph of ' are
signi cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as we show next.

Example 953 (i) Consider f : R ! R given by

(
x if x < 0
f (x) =
1 if x 0

Since f is a function, it is both closed-valued and convex-valued. However, its graph

Gr ' = f(x; x) : x < 0g [ f(x; 1) : x 0g


638 CHAPTER 21. CORRESPONDENCES

is neither closed nor convex. Graphically:

The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin is a
boundary point that does not belong to Gr '.
(ii) A scalar function f : R ! R has convex graph if and only if it is a ne (i.e., it is a
straight line). The \if" is obvious. As to the \only if," suppose that Gr f R2 is convex.
Given any x; y 2 R and any 2 [0; 1], then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f ,
that is, f ( x + (1 ) y) = f (x) + (1 ) f (y), proving f is a ne. By Proposition 820,
this implies that there exist m; q 2 R such that f (x) = mx + q. We conclude that all scalar
functions that are not a ne are convex-valued but do not have convex graphs. N

Recall that a real-valued function f : A ! R, de ned on any set A, is bounded if its


image is a bounded set of the real line, i.e., if there is k > 0 such that jf (x)j k for all
x 2 A (Section 6.4.3). This notion extends naturally to functions f = (f1 ; ::; fn ) : A ! Rm
by saying that f is bounded if its image is a bounded set of Rm (De nition 167), that is, if
there exists k > 0 such that
kf (x)k k 8x 2 A
It is easy to check that f is bounded if and only if its component functions fi : A ! R are
bounded.
In a similar vein, we say that a correspondence ' : A Rm is bounded if there is a
m
compact subset K R such that
' (x) K 8x 2 A
If needed, we may write ' : A K. In any case, when ' : A ! Rm is a function we get
back to the notion of boundedness just introduced. Indeed, in this case ' (x) K amounts
to ' (x) 2 K, and it is easy to see that ' (x) 2 K for all x 2 A if and only if there is a
positive scalar k > 0 such that k' (x)k k for all x 2 A.
21.3. HEMICONTINUITY 639

Example 954 The budget correspondence is bounded if the consumption set A is bounded.
Indeed, by de nition B (p; w) A for all (p; w) 2 Rn+ R+ . N

21.3 Hemicontinuity
There are several notions of continuity for correspondences. For bounded correspondences,
the main class of correspondences for which continuity will be needed (cf. Section 41.4), the
following notions are adequate.

De nition 955 A correspondence ' : A Rm is

(i) upper hemicontinuous at x 2 A if

xn ! x, yn ! y and yn 2 ' (xn )

implies y 2 ' (x);

(ii) lower hemicontinuous at x 2 A if

xn ! x and y 2 ' (x)

implies that there exist elements yn 2 ' (xn ) such that yn ! y;

(iii) continuous at x 2 A if it is both upper and lower hemicontinuous at x.

A correspondence ' is upper (lower ) hemicontinuous if it is upper (lower) hemicontinuous


at all x 2 A. A correspondence ' is continuous if it is upper and lower hemicontinuous.

Intuitively, a upper hemicontinuous correspondence has no abrupt shrinks in the graph:


the image of the correspondence at each point x contains all possible limits of sequences
yn 2 ' (xn ) included in the graph. In contrast, a lower hemicontinuous correspondence has
no abrupt dilations in the graph: any element in the image of a point x must be reachable
as a limit of a sequence yn 2 ' (xn ) included in the graph.
The following examples illustrate these continuity notions.

Example 956 The correspondence ' : [0; 1] R given by


(
[0; 2] if 0 x<1
' (x) = 1
2 if x = 1
640 CHAPTER 21. CORRESPONDENCES

is lower hemicontinuous at x = 1. Graphically:

Formally, let xn ! 1 and y 2 ' (1) = f1=2g, that is, y = 1=2. If we take, for instance,
yn = 1=2 2 ' (xn ) for all n, we have yn ! y. In contrast, ' is not upper hemicontinuous at
x = 1 (where an \abrupt shrink" in the graph occurs). For example, consider the sequences
xn = 1 1=n and yn = 1=4. It holds xn ! 1 and yn 2 ' (xn ), but yn trivially converges to
1=4 2
= ' (1) = f1=2g. Finally, ' is easily seen to be continuous on [0; 1). N

Example 957 The correspondence ' : [0; 1] R given by


(
[1; 2] if 0 x<1
' (x) =
[1; 3] if x = 1
is upper hemicontinuous at x = 1. Graphically:
21.3. HEMICONTINUITY 641

Formally, if xn ! 1, yn ! y and yn 2 ' (xn ) = [1; 2], then y 2 [1; 2] ' (1). In contrast, '
is not lower hemicontinuous at x = 1 (where an \abrupt dilation" in the graph occurs). For
example, consider the sequence xn = 1 1=n and y = 3. It holds xn ! 1 and y 2 ' (1), but
there is no sequence fyn g such that yn 2 ' (xn ) that converges to y. Finally, ' is easily seen
to be continuous on [0; 1). N

The next two results further clarify the nature of upper hemicontinuous correspondences.

Proposition 958 A correspondence ' : A Rm is upper hemicontinuous if its graph is a


closed set. The converse is true if its domain A is a closed set.

Proof Suppose Gr ' is closed. Let xn ! x, yn ! y and yn 2 ' (xn ). Since (xn ; yn ) ! (x; y)
and Gr ' is a closed set, we have (x; y) 2 Gr ', yielding that y 2 ' (x). We conclude that '
is upper hemicontinuous.
As to the converse, assume that the domain A is closed and ' : A Rm is upper
hemicontinuous. Let f(xn ; yn )g Gr ' be such that (xn ; yn ) ! (x; y) 2 Rn Rm . To show
that Gr ' is closed, we need to show that (x; y) 2 Gr '. Since A is closed, xn ! x 2 A.
By construction, we also have that yn ! y and yn 2 ' (xn ) for every n. Since ' is upper
hemicontinuous, we have y 2 ' (x), proving that (x; y) 2 Gr ' and that Gr ' is closed.

Proposition 959 An upper hemicontinuous correspondence ' : A Rm is closed-valued.

In turn, this implies that upper hemicontinuous correspondences are compact-valued


when they are bounded.

Proof Let x 2 A. We need to show that ' (x) is a closed set. Consider fyn g ' (x) to be
such that yn ! y 2 Rm . De ne fxn g A to be such that xn = x for every n. It follows
that xn ! x, yn ! y and yn 2 ' (xn ) for every n. Since ' is upper hemicontinuous, we can
conclude that y 2 ' (x), yielding that ' (x) is closed.

For bounded functions the two notions of hemicontinuity are equivalent to continuity.

Proposition 960 For a bounded function f : A ! Rm and a point x 2 A, the following


properties are equivalent:

(i) f is continuous at x;

(ii) f is lower hemicontinuous at x;

(iii) f is upper hemicontinuous at x.

Proof First observe that, being f a function, y = f (x) amounts to y 2 f (x), when we look
at the function f as a single-valued correspondence.
(i) implies (ii). Let xn ! x and y = f (x). Since f is a function, we can only choose
fyn g to be such that yn = f (xn ). By continuity, yn = f (xn ) ! f (x) = y, so f is lower
hemicontinuous at x.
(ii) implies (iii). Let xn ! x and fyn g such that yn 2 f (xn ) and yn ! y. Since
f is a function, we can only choose fyn g to be such that yn = f (xn ). Since f is lower
642 CHAPTER 21. CORRESPONDENCES

hemicontinuous at x, it holds yn ! f (x) = y. This implies that f is upper hemicontinuous


at x.
(iii) implies (i). Let xn ! x. We want to show that yn = f (xn ) ! f (x). Suppose not.
Then there is " > 0 and a subsequence fynk g such that
kynk f (x)k " 8k 1 (21.1)
Since fynk g is a bounded sequence of vectors (being f bounded), by the Bolzano-Weierstrass'
Theorem (which is easily seen to hold also for bounded sequences of vectors) there is a further
subsequence ynks that converges to y 2 Rm . Since xnks ! x and ynks = f xnks , by upper
hemicontinuity y 2 f (x), that is, y = f (x). Hence, for all s large enough ynks f (x) < ",
which contradicts (21.1). We conclude that yn = f (xn ) ! y = f (x), i.e., f is continuous at
x.

Boundedness is key in this proposition, as the next example shows.

Example 961 The unbounded function f : R ! R given by


( 1
x if x > 0
f (x) =
0 if x 0

has closed graph in R2 (why?), so is upper hemicontinuous (Proposition 958), but it is


discontinuous at the origin. N

21.4 Addition and scalar multiplication of sets


To complete our study of correspondences, we need to introduce addition and scalar multi-
plication for sets. We begin with addition.

De nition 962 Given any two sets A and B in Rn , their sum A + B is the set in Rn such
that
A + B = fx + y : x 2 A and y 2 Bg

In words, A + B consists of all the possible sums x + y of elements of A and B.


21.4. ADDITION AND SCALAR MULTIPLICATION OF SETS 643

Note that if 0 2 A, then B A + B because y = 0 + y 2 A + B for all y 2 B.

Example 963 (i) The sum of the unit square A = [0; 1] [0; 1] and of the singleton B =
f(3; 3)g is the square A + B = [3; 4] [3; 4]. (ii) The sum of the squares A = [0; 1]
[0; 1] and B = [2; 3] [2; 3] is the square A + B = [2; 4] [2; 4]. Note that B A+B
since 0 2 A. (iii) The sum of the sides A = f(x1 ; x2 ) 2 [0; 1] [0; 1] : x1 = 0g and B =
f(x1 ; x2 ) 2 [0; 1] [0; 1] : x2 = 0g of the unit square is the unit square itself, i.e., A + B =
[0; 1] [0; 1]. (iv) The sum of the vertical A = (x1 ; x2 ) 2 R2 : x1 = 0 and horizontal
B = (x1 ; x2 ) 2 R2 : x2 = 0 axes is the entire plane, i.e., A + B = R2 . N

Next we give some properties of sums of sets.

Proposition 964 Let A and B be any two sets in Rn . Then:

(i) if A and B are convex, their sum A + B is convex;


(ii) if A is closed and B is compact, their sum A + B is closed;
(iii) if A and B are compact, their sum A + B is compact;
(iv) if A or B is open, their sum A + B is open.

Proof (i) Let A and B be convex. Let v; w 2 A + B and 2 [0; 1]. By de nition, there
exist x0 ; x00 2 A and y 0 ; y 00 2 B such that v = x0 + y 0 and w = x00 + y 00 . Since A and B are
convex, we have that x0 + (1 ) x00 2 A and y 0 + (1 ) y 00 2 B. This implies that

v +(1 )w = x0 + y 0 +(1 ) x00 + y 00 = x0 +(1 ) x00 + y 0 +(1 ) y 00 2 A+B

(ii) Let A be closed and B compact. Let fzn g A + B be such that zn ! z 2 Rn . We


want to show that z 2 A + B. By de nition, there exist fxn g A and fyn g B such that
zn = xn + yn for all n 1. Since B is compact, by the Bolzano-Weierstrass' Theorem there
exists a subsequence fynk g B that converges to some y 2 B. So, by the algebra of limits
we have
lim xnk = lim (znk ynk ) = lim znk lim ynk = z y
k!1 k!1 k!1 k!1
Since A is closed, we have z y 2 A. In turn, this implies z 2 A + B. (iii) Let A and B be
compact. By point (ii), A + B is closed. As the reader can check, A + B is also bounded.
So, it is compact. (iv) Assume[that A is open. Given any y 2 B, the set A + y is easily seen
to be open. Since A + B = (A + y), we conclude that A + B is open because it is the
y2B
union of open sets.

By iterating the sum of two sets, we can de ne the sum


n
X
Ai (21.2)
i=1

of n sets Ai in Rn . Properties (i) and (iii) just established for the sum of two sets continue
to hold for sums of n sets.

We turn now to scalar multiplication.


644 CHAPTER 21. CORRESPONDENCES

De nition 965 Given a scalar 2 R and a set A in Rn , their product A is the set in Rn
such that A = f x : x 2 Ag.

Example 966 The product of the unit square A = [0; 1] [0; 1] and of = 2, is the square
2A = [0; 2] [0; 2]. N

The sum (21.2) thus generalizes to a linear combination


n
X
i Ai
i=1

of n sets Ai in Rn and n scalars i 2 R.

21.5 Combining correspondences


Through sums of sets we can de ne sums of correspondences.

De nition 967 Given any two correspondences '; :A Rm , their sum ' + :A Rm
is the correspondence such that

(' + ) (x) = ' (x) + (x) 8x 2 A

In view of Proposition 964, the sum of two convex-valued correspondences is convex-


valued, and the sum of two compact-valued correspondences is compact-valued.

Proposition 968 Let '; :A Rm be any two correspondences and ; 2 R. Then:

(i) if ' and are bounded and upper hemicontinuous at a point, their sum '+ is
upper hemicontinuous at that point;

(ii) if ' and are lower hemicontinuous at a point, their sum ' + is lower hemicon-
tinuous at that point.

Proof It is enough to consider the case = = 1, as the general case then easily follows.
(i) Suppose that at x we have xn ! x, yn ! y and yn 2 (' + ) (xn ). We want to show
that y 2 (' + ) (x). By de nition, for each n there exist yn0 2 ' (xn ) and yn00 2 (xn ) such
that yn = yn0 + yn00 . Since ' and are bounded, there exist compact sets K' and K such
that fyn0 g K' and fyn00 g K . Hence, both sequences are bounded, so by the Bolzano-
Weierstrass' Theorem there exist subsequences yn0 k and yn00k that converge to some points
y 0 2 Rm and y 00 2 Rm , respectively. Since yn0 k 2 ' (xnk ) and yn00k 2 (xnk ) for every k and
xnk ! x, we then have y 0 2 ' (x) and y 00 2 (x) because ' and are upper hemicontinuous
at x. We conclude that y = limk!1 ynk = limk!1 yn0 k + yn00k = y 0 + y 00 2 (' + ) (x), as
desired.
(ii) Suppose that at x we have xn ! x and y 2 (' + ) (x). We want to show that there
exist elements yn 2 (' + ) (xn ) such that yn ! y. By de nition, there exist y 0 2 ' (x)
and y 00 2 (x) such that y = y 0 + y 00 . Since ' and are lower hemicontinuous, there exist
elements yn0 2 ' (xn ) and yn00 2 (xn ) such that yn0 ! y 0 and yn00 ! y 00 . Setting yn = yn0 + yn00
we then have yn 2 (' + ) (xn ) and yn = yn0 + yn00 ! y 0 + y 00 = y, as desired.
21.5. COMBINING CORRESPONDENCES 645

By iterating the linear combination of two correspondences, we can de ne the linear


combination of
X n
i 'i (21.3)
i=1

of n correspondences and n scalars i 2 R. The properties of the linear combinations of two


correspondences just established, continue to hold for linear combination of n correspon-
dences.
646 CHAPTER 21. CORRESPONDENCES
Part V

Optima

647
Chapter 22

Optimization problems

Optimization problems are fundamental in economics, which is based on the analysis of


maximization or minimization problems solved by economic agents, such as individuals (con-
sumers, producers, and investors), families, and governments. Methodological individualism
is, indeed, at the heart of economic analysis that, thus, aims to explain economic phenom-
ena in terms of individual agents' purposeful and optimal (so, rational) behavior. A rational
agent { the homo oeconomicus { is the, idealized, basic unit of analysis of economic theory.
As a result, this is the central chapter of the book that justi es the study of the notions
discussed so far and of those that will be seen in the rest of the book.

22.1 Generalities
Consider the function f : R ! R given by f (x) = 1 x2 , with graph:

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

It is immediate to see that f attains its maximum value, equal to 1, at the point x = 0, that
is, at the origin (Example 251). On the other hand, there is no point at which f attains a
minimum value.

649
650 CHAPTER 22. OPTIMIZATION PROBLEMS

Suppose that, for some reason, we are interested in the behavior of f only on the interval
[1; 2], not on the entire domain R. Then f has 0 as maximum value, attained at the point
x = 1, while it has 3 as minimum value, attained at the point x = 2. Graphically:

4 y
3

1
1 2
0
O x
-1

-2

-3 -3
-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

From this example two crucial observations follow:

(i) the distinction between maximum (minimum) value and maximizers (minimizers): a
maximizer is an element of the domain at which the function reaches its maximum
value, the unique element of the codomain which is the image of all maximizers;1

(ii) the importance of the subset of the domain in which we are interested in establishing
the existence of maximizers or minimizers.

These two observations lead to next de nition, in which we consider an objective function
f and a subset C of its domain, called choice set.

De nition 969 Let f : A Rn ! R be a real-valued function and C a subset of A. A point


x
^ 2 C is called a (global) maximizer of f on C if

f (^
x) f (x) 8x 2 C (22.1)

The value f (^
x) of the function at x
^ is called ( global) maximum value of f on C.

In the special case C = A when the choice set is the entire domain, the point x
^ is called
maximizer, without further speci cation (in this way, we recover the de nition of Section
6.6).
In the initial example we considered two cases:

(i) in the rst case, C was the entire domain, that is, C = R, and we had x
^ = 0 and
f (^
x) = max f (R) = 1;

(ii) in the second case, C was the interval [1; 2] and we had x
^ = 1 and f (^
x) = max f ([1; 2]) =
0.
1
As already anticipated in Section 6.6.
22.1. GENERALITIES 651

The maximum value of the objective function f on the choice set C is, thus, nothing but
the maximum of the set f (C) in the real line, i.e.,2

f (^
x) = max f (C)

By Proposition 36, the maximum value is unique. We denote this unique value by

max f (x)
x2C

The maximizers may, instead, be not unique. Their set, called solution set, is denoted by
arg maxx2C f (x), that is,3

arg max f (x) = x


^ 2 C : f (^
x) = max f (x)
x2C x2C

For example, for the function f : R ! R de ned by


8
>
> x+1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
x+1 if x 1

with graph
2

y
1.5

0.5

-1 1
0
O x
-0.5

-1

-1.5

-2
-3 -2 -1 0 1 2 3

we have maxx2R f (x) = 0 and arg maxx2R f (x) = [ 1; 1]. So, the set of maximizers is
the entire interval [ 1; 1]. On the other hand, if we restrict ourselves to [1; 1), we have
maxx2[1;1) f (x) = 0 and arg maxx2[1;1) f (x) = f1g, so 1 is the unique maximizer of f on

2
Recall that f (C) = ff (x) : x 2 Cg R is the set (6.1) of all images of the points that belong to C.
3
A convenient shorthand notation is maxC f and arg maxC f . Though we will not use it, readers may
want to familiarize with it by trying to use it by themselves.
652 CHAPTER 22. OPTIMIZATION PROBLEMS

[1; 1). Graphically:

y
1.5

0.5

1
0
O x
-0.5

-1

-1.5

-2
-3 -2 -1 0 1 2 3

Next we introduce a class of maximizers.

De nition 970 Let f : A Rn ! R be a real-valued function and C a subset of A. A point


x
^ 2 C is called strong maximizer if

f (^
x) > f (x)

for all x 2 C distinct from x


^.

To be strong is what characterizes maximizers that are unique.

Proposition 971 A maximizer is strong if and only if it is unique.

Proof \Only if". Let x ^1 ; x


^2 2 C be two strong maximizers. We want to show that they are
equal, i.e., x
^1 = x^2 . Suppose, by contradiction, that they are distinct, i.e., x ^1 6= x^2 . By
the de nition of strong maximizer, we then have both f (^ x1 ) > f (^x2 ) and f (^
x2 ) > f (^
x1 ), a
contradiction. We conclude that x ^1 = x^2 .
\If". Let x^ 2 C be the unique maximizer. By hypothesis, f (^ x) f (x) for all x 2 C.
Let y 2 C be such that f (^ x) = f (y). We have f (y) = f (^
x) f (x) for all x 2 C and so
y as well is maximizer. As x ^ 2 C is the unique maximizer, this implies that y = x ^. Hence,
f (^
x) > f (x) for all x 2 C distinct from x ^. We conclude that x^ is a strong maximizer.

The uniqueness of maximizers is an all-important property that greatly simpli es the


study of how maximizers and maximum values change when the choice set C changes. For
example, how optimal bundles, and their utility, change when the budget set changes as a
consequence of variations in income and prices (see Section 22.1.4). In economic applications
this analysis, known as comparative statics, plays a fundamental role. It is particularly
e ective when maximizers are unique. Indeed, it is much easier to keep track and compare
unique solutions { e.g., unique optimal bundles per each pro le of prices and income { than
sets of them.

Until now we have talked about maximizers, but analogous considerations hold for min-
imizers. For example, in De nition 969 an element x
^ 2 C is a (global) minimizer of f on C
22.1. GENERALITIES 653

if f (^
x) f (x) for every x 2 C, with (global ) minimum value f (^ x) = min f (C), denoted
by minx2C f (x). Maximizing and minimizing are actually two sides of the same coin, as
formalized by the next result. Its obvious proof is based on the observation that
f (x) f (y) () f (x) f (y)
for all x; y 2 A.
Proposition 972 Let f : A Rn ! R be a real-valued function and C a subset of A. A
point x
^ 2 C is a minimizer of f on C if and only if it is a maximizer of f on C, and it is
a maximizer of f on C if and only if it is a minimizer of f on C. In particular,
min f (x) = max ( f ) (x) and max f (x) = min ( f ) (x)
x2C x2C x2C x2C

For example, the minimizers of the function f : R ! R given by f (x) = x2 1 are the
maximizers for the function ( f ) (x) = 1 x2 seen at the beginning of the section.
Thus, between maximizers and minimizers there is a natural duality that makes the re-
sults of one case a simple dual version of the other. From a mathematical viewpoint, the
choice of which of these two equivalent problems to study is only a question of convenience
bearing no conceptual relevance. Given their great importance in economic applications,
throughout the book we focus on the properties of maximizers, leaving the analogous prop-
erties for minimizers to the reader. In any case, the neutral term extremal refers to both
maximizers and minimizers, with optimum value referring to both maximum and minimum
values.

The problem of maximizing an objective function f : A Rn ! R on a given choice set


C A Rn , that is, of nding its maximum value and its maximizers, is called maximization
problem. In a maximization problem, the maximizers are called solutions. The solutions are
said to be strong if so are the maximizers. By Proposition 971, a solution is strong if and
only if it is unique.
Analogous notions hold for minimization problems, in which we look for the minimum
value and the minimizers of an objective function on a given choice set. Finally, optimization
problems include both maximization and minimization problems, they are \genderless".4

Formally, given an objective function f : A Rn ! R and a choice set C A, we write


a maximization problem as
max f (x) sub x 2 C (22.2)
x
and a minimization problem with min in place of max. Here \sub" is short for \subject
to". The x below max indicates the choice variable, that is, the variable that we control to
maximize the objective function. When C = A, sometimes we omit the clause \sub x 2 C"
since x necessarily belongs to the domain of f . In the important case when the choice set C
is open, we talk of unconstrained optimization problems.5 Otherwise, we talk of constrained
optimization problems.
4
Because of our maximization emphasis, however, in what follows we often use interchangeably the terms
\optimization problem" and \maximization problem".
5
Since an open set C is still a constraint, this terminology is not that satisfactory. To make some sense of
it, note that all the points x of an open set C are interior and so have a neighborhood B" (x) included in C.
One can thus \move around" the point x while still remaining within C. In this local sense, an open choice
set allows for some freedom.
654 CHAPTER 22. OPTIMIZATION PROBLEMS

22.1.1 Beginner's luck


Normally, it is quite complicated to solve an optimization problem. Nevertheless, maximizers
(or minimizers) can sometimes be found by working with bare hands, as the next examples
show.

Example 973 Let f : R ! R be given by f (x) = 2x x2 and consider the optimization


problem
max f (x) sub x 2 R
x
that is, we look for maximizers of f on its entire domain:

4 y
3

0
O
1 2 x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

We can write f (x) = 2x x2 1+1=1 (x 1)2 , so one has

f (x) 1 8x 2 R

Since f takes takes on value 1 only at x


^ = 1, we can say that x
^ = 1 is the unique maximizer
of f on R. The maximum value of f on R is the scalar 1. In symbols:

arg max f (x) = f1g and max f (x) = 1


x2R x2R

Finally, f is unbounded below and so has no minimizers; i.e., arg minx2R f (x) = ;. N

Example 974 De ne f : R2 ! R by f (x) = x21 6x1 x2 + 12x22 for all x = (x1 ; x2 ) 2 R2


and consider the optimization problem

min f (x) sub x 2 R2


x

Since f (x1 ; x2 ) = x21 6x1 x2 + 9x22 + 3x22 = (x1 3x2 )2 + 3x22 , the function f is the sum
of two squares and so is positive. Since it assumes value 0 only at the origin 0 = (0; 0), we
conclude that the origin is the unique minimizer of f on R2 . Its minimum value is the scalar
0. In symbols:
arg min f (x) = f0g and min f (x) = 0
x2R2 x2R2

Finally, f is unbounded above and so has no maximizers; i.e., arg maxx2R2 f (x) = ;. N
22.1. GENERALITIES 655

Example 975 Let f : R3 ! R be given by f (x) = e x21 x22 x23 for all x = (x1 ; x2 ; x3 ) 2 R3
and consider the optimization problem

max f (x) sub x 2 R3


x

Since 0 < f (x1 ; x2 ; x3 ) 1 for all (x1 ; x2 ; x3 ) 2 R3 and f (0; 0; 0) = 1, the origin 0 = (0; 0; 0)
is a maximizer of f on R3 . It is actually the unique maximizer (why?). The maximum value
of f on R3 is the scalar 1. In symbols:

arg max f (x) = f0g and max f (x) = 1


x2R3 x2R3

However, f does not have a minimizer { so arg minx2R3 f (x) = ; { because it never attains
the in mum of its values, that is, 0. N

Example 976 De ne f : R ! R by f (x) = cos x and consider the optimization problem

min f (x) sub x 2 R


x

Since 1 cos x 1, all the points at which f (x) = 1 are maximizers and all the points
at which f (x) = 1 are minimizers. The maximizers are, therefore, the points 2k with
k 2 Z and the minimizers are the points (2k + 1) with k 2 Z. The maximum and minimum
values are the scalars 1 and 1, respectively. In symbols:

arg max f (x) = f2k : k 2 Zg and arg min f (x) = f(2k + 1) : k 2 Zg


x2R x2R

as well as
max f (x) = 1 and min f (x) = 1
x2R x2R

These maximizers and minimizers on R are not unique. However, if we consider a smaller
choice set, such as C = [0; 2 ), we will nd that the unique maximizer is the origin 0 and
the unique minimizer is the point . N

Example 977 For a constant function, all the points of the domain are simultaneously
maximizers and minimizers. Its constant value is simultaneously the maximum and minimum
value. N

O.R. (i) De nition 969 does not require the function to satisfy any special property; in
particular, neither continuity nor di erentiability are invoked. For example, the function
f : R ! R given by f (x) = jxj attains its minimum value at the origin, where it is not
di erentiable. The function f : R ! R given by
(
x + 1 if x 1
f (x) =
x if x > 1
656 CHAPTER 22. OPTIMIZATION PROBLEMS

with graph
4

y
3

0
O 1 x
-1 -1

-2

-3

-4
-4 -3 -2 -1 0 1 2 3 4

attains its maximum value at the point x^ = 1, where it is discontinuous.


It may also happen that an isolated point is extremal. For example, the function de ned
by 8
>
> x + 1 if x 1
<
f (x) = 5 if x = 2
>
>
:
x if x > 4
with graph
6

y
4

0
4
O 1 x

-2

-4 -4

-6
-6 -4 -2 0 2 4 6

attains its maximum value at x ^ = 2, an isolated point of the domain ( 1; 1] [ f2g [ (4; +1)
of f .
(ii) As we have already observed, the maximum value of f : A Rn ! R on C A
is nothing but max f (C). It is a value actually attained by f , that is, there exists a point
x
^ 2 C such that f (^x) = max f (C). We can, therefore, choose a point in C of f to \attain"
the maximum.
When the maximum value does not exist, the image set f (C) might still have a nite
supremum sup f (C). The unpleasant aspect is that there might well be no point in C that
attains such a value, that is, we might not be able to attain it. Pragmatically, this aspect is
less negative than it might appear prima facie. Indeed, as Proposition 127 indicates, we can
22.1. GENERALITIES 657

choose a point at which f is arbitrarily close to the sup. If sup f (C) = 48, we will never be
able to get exactly 48, but we can get arbitrarily close to it: we can always choose a point at
which the function has value 47; 9 and, if this is not enough, we can get a point at which f
takes value 47; 999999999999 and, if this is not enough.... Similar remarks hold for minimum
values. H

22.1.2 Basic properties


We now study a few basic properties of the optimization problems (22.2). We begin with an
important property of invariance.

Proposition 978 Let g : B R ! R be a strictly increasing function with Im f B. The


two optimization problems
max f (x) sub x 2 C (22.3)
x

and
max (g f ) (x) sub x 2 C (22.4)
x

are equivalent, that is, they have the same solutions.

Proof Since g is strictly increasing, by Proposition 221 we have

s t () g (s) g (t) 8s; t 2 Im f

Thus,
f (x) f (y) () g (f (x)) g (f (y)) 8x; y 2 A
| {z } | {z }
s t

Therefore,

8x 2 C; f (^
x) f (x) () 8x 2 C; (g f ) (^
x) (g f ) (x)

This proves that a vector x


^ 2 C solves problem (22.3) if and only if it solves problem (22.4).

Thus, two objective functions { here f and f~ = g f { are equivalent when they are a
strictly increasing transformation one of the other.6 Later in the chapter, we will comment
more on this simple, yet conceptually important, result.

Let us now consider the case, important in economic applications (as we will soon see),
in which the objective function is strongly increasing.

Proposition 979 Let f : A Rn ! R be a real-valued function and C a subset of A. If f


is strongly increasing on C, then arg maxx2C f (x) @C.
6
Note that f~ = g f if and only if f = g 1 f~, so one can move back and forth between equivalent
objective functions via strictly increasing transformations.
658 CHAPTER 22. OPTIMIZATION PROBLEMS

Proof Let x ^ 2 arg maxx2C f (x). We want to show that x ^ 2 @C. Suppose, by contradiction,
that x^2= @C, i.e., x
^ is an interior point of C. There exists, therefore, a neighborhood B" (^
x)
of x
^ included in C. Set
" "
x^" = x^1 + ; :::; x
^1 +
2 2
Clearly, x^" 2 B" (^ x) and x ^ x
^" . Since f is strongly increasing on C, we obtain that
f (^
x" ) > f (^
x), which contradicts the optimality of x ^. We conclude that x^ 2 @C.

The possible solutions of the optimization problem (22.2) are, thus, boundary points
when the objective function is strongly increasing (a fortiori, when it is strictly increasing;
cf. Proposition 225). With this kind of objective function, we can thus simplify problem
(22.2) as follows:
max f (x) sub x 2 @C
x

We will soon see a remarkable application of this observation in Walras' law.


The last proposition implies that when @C \ C = ;, which happens for example when
C is open, the optimization problem (22.2) does not admit any solution if f is strongly
increasing. A trivial example is f (x) = x on C = (0; 1), as the graph shows:

y
2 1

0
O 1 x

-1

-2

-3
-2 -1 0 1 2 3 4

Finally, let us consider an obvious, yet noteworthy, property of monotonicity in C.

Proposition 980 Given f : A Rn ! R, let C and C 0 be any two subsets of A. Then

C C 0 =) max f (x) max0 f (x)


x2C x2C

Proof Let x^0 2 arg maxx2C 0 f (x) and x


^ 2 arg maxx2C f (x). As x
^ 2 C C 0 , we have
x 0
^ 2 C . Thus,
max0 f (x) = f x^0 f (^
x) = max f (x)
x2C x2C

x0 )
because f (^ f (x) for all x 2 C 0 .

Larger choice sets always lead to higher maximum values of the objective function. In
other terms, to have more opportunities to choose from is never detrimental, whatever the
22.1. GENERALITIES 659

form of the objective function is. This simple principle of monotonicity is often important.
The basic economic principle that removing constraints on agents' choices can only bene t
them is, indeed, formalized by this proposition.

Example 981 Recall the initial example in which we considered two di erent sets of choices,
R and [1; 2], for the function f (x) = 1 x2 . We had maxx2[1;2] f (x) = 0 < 1 = maxx2R f (x),
in accordance with the last proposition. N

In contrast, adding constraints is never bene cial and may even result in empty choice
sets. For instance, suppose C 0 is the set of all points x = (x1 ; x2 ; x3 ) 2 R3 such that

x1 + x2 + x3 < 2 (22.5)
x1 x2 + x3 2 (22.6)

This choice set is not empty: for instance, the point (0; 5; 3) belongs to it. Now, let C be
the choice set that satis es these constraints as well as the additional one

x2 0 (22.7)

So, C C 0 . Yet, too many constraints: the choice set C is empty. Indeed, suppose by
contradiction that C 6= ; and let x 2 C. By (22.6) and (22.7), x1 + x3 2 + x2 2. Along
with (22.5), this implies 2 > x1 + x2 + x3 x1 + x3 2. This contradiction shows that
C = ;.

Optimization problems with concave objective functions and convex choice sets are all-
important in applications. We begin with a basic, yet remarkable, property of their solution
sets.7

Proposition 982 Let C be a convex set in Rn . If f : C ! R is concave, then arg maxx2C f (x)
is convex.

Proof Let x
^1 ; x
^2 2 arg maxx2C f (x) and let 2 [0; 1]. We want to show that x
^1 +
(1 )x
^2 2 arg maxx2C f (x). By concavity,

f( x
^1 + (1 )x
^2 ) f (^
x1 )+(1 ) f (^
x2 ) = max f (x)+(1 ) max f (x) = max f (x)
x2C x2C x2C

Therefore, f ( x
^1 + (1 )x
^2 ) = maxx2C f (x), i.e., x
^1 + (1 )x
^2 2 arg maxx2C f (x).

Since the solution set arg maxx2C f (x) is convex, there are three possibilities:

(i) arg maxx2C f (x) is empty: there are no maximizers;

(ii) arg maxx2C f (x) is a singleton: there exists a unique maximizer;

(iii) arg maxx2C f (x) consists of in nitely many points: there exist in nitely many maxi-
mizers.8
7
To ease exposition, we consider a function f de ned directly on a convex choice set C. Of course, f can
be seen as the restriction of a function de ned on a larger domain A that includes C.
8
Indeed, if x ^1 and x ^2 are two distinct maximizers, all their convex combinations x
^1 + (1 )x^2 , as
varies in [0; 1], are still maximizers because of the convexity of arg maxx2C f (x).
660 CHAPTER 22. OPTIMIZATION PROBLEMS

Thus, under concavity we cannot have nitely many distinct solutions, like 3 or 7.

Example 983 (i) The logarithmic function f : (0; 1) ! R de ned by f (x) = log x is
strictly concave. It is easy to see that it has no maximizers, that is, arg maxx>0 f (x) = ;.
(ii) The function f : R ! R de ned by f (x) = 1 x2 is strictly concave and has a unique
maximizer x ^ = 0, so that arg maxx2R f (x) = f0g.
(iii) De ne f : R ! R by
8
>
> x if x 1
<
f (x) = 1 if x 2 (1; 2)
>
>
:
3 x if x > 2

with graph
2

1.5 y

0.5

O
0
1 2 x
-0.5

-1

-1.5

-2
-2 -1 0 1 2 3 4

The function f is concave and arg maxx2R f (x) = [1; 2]. N

The last function of this example, with in nitely many maximizers, is concave but not
strictly concave. The next result shows that, indeed, strict concavity implies that maximizers,
if exist, are necessarily unique. In other words, for strictly concave functions the solution set
is at most a singleton.

Proposition 984 Let C be a convex set in Rn . If f : C ! R is strictly concave, then


arg maxx2C f (x) is at most a singleton.

Proof Let x ^1 ; x
^2 2 C be two maximizers. We want to show that x ^1 = x
^2 . Suppose, by
contradiction, that x^1 6= x
^2 . Since x
^1 and x
^2 are maximizers, we have f (^
x1 ) = f (^
x2 ) =
maxx2C f (x). Set
1 1
z= x ^1 + x ^2
2 2
Since C is convex, z 2 C. Moreover, by strict concavity,

1 1 1 1 1 1
f (z) = f x
^1 + x^2 > f (^
x1 ) + f (^
x2 ) = max f (x) + max f (x) = max f (x)
2 2 2 2 2 x2C 2 x2C x2C
(22.8)
22.1. GENERALITIES 661

which is a contradiction (no element of C can have a strictly higher value than the maximum
value). We conclude that x ^1 = x
^2 , as desired.

In the last example, f (x) = 1 x2 is a strictly concave function with a unique maximizer
x
^ = 0, while f (x) = log x is a strictly concave function that has no maximizers. The clause
\at most" is, therefore, indispensable because, unfortunately, maximizers might not exist.
To have (at most) a unique maximizer is the key characteristic of strictly concave func-
tions that motivates their widespread use in economic applications. Indeed, strict quasi-
concavity is the simplest condition which guarantees the uniqueness of the maximizer, a key
property for comparative statics exercises (as we remarked earlier in the chapter).

22.1.3 Cogito ergo solvo


Optimization problems are often solved through the di erential methods that will be studied
later in the book. However, before using any \method", it is important to ponder over the
problem at hand and see if our insight can suggest us anything relevant about it. This often
permits to simplify the problem, sometimes even to guess a solution that we can then try to
verify.
We will illustrate all this through few optimization problems, some of them inspired by
classic economic problems. Here, however, we abstract from applications and treat them in
purely analytical terms.
3
Example 985 Let f : R ! R be the scalar function de ned by f (x) = 1 x2 . Consider
the optimization problem
max f (x) sub x 0
x
1
Through the strictly increasing function g : R ! R de ned by the cube root g (x) = x 3 ,
we get the transformed objective function f~ : R ! R given by f~ (x) = g (f (x)) = 1 x2 .
Problem
max f~ (x) sub x 0
x
is equivalent to the previous one by Proposition 978 but, clearly, it is more tractable. We
can actually do better by getting rid of the constant 1 in the objective function (constants
a ect the maximum value but not the maximizers). So, we can just study the problem

max x2 sub x 0
x

which features a strictly concave objective function. Clearly, the unique solution is x
^ = 0.
By plugging it in the original objective function, we get the maximum value f (^x) = 1.9 N

Example 986 Let f : R2++ ! R be de ned by f (x) = log x1 + log x2 . Consider the
optimization problem
max f (x) sub x1 + x2 = 1
x

The objective function is strictly concave and the choice set x 2 R2++ : x1 + x2 = 1 is
convex. By Proposition 984, there is at most a unique solution x
^. The problem is symmetric
9 1 1
By taking the strictly increasing transformation x 3 1 rather than x 3 we get directly x2 . A suitable
choice of the transformation thus speeds up matters.
662 CHAPTER 22. OPTIMIZATION PROBLEMS

in each xi , so it is natural to guess a symmetric solution x


^ with equal components x
^1 = x
^2 .
If so, x
^1 = x^2 = 1=2 because of the constraint x1 + x2 = 1. Let us verify this guess. Since
the logarithmic function is strictly concave, if y 6= x
^ and y1 + y2 = 1, we have

1 1
f (y) f (^
x) = log 2y1 + log 2y2 = 2 log 2y1 + log 2y2 < 2 log (y1 + y2 ) = 2 log 1 = 0
2 2

So, x
^ indeed uniquely solves the problem. Here the maximum value is f (^
x) = log 4. N

The next three examples are a bit more complicated, but they are important in appli-
cations and show how some little thinking can save many calculations. For a given vector
= ( 1 ; :::; n ) 2 Rn++ and scalar > 0, all these examples study under di erent objective
functions f the optimization problem

max f (x) sub x 2 C (22.9)


x

with convex choice set C = x 2 Rn+ : x= .

Example 987 Consider problem (22.9) with a Cobb-Douglas objective function f : Rn+ ! R
de ned by
Yn
f (x) = xai i
i=1
Pn
with i=1 ai = 1 and ai > 0 for each i. It is easy to see that the maximizers belong to
Rn++ , that is, they have strictly positive components. Indeed, if x lies on some axes of Rn
{ i.e., xi = 0 for some i { then f (x) = 0. Since f 0 on C, it is easy to see that such x
cannot solve the problem. For this reason, in place of (22.9) we can consider the equivalent
optimization problem
max f (x) sub x 2 C \ Rn++ (22.10)
x

We can do better: since f > 0 on Rn++ , we can consider the logarithmic transformation
P
f~ = log f of the objective function f , that is, the log-linear function f~ (x) = ni=1 ai log xi .
The problem
max f~ (x) sub x 2 C \ Rn++ (22.11)
x

is equivalent to the previous one by Proposition 978. It is, however, more tractable because
of the log-linear form of the objective function.
Let us ponder over problem (22.11). It has strictly concave objective function and a
convex choice set C \ Rn++ . By Proposition 984, there is at most a unique solution x^. With
this, suppose rst thatP the coe cients ai and i are both equal among themselves, with
ai = 1=n (because ni=1 ai = 1) and i = 1 for each i. The problem is then symmetric
in each xi , so it is natural to guess a symmetric
P solution x ^, with x
^1 = = x^n . Then,
^i = ai for each i because of the constraint ni=1 xi = . If, instead, the coe cients di er,
x
the asymmetry in the solutions should depend on the coe cients i and ai peculiar to each
xi . An (educated) guess is that

a1 an
x
^= ; :::; (22.12)
1 n
22.1. GENERALITIES 663

^ 2 C \ Rn++ because x
Let us verify this guess. We have x ^ 2 Rn++ and

n
X n
X n
X
ix
^i = i ai = ai =
i=1 i=1 i i=1
P P
We now show that ni=1 ai log yi < ni=1 ai log x ^i for every y 2 C \ Rn++ with y 6= x
^. Since
log x is strictly concave, by Jensen's inequality (17.15) we have
n
X n
X n
X n
X n
X
yi yi i yi
ai log yi ai log x
^i = ai log < log ai = log
i=1 i=1 i=1 ai i i=1 ai i i=1
n
X
1 1
= log i yi log = log 1 = 0
i=1

as desired. We conclude that (22.12) is the unique solution of the problem. N

Example 988 Consider problem (22.9) with a Leontief objective function f : Rn ! R


de ned by
f (x) = min xi
i=1;::;n

Recall that f is concave but not strictly concave (cf. Example 817). Because of the symmetry
of the objective function, we again guess a symmetric solution x ^, which has then the form

x
^= Pn ; :::; Pn (22.13)
i=1 i i=1 i

because of the constraint. To verify this guess, let x 2 C be a solution of the problem, so
that f (x ) f (y) for all y 2 C. As we will see, by Weierstrass' Theorem such a solution
exists. We want to show that x = x ^. It is easy to check that, if k = (k; :::; k) 2 Rn is a
constant vector and 0 is a positive scalar, we have

f ( x + k) = f (x) + k 8x 2 Rn (22.14)

In turn, this implies

1 1 1 1
f (x ) f x + x^ = f (x ) + Pn
2 2 2 2 i=1 i
P P
So, mini=1;::;n xi = f (x ) = ni=1 i , that is, xi = ni=1 i for each i. Suppose x 6= x
^,
that is, x > x ^. Since x 2 C, we reach the contradiction
n
X n
X
= i xi > i Pn =
i=1 i=1 i=1 i

We conclude that x = x ^. The constant vector (22.13) is thus the unique solution of the
problem. Interestingly, we have a unique solution even without strict concavity. N
664 CHAPTER 22. OPTIMIZATION PROBLEMS

Example 989 Since is a strictly positive scalar, it holds

x 2 Rn+ : x= = x 2 Rn+ : x=1

As the reader can easily check, once we solve problem (22.9) for a normalized choice set

C = x 2 Rn+ : x=1 (22.15)

with each i > 0, we can then easily retrieve the solutions when 6= 1. The use of a
normalized choice set permits to ease notation, a signi cant simpli cation.
With this motivation, here we study problem (22.9) with a normalized choice set (22.15).
We consider a convex objective function f : Rn ! R. We start by observing that the
elements of the convex choice set (22.15) can be written as a convex combination of the
vectors
1 1
e~i = ei = 0; :::; 0; ; 0; :::; 0 8i = 1; :::; n
i i
Indeed, if x 2 C then
n
X n
X n
X
i 1 i
x= xi e = i xi e = ~i
i xi e
i=1 i=1 i i=1
Pn
where i xi 0 for each i and i=1 i xi = 1 (because x 2 C). It is easy to check that each
e~i belongs to C. We are now in a position to say something about the optimization problem
(22.9). Since f is convex, we have
n
! n
X X
i
f (x) = f i xi e
~ ~i
i xi f e max f e~i
i=1;:::;n
i=1 i=1

Thus, to nd a maximizer it is enough to check which e~i receives the highest evaluation under
f . Since the vectors e~i lie on some axis of Rn , in this way we nd what in the economics
jargon are called corner solutions.
That said, there might well be maximizers that this simple reasoning may neglect. In
other words, we only showed that:

arg max f (x) arg max f (x)


e1 ;:::;~
x2f~ en g x2C

To say something more about all possible maximizers, i.e., about the set arg maxx2C f (x),
we need to assume more on the objective function f . We consider two important cases:

(i) Assume that f is strictly convex. Then, the only maximizers in C are among the
vectors e~j , that is,
arg max f (x) = arg max f (x)
e1 ;:::;~
x2f~ en g x2C

So, problem (22.9) reduces to the much simpler problem

max f (x) sub x 2 e~1 ; :::; e~n (22.16)


x
22.1. GENERALITIES 665

Indeed, strict convexity yields a strict inequality as soon as at least for two indexes i
we have i xi > 0, that is,
n
! n
X X
i
f (x) = f i xi e
~ < i xi f e~i
i=1 i=1

For instance, consider the problem

max x21 + x22 + x23 sub 1 x1 + 2 x2 + 3 x3 =1


x

It is enough to solve the problem

1 1 1
max x21 + x22 + x23 sub x 2 ; 0; 0 ; 0; ; 0 ; 0; 0;
x 1 2 3

For example, if 1 < 2 < 3 , then e~1 = (1= 1 ; 0; 0) is the only solution, while if
~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) are the only two solutions.
1 = 2 < 3 , then e

(ii) Assume that f is a ne, i.e., f (x) = a0 +a1 x1 + +an xn . Then, the set of maximizers
consists of the vectors e~j that solve problem (22.16) and of their convex combinations
(as the reader can easily check). That is,

co arg max f (x) = arg max f (x)


e1 ;:::;~
x2f~ en g x2C

where left-hand side is the convex envelope of the vectors in arg maxx2f~e1 ;:::;~en g f (x),
which is a polytope. For instance, consider the problem

max a0 + a1 x1 + a2 x2 + a3 x3 sub 1 x1 + 2 x2 + 3 x3 =1 (22.17)


x

as well as the simpler problem

1 1 1
max a0 + a1 x1 + a2 x2 + a3 x3 sub x 2 ; 0; 0 ; 0; ; 0 ; 0; 0; (22.18)
x 1 2 3

For instance, if a1 = 1 > a2 = 2 > a3 = 3 , then e~1 = (1= 1 ; 0; 0) is the only solution of
problem (22.18), so of problem (22.17). On the other hand, if a1 = 1 = a2 = 2 > a3 = 3 ,
then e~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) solve problem (22.18), so the polytope

t (1 t)
co e~1 ; e~2 = t~
e1 + (1 t) e~2 : t 2 [0; 1] = ; ;0 : t 2 [0; 1]
1 2

is the set of all solutions of problem (22.17).

To sum up, some simple arguments show that optimization problems featuring convex
objective functions and linear constraints have corner solutions. Section 22.6.2 will discuss
these problems, which often arise in applications. N
666 CHAPTER 22. OPTIMIZATION PROBLEMS

22.1.4 Consumption and production


To illustrate the centrality of optimization problems in economics we consider the classic
optimization problems of the consumer and of the producer.
Consider rst a consumer whose preference is represented by a utility function u : A
Rn+ ! R, where the domain A is a set of bundles x = (x1 ; x2; :::; xn ) of n goods, called
the consumption set of the consumer. It consists of the bundles that are of interest to the
consumer.

Example 990 (i) Let u : R2+ ! R be the CES utility function


1
u (x) = ( x1 + (1 ) x2 )

with 2 [0; 1] and 2 (0; 1]. In this case the consumption set is A = R2+ .
(ii) Let u : R2++ ! R be the log-linear utility function

u (x) = a log x1 + (1 a) log x2

with a 2 (0; 1). Here the consumption set is A = R2++ . CES and log-linear consumers have
therefore di erent consumption sets.
(iii) Suppose that the consumer has a subsistence bundle x 0, so that he can consider
only bundles x x (in order to survive). In this case it is natural to take as consumption
set the closed and convex set

A = x 2 Rn+ : x x Rn++ (22.19)

For instance, we can consider the restrictions of CES and log-linear utility functions on this
set A. N

Denote by p = (p1 ; p2 ; :::; pn ) 2 Rn+ the vector of the market prices of the goods. Suppose
that the consumer has income w 0. The budget set of the consumer

B (p; w) = x 2 Rn+ : p x w

consists of the a ordable bundles, i.e., of the bundles that he can purchase given the vector
of prices p and his income w. We write B (p; w) to highlight the dependence of the budget
set on p and on w. For example,

w w0 =) B (p; w) B p; w0 (22.20)

that is, to a greater income there corresponds a larger budget set. Analogously,

p p0 =) B (p; w) B p0 ; w (22.21)

that is, to lower prices there corresponds a larger budget set.


The next two results show some remarkable properties of the budget set. We begin with
convexity.

Proposition 991 The budget set B (p; w) is convex.


22.1. GENERALITIES 667

Proof Let x; y 2 B (p; w) and 2 [0; 1]. We have

p ( x + (1 ) y) = (p x) + (1 ) (p y) w + (1 )w = w

Hence, x + (1 ) y 2 B (p; w). The budget set is therefore convex.

Interestingly, the compactness of budget sets requires the price of each good to be strictly
positive, so none of them is free.

Proposition 992 The budget set B (p; w) is compact if p 0.

The importance of the no-free goods condition p 0 is obvious: if some of the goods
were free (and available in unlimited quantity), the consumer could obtain any quantity of
it and the budget set would be then unbounded. Note that when w = 0 and p 0 we have
B (p; w) = f0g and so, being a singleton, the budget set is trivially compact.

Proof Let p 0. Let us show that B (p; w) is closed. Consider a sequence of bundles
xk B (p; w) such that

xk = xk1 ; :::; xkn ! x = (x1 ; :::; xn )

Since p xk w for each k 1, we have p x = lim p xk w. Therefore, x 2 B (p; w). By


Theorem 174, the set B (p; w) is closed.
We are left to show that B (p; w) is a bounded set. Suppose, per contra, that there exists
a sequence xk B (p; w) such that

lim xki = +1
k!1

for some good i (i.e., such that the quantity of good i gets unboundedly larger and larger
along the sequence xk of bundles). Since p 0, in particular it holds pi > 0. As xk 0
for all k 1, we therefore have

p xk pi xki 0 8k 1

We thus reach the contradiction

w lim p xk lim pi xki = pi lim xki = +1


k!1 k!1 k!1

We conclude that B (p; w) is both closed and bounded, i.e., it is compact.

The consumer (optimization) problem consists in maximizing the consumer utility func-
tion u : A Rn+ ! R on the budget set B (p; w), that is,

max u (x) sub x 2 B (p; w) \ A (22.22)


x

Given prices and income, the intersection B (p; w) \ A of the budget set and of the consump-
tion set is the choice set of the consumer problem, consisting of the bundles that are both
a ordable and relevant for the consumer. Let us denote by C (p; w) this choice set, that is,

C (p; w) = B (p; w) \ A = fx 2 A : p x wg
668 CHAPTER 22. OPTIMIZATION PROBLEMS

We can then conveniently write the consumer problem as


max u (x) sub x 2 C (p; w)
x

Consumers with di erent consumption sets may feature di erent choice sets even when they
confront the same budget set: for instance, for a log-linear consumer we have C (p; w) =
x 2 Rn++ : p x w , while for a CES consumer we have C (p; w) = B (p; w). In particular,
the CES consumer exempli es the important case A = Rn+ when the consumer problem takes
the simpler form
max u (x) sub x 2 B (p; w) (22.23)
x
A bundle x
^ 2 C (p; w) is optimal when it solves the optimization problem (22.22), i.e.,
when
u (^
x) u (x) 8x 2 C (p; w)
In particular, maxx2C(p;w) u (x) = u (^
x) is the maximum utility that can be attained by the
consumer.

By Proposition 978, every strictly increasing transformation u


~ = g u of u de nes an
optimization problem
max u~ (x) sub x 2 C (p; w) (22.24)
x
equivalent to the original one (22.22) in that it has the same solutions, i.e., the same optimal
bundles. The choice of which one to solve among these equivalent problems is only a matter of
analytical convenience. The utility functions u ~ and u are thus equivalent objective functions.
Such equivalence is also economic in that they also represent the same underlying preference
(Section 6.8). These economic and mathematical equivalences shed light one the other.
P
Example 993 The log-linear utility function u (x) = ni=1 i log xi is an analytically con-
venient transformation of the Cobb-Douglas utility function, as already observed. N
The maximum utility maxx2C(p;w) u (x) depends on the income w and on the price vector
p. So, we can de ne a function v : Rn++ [0; 1) ! R by
v (p; w) = max u (x) 8 (p; w) 2 Rn++ [0; 1)
x2C(p;w)

It is called the indirect utility function.10 When prices and income vary, it indicates how
varies the maximum utility that the consumer may attain.
Example 994 The unique optimal bundle for the log-linear utility function u (x) = a log x1 +
(1 a) log x2 , with a 2 (0; 1), is given by x
^1 = aw=p1 and x ^2 = (1 a) w=p2 (Example 987).
It follows that the indirect utility function associated to the log-linear utility function is
aw (1 a) w
v (p; w) = u (^
x) = a log + (1 a) log
p1 p2
= a (log a + log w log p1 ) + (1 a) (log (1 a) + log w log p2 )
= log w + a log a + (1 a) log (1 a) (a log p1 + (1 a) log p2 )
for every (p; w) 2 Rn++ R++ . N
10
Here, we are tacitly assuming that a maximizer exists for every pair (p; w) of prices and income. Later
in the chapter we will present results, namely Weierstrass' and Tonelli's theorems, that guarantee this.
22.1. GENERALITIES 669

Thanks to (22.20) and (22.21), the property of monotonicity seen in Proposition 980
takes the following form for indirect utility functions.

Proposition 995 Let w; w0 0 and p; p0 0. Then,

w w0 =) v (p; w) v p; w0 and p p0 =) v (p; w) v p0 ; w

In other words, consumers always bene t both from a higher income and from lower
prices, regardless of their utility functions (provided they are continuous).

As previously observed (Section 6.4.4), it is natural to assume that the utility function
u : A Rn+ ! R is, at least, increasing. By Proposition 979, if we assume that u is actually
strongly increasing, the solution of the consumer problem belongs to the boundary of the
budget set. Yet, a sharper result holds because of the particular form of the budget set. To
ease matters, we assume that A = Rn+ , so the consumer problem takes the form (22.23).

Proposition 996 (Walras' Law) Let u : Rn+ ! R be strongly increasing. If x


^ is a solution
of the consumer problem, then p x
^ = w.

Proof Let x 2 B (p; w) be such that p x < w. It is easyPto see that there exists y x such
n
that p y w. Indeed, taking any 0 < " < (w p x) = i=1 pi , it is su cient to set

y = (x1 + "; :::; xn + ")

Since u is strongly increasing, we have u (y) > u (x) and therefore x cannot be a solution of
the consumer problem.

The consumer therefore allocates all its income to the purchase of an optimal bundle x ^,
that is, p x^ = w.11 This property is called Walras' law. Thanks to it, in the consumer
problem with strongly increasing utility functions u : Rn+ ! R we can replace the budget set
B (p; w) with the budget line

(p; w) = x 2 Rn+ : p x = w @B (p; w)

Problem (22.23) then reduces to

max u (x) sub x 2 (p; w)


x

which is the form of the consumer problem often studied in introductory courses.

Turn now to a producer who must decide the quantity y to produce of a given output. In
taking such a decision the producer must consider both the revenue r (y) that he will have
by selling the quantity y and the cost c (y) that he will bear to produce it.
Let r : [0; 1) ! R be the revenue function and c : [0; 1) ! R be the cost function of
the producer. His pro t is therefore represented by the function : [0; 1) ! R given by

(y) = r (y) c (y)


11
Proposition 996 is sharper than Proposition 979 because there exist points of the boundary @B (p; w)
such that p x < w. For example, 0 2 @B (p; w).
670 CHAPTER 22. OPTIMIZATION PROBLEMS

The producer (optimization) problem is to maximize his pro t function : [0; 1) ! R, that
is,
max (y) sub y 0 (22.25)
y

In particular, a quantity y^ 0 of output is a maximizer if

(^
y) (y) 8y 0

while maxy2[0;1) (y) is the maximum pro t that can be obtained by the producer. The set
of the (pro t) maximizing outputs is arg maxy2[0;1) (y).

The form of the revenue function depends on the structure of the market in which the
producer sells the output, while that of the cost function depends on the structure of the
market where the producer buys the inputs necessary to produce the good. Let us consider
some classic market structures.

(i) The output market is perfectly competitive, so that its sale price p 0 is independent
of the quantity that the producer decides to produce. In such a case the revenue
function r : [0; 1) ! R is given by

r (y) = py

(ii) The producer is a monopolist on the output market. Let us suppose that the demand
function on this market is D : [0; 1) ! R, where D (y) denotes the unit price at
which the market absorbs the quantity y of the output. Usually, for obvious reasons,
we assume that the demand function is decreasing: the market absorbs greater and
greater quantities of output as its unit price gets lower and lower. The revenue function
r : [0; 1) ! R is therefore given by

r (y) = yD (y)

(iii) The input market is perfectly competitive, that is, the vectors

x = (x1 ; x2 ; :::; xn )

necessary for the production of y have prices gathered in the vector

w = (w1 ; w2 ; :::; wn ) 2 Rn+

that are independent of the quantity that the producer decides to buy (wi is
Pthe price
n
of the i-th input). The cost of a vector x of input is thus equal to w x = i=1 wi xi .
But, how does this cost translate in a cost function c (y)?
To answer this question, assume that f : Rn+ ! R is the production function that the
producer has at his disposal to transform a vector x 2 Rn+ of input into the quantity
f (x) of output. The cost c (y) of producing the quantity y of output is then obtained
by minimizing the cost w x among all the vectors x 2 Rn+ that belong to the isoquant
1
f (y) = x 2 Rn+ : f (x) = y
22.1. GENERALITIES 671

that is, among all the vectors that allow to produce the quantity y of output. Indeed,
in terms of production the inputs in f 1 (y) are equivalent and so the producer will
opt for the cheaper ones. In other terms, the cost function c : [0; 1) ! R is given by

c (y) = min w x
x2f 1 (y)

that is, it is equal to the minimum value of the minimization problem for the cost w x
on the isoquant f 1 (y). Since the linear objective function w x is continuous, by the
Weierstrass Theorem this problem has a solution, so the cost function is well de ned,
when the isoquant f 1 (y) is compact.

To sum up, a producer who, for example, is a monopolist in the output market and faces
perfect competition in the inputs' markets, has a pro t function

(y) = r (y) c (y) = yD (y) min w x


x2f 1 (y)

Instead, a producer who faces perfect competition in all markets, for the output and the
inputs', has a pro t function

(y) = r (y) c (y) = py min w x


x2f 1 (y)

22.1.5 Comments
Ordinality Properties of functions that are preserved under strictly increasing transfor-
mations are called ordinal, as we mentioned when discussing utility theory (Sections 6.4.4
and 17.3.3). In view of Proposition 978, a property may hold for all equivalent objective
functions only if it is ordinal. For instance, all them can be quasi-concave but not concave
(quasi-concavity, but not concavity, is an ordinal property). So, if we are interested in a
property of solutions and wonder which properties of objective functions would ensure it,
ideally we should look for ordinal properties. If we come up with su cient conditions that
are not so { for instance, concavity or continuity conditions { chances are that there exist
more general su cient conditions that are ordinal. In any case, any necessary condition
must be ordinal in that it has to hold for all equivalent objective functions.
To illustrate this subtle, yet important, methodological point, consider the uniqueness of
solutions, a most desirable property for comparative statics exercises (as we remarked earlier
in the chapter). We will soon learn that strict quasi-concavity is an ordinal property that
ensures such uniqueness (Theorem 1032). So does strict concavity as well, which is not an
ordinal property. Yet, conceptually it is strict quasi-concavity the best way to frame this
su cient condition { though, operationally, strict concavity might be the workable version.
What about a necessary condition for uniqueness of solutions? At the end of the chapter
we will digress on cuneiformity, an ordinal property that is both necessary and su cient for
uniqueness (Proposition 1067). As soon as we look for necessary conditions, ordinality takes
center stage.

Rationality Optimization problems are fundamental also in the natural sciences, as Leonida
Tonelli well explains in a 1940 piece: \Maximum and minimum questions have always had a
672 CHAPTER 22. OPTIMIZATION PROBLEMS

great importance also in the interpretation of natural phenomena because they are governed
by a general principle of parsimony. Nature, in its manifestations, tends to save the most
possible of what it uses; therefore, the solutions that it nds are always solutions of either
minimization or maximization problems". The general principle to which Tonelli alludes,
the so-called principle of minimum action, is a metaphysical principle (in the most basic
meaning of this term). Not by chance Tonelli continues by writing \Euler said that, since
the construction of the world is the most perfect and was established by the wisest creator,
nothing happens in this world without an underlying maximum or minimum principle". In
economics, instead, the centrality of the optimization problems is based on a (secular) as-
sumption of rationality of economic agents. The resulting optimal choices of the agents {
for example, optimal bundles for the consumers and optimal outputs for the producers { are
the natural benchmark with respect to which to assess any suboptimal, boundedly rational,
behavior that agents may exhibit.

A matter of interpretation As just remarked, optimality is a most important organiz-


ing principle in natural and social sciences. Yet, a proper scienti c interpretation of the
elements of an optimization problem is all-important. Otherwise, at a purely formal level
any alternative can be made optimal by adding suitable constraints. Indeed, in the maxi-
mization problem (22.2) let us take an arbitrary element x 2 C. If we further constrain this
problem by considering the subset of C given by Cx = fx 2 C : f (x) f (x)g, then we make
x optimal as it trivially solves the maximization problem

max f (x) sub x 2 Cx


x

In an optimization problem one must beware of ad hoc constraints, not to undermine its
organizing principle role.

22.2 Existence: Weierstrass' Theorem


The rst fundamental question that arises for optimization problems, of both theoretical and
applied relevance, is the existence of a solution. Fortunately, there exist remarkable existence
results which guarantee, under very general conditions, the existence of a solution. The most
famous and fundamental among them, already introduced for functions of a single variable in
Section 13.5, is the Weierstrass' Theorem { also known as Extreme Value Theorem. It guar-
antees the existence of both a maximizer and a minimizer for continuous functions de ned
on compact sets. Given the centrality of optimization problems in economic applications,
Weierstrass' Theorem is one of the most important results that we present in this book.

Theorem 997 (Weierstrass) A function f : A Rn ! R continuous on a compact subset


K of A admits (at least) a minimizer and (at least) a maximizer in K, that is, there exist
x1 ; x2 2 K such that

f (x1 ) = max f (x) and f (x2 ) = min f (x)


x2K x2K

Thanks to this result, the optimization problem (22.2), that is,

max f (x) sub x 2 C


x
22.2. EXISTENCE: WEIERSTRASS' THEOREM 673

admits a solution whenever f is continuous and C is compact. This holds also for the dual
optimization problem with min in place of max.

The hypotheses of continuity and compactness in Weierstrass' Theorem cannot be weak-


ened, as the simple examples presented in Section 13.5 show.

A classic economic application of Weierstrass' Theorem is the consumer problem.

Proposition 998 If the utility function u : A Rn+ ! R is continuous on the closed set A,
then the consumer problem
max u (x) sub x 2 C (p; w)
x

has a solution provided p 0.

Proof By Proposition 992, the budget set B (p; w) is compact. As A is closed, the set
C (p; w) = B (p; w) \ A is compact (why?). By the Weierstrass Theorem, the consumer
problem has then a solution.

In words, when the utility function is continuous and the consumption set is closed,
optimal bundles exist as long as there are no free goods. These conditions are fairly mild
and often satis ed.12 In particular, the most important case when the consumption set is
closed is when A = Rn+ , as it happens to the CES consumer of the next example.

Example 999 The CES utility function u : Rn+ ! R given by


1
u (x) = ( x1 + (1 ) x2 )

with 2 [0; 1] and 2 (0; 1], is continuous. By the Weierstrass Theorem, the consumer
problem with this utility function has a solution provided p 0. N

The next example shows what may happen with free goods, so when the Weierstrass
Theorem cannot be applied.

Example 1000 Consider the strictly monotone utility function u : R2+ ! R de ned by
u (x) = x1 + x2 . Let p2 = 0. The budget set B (p; w) = x 2 R2+ : p1 x1 w is unbounded
(so not compact) because any amount, however large, of the free good is a ordable and so
belongs to the budget set. The consumer problem

max x1 + x2 sub p1 x1 w
(x1 ;x2 )

has no solution. For, suppose per contra that bundle x ^ = (^x1 ; x


^2 ) 2 B (p; w) is optimal.
Given any " > 0, the strictly greater bundle x
^" = (^
x1 ; x
^2 + ") still belongs to B (p; w) and
has a strictly higher utility than x
^ since

u (^
x" ) = x
^1 + x
^2 + " > x
^1 + x
^2 = u (^
x)
12
Free goods short circuit the consumer problem, so constraints may actually help consumers to focus:
homo oeconomicus e vinculis ratiocinatur (to paraphrase a sentence of Carl Schmitt).
674 CHAPTER 22. OPTIMIZATION PROBLEMS

This contradicts the optimality of x^ and so we conclude that there are no optimal bundles.
Intuitively, the consumer, being the two goods perfect substitute for him, shifts all his
consumption on the free good but then enters an hopeless consumption spree.
Assume that, for some reason, the consumer loses interest in the free good. He now
features the strongly, but not strictly, monotone utility function u (x) = x1 . In this case,
w
arg max u (x) = ;x
^2 :x
^2 0
x2B(p;w) p1
The consumer thus exhausts all his wealth on good 1 and is indi erent over any amount of
good 2. There exist uncountably many optimal bundles: the presence of a free good now
resulted in an abundance of optimal bundles. N

Given the importance of the Weierstrass Theorem, we close the section with two possible
proofs. First, we need an important remark on notation.

Notation In the rest of the book, to simplify notation we denote also sequences of vectors by
fxn g rather than fxn g. If needed, the writing fxn g Rn should clarify the vector nature of
the sequence even though here n denotes both the dimension of the space Rn and a generic
term xn of a sequence. It is a slight abuse of notation, as the same letter denotes two
altogether di erent entities, but hopefully it should not cause any confusion.

The rst proof Weierstrass' Theorem is based on the following lemma.

Lemma 1001 Let A be a subset of the real line. There exists a sequence fan g A that
converges to sup A.

Proof Set = sup A. Suppose that 2 R. By Proposition 127, for every " > 0 there exists
a" 2 A such that a" > ". By taking " = 1=n for every n 1, it is therefore possible to
build a sequence fan g A such that an > 1=n for every n. It is immediate to see
that an ! .
Suppose now = +1. It follows that for every K > 0 there exists aK 2 A such that
aK K. By taking K = n for every n 1, we can therefore build a sequence fan g such
that an n for every n. It is immediate to see that an ! +1.

First proof of the Weierstrass Theorem Set = supx2C f (x), that is, = sup f (C).
By the previous lemma, there exists a sequence fan g f (C) such that an ! . Let
fxn g C be such that an = f (xn ) for every n 1. Since C is compact, the Bolzano-
Weierstrass' Theorem yields a subsequence fxnk g fxn g that converges to some x^ 2 C,
that is, xnk ! x^ 2 C. Since fan g converges to , also the subsequence fank g converges to
. Since f is continuous, it follows that

= lim ank = lim f (xnk ) = f (^


x)
k!1 k!1

We conclude that x
^ is a solution and = max f (C), that is, x^ 2 arg maxx2C f (x) and
= maxx2C f (x). A similar argument shows that arg minx2C f (x) is not empty.

The second proof of Weierstrass' Theorem is based on Proposition 597, which says that
the continuous image of a compact set is compact.
22.3. EXISTENCE: TONELLI'S THEOREM 675

Second proof of Weierstrass' Theorem We prove the result for n = 1. By Proposition


597, f (K) is compact, so is bounded. By the Least Upper Bound Principle, there exists
sup f (K). Since sup f (K) 2 @f (K) (why?) and f (K) is closed, it follows that sup f (K) 2
f (K). Therefore, sup f (K) = max f (K), that is, there exists x1 2 K such that f (x1 ) =
maxx2K f (x). A similar argument shows that arg minx2C f (x) is not empty.

22.3 Existence: Tonelli's Theorem


22.3.1 Coercivity
Weierstrass' Theorem guarantees the existence of both maximizers and minimizers. How-
ever, when studying optimization problems in economics, one is generally interested in the
existence of maximizers or minimizers, but rarely in both. For example, in many economic
applications the existence of maximizers is of crucial importance, while that of minimizers
is of little or no interest at all.
For such a reason we will now introduce a class of functions which, thanks to an ingenious
use of Weierstrass' Theorem, are guaranteed to admit maximizers under weaker hypotheses,
without making any mention of minimizers.13 Recall that for a function f : A Rn ! R
the upper contour set fx 2 A : f (x) tg is indicated as (f t).

De nition 1002 A function f : A Rn ! R is said to be coercive on a subset C of A if


there is a scalar t 2 Im f such that the set

(f t) \ C = fx 2 C : f (x) tg (22.26)

is non-empty and compact.

Thus, a function is coercive on C when there is at least an upper contour set that has a
non-empty and compact intersection with C. In particular, when A = C the function is just
said to be coercive, without any further speci cation.
Next we show that, under some basic conditions, the condition t 2 Im f can be relaxed
to just t 2 R, that is, the scalar t can be chosen freely, a useful simpli cation when checking
coercivity.

Proposition 1003 If f : A Rn ! R is continuous and C A is closed, in the last


de nition the scalar t can be chosen arbitrarily in R.

This result is a simple consequence of the following important property of upper and
lower contours sets of continuous functions, which re nes what seen in Example 595.

Lemma 1004 Let f : A Rn ! R be continuous on a closed subset C of A. Then, the sets


(f t) \ C and (f t) \ C are both closed for every t 2 R.

The hypothesis that C is closed is crucial. Take for example the identity function f :
R ! R given by f (x) = x. If C = (0; 1), we have (f t) \ C = [t; 1) for every t 2 (0; 1) and
these sets are not closed.
13
Needless to say, the theorems of this section can be \ ipped over" (just take f ) in order to guarantee
the existence of minimizers, now without caring about maximizers.
676 CHAPTER 22. OPTIMIZATION PROBLEMS

Proof If (f t) is empty, we have that (f t) \ C = ;, which is trivially closed. So, let


(f t) be non-empty. Let fxn g (f t) \ C be a sequence converging to x 2 R. By
Theorem 174, to prove that (f t) \ C is closed one must show that x 2 (f t) \ C. The
fact that C is closed implies that x 2 C. The continuity of f at x implies that f (xn ) ! f (x).
Since f (xn ) t for every n, a simple application of Proposition 320 shows that f (x) t,
that is x 2 (f t). We conclude that x 2 (f t) \ C, as desired. A similar argument proves
that also the set (f t) \ C is closed.

Proof of Proposition 1003 Let (f t) \ C be non-empty and compact for some t 2 R.


Then, there exists y 2 C such that f (y) t. Since f is continuous and C is closed, by
Lemma 1004 the upper contour set (f f (y)) is closed. So, the set (f f (y)) \ C is
closed; it is actually compact because it is a closed subset of the compact set (f t) \ C
(cf. Proposition 173). We conclude that the set (f f (y)) \ C is compact, proving the
coercivity of f on C since f (y) 2 Im f .

With this, next we illustrate coercivity with a few examples

Example 1005 The function f : R ! R given by f (x) = x2 is coercive. Its graph is a


downward parabola

4 y
3

0
O x
-1 y =t

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5

that already suggests its coercivity. Formally, we have


( p p
t; t if t 0
fx 2 R : f (x) tg =
; if t > 0

So, fx 2 R : f (x) tg is non-empty and compact for every t 0. N


22.3. EXISTENCE: TONELLI'S THEOREM 677

Example 1006 Consider the cosine function f : R ! R given by f (x) = cos x, with graph:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

This function is coercive on [ ; ]. For example, for t = 0 one has that


h i
fx 2 [ ; ] : f (x) 0g = ;
2 2

More generally, from the graph it is easy to see that the set fx 2 [ ; ] : f (x) tg is non-
empty and compact for every t 1. However, the function fails to be coercive on the entire
real line: the set fx 2 R : f (x) tg is unbounded { so, not compact { for every t 1 and
is empty for every t > 1 (as one can easily see from the graph). N

As the last example shows, coercivity is a joint property of the function f and of the set
C, that is, of the pair (f; C). It is an ordinal property:

Proposition 1007 Given a function f : A Rn ! R, let g : B R ! R be strictly


increasing with Im f B. The function f is coercive on C A if and only if the composite
function g f is coercive on C.

Proof In proving Proposition 978 we noted that

f (x) f (y) () (g f ) (x) (g f ) (y) 8x; y 2 A (22.27)

It follows that (f t) = (g f g (t)) for all t 2 Im f and (g f s) = f g 1 (s) for


all s 2 Im g f . In turn, this readily implies that f is coercive on C A if and only if so
does g f .

Example 1008 Thanks to Example 1005 and Proposition 1007, the famous Gaussian func-
2
tion f : R ! R de ned by f (x) = e x is coercive. This should be clear by inspection of its
678 CHAPTER 22. OPTIMIZATION PROBLEMS

graph:
3

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

which is the well-known \bell curve" found in statistics courses (cf. Section 36.4). N

Continuous functions are coercive on compact sets, a simple consequence of the closure
of their upper contour sets established in Lemma 1004.

Proposition 1009 A function f : A Rn ! R which is continuous on a compact subset C


of A is coercive on C.

Proof Let C A be compact. If f : A Rn ! R is continuous on C, Lemma 1004 implies


that any set (f t) \ C is closed. Since a closed subset of a compact set is compact itself,
it follows that any (f t) \ C is compact. Therefore, f is coercive on C.

Continuous functions f on compact sets C are, thus, a rst relevant example of pairs
(f; C) exhibiting coercivity. Let us see a few more examples.

Example 1010 Let f : R ! R be de ned by f (x) = 1 x2 . Its graph is:

4 y
3

0
O x
-1

-2

-3

-4

-5
-4 -3 -2 -1 0 1 2 3 4 5
22.3. EXISTENCE: TONELLI'S THEOREM 679

This function is coercive, as the graph suggests. Formally, we have


( p p
1 t; 1 t if t 1
fx 2 R : f (x) tg =
; if t > 1

and so the set fx 2 R : f (x) tg is non-empty and compact for every t 1. For example,
for t = 0 we have
fx 2 R : f (x) 0g = [ 1; 1]

which su ces to conclude that f is coercive { indeed, De nition 1002 requires the mere
existence of at least a scalar t 2 Im f for which the set fx 2 R : f (x) tg is non-empty and
compact. N

Example 1011 The function f : R ! R de ned by f (x) = e jxj is coercive. Indeed


8
>
> R if t 0
<
fx 2 R : f (x) tg = [log t; log t] if t 2 (0; 1]
>
>
:
; if t > 1

and so fx 2 R : f (x) tg is non-empty and compact for each t 2 (0; 1]. N

Example 1012 De ne f : R ! R by
(
log jxj if x 6= 0
f (x) =
0 if x = 0

and let C = [ 1; 1]. We have


(
1; et [ et ; +1 [ f0g if t 0
fx 2 R : f (x) tg =
1; et [ et ; +1 if t > 0

and so (
; t>0
fx 2 R : f (x) tg \ C =
1; et [ et ; 1 [ f0g t 0
Thus f is coercive on the compact set [ 1; 1]. Note that f is discontinuous at 0, thus making
Proposition 1009 inapplicable. N

22.3.2 Tonelli
The fact that coercivity and continuity of a function guarantee the existence of a maximizer
is rather intuitive. The upper contour set (f t) indeed \cuts out the low part" { i.e.,
under the value t { of Im f leaving untouched the high part { where the maximum value
lies. The following result, a version of a result of Leonida Tonelli, formalizes this intuition
by establishing the existence of maximizers for coercive functions.
680 CHAPTER 22. OPTIMIZATION PROBLEMS

Theorem 1013 (Tonelli) A function f : A Rn ! R which is coercive and continuous


on a subset C of A admits (at least) a maximizer in C, that is, there exists a x
^ 2 C such
that
f (^
x) = max f (x)
x2C

If, in addition, C is closed, then arg maxx2C f (x) is compact.

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. By Weierstrass' Theorem, there exists x ^ 2 such that f (^x) f (x)
for every x 2 . At the same time, if x 2 C we have that f (x) < t and so f (^
x) t > f (x).
It follows that f (^x) f (x) for every x 2 C, that is, f (^
x) = maxx2C f (x).
It remains to show that arg maxx2C f (x) is compact if C is closed. Since arg maxx2C f (x)
, it is enough to show that arg maxx2C f (x) is closed (in that a closed subset of a compact
set is, in turn, compact). Clearly, we have

arg max f (x) = f max f (x) \ C


x2C x2C

So, arg maxx2C f (x) is closed by Proposition 1004, as desired.

Thanks to Proposition 1009, the hypotheses of Tonelli's Theorem are weaker than those
of Weierstrass' Theorem. On the other hand, weaker hypotheses lead to a weaker result (as
always, no free meals) in which only the existence of a maximizer is guaranteed, without mak-
ing any mention of minimizers. Since, as we already noted, in many economic optimization
problems, one is interested in the existence of maximizers, Tonelli's Theorem is important
because it allows to \trim o " overabundant hypotheses (with respect to our needs) from
Weierstrass' Theorem. In particular, we can use Tonelli's Theorem in optimization problems
where the choice set is not compact { for example, in Chapter 37 we will use it with open
choice sets.

To sum up, the optimization problem (22.2), that is,

max f (x) sub x 2 C


x

has a solution if f is coercive and continuous on C. Under such hypotheses, one cannot say
anything about the dual minimization problem with min instead of max.
2
Example 1014 The functions f; g : R ! R de ned by f (x) = 1 x2 and g (x) = e x are
both coercive (see Examples 1010 and 1008). Since they are continuous as well, by Tonelli's
Theorem we can say that arg maxx2R f (x) 6= ; and arg maxx2R g (x) 6= ; { as easily seen
from their graphs, for both functions the origin is the global maximizer. Note that, instead,
arg minx2R f (x) = arg minx2R g (x) = ;. Indeed, the set R is not compact, thus making
Weierstrass' Theorem inapplicable. N

A constant function on Rn is a simple example of a continuous function that, trivially,


admits maximizers (and minimizers as well) but it is not coercive. So, coercivity is not a
necessary condition for the existence of maximizers, even for continuous objective functions.
Yet, by Tonelli's Theorem it becomes a su cient condition for continuous objective functions.
22.3. EXISTENCE: TONELLI'S THEOREM 681

N.B. The coercivity of f on C amounts to say that there exists a non-empty compact set
K such that
arg max f (x) K C
x2C

Indeed, just take K = (f t) \ C in (22.26) because, if the solution set is non-empty,


we trivially have arg maxx2C f (x) = ff maxx2C f (x)g \ C. In words, coercivity thus
requires that the solution set can be \inscribed" in a compact subset of the choice set. Such
compact subset can be regarded as a rst, possibly very rough, estimate of the solution set.
However rough, in view of Tonelli's Theorem such estimate ensures for continuous functions
the existence of solutions. In this vein, Tonelli's Theorem can be viewed as the outcome
of two elements: (i) the continuity of the objective function, (ii) a preliminary \compact"
estimate of the solution set.14 O

22.3.3 Supercoercivity
In light of Tonelli's Theorem, it becomes important to identify classes of coercive functions.
Supercoercive functions are a rst relevant example.15

De nition 1015 A function f : Rn ! R is said to be supercoercive if, for every sequence


fxn g Rn ,
kxn k ! +1 =) f (xn ) ! 1

Supercoercivity requires f to diverge to 1 along any possible unbounded sequence


fxn g Rn { i.e., such that kxn k ! +1. In words, the function cannot take, inde nitely,
increasing values on a sequence that \dashes o " to in nity. This makes all upper contour
sets bounded:

Proposition 1016 A function f : Rn ! R is supercoercive if and only if all its upper


contour sets are bounded.

Proof \Only if". Let f : Rn ! R be supercoercive. Suppose, by contradiction, that there is


an upper contour set (f t) which is not bounded. Then, there is, a sequence fxn g (f t)
such that kxn k ! +1. That is, fxn g Rn is such that kxn k ! +1 and f (xn ) t for each
n. But, kxn k ! +1 implies f (xn ) ! 1 because f is supercoercive. This contradiction
proves that all sets (f t) are bounded.
\If". Suppose that all upper contour sets are bounded. Let fxn g Rn be such that
kxn k ! +1. Fix any scalar t < supx2Rn f (x), so that the corresponding upper contour
set (f t) is not empty. Since it is bounded, by De nition 167 there exists K > 0 large
enough so that kxk < K for all x 2 (f t). Since kxn k ! +1, then there exists nt 1
large enough so that xn 2 = (f t) for all n nt , i.e., f (xn ) < t for all n nt . In turn, this
implies that lim sup f (xn ) t. Since this inequality holds for all scalars t < supx2Rn f (x),
we conclude that lim sup f (xn ) = 1, which in turn trivially implies that lim f (xn ) = 1,
as desired.
14
Coda readers will learn in Chapter 23 that (i) can be substantially weakened.
15
For the sake of simplicity, here we focus on functions de ned on Rn although the analysis holds for
functions de ned on a subset A of Rn as well (in the next de nition one then requires fxn g A).
682 CHAPTER 22. OPTIMIZATION PROBLEMS

Example 1017 (i) The function f : R ! R, de ned by f (x) = x2 is supercoercive.


Indeed, since jxn j2 = x2n for every n, we have that jxn j ! +1 only if x2n ! +1. This
implies that
jxn j ! +1 =) f (xn ) = 1
yielding that the function is supercoercive.
(ii) The function f : R2 ! R given by f (x) = x21 x22 is supercoercive. Indeed,
q 2
f (x) = x21 + x22 = x21 + x22 = kxk2

and so kxn k ! +1 implies f (xn ) ! 1. Pn


(iii) More generally, the function f : Rn ! R given by f (x) = kxk2 = 2
i=1 xi is
supercoercive. N

Example 1018 (i) The function f : R2 ! R given by f (x) = (x1 x2 )2 is not supercoer-
cive. Consider
p the sequence xn = (n; n). One has that f (xn ) = 0 for every n 1, although
kxn k = n 2 ! +1.
(ii) The exponential function f : R ! R given by f (x) = ex is not supercoercive: just
2
consider the sequence xn = n. Its cousin f (x) = ex is, instead, easily checked to be
supercoercive.
(iii) The negative quadratic function f : R ! R de ned by f (x) = x2 is supercoercive,
2
as previously checked. Its strictly increasing transformation e x is, however, not super-
coercive: just observe that the upper contour set (f 0) is equal to the real line, so it is
unbounded. N

The last example shows that supercoercivity is not, unlike coercivity, an ordinal property.
Yet, it implies coercivity for functions f that are continuous on a closed set C. As a result,
Tonelli's Theorem can be applied to the pair (f; C).

Proposition 1019 A supercoercive function f : Rn ! R which is continuous on a closed


subset C of A is coercive there. In particular, the sets (f t) \ C are compact for every
t 2 R.

Proof The last result implies that, for every t 2 R, the sets (f t) \ C are bounded. Since
f is continuous and C is closed, such sets are also closed. Indeed, take fxn g (f t) \ C
such that xn ! x 2 Rn . By Theorem 174, to show that (f t) \ C is closed it su ces
to show that x 2 (f t) \ C. As C is closed, we have x 2 C. Since f is continuous, we
have lim f (xn ) = f (x). Since f (xn ) t for every n 1, it follows that f (x) t, that is,
x 2 (f t). Hence, x 2 (f t) \ C and the set (f t) \ C is closed. Since it is bounded,
it is compact.

The reader should note that, when considering a supercoercive and continuous func-
tion, all sets (f t) \ C are compact, while coercivity requires only that at least one of
them be non-empty and compact. This shows, once again, how supercoercivity is a much
stronger property than coercivity. However, it is simpler both to formulate and to verify,
thus explaining its appeal.

The next result establishes a simple comparison criterion for supercoercivity.


22.4. SEPARATION THEOREMS 683

Proposition 1020 Let f : Rn ! R be supercoercive. If g : Rn ! R is such that, for some


k > 0,
kxk k =) g (x) f (x) 8x 2 Rn
then g is supercoercive.

Proof Let fxn g Rn be such that kxn k ! +1. This implies that there exists n 1
such that kxn k k, and so g (xn ) f (xn ), for every n n. At the same time, since f
is supercoercive, the sequence ff (xn )g is such that f (xn ) ! 1. This implies that for
each K 2 R there exists nK 1 such that xn < K for all n nK . For each K 2 R, set
nK = max fn; nK g. We then have g (xn ) f (xn ) < K for all n nK , thus proving that
g (xn ) ! 1 as well.

Supercoercivity is thus inherited via dominance: given a function g, if we can nd a


supercoercive function f such that g f on some set fx 2 Rn : kxk kg, then also g is
supercoercive. A natural supercoercive \test" function f : Rn ! R is

f (x) = kxk +

with < 0 and 2 R. It is a very simple function, easily seen to be supercoercive. If a


function g : Rn ! R is such that
g (x) kxk + (22.28)
on some set fx 2 Rn : kxk kg, then it is supercoercive.

Example 1021 Let g : Rn ! R be de ned by g (x) = 1 kxk . If 1, then g is


supercoercive. Indeed, on fx 2 Rn : kxk 1g we have kxk kxk, so

g (x) = 1 kxk 1 kxk

The inequality (22.28) holds with = 1 and = 1, so g is supercoercive (for = 1 and


n = 1, we get back to the function g (x) = 1 x2 that was shown to be coercive in Example
1010).
Since g is continuous, by Tonelli's Theorem it has at least one maximizer in Rn . Yet,
it is easily seen that the function has no minimizers (here Weierstrass' Theorem is useless
because Rn is not compact). N

22.4 Separation theorems


It is sometimes useful to separate convex sets through linear functions.16 As a dividend of
Tonelli's Theorem, in this section we establish a non-trivial result of this kind.
Recall that an hyperplane in Rn is, for some 0 6= a 2 Rn and b 2 R,

H = fx 2 Rn : a x = bg

That is, they are the level curves of linear functions (cf. Section 16.6). An hyperplane H
de nes two closed half-spaces

H+ = fx 2 Rn : a x bg and H = fx 2 Rn : a x bg
16
For instance, see the proofs of Theorems 1530, 1532 and 1539.
684 CHAPTER 22. OPTIMIZATION PROBLEMS

whose intersection is H, i.e., H+ \ H = H, as well as two open half-spaces

int H+ = fx 2 Rn : a x > bg and int H = fx 2 Rn : a x < bg

formed by their interiors.

De nition 1022 Two sets X and Y of Rn are:

(i) separated if there exists an hyperplane H such that X H+ and Y H ;

(ii) strictly separated if there exists an hyperplane H such that X int H+ and Y
int H .

In words, two sets are (strictly) separated when they belong to opposite closed (open)
half-spaces. Intuitively, the separating hyperplane acts like a watershed between them.
Next we give a straightforward, yet useful, characterization.

Lemma 1023 Two sets X and Y of Rn are:

(i) separated if and only if there exist 0 6= a 2 Rn and b 2 Rn such that a x b a y


for all x 2 X and y 2 Y ;

(ii) strictly separated if and only if there exist 0 6= a 2 Rn and b 2 Rn such that a x > b >
a y for all x 2 X and y 2 Y .

This characterization suggests a more stringent notion of separation, in which the sets
are completely contained in opposite open half-spaces.

De nition 1024 Two sets X and Y of Rn are strongly separated if there exists 0 6= a 2 Rn
such that
inf a x > sup a y
x2X y2Y

i.e., there exist b 2 Rn and " > 0 such that a x b+" > b a y for all x 2 X and y 2 Y .

Our rst result studies the basic case of separation between convex sets and single points.

Proposition 1025 Let C be a convex set in Rn and let x0 2


= C.

(i) If C is closed, then fx0 g and C are strongly separated.

(ii) If C is open, then fx0 g and C are strictly separated.

Proof We only prove (i), while we omit the non-trivial proof of (ii). Without loss of
generality (why?), assume that x0 = 0 2 = C. Consider the continuous function f : Rn ! R
2
given by f (x) = kxk . This function is supercoercive (Example 1017). By Proposition
1019, f is coercive on the closed set C, so it has a maximizer c 2 C by Tonelli's Theorem.
If x is any point of C, we have kck2 k c + (1 ) xk2 . Hence

kck2 2
kck2 + (1 )2 kxk2 + 2 (1 ) c x
2 2
(1 + ) kck (1 ) kxk + 2 c x
22.5. LOCAL EXTREMAL POINTS 685

For ! 1, we get kck2 c x for all x 2 C. Therefore, setting = kck2 =2 we have


c x kck2 > > 0 = c x0 , which is the desired separation property.

The strong separation in point (i) continues to hold if we replace singletons with compact
convex sets, as the next important result shows.

Theorem 1026 (Strong Hyperplane Separation Theorem) A compact convex set and
a closed convex set are strongly separated if they are disjoint.

Proof Let K be a compact convex set and C be a closed convex set, with K \ C = ;.
The set K C = fx y : x 2 K; y 2 Cg is a closed and convex set (Proposition 964) that
does not contain the origin 0 since K \ C = ;. By Proposition 1025-(i), the sets f0g and
K C are then strongly separated. So, there exist 0 6= a 2 Rn , b 2 Rn and " > 0 such that
0 = a 0 b < b + " a (x y) for all x 2 K and y 2 C. This implies a x b + " + a y for
all x 2 K and y 2 C. Since K is compact, by the Weierstrass Theorem there exists x ^2K
such that a x a x ^ b + " + a y for all x 2 K and y 2 C. Hence, a x ^ b + " + supy2C a y,
that is, minx2K a x > supy2C a y. We conclude that K and C are strongly separated.

The strict separation in point (i) of Proposition 1025 leads to the following separation
result based on interiors.

Proposition 1027 Two convex sets are separated if they have disjoint and non-empty in-
teriors .

Proof Let A and B be two convex sets with int A and int B non-empty and disjoint. Then,
int A int B is a open and convex set (Proposition 964) that does not contain the origin
0. By Proposition 1025-(ii), the sets f0g and int A int B are then strictly separated. So,
there exist 0 6= a 2 Rn and b 2 Rn such that 0 = a 0 < b < a (x y) for all x 2 int A and
y 2 int B. This implies
a x>b+a y (22.29)
for all x 2 int A and y 2 int B. Since A and B are convex, A int A and B int B
(why?). So, if x 2 A and y 2 B, there exist sequences fxn g int A and fyn g int B
such that xn ! x and yn ! y. By (22.29), a xn > b + a yn for all n 1, and so
a x = limn!1 a xn b + limn!1 a yn = b + a y. Since b > 0, we conclude that
a x b + a y a y for all x 2 A and all y 2 B, as desired.

As the reader can check, this argument can be adapted to prove, more generally, that
two convex sets are separated if they have only boundary points in common and at least one
of them has a non-empty interior.

22.5 Local extremal points


Let us now consider a local and weaker version of the notion of maximizer. By itself, it is
a weakening of little interest, particularly for economic applications in which we are mainly
interested in global extrema. For example, in the consumer problem it is not of much interest
whether a bundle is a local maximizer or not: what matters is whether it is a global maximizer
or not.
686 CHAPTER 22. OPTIMIZATION PROBLEMS

Nevertheless, thanks to di erential calculus, local maximizers are of great instrumental


importance, in primis (but not only) in the solution of optimization problems. For this
reason, we will devote this section to them.
Consider a function f : R ! R with a graph that reminds the pro le of a mountain
range:

6
y
5

0 O x

-1

-2
1880 1900 1920 1940 1960 1980 2000

The highest peak is the (global) maximum value, but intuitively the other peaks, too, cor-
respond to points that, locally, are maximizers. The next de nition formalizes this simple
idea.

De nition 1028 Let f : A Rn ! R be a real-valued function and C a subset of A. A


vector x
^ 2 C is said to be a local maximizer of f on C if there exists a neighborhood B" (^
x)
of x
^ such that
f (^
x) f (x) 8x 2 B" (^
x) \ C (22.30)

The value f (^
x) of the function at x
^ is called local maximum value of f on C.

The local maximizer is strong, so unique, if in (22.30) we have f (^


x) > f (x) for every
x 2 B" (^x) \ C such that x 6= x ^. In the terminology of the optimization problem (22.2),
a local maximizer of f on C is called a local solution of the problem. We have analogous
de nitions for local minimizers, with and < in place of and >.

A global maximizer on C is obviously also a local maximizer. The notion of local max-
imizer is, indeed, much weaker than that of global maximizer. As the next example shows,
it may happen that there are (even many) local maximizers and no global maximizers.

Example 1029 (i) Let f : R ! R be given by f (x) = x6 3x2 + 1. In Example 1674 we


22.5. LOCAL EXTREMAL POINTS 687

will see that its graph is:

10
y
8

-2

-4
O x
-6

-8

-10
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

In particular, the origin x = 0 is a local maximizer, but not a global one. Indeed,
limx!+1 f (x) = limx! 1 f (x) = +1, thus the function has no global maximizers.
(ii) Let f : R ! R be given by
(
cos x if x 0
f (x) =
x if x > 0
with the graph

8
y
6

0
O x
-2

-4

-6

-8

-8 -6 -4 -2 0 2 4 6 8

The function has in nitely many local maximizers (i.e., x = 2k for k 2 N), but no global
ones. N

Terminology In what follows maximizers (and minimizers) are understood to be global


even if not stated explicitly. The adjective \local" will be always added when they are local
in the sense of the previous de nition.

O.R. The most important part of the de nition of a local maximizer is \if there exists a
neighborhood". A common mistake is to replace the correct \if there exists a neighborhood"
688 CHAPTER 22. OPTIMIZATION PROBLEMS

by the incorrect \if, by taking a neighborhood B" (^ x) of x


^". In such a way, we do not de ne
a local maximizer but a global one. Indeed, to x a priori a neighborhood B" (^ x) amounts
to considering B" (^ x) rather than C as the choice set, so a di erent optimization will be
addressed. Relatedly, in the neighborhood B" (^ x) in (22.30) the local maximizer is, clearly,
a global one. Such \choice set" is, however, chosen by the function, not posited by us. So,
it is typically of little interest for the application that motivated the optimization problem.
Applications discipline optimization problems, not vice versa. H

O.R. An isolated point x0 of C is always both a local maximizer and a local minimizer.
Indeed, by de nition there is a neighborhood B" (x0 ) of x0 such that B" (x0 ) \ C = fx0 g,
so the inequalities f (x0 ) f (x) and f (x0 ) f (x) for every x 2 B" (x0 ) \ C reduce to
f (x0 ) f (x0 ) and f (x0 ) f (x0 ), which are trivially true.
Considering isolated points as both local maximizers and local minimizers is a bit odd. To
avoid this, we could reformulate the de nition of local maximizer and minimizer by requiring
x
^ to be a limit point of C. However, an even more unpleasant consequence would result:
if an isolated point were a global extremal (e.g., recall the example at the end of Section
22.1.1), we should say that it is not so in the local sense. Thus, the remedy would be worse
than the disease. H

22.6 Concavity and quasi-concavity


22.6.1 Maxima
Concave functions nd their most classic application in the study of optimization problems,
in which they enjoy truly remarkable properties. We already discussed some of them earlier
in the chapter. The next striking result shows that maximizers of concave functions are
automatically global.

Theorem 1030 (Fenchel) Let f : C Rn ! R be a concave function de ned on a convex


set. If the point x
^ 2 C is a local maximizer, then it is a global maximizer.

Proof Let x
^ 2 C be a local maximizer. By de nition, there exists a neighborhood B" (^
x)
such that
f (^
x) f (x) 8x 2 B" (^x) (22.31)
Suppose, by contradiction, that x^ is not a global maximizer. Then, there exists a y 2 C such
that f (y) > f (^
x). Since f is concave, for every t 2 (0; 1) we have

f (t^
x + (1 t) y) tf (^
x) + (1 t) f (y) > tf (^
x) + (1 t) f (^
x) = f (^
x) (22.32)

Moreover, since C is convex, we have t^


x + (1 t) y 2 C for every t 2 (0; 1). On the other
hand,
lim kt^
x + (1 t) y x ^k = ky x^k lim (1 t) = 0
t!1 t!1
Therefore, there exists t 2 (0; 1) such that t^x + (1 t) y 2 B" (^x) for every t 2 (t; 1). From
(22.32) it follows that for such t we have f (t^
x + (1 t) y) > f (^
x), which contradicts (22.31).
We conclude that x ^ is a global maximizer.

This important result does not hold for quasi-concave functions:


22.6. CONCAVITY AND QUASI-CONCAVITY 689

Example 1031 Let f : R ! R be given by


8
>
> 2 if x 0
<
f (x) = 2 x if x 2 (0; 1)
>
>
:
1 if x 1
Graphically:
4

3.5 y
3

2.5

2 2
1.5

0.5

0
O 1 x
-0.5

-1
-3 -2 -1 0 1 2 3

This function is quasi-concave because it is monotone. All points x > 1 are local maximizers,
but not global maximizers. N

When f is quasi-concave, the solution set arg maxx2C f (x) is convex, as it was the case
under concavity (Proposition 982). Indeed, let x
^1 ; x
^2 2 arg maxx2C f (x) and let t 2 [0; 1].
By quasi-concavity,

f (t^
x1 + (1 t) x
^2 ) min ff (^
x1 ) ; f (^
x2 )g = max f (x)
x2C

and therefore
f (t^
x1 + (1 t) x
^2 ) = max f (x)
x2C

i.e., t^
x1 + (1 t) x
^2 2 arg maxx2C f (x).
As we discussed before, when the solution set is convex there are either no solutions or a
unique solution or in nitely many solutions. The uniqueness of solutions is ensured by strict
quasi-concavity, as the next important result shows.

Theorem 1032 A strictly quasi-concave function f : C Rn ! R de ned on a convex set


has at most one maximizer.

Proposition 984 is the special case when f is strictly concave.

Proof The proof is similar to that of Proposition 984 once observed that (22.8) becomes,
by strict quasi-concavity,
1 1
f (z) = f x
^1 + x^2 > min ff (^
x1 ) ; f (^
x2 )g = max f (x)
2 2 x2C
690 CHAPTER 22. OPTIMIZATION PROBLEMS

22.6.2 Minima
Also miniminization problems for concave functions have some noteworthy properties.

Proposition 1033 Let f : C Rn ! R be a function de ned on a convex set.

(i) If f is concave and non-constant, then arg minx2C f (x) @C.

(ii) If f is strictly quasi-concave, then arg minx2C f (x) ext C.

Proof Suppose arg minx2C f (x) 6= ; (otherwise the result is trivially true). (i) Let x
^ 2
arg minx2C f (x). Since f is not constant, there exists y 2 C such that f (y) > f (^ x).
Suppose, by contradiction, that x ^ is an interior point of C. Set z = x^ + (1 ) y with
2 R. The points z are the points of the straight line that passes through x ^ and y.
Since x
^ is an interior point of C, there exists > 1 such that z 2 C. On the other hand,
x
^ = z = + y= 1 1 . Therefore, we get the contradiction

1 1 1 1
f (^
x) = f z + 1 y f (z ) + 1 f (y)

1 1
> f (^
x) + 1 f (^
x) = f (^
x)

It follows that x^ 2 @C, as desired. (ii) Let x^ 2 arg minx2C f (x). Suppose, by contradiction,
that x^2 = ext C. Then, there exist x; y 2 C with x 6= y and 2 (0; 1) such that x ^ = x+
(1 ) y. By strict quasi-concavity, f (^
x) = f ( x + (1 ) y) > min ff (x) ; f (y)g f (^
x),
a contradiction. We conclude that x ^ 2 ext C, as desired.

Hence, under (i) the search of minimizers can be restricted to the boundary points of
C.17 More is true under (ii), where the search can be restricted to the extreme points of C,
an even smaller set (Proposition 782).

Example 1034 Consider the strictly concave function f : [ 1; 1] ! R de ned by f (x) =


1 x2 . Since f 1; 1g is the set of extreme points of C = [ 1; 1], by the last proposition the
minimizers belong to such set. Clearly, both its elements are minimizers. N

Point (i) fails under quasi-concavity, as the next example shows.

Example 1035 Consider the non-constant quasi-concave function f : [0; 1] ! R de ned by


f (1) = 1 and f (x) = 0 if x 2 [0; 1). We have arg minx2C f (x) = [0; 1), and so the origin is
the only miminizer which is a boundary point. N

Extreme points take center stage in the compact case, a remarkable fact because the set
of extreme points can be a small subset of the frontier { for instance, if C is a polytope we
can restrict the search of minimizers to the vertices.

Theorem 1036 (Bauer) Let f : C Rn ! R be a continuous function de ned on a convex


and compact set.
17
Results of this kind, which ensure that solutions of optimization problems are boundary points, are
sometimes classi ed as maximum principles. An earlier instance of a maximum principle is Proposition 979.
22.6. CONCAVITY AND QUASI-CONCAVITY 691

(i) If f is quasi-concave, then


min f (x) = min f (x) (22.33)
x2C x2ext C

and
;=
6 arg min f (x) arg min f (x) co arg min f (x) (22.34)
x2ext C x2C x2ext C

(ii) If f is strictly quasi-concave, then

;=
6 arg min f (x) ext C
x2C

Relative to the previous result, now Weierstrass' Theorem ensures the existence of min-
imizers, and so the equality (22.33) is meaningful. More interestingly, by building on
Minkowski's Theorem this theorem says that a quasi-concave function attains its minimum
value at some extreme point. Under the hypotheses of Bauer's Theorem, in terms of value
attainment the miniminization problem

min f (x) sub x 2 C (22.35)


x

thus reduces to the much simpler problem

min f (x) sub x 2 ext C (22.36)


x

that only involves extreme points. In particular, when f is strictly quasi-concave we can
take advantage of both (i) and (ii), so

;=
6 arg min f (x) = arg min f (x)
x2ext C x2C

The miniminization problem (22.35) thus reduces to the simpler problem (22.36) in terms of
both solutions and value attainment.

Proof By Weierstrass' Theorem, arg minx2C f (x) 6= ;. Point (ii) thus follows from the last
result. As to (i), we rst prove that

arg min f (x) co ext C \ arg min f (x) (22.37)


x2C x2C

That is, minimizers are a convex combination of extreme points which are, themselves,
minimizers. Let x ^ 2 arg minx2C f (x). By Minkowski's Theorem, we have C =Pco ext C.
Therefore, there exist
P nite collections fxi gi2I ext C and f i gi2I (0; 1], with i2I i =
1, such that x
^ = i2I i xi . Since x ^ is a minimizer, we have f (xi ) f (^
x) for all i 2 I.
Together with quasi-concavity, this implies that
!
X
f (^
x) = f i xi min f (xi ) f (^
x) (22.38)
i2I
i2I
P
Hence, we conclude that i2I i f (xi ) = f (^ x), which is easily seen to imply f (xi ) = f (^
x)
for all i 2 I. It follows that for each i 2 I we have xi 2 arg minx2C f (x) \ ext C, proving
(22.37).
692 CHAPTER 22. OPTIMIZATION PROBLEMS

We are ready to prove (22.34). By (22.37), we have arg minx2C f (x) \ ext C 6= ;. Con-
sider x 2 arg minx2C f (x) \ ext C. Let x ^ 2 arg minx2ext C f (x). Since x 2 ext C, we
have f (^x) f (x). Since x 2 arg minx2C f (x), we have f (x) f (^
x). This implies that
f (x) = f (^x) and, therefore, x
^ 2 arg minx2C f (x). Since x ^ was arbitrarily chosen, it follows
that arg minx2ext C f (x) arg minx2C f (x) \ ext C, proving the rst inclusion in (22.34).
Clearly, ext C \ arg minx2C f (x) arg minx2ext C f (x). So, ext C \ arg minx2C f (x) =
arg minx2ext C f (x) and (22.37) yields the second inclusion in (22.34).
It remains to prove (22.33). Let x ^ 2 arg minx2C f (x). By (22.34),
P there exist nite
collections
P f^
xi gi2I arg minx2ext C f (x) and f i gi2I (0; 1], with i2I i = 1, such that
x
^ = i2I i x ^i . By quasi-concavity:
!
X
min f (x) = f (^x) = f ix
^i min f (^
xi ) = min f (x) min f (x)
x2C i2I x2ext C x2C
i2I

So, (22.33) holds.

Minimization problems for concave functions are, conceptually, equivalent to maximiza-


tion problems for convex functions. So, Example 989 can now be viewed as an early illus-
tration of Bauer's Theorem. Let us see other examples.

Example 1037 (i) The function f in Example 1034 is strictly concave. In particular, we
have arg minx2ext C f (x) = arg minx2C f (x) = f 1; 1g, while co (arg minx2ext C f (x)) =
[0; 1]. n o
P
(ii) Consider the simplex 2 = x 2 R3+ : 3i=1 xi = 1 of R3 . De ne f : 2 ! R by

1 1
f (x) = (1 x1 x2 )2 (1 x3 )2
2 2
It is easy to check that f is continuous and concave. Since 2 is convex and compact with
extreme points the versors e1 ; e2 ; e3 , by Bauer's Theorem-(i) we have

6 arg min f ei
;= arg min f (x) co arg min f ei (22.39)
i2f1;2;3g x2 2 i2f1;2;3g

It is immediate to check that f ei = 1=2 for all i 2 f1; 2; 3g, that is,

arg min f ei = e1 ; e2 ; e3 and co arg min f ei = 2


i2f1;2;3g i2f1;2;3g

Let x = (1=4; 1=4; 1=2) 2 2 and x ^ = (1=2; 1=2; 0). We have f (x) = 1=4 > 1=2 = f (^ x),
so x does not belong to arg minx2 2 f (x) but, clearly, belongs to co(arg mini2f1;2;3g f ei ).
^ belongs to arg minx2 2 f (x) but, clearly, does not belong to arg mini2f1;2;3g f ei .
Moreover, x
This proves that the inclusions in (22.39) are strict. N

22.7 A nity
22.7.1 Quasi-a ne objective functions
If we consider quasi-a ne functions { i.e., functions that are both quasi-concave and quasi-
convex { we have the following corollary of Bauer's Theorem.
22.7. AFFINITY 693

Corollary 1038 Let f : C Rn ! R be a continuous function de ned on a convex and


compact set. If f is quasi-a ne, then

max f (x) = max f (x) and min f (x) = min f (x) (22.40)
x2C x2ext C x2C x2ext C

as well as
;=
6 arg max f (x) = co arg max f (x) (22.41)
x2C x2ext C

and
;=
6 arg min f (x) = co arg min f (x) (22.42)
x2C x2ext C

When f is a ne, the hypothesis of continuity becomes super uous by Proposition 836.

Proof By (22.33) we have (22.40). The sets in (22.41) and (22.42) are non-empty by
Weierstrass' Theorem. Since f is quasi-a ne, it is also quasi-concave. By (22.34), we have
(22.42) because arg minx2C f (x) is convex given that f is quasi-a ne. Since f is also
quasi-a ne, the result holds for the arg maxx2C f (x) as well.

For quasi-a ne functions we therefore have an especially e ective version of Weierstrass'


Theorem: not only both maximizers and minimizers exist, but they can be found by solving
the much simpler optimization problems

max f (x) sub x 2 ext C and min f (x) sub x 2 ext C


x x

that only involve extreme points. Moreover, by (22.40), the values attained are the same.
So, the simpler problems are equivalent to the original ones in terms of both solutions and
value attainment.
An earlier instance of such a remarkable simpli cation a orded by quasi-a ne objective
functions was discussed in Example 989-(ii). Next we provide another couple of examples.

Example 1039 (i) Consider the a ne function f : R3 ! R de ned by f (x) = x1 +


2x2 x3 + 5 and the simplex 2 = f(x1 ; x2 ; 1 x1 x2 ) : x1 ; x2 0 and x1 + x2 1g. Its
extreme points are the versors e1 , e2 , and e3 . By the last corollary, some of them have to be
maximizers or minimizers. We have

f e3 = 4 < f e1 = 6 < f e2 = 7

By (22.41) and (22.42), arg maxx2C f (x) = e2 and arg minx2C f (x) = e3 .
(ii) Consider the a ne function f : R3 ! R de ned by f (x) = x1 + 2x2 + 2x3 + 5. Now
we have
f e1 = 6 < f e2 = f e3 = 7
By (22.41) and (22.42),

arg max f (x) = co e2 ; e3 = f(0; ; 1 ): 2 [0; 1]g


x2C

and arg minx2C f (x) = e1 . N


694 CHAPTER 22. OPTIMIZATION PROBLEMS

22.7.2 Linear programming


Corollary 1038 and its variations play a key role in linear programming, which studies op-
timization problems with linear objective functions and a ne constraints. To study these
problems we need to introduce an important class of convex sets. Speci cally, given a m n
matrix A = (aij ) and a vector b 2 Rm , the convex set
8 9
< Xn =
P = fx 2 Rn : Ax bg = x 2 Rn : aij xj bi 8i = 1; :::; m
: ;
j=1

of Rn is called polyhedron. Let us write explicitly the row vectors of the matrix A as:

a1 = (a11 ; a12 ; :::; a1n )

am = (am1 ; am2 ; :::; amn )

Each row vector ai thus identi es an inequality constraint ai x bi that a vector x 2 Rn has
to satisfy in order to belong to the polyhedron. We can indeed write P as the intersection
m
\
P = Hi
i=1

of the half-spaces Hi = fx 2 Rn : ai x bi g seen in Section 22.4.

Example 1040 (i) A ne sets are the polyhedra featuring equality constraints (Proposition
793). (ii) Simplices are polyhedra: for instance 2 in R3 can be written as x 2 R3 : Ax b
with b = (0; 0; 0; 1) 2 R4 and 2 3
1 0 0
6 0 1 0 7
A =6 4
7
4 3 0 0 1 5
1 1 1
Clearly, simplices are examples of compact polyhedra. N

Example 1041 Given b = (1; 1; 2) and


2 3
1 2 2
A=4 0 2 1 5
0 1 1
we have the polyhedron
8 9
< x1 2x2 + 2x3 1 =
P = x 2 R3 : Ax n
b = x = (x1 ; x2 ; x3 ) 2 R : 2x2 x3 1
: ;
x2 x3 2
This polyhedron is not bounded: for instance, the vectors xn = ( n; 1=2; 0) belong to P
for all n 1. N
22.7. AFFINITY 695

Example 1042 The elements of a polyhedron are often required to be positive, so let P =
x 2 Rn+ : Ax b . This polyhedron can be written, however, in the standard form P 0 =
fx 2 Rn : A0 x b0 g via suitable A0 and b0 . For instance, if we require the elements of the
polyhedron of the previous example to be positive, we have b0 = (1; 1; 2; 0; 0; 0) and
2 3
1 2 2
6 0 2 1 7
6 7
6 0 1 1 7
0
A =6 6 7
6 0 0 1 7
7
4 0 1 0 5
1 0 0

in which we added (negative) versors to the matrix A. In sum, the standard formulation of
polyhedra easily includes positivity constraints. N

We can characterize the extreme points of polyhedra. To this end, denote by Ax the
submatrix of A that consists of the rows ai of A featuring constrains that are binding at x,
i.e., such that ai x = bi . Clearly, (Ax ) (A) max fm; ng.

Proposition 1043 Let P = fx 2 Rn : Ax bg be a polyhedron. A vector x 2 P is an


extreme point of P if and only if (Ax ) = n.

In other words, a vector is an extreme point of a polyhedron of Rn if and only if there exist
n linearly independent binding constraints at that vector. Besides its theoretical interest,
this characterization operationalizes the search of extreme points by reducing it to checking
a matrix property.

Proof We prove the \if" leaving the converse to the reader. Suppose that (Ax ) = n.
We want to show that x is an extreme point. Suppose, by contradiction, that there exists
2 (0; 1) and two distinct vectors x0 ; x00 2 P such that x = x0 + (1 ) x00 . Denote by
I (x) = fi 2 f1; :::; mg : ai x = bi g the set of binding constrains. Then,

bi = ai x = ai x0 + (1 ) x00 = ai x0 + (1 ) ai x00 bi 8i 2 I (x)

so
ai x0 = ai x00 = bi 8i 2 I (x)
This implies that x0 and x00 are solutions of the linear system

ai x = bi 8i 2 I (x)

In view of Theorem 744, this contradicts the hypothesis (Ax ) = n. We conclude that x is
an extreme point of P .

Example 1044 Let us check that the versors e1 , e2 and e3 are the extreme points of the
simplex 2 . For each x 2 R3 we have
8
>
> x1 = 0
<
x2 = 0
Ax = b ()
>
> x =0
: 3
x1 + x2 + x3 = 1
696 CHAPTER 22. OPTIMIZATION PROBLEMS

So, 2 3
0 1 0
Ae1 =4 0 0 1 5
1 1 1
By Proposition 1043, versor e1 is an extreme point of 2 because (Ae1 ) = 3. A similar
argument shows that also e2 and e3 are the extreme points of 2 . Moreover, it is easy to see
that no other points x of 2 are such that (Ax ) = 3 (indeed, to have (Ax ) > 2 at least
two coordinates of x have to be 0). N
The last result has the following important consequence (cf. Example 783).
Corollary 1045 A polyhedron has at most a nite number of extreme points.
Proof Let P be a polyhedron. Using the notation used in the last proof, if x is an extreme
point of P , then (Ax ) = n and so, by Proposition 741, x is the unique solution in Rn of the
linear system Ax x = bx , where bx is the vector that consists of the scalars fbi : i 2 I (x)g.
Thus,
I (x) = I x0 =) Ax = Ax0 =) x = x0 8x; x0 2 extP
Equivalently, we have
x 6= x0 =) I (x) 6= I x0 8x; x0 2 extP (22.43)
Since I (x) f1; :::; mg for all x 2 extP , this implies that the set extP is at most nite.

Polyhedra are easily seen to be closed. So, they are compact if and only if they are
bounded. Bounded polyhedra are actually old friends.
Proposition 1046 A convex set in Rn is a bounded polyhedron if and only if it is a polytope.
A bounded polyhedron P can be thus written as a convex envelope of a collection of
vectors xi 2 Rn , i.e., as a polytope P = co fx1 ; :::; xm g.18

Proof We only prove the \only if". Let P be a bounded, so compact, polyhedron. By
Minkowski's Theorem, P = co (ext P ). By Corollary 1045, extP is nite and so P is a
polytope.

Given a vector c 2 Rn and a non-empty polyhedron P , a linear programming problem


has the form
max c x sub x 2 P (22.44)
x
or, equivalently,
n
X
max cj xj
x1 ;:::;xn
j=1
n
X Xn n
X
sub a1j xj b1 ; a2j xj b2 ; :::; amj xj bm
j=1 j=1 j=1

In view of Corollary 1038, we can solve this optimization problem when P is bounded (so
compact).
18
Recall (16.2).
22.8. CONSUMPTION 697

Theorem 1047 (Fundamental Theorem of Linear Programming) For a linear pro-


gramming problem with P bounded, we have

max c x = max c x (22.45)


x2P x2fy2P : (Ay )=ng

and
;=
6 arg max c x = co arg max c x (22.46)
x2P x2fy2P : (Ay )=ng

Though an immediate consequence of Corollary 1038 and Proposition 1043, this is an


important result (as its name shows). In words, it says that when P is bounded (so, compact),
then: (i) by (22.46), a solution of the linear programming problem (22.44) exists and is either
an extreme point of the polyhedron P or a convex combination of extreme points, (ii) by
(22.45), in terms of value attainment we can consider the simpler problem

max c x sub x 2 fy 2 P : (Ay ) = ng


x

that involves only the extreme points.

Example 1048 Consider the linear programming problem

max c x sub x 2 n 1
x

By the Fundamental Theorem of Linear Programming, the solution set is

co arg max c ei = co ei : i 2 arg max cj


ei 2 n 1 j=1;:::;n

For instance, if n = 4 and c = (1; 3; 3; 4), the problem is

max x1 + 3 (x2 + x3 ) 4x4 sub x = (x1 ; x2 ; x3 ; x4 ) 2 3


x1 ;x2 ;x3 ;x4

Its solution set is e2 + (1 ) e3 : 2 [0; 1] . N

A general study of optimization problem with equality and inequality constraints will be
carried out in Chapter 39. Linear programming is the special case of a concave optimization
problem (Section 39.4) where the objective function is linear and the constraints are expressed
via a ne functions.19

22.8 Consumption
22.8.1 Optimal bundles
Let us go back to the consumer problem:

max u (x) sub x 2 C (p; w)


x
19
By Riesz's Theorem and Proposition 820, we can write the objective function and the constraints in the
inner product and matrix form that (22.44) features.
698 CHAPTER 22. OPTIMIZATION PROBLEMS

If u : A Rn+ ! R is continuous and the consumption set A is closed, Weierstrass' Theorem


ensures via Proposition 998 that the consumer problem does have a solution.
If instead the consumption set A is not closed, Weierstrass' Theorem is no longer appli-
cable { the set C (p; w) is not compact { and it is necessary to assume u to be coercive on
C (p; w) in order to apply Tonelli's Theorem, which becomes key in this case. Furthermore,
if A is convex and if u is strictly quasi-concave, by Theorem 1032 the solution is unique. To
sum up:

Theorem 1049 If the utility function u : A Rn+ ! R is continuous and coercive on


C (p; w), the consumer problem has a solution. Such a solution is unique if A is convex and
u is strictly quasi-concave.

This powerful theorem generalizes Proposition 998 and covers most cases of interest in
consumer theory.
Pn For instance, consider the P log-linear utility function u : Rn++ ! R given
by u (x) = i=1 ai log xi , with ai > 0 and ni=1 ai = 1. It has an open consumption set
Rn++ , so Proposition 998 cannot be applied. Fortunately, the following lemma shows that it
is coercive on C (p; w). Since it is also continuous and strictly concave, by Theorem 1049
the consumer problem with log-linear utility has a unique solution.

Lemma 1050 The log-linear utility function u : Rn++ ! R is coercive on C (p; w), provided
p 0.

Proof By Proposition 1007, it su ces to show that the result holds for the Cobb-Douglas
n
Y
utility function u (x) = xai i de ned over Rn++ . We begin by showing that the upper
i=1
contour sets (u t) are closed for every t 2 R. If t 0 the statement is trivially true as
(u t) = ;. Let t > 0, so that (u t) 6= ;. Consider a sequence fxn g (u t) that
converges to a bundle x~ 2 Rn . To prove that (u t) is closed, it is necessary to show that
x
~ 2 (u t). Since fxn g Rn++ , we have x~ 0. Let us show that x ~ 0. Suppose, by
Yn
contradiction, that x has at least one null coordinate. This implies that u (xn ) ! ~ai i = 0,
x
i=1
thus contradicting
u (xn ) t>0 8n 1
In conclusion, x ~ 0. Hence, x ~ belongs to the domain of u, so by continuity we have
u (xn ) ! u (~x). As u (xn ) t for every n, we conclude that u (~ x) t, that is, x
~ 2 (u t),
as desired.
It is easily seen that, for t > 0 small enough, the intersection (u t) \ C (p; w) is non-
empty. We have

(u t) \ B (p; w) = x 2 Rn++ : u (x) t \ x 2 Rn++ : p x w


= x2 Rn++ : u (x) t \ x2 Rn+ :p x w

As (u t) is closed and x 2 Rn+ : p x w is compact since p 0, it follows that the


intersection (u t) \ B (p; w) is a compact set. The function u is thus coercive on B (p; w).
22.8. CONSUMPTION 699

22.8.2 Demand function


The solution set of the consumer problem { i.e., the optimal bundles { is arg maxx2C(p;w) u (x).
If the utility function is strictly quasi-concave, such a set is at most a singleton. Let us de-
note the unique optimal bundle by x ^ (p; w), so to highlight its dependence on the income w
and on the price vector p. In particular, such a dependence can be formalized by means of
a function D : Rn++ R+ ! Rn de ned by

D (p; w) = x
^ (p; w) 8 (p; w) 2 Rn++ R+

Function D is referred to as the consumer's demand function: it associates to each vector


(p; w) the corresponding unique optimal bundle. Of central importance in economics, the
demand function thus describes how the solution of the consumer problem varies as prices
and income change.20

The study of the demand function is usually based on methods of constrained opti-
mization that rely on di erential calculus, as we will see in Section 38.5. However, in the
important case of log-linear utility functions the demand for good i is, in view of Example
987,
w
Di (p; w) = ai (22.47)
pi
The demanded quantity of good i depends on income w, on its price pi and the relative
importance ai that the log-linear utility function gives it with respect to the other goods.
Speci cally, the larger ai is, the higher is good i's relative importance and { ceteris paribus
(i.e., keeping prices and income constant) { the higher is its demand.

Demand functions have an important property of invariance.

Proposition 1051 Given a demand function D : Rn++ R+ ! Rn , we have

D ( p; w) = D (p; w) 8 >0 (22.48)

The proof is straightforward: it is enough to note that the budget set does not change if
one multiplies prices and income by the same scalar > 0, that is

B ( p; w) = x 2 Rn+ : ( p) x w = x 2 Rn+ : p x w = B (p; w)

As simple as it may seem, this proposition underscores an important economic concept: only
relative prices matter. To see why, choose any good among those in bundle x, for example
the rst good x1 , and call it the numeraire { that is, the unit of account. By setting its price
to 1, we can express income and the other goods' prices in terms of the numeraire:
p2 pn w
1; ; :::; ;
p1 p 1 p1
By Proposition 1051, the demand remains the same:
p2 pn w
x
^ (p1 ; :::; pn ; w) = x
^ 1; ; :::; ; 8p 0
p1 p 1 p1
20
Demand functions are a rst, important, illustration of the importance of the uniqueness of the solution
of an optimization problem.
700 CHAPTER 22. OPTIMIZATION PROBLEMS

As an example, suppose that bundle x consists of di erent kinds of fruit { apples, bananas,
oranges, and so on. Assume that good 1, the numeraire, are apples. Set w ~ = w=p1 and
qi = pi =p1 for every i = 2; :::; n, so that

p2 p 3 pn w
1; ; ; :::; ; = (1; q2 ; q3 ; ::; qn ; w)
~
p1 p1 p1 p 1

In terms of the \apple" numeraire, the price of one unit of fruit 2 is of q2 apples, the price
of one unit of fruit 3 is of q3 apples, ..., the price of one unit of fruit n is of qn apples, while
the value of income is of w ~ apples. To give a concrete example, if

p 2 p3 pn w
1; ; ; :::; ; = (1; 3; 7; :::; 5; 12)
p 1 p1 p 1 p1

the price of one unit of fruit 2 is of 3 apples, the price of one unit of fruit 3 is of 7 apples,
..., the price of one unit of good n is of 5 apples, while the value of income is of 12 apples.
Any good in bundle x can be chosen as numeraire: it is merely a conventional choice
within an economy (justi ed by political reasons, availability of the good itself, etc.), con-
sumers can solve their optimization problems using any numeraire whatsoever. Such a role,
however, can also be taken by an arti cial object, money, for instance euros. In this case,
we say that the price of a unit of apples is of p1 euros, the price of a unit of fruit 2 is of p2
euros, the price of a unit of fruit 3 is of p3 euros, ..., the price of a unit of fruit n is of pn
euros, while the value of income is of w euros. It is a mere change of scale, akin to that of
measuring quantities of fruit in kilograms rather than in pounds. In conclusion, in consumer
theory money is a mere unit of account, nothing but a \veil". The choice of optimal bundles
does not vary if relative prices p2 =p1 , ..., pn =p1 , and relative income w=p1 remain unchanged.
\Nominal" price and income variations do not a ect consumers' choices.

22.9 Equilibrium analysis


22.9.1 Exchange economies
In the previous section we studied the behavior of individual consumers. But, how these
individual behaviors do interact in a market? In particular, how is connected the individual
analysis of this section with the aggregate market analysis of Chapter 14?
The simplest way to answer these important questions is through an exchange economy,
a simple yet coherent general equilibrium model. Suppose there is a nite collection I of
agents, each with a utility function ui : Rn+ ! R and with an initial endowment ! i 2 Rn
of n goods (potatoes, apples, and so on).21 The exchange economy is thus represented by
a collection E = f(ui ; ! i )gi2I , where each pair (ui ; ! i ) summarizes all economically relevant
characteristics of agent i, his \economic persona".
Assume that agents can trade, buy or sell, among themselves any quantity of the n goods
at a price vector p 2 Rn+ (say, in euros). There are no impediments to trade. Agent i has a
budget set
Bi (p; p ! i ) = x 2 Rn+ : p x p ! i
21
To ease matters, we assume that all agents have Rn
+ as their consumption set.
22.9. EQUILIBRIUM ANALYSIS 701

where the income w = p ! i now depends on prices because agent i can fund his consumption
by trading at the market price p his endowment and thus earning up to p ! i euros. The
vector z = x ! i is the vector of net trades, per each good, of agent i if he selects bundle
x.22
As a trader, agent i exchange goods at the market price. As a consumer, agent i solves
the optimization problem

max ui (x) sub x 2 Bi (p; p ! i )


x

Agents thus play two roles in this economy. Their trader role is, however, ancillary to their
consumer role: what agent i cares about is consumption, trading being only instrumental to
that.
^i (p; p ! i ). Since it only depends on the
Assume that there is a unique optimal bundle x
n n
price vector p, the demand function Di : R+ ! R+ of agent i can be de ned by

^i (p; ! i )
Di (p) = x 8p 2 Rn+

The individual demand Di has still the remarkable invariance property Di ( p) = Di (p) for
every > 0. So, nominal changes in prices do not a ect agents' consumption behavior.
Moreover, if ui : Rn+ ! R is strongly increasing, then Walras' law is easily seen to hold for
agent i, i.e.,
p Di (p) = p ! i (22.49)
We can now aggregate individual behavior. The aggregate demand function D : Rn+ ! Rn
is de ned by X
D (p) = Di (p)
i2I
Note that the aggregate demand function inherits the invariance property of individual de-
mand functions, that is,
D ( p) = D (p) 8 >0 (22.50)
So, nominal changes do not a ect the aggregate demand of goods. Condition A.2 of the
Arrow-Debreu's
P Theorem (Chapter 14) is thus satis ed.
Let ! = i2I ! i be the sum of individual endowments, so the total resources in the
economy. The aggregate supply function S : Rn+ ! Rn is given by such sum, i.e.,

S (p) = !

So, in this simpli ed exchange economy the aggregate supply function does not depend on
prices. It is a \ at" supply.
In this economy we have the weak Walras' law

p E (p) 0

where E : Rn+ ! Rn is the excess demand function de ned by E (p) = D (p) !. Indeed,
X X X
p D (p) = p Di (p) = p Di (p) p !i = p !
i2I i2I i2I
22
We say \net trade" because z may be the outcome of several market operations, here not modelled, in
which agents may have been on both sides of the market (i.e., buyers and sellers).
702 CHAPTER 22. OPTIMIZATION PROBLEMS

If Walras' law (22.49) holds for each agent i 2 I, then its aggregate version holds

p E (p) = 0

So, besides condition A.2, also conditions W.1 and W.2 used in the Arrow-Debreu's Theorem
naturally arise in this simple exchange economy.

The wellbeing of each agent i in the economy E depends on the bundle of goods xi =
(xi1 ; :::; xin ) 2 Rn that he receives, as ranked via a utility function ui : Rn+ ! R. A con-
sumption allocation of such bundles is a vector
jIj
x = x1 ; :::; xjIj 2 Rn+

Next we de ne allocations that may arise via market exchanges which are, at the same time,
voluntary and feasible.

jIj
De nition 1052 A pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a
weak Arrow-Debreu ( market) equilibrium of the exchange economy E if

(i) xi = Di (p) for each i 2 I;


P
(ii) i2I xi !.

If equality holds in (ii), we say that (p; x) is a Arrow-Debreu ( market) equilibrium.

The optimality condition (i) requires that allocation x consists of bundles that, at the
price level p, are optimal for each agent i { so, as a trader, agent i is freely trading. The
market clearing condition (ii) requires that such allocation x relies on trades that are feasible
in the market. Jointly, conditions (i) and (ii) ensure that allocation x is attained via market
exchanges that are both voluntary and feasible.
The Arrow-Debreu equilibrium notion thus aggregates individual behavior. What distin-
guishes a weak equilibrium and an equilibrium is that in the latter optimal bundles exhaust
endowments, so no resources are left unused. The next result is trivial mathematically yet
of great economic importance in that it shows that the aggregate equilibrium notions of
Chapter 14 can be interpreted in terms of a simple exchange economy.

n jIj
P 1053 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:

(i) Arrow-Debreu equilibrium if and only if (14.8) holds, i.e., q = D (p) = S (p);

(ii) weak Arrow-Debreu equilibrium if and only if (14.10) holds, i.e., q = D (p) S (p).

In view of this result, we can then establish the existence of a weak market equilibrium
of the exchange economy E using the existence results of Chapter 14, in particular Arrow-
Debreu's Theorem. For simplicity, next we consider the existence of a weak market price
equilibrium, i.e., a price p such that E (p) 0 (so, at p there is no excess demand).
22.9. EQUILIBRIUM ANALYSIS 703

Proposition 1054 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I,
the endowment ! i is strictly positive and the utility function ui is continuous and strictly
quasi-concave on a convex and compact consumption set Ai . Then, a weak Arrow-Debreu
equilibrium of the exchange economy E exists.

Proof Let i 2 I. If ui is continuous and strictly quasi-concave on the compact set Ai , by


the Maximum Theorem (it will be presented in Chapter 41) the individual demand function
Di is continuous on Rn++ . The aggregate demand D is then also continuous on Rn++ , so
condition A.1 is satis ed. Since we already noted that conditions A.2 and W.1 hold, we
conclude that a weak market price equilibrium exists by the Arrow-Debreu's Theorem.

In sum, in this simple exchange economy we have connected individual and aggregate be-
havior via an equilibrium notion. In particular, the existence of a (weak) market equilibrium
is established only via conditions on agents' individual characteristics { i.e., utility func-
tions and endowments { as methodological individualism prescribes. Indeed, to aggregate
individual behavior via an equilibrium notion is a common mode of analysis in economics.
A caveat, however, is in order: indeed, how does a market price equilibrium come about?
The previous analysis provides conditions under which it exists but says nothing about what
kind of individual choices may actually implement it. A deus ex machina, the \market", sets
price equilibria, a signi cant limitation of the analysis from a methodological individualism
viewpoint.

22.9.2 Invisible hand


The set of all consumption allocations in the economy E = f(ui ; ! i )gi2I is
( )
jIj
X
C (!) = x 2 Rn+ : xi !
i2I

All allocations in C (!) can, in principle, be attained via trading; for this reason, we call
them attainable allocations. Yet, if there exists a mighty planner { say, a pharaon { endowed
with a vector ! of goods, rather than via trading the attainable allocations may result from
an arbitrary consumption allocation selected by the pharaon, who decides which bundle each
agent can consume.
jIj
The operator f : Rn+ ! RjIj given by
f (x) = (u1 (x1 ) ; :::; u (xn )) (22.51)
represents the utility pro le across agents of each allocation. So, the image
f (C (!)) = ff (x) : x 2 C (!)g
consists of all utility pro les (u1 (x1 ) ; :::; u (xn )) that agents can achieve at attainable allo-
cations. Because of its importance, we denote by the more evocative symbol UE such image,
i.e., we set UE = f (C (!)). The subscript reminds us that this set depends on the individual
characteristics { utility functions and endowments { of the agents in the economy.
jIj
A vector x 2 Rn+ is said to be a (weak, resp.) equilibrium market allocation of
economy E if there is a non-zero price vector p such that the pair (p; x) is a (weak, resp.)
704 CHAPTER 22. OPTIMIZATION PROBLEMS

Arrow-Debreu equilibrium of the exchange economy E. Clearly, equilibrium allocations are


attainable.
Can a benevolent pharaon improve upon an equilibrium market allocation? Speci cally,
given an equilibrium market allocation x, is there an alternative attainable allocation x0 such
that f (x0 ) > f (x), i.e., such that under x0 at least an agent is strictly better o than under
allocation x and none is worse o ?
Formally, a negative answer to this question amounts to saying that equilibrium market
allocations are Pareto optimal, that is, result in utility pro les that are maximal in the set
UE , i.e., that are Pareto optima in such set (Section 2.5). Remarkably, this is indeed the
case, as the next fundamental result shows.

Theorem 1055 (First Welfare Theorem) Let E = f(ui ; ! i )gi2I be an economy in which
! 0 and, for each agent i 2 I, the utility function ui : Rn+ ! R is concave and strongly
increasing. An equilibrium allocation of economy E is (if it exists) Pareto optimal.

Thus, it is not possible to Pareto improve upon an equilibrium allocation. The First
Welfare Theorem can be viewed as a possible formalization of the famous invisible hand of
Adam Smith. Indeed, an exchange economy reaches via feasible and voluntary exchanges
an equilibrium allocation that even a benevolent pharaon would be not be able to Pareto
improve upon, i.e., he would not be able to select a di erent attainable allocation that makes
at least an agent strictly better o , yet none worse o .

Proof Suppose there exists an equilibrium allocation x 2 C (!) under a non-zero price vector
p. Suppose, by contradiction, that there exists a di erent x0 2 C (!) such that f (x0 ) > f (x).
Let i 2 I. If ui (x0i ) > ui (xi ), then p x0i > p ! i because xi is an optimal bundle. If
ui (x0i ) = ui (xi ), then p x0i p ! i ; indeed, if p x0i < p ! i then x0i is an optimal bundle
that violates the individual Walras' law, a contradiction because ui is strongly increasing
and Ai P is closed under majorization (Proposition 996). Being f (x0 ) > f (x), we conclude
P
that p 0 0 0
i2I xi > p !. On the other hand, from x 2 C P (!) it follows that p !P p i2I xi
because p > 0. We thus reached the contradiction p 0 0
i2I xi > p ! p i2I xi . This
proves that x is a Pareto optimum.

The First Welfare Theorem establishes a property of equilibrium allocations without


worrying about their existence. To address this further issue, it is enough to combine this
theorem and Proposition 1054.

22.10 Least squares


The method of least squares is of central importance in applied mathematics. As all great
ideas, it can be analyzed from multiple perspectives, as we will see in this section.

22.10.1 Linear systems


Let us start with a linear algebra approach. A linear system of equations

A x = b (22.52)
(m n)(n 1) m 1
22.10. LEAST SQUARES 705

may not have a solution. This is often the case when a system has more equations than
unknowns, i.e., m > n.
When a system has no solution, there is no vector x^ 2 Rn such that A^ x = b. That said,
one may wonder whether there is a surrogate for a solution, a vector x 2 Rn that minimizes
the approximation error
kAx bk (22.53)
that is, the distance between the vector of constants b and the image Ax of the linear
operator F (x) = Ax. The error is null in the fortunate case where x solves the system:
Ax b = 0. In general, the error (22.53) is positive as the norm is always positive.
By Proposition 978, to minimize the approximation error is equivalent to minimizing the
quadratic transformation kAx bk2 of the norm. This justi es the following de nition.

De nition 1056 A vector x 2 Rn is said to be a least squares solution of system (22.52)


if it solves the optimization problem

min kAx bk2 sub x 2 Rn (22.54)


x

The least squares solution in an approximated solution of the linear system, it is the best
we can do to minimize the distance between vectors Ax and b in Rm . As k k2 is a sum
of squares, to nd the least squares solution by solving the optimization problem (22.54)
is called least squares method . The fathers of this method are Gauss and Legendre, who
suggested it to analyze astronomical data at the beginning of the nineteenth century.

As we remarked, when it exist the linear system's solution is also a least squares solution.
To be a good surrogate, a least squares solution should exist also when the system has no
solution. In other words, the more general are the conditions ensuring the existence of
solutions of the optimization problem (22.54), the more useful is the least squares method.
The following fundamental result shows that such solutions do indeed exist and are unique
under the hypothesis that (A) = n. In the more relevant case where m > n, it amounts to
requiring that the matrix A has maximum rank. The result relies on Tonelli's Theorem for
existence and on Theorem 1032 for uniqueness.

Theorem 1057 Let m n. The optimization problem (22.54) has a unique solution if
(A) = n.

Later in the book we will see the form of this unique solution (Sections 24.4 and 31.5.1).
To prove the result, let us consider the function g : Rn ! R de ned by

g (x) = kAx bk2

so that problem (22.54) is equivalent to the optimization problem:

max g (x) sub x 2 Rn (22.55)


x

The following lemma illustrates the remarkable properties of the objective function g which
allow us to use Tonelli's Theorem and Theorem 1032. Note that condition (A) = n is
equivalent to requiring injectivity of the linear operator F (x) = Ax (Corollary 689).
706 CHAPTER 22. OPTIMIZATION PROBLEMS

Lemma 1058 If (A) = n, then g is supercoercive and strictly concave.

Proof Let us start by showing that g is strictly concave. Set x1 ; x2 2 Rn and 2 (0; 1).
Condition (A) = n implies that F is injective, hence F (x1 ) 6= F (x2 ). Therefore,

kF ( x1 + (1 ) x2 ) bk2 = k F (x1 ) + (1 ) F (x2 ) ( b + (1 ) b)k2


= k (F (x1 ) b) + (1 ) (F (x2 ) b)k2
< kF (x1 ) bk2 + (1 ) kF (x2 ) bk2 (22.56)

where the strict inequality follows from the strict convexity of k k2 .23 So,

g ( x1 + (1 ) x2 ) = kF ( x1 + (1 ) x2 ) bk2
> kF (x1 ) bk2 (1 ) kF (x2 ) bk2
= g (x1 ) + (1 ) g (x2 )

which implies strict concavity of g.


Let us show that g is coercive. As F is injective, its inverse F 1 : Im F ! Rn exists
and is continuous (Proposition 671). Furthermore, the function f : Rm ! R de ned by
f (y) = ky bk2 is supercoercive. Indeed:

kyk = ky b + bk ky bk + kbk

hence
kyk ! +1 =) ky bk ! +1 =) f (y) = ky bk2 ! 1
Set Bt = fy 2 Im F : f (y) tg = (f t) \ Im F for t 2 R. As f is supercoercive and
continuous, by Proposition 1019 f is coercive on the closed set Im F and the sets Bt =
(f t) \ Im F are compact for every t. Furthermore

(g t) = fx 2 Rn : f (F (x)) tg = fx 2 Rn : F (x) 2 Bt g = F 1
(Bt )

Since F 1 is continuous and Bt is compact, by Proposition 597, F 1 (Bt ) is compact. It


follows that (g t) is compact for every t, which implies that g is supercoercive (Proposition
1016).

Proof of Theorem 1057 In light of the previous lemma, problem (22.55), and so problem
(22.54), has a solution thanks to Tonelli's Theorem because g is coercive. Such a solution is
unique thanks to Theorem 1032 because g is strictly concave.

22.10.2 Descriptive statistics


Let us now consider the least squares method from a more statistical perspective. Suppose
a farmer must choose how much fertilizer x (input) to use for the next crop of potatoes y
(output). He does not know the production function f : R+ ! R associating to each level of
input x the corresponding level of output y, so that, given an output objective y, he cannot
simply compute the inverse f 1 (y).
Pn
23
Indeed, the function kxk2 = i=1 x2i is strictly convex, as we already noted for n = 2 in Example 816.
22.10. LEAST SQUARES 707

However, the farmer does have data on the pairs (xi ; yi ) of input and output over the
previous m years, that is, for i = 1; :::; m. The farmer wishes to nd the linear production
function f (x) = x, with 2 R, that better ts his data. Linearity is assumed for the sake
of simplicity: once one becomes familiar with the method, more complex formulations of f
can be considered.
It is still unclear what \better ts his data" means precisely. This is, indeed, the crux of
the matter. According to the least squares method, it consists in requiring the function to
be f (x) = x, where the coe cient minimizes

m
X
(yi xi )2
i=1

that is, the sum of the squares of the errors yi xi that are made by using the produc-
tion function f (x) = x to evaluate output. Therefore, one is faced with the following
optimization problem
m
X
min (yi xi )2 sub 2R
i=1

By denoting by X = (x1 ; :::; xm ) and Y = (y1 ; :::; ym ) the data vectors regarding input and
output, the problem can be restated as

min k X Y k2 sub 2R (22.57)

which is a special case n = 1 of the optimization problem (22.54) with the notation A = X,
x = and b = Y .24
By Theorem 1057, problem (22.57) has a unique solution 2 R because the rank
condition is trivially satis ed when n = 1. The farmer can use the production function

f (x) = x

in order to decide how much fertilizer to use for the next crop, for whichever level of output
he might choose. Given the data he has at hand and the (possibly, simplistic) choice of a
linear production function, the least squares method suggests the farmer that this is the
production function that best ts the available data.

24
Unfortunately, the notation we have used, which is standard in statistics, is not consistent with that of
Problem (22.54). In particular, here plays the role of x in (22.54). The only bene t of inconsistent notation
is that it provides a litmus test to check whether a topic has been understood.
708 CHAPTER 22. OPTIMIZATION PROBLEMS

8
y
7

1
0 O 1 2 3 4 5 6 7x

Such a procedure can be used in the analysis of data regarding any pair of variables.
The independent variable x, referred to as regressor, is not generally unique. For example,
suppose the same farmer needs n kinds of input x1 , x2 , ..., xn { that is, n regressors { to
produce a quantity y of output. The data collected by the farmer is thus
X1 = (x11 ; x12 ; :::; x1m )
X2 = (x21 ; x22 ; :::; x2m )

Xn = (xn1 ; xn2 ; :::; xnm )


where xij is the quantity of input i used in year j. The vector Y = (y1 ; :::; ym ) denotes the
output, as before. The linear production function is now a function of several variables, that
is, f (x) = x with x 2 Rn . The data matrix
2 3
x11 x21 xn1
6 x12 x22 xn2 7
6 7
X = X1 X2 T T T
Xn = 6 6 7 (22.58)
m n 7
4 5
x1m x2m xnm
has the vectors X1 , X2 , ..., Xn as columns, so that the latter contain data on each regressor
throughout the years.
The least squares method leads to
min kX Y k2 sub 2 Rn

which is the optimization problem (22.54) with the notation A = X, x = and b = Y .


If (X) = n, Theorem 1057 says that problem (22.57) has a unique solution 2 Rn . The
linear production function which the farmer extracts from the available data is f (x) = X ,
where the vector of coe cients = ( 1 ; :::; n ) assigns to each regressor xi the explanatory
power i prescribed by the least squares method.
22.11. OPERATOR OPTIMA 709

22.11 Operator optima


22.11.1 Operator optimization problems
So far we considered objective functions f : A Rn ! R that take on scalar values. In some
important applications, however, the objective function is an operator f : A Rn ! Rm
that takes on vectors as values. If we write the operator f as a m-tuple (f1 ; :::; fm ) of vector
functions fi : A Rn ! R, it becomes clear that each alternative x 2 A is now evaluated
through multiple criteria (f1 (x) ; :::; fm (x)). In a consumer problem, consumers may for
example evaluate bundles according to m criteria, each represented by a function fi (for
instance, for a car it might matter both the color and the speed, taken as indicators of
design and performance, respectively). In a planner problem, x can be an allocation of some
resources among the m agents of an economy; the planner objective function f is an operator
that assesses an allocation through the utility function fi of each agent i (cf. Section 22.9).

To address an optimization problem with operators as objective functions, we need the


notion of Pareto optimum (Section 2.5).
De nition 1059 Let f : A Rn ! Rm be an operator and C a subset of A. An element
x
^ 2 C is called Pareto optimizer of f on C if there is no x 2 C such that
f (x) > f (^
x)
The value f (^
x) of the function at x
^ is called Pareto value of f on C.
Because of the planner example, sometimes f is called the social objective function and
C the social choice set. Note that a Pareto value of the objective function f on the choice
set C is a Pareto optimum of the set f (C) = ff (x) : x 2 Cg. Unlike the maximum value,
which is unique, there are in general multiple Pareto values. The collection of all such values
is called Pareto frontier of f on C (in accordance with the terminology of Section 2.5).
We will write an operator optimization problem as
opt f (x) sub x 2 C (22.59)
x

A vector x^ 2 C solves this problem if it is a Pareto optimizer of f on C. We denote by


arg optx2C f (x) the set of all solutions. When m = 1, we get back to the maximization
problem (22.2).25 Problems (22.59) are often called vector maximization problems.
To study operator optimization problems it is often useful a scalarization of the objective
function. Speci cally, consider the scalar function W : A Rn ! R de ned by
m
X
W (x) = i fi (x)
i=1
P
where denotes a strictly positive and normalized element of Rm , i.e., 0 and m i=1 i =
1. The vector can be interpreted as a vector of weights. Again in view of the planner
problem, in which i would \weight" agent i, the function W is sometimes called (social )
welfare function.
The next result is a rst illustration of the usefulness of the scalarization provided by
welfare functions.
25
As the reader can check, a dual notion of Pareto optimality would lead to minimum problems.
710 CHAPTER 22. OPTIMIZATION PROBLEMS

Lemma 1060 We have arg maxx2C W (x) arg optx2C f (x) for every .
P
Proof Fix 0, with m i=1 i = 1. Let x ^ 2 arg maxx2C W (x). The point x ^ is clearly a
Pareto optimizer. Otherwise, there exists x 2 C such that f (x) > f (^
x). But, being 0,
this implies W (x) = f (x) > f (^
x) = W (^ x), a contradiction.

This lemma implies the next Weierstrass-type result that ensures the existence of solu-
tions for an operator optimization problem.

Proposition 1061 An operator f : A Rn ! Rm which is continuous on a compact subset


K of A admits (at least) an optimizer in K, that is, there exists x
^ 2 K such that there is no
x 2 K for which f (x) > f (^
x).

Proof The function W is continuous if the operator f is continuous. By Weierstrass'


Theorem, arg maxx2K W (x) =
6 ;. Then, by the previous lemma arg optx2K f (x) 6= ;.

Scalarization is most e ective when


[
arg opt f (x) = arg max W (x) (22.60)
x2C x2C

In this case, by suitably choosing the vector of weights we can retrieve all optimizers. The
next examples show that this may, or may not, happen.

Example 1062 (i) Consider f : [0; 1] ! R2 given by f (x) = (ex ; e x ). All the points of
the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = ex + (1 ) e x , where 2 (0; 1). Its maximizer is x ^ = 0 if (1 )= e
and x^ = 1 otherwise. Hence, only the two Pareto optimizers f0; 1g can be found through
scalarization. (ii) Consider f : [0; 1] ! R2 given by f (x) = x2 ; x2 . Again, all the points
of the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = x2 (1 ) x2 = (2 1) x2 , where 2 (0; 1). We have
8
>
> f0g if < 21
<
arg max W (x) = [0; 1] if = 12
x2C >
>
:
f1g if > 12

and so (22.60) holds. In this case, all Pareto optimizers can be retrieved via scalarization.N

22.11.2 Planner's problem


Considered again a planner, the pharaon, who has to allocate at his discretion an overall
endowment ! 2 Rn+ among a nite set I of agents (Section 22.9). The set of attainable
consumption allocations is the set
( )
X
n jIj
C (!) = x 2 R+ : xi ! (22.61)
i2I
22.11. OPERATOR OPTIMA 711

jIj
Given f : Rn+ ! RjIj de ned in (22.51), i.e., f (x) = (u1 (x1 ) ; :::; u (xn )), the operator
optimization problem of the planner is

opt f (x) sub x 2 C (!) (22.62)


x

The solutions of this problem, i.e., the Pareto optimizers, are called Pareto optimal allocations
(in accordance with the terminology of the First Welfare Theorem).
In view of the previous
Pm discussion, the planner can tackle his problem through a welfare
function W (x) = i=1 i ui (xi ) and the associated optimization problem

max W (x) sub x 2 C (!) (22.63)


x

Unless (22.60) holds, some Pareto optimizers will be missed by a planner that relies on this
scalar optimization problem, whatever he chooses to scalarize with.

Example 1063 Consider an exchange economy with two agents and one good. Assume that
the total amount of the good in the economy is ! > 0. For the sake of simplicity, assume
that the two agents have the same preferences over this single good. In this way, they share
the same utility function, for example a linear u : R+ ! R de ned by

u1 (x) = u2 (x) = x

A planner has to allocate the total endowment ! to the two agents. In other words, he has to
choose an attainable vector x = (x1 ; x2 ) 2 R2+ , that is, such that x1 + x2 ! where x1 will
be the share of ! allotted to the rst agent and x2 will be share of the second agent. Indeed,
every agent can only receive a positive quantity of the good, x 2 R2+ , and the planner cannot
allocate to the agents more than what is available in the economy, x1 + x2 !. Here the
collection (22.61) of attainable allocations is

C (!) = x 2 R2+ : x1 + x2 !

De ne f : R2+ ! R2+ by
f (x1 ; x2 ) = (x1 ; x2 )
In other words, the function f associates to each allocation x the utility pro le (u1 (x1 ) ; u2 (x2 )) 2
R2+ . This latter vector represents the utility of the two agents coming from the feasible al-
location x. The planner operator optimization problem (22.59) is here

opt f (x) sub x 0 and x1 + x2 !


x

It is easy to check that

arg opt f (x) = x 2 R2+ : x1 + x2 = !


x2C(!)

that is, the allocations that exhaust total resources are the Pareto optimizers of f on C.
Since agents' utility functions are linear, the Pareto frontier is x 2 R2+ : x1 + x2 = ! . N
712 CHAPTER 22. OPTIMIZATION PROBLEMS

Example 1064 If in the previous example we have two agents and two goods, we get back
to the setup of the Edgeworth box (Section 2.5). Recall that we assumed that there is a unit
of each good to split among the two agents (Albert and Barbara), so ! = f1; 1g. They have
the same utility function ui : R2+ ! R de ned by
p
ui (xi1 ; xi2 ) = xi1 xi2

The collection (22.61) of attainable allocations becomes26


n o
2
C (!) = x 2 R2+ : x11 + x21 1 and x11 + x12 1

2 p p
De ne f : R2+ ! R2+ by f (x1 ; x2 ) = ( x11 x12 ; x21 x22 ). The planner operator optimiza-
tion problem (22.59) is here

opt f (x) sub x 0, x11 + x21 1 and x11 + x12 1


x

By Proposition 62,
n o
2
arg opt f (x) = x 2 R2+ :0 x11 = x12 = 1 x21 = 1 x22 1
x2C(!)

that is, the allocations that are symmetric { i.e., there is the same quantity of each good {
and that exhaust total resources are the Pareto optimizers of f on C. The Pareto frontier is
p p
( x11 x12 ; x21 x22 ) 2 R2+ : 0 x11 = x12 = 1 x21 = 1 x22 1

O.R. As the First Welfare Theorem suggests, there is a close connection between Pareto op-
timal allocations and equilibrium allocations that would arise if agents were given individual
endowments and could trade among them under a price vector. We do not further discuss
this topic, which readers will study in some microeconomics course. Just note that, through
such connection, the possible equilibrium allocations may be found by solving the operator
optimization problem (22.62) or, under condition (22.60), the standard optimization problem
(22.63). H

22.12 Coda: cuneiform functions


Strict quasi-concavity is the most standard condition that ensures the uniqueness of solutions
of optimization problems (Theorem 1032). It is, however, a su cient condition that requires
the convexity of the choice set, and so it is for example useless for nite choice sets. Let us
consider the following class of functions.27 Here A is any set.

De nition 1065 A real-valued function f : A ! R is said to be cuneiform if, for ev-


ery pair of distinct elements x; y 2 A, there exists an element z 2 A such that f (z) >
min ff (x) ; f (y)g.
26
We denote by xi = (xi1 ; :::; xin ) 2 Rn a bundle of goods of agent i.
27
Our terminology is not standard.
22.13. ULTRACODA: NO ILLUSIONS 713

It is an ordinal property: if f : A ! R is cuneiform and g : Im f ! R is strictly


increasing, then the composition g f : A ! R is cuneiform as well. The next example shows
two important classes of cuneiform functions.

Example 1066 (i) Strictly quasi-concave functions f : C ! R de ned on convex sets C


of Rn are cuneiform. Indeed, given any two distinct elements x; y 2 C, by setting z =
(1=2) x + (1=2) y we have

1 1 1 1
f (z) = f x+ y > f (x) + f (y) min ff (x) ; f (y)g
2 2 2 2

(ii) Injective functions f : A ! R are cuneiform. Let x; y 2 A be any two distinct elements
of A. Since injectivity implies f (x) 6= f (y), without loss of generality we can assume that
f (x) > f (y). So, x itself can play the role of z in De nition 1065. An important class
of cuneiform functions are, thus, the strictly monotone functions (increasing or decreasing)
de ned on any subset { nite or not { of the real line. N

The next result shows that being cuneiform is a necessary and su cient condition for the
uniqueness of solutions. In view of the last example, this result generalizes the uniqueness
result that we established for strictly quasi-concave functions.

Proposition 1067 A function f : A ! R has at most one maximizer if and only if it is


cuneiform.

Proof \If". Let f : A ! R be cuneiform. We want to show that there exists at most a
maximizer in A. Suppose, by contradiction, that there exist in A two such points x0 and x00 ,
i.e., f (x0 ) = f (x00 ) = maxx2A f (x). Since f is cuneiform, there exists z 2 A such that

f (z) > min f x0 ; f x00 = f x0 = f x00 = max f (x)


x2A

which contradicts the optimality of x0 and x00 . \Only if". Suppose that there exists at most
one maximizer in A. Let x0 and x00 be any two distinct elements in A. If there are no
maximizers, then in particular x0 and x00 are not maximizers; so, there exists z 2 A such
that f (z) > min ff (x0 ) ; f (x00 )g. We conclude that f is cuneiform. On the other hand, if
there is one maximizer, it is easy to check that it plays the role of z in De nition 1065. Also
in this case f is cuneiform.

Though for brevity we omit details, it is easy to see that there is a dual notion in which the
inequality in the previous de nition is reversed and the previous result holds for minimizers.

22.13 Ultracoda: no illusions


Solving optimization problems is, in general, a quite complex endeavor, even when a limited
number of variables is involved. In this section we will present an example of an optimization
problem whose solution is as complicated as proving Fermat's Last Theorem.28 The latter,
which was nally proven after three centuries of unfruitful e orts, states that, for n 3,
28
Based on Murty and Kabadi (1987).
714 CHAPTER 22. OPTIMIZATION PROBLEMS

there do not exist any three positive integers x, y and z such that xn + y n = z n (Section
1.3.2)
Let us consider the optimization problem

min f (x; y; z; n) sub (x; y; z; n) 2 C


x;y;z;n

where the objective function f : R3 N ! R is given by

f (x; y; z; n) = (xn + y n z n )2 + (1 cos 2 x)2 + (1 cos 2 y)2 + (1 cos 2 z)2

and the choice set is C = (x; y; z; n) 2 R3 N : x; y; z 1; n 3 .


It is an optimization problem in four variables, one of which, n, is discrete, thus not
making it possible to use di erential and convex methods. At rst sight this might seem a
di cult problem, but not intractable. Let us have a closer look. We have f 0 because
f is a sum of squares. In particular,

inf f (x; y; z; n) = 0
(x;y;z;n)2C

p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. Indeed, limn!1 n 2 = 1
(Proposition 346).
The minimum value is thus 0. The question is whether there is a solution of the problem,
that is, a vector (^
x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of squares, this
requires that in such a vector they all be null:

^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0

The last three equalities imply that the points x ^, y^ and z^ are integers.29 In order to belong
to the set C, they must be positive. Therefore, the vector (^ x; y^; z^; n
^ ) 2 C must be made
up of three positive integers such that x^n^ + y^n^ = z^n^ for n
^ 3. This is possible if and only
if Fermat's Last Theorem is false. Now that we know it to be true, we can conclude that
this optimization problem has no solution. We could not have made such a statement before
1994: till then, it would have been unclear whether this optimization problem had a solution.
Be it as it may, solving this optimization problem, which only has four variables, amounts
to solving one of the most well-known problems in mathematics.

29
Reall that cos 2x = 1 if and only if x is an integer.
Chapter 23

Semicontinuous optimization

In some optimization problems, continuity turns out to be a too strong property and a weak-
ened notion of continuity, called semicontinuity, comes to play a key role. Fortunately, a more
general version of Tonelli's Theorem continues to hold. We rst introduce semicontinuity,
and then present this ultimate version of Tonelli's Theorem.1

23.1 Semicontinuous functions


23.1.1 De nition
Recall that a function f : A Rn ! R is continuous at a point x0 2 A when, for each " > 0,
there exists " > 0 such that2

kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A (23.1)

If in this de nition we keep the second inequality, we have the following weakening of
continuity.

De nition 1068 A function f : A Rn ! R is said to be upper semicontinuous at x0 2 A


if, for each " > 0, there exists " > 0 such that

kx x0 k < " =) f (x) < f (x0 ) + " 8x 2 A

A function that is upper semicontinuous at each point of a set E is called upper semicon-
tinuous on E. The function is called upper semicontinuous when it is upper semicontinuous
at all the points of its domain.3

Upper semicontinuity has a dual notion of lower semicontinuity, with f (x) > f (x0 ) "
in place of f (x) < f (x0 ) + ".

Proposition 1069 A function f : A Rn ! R is both upper and lower semicontinuous at


a point x0 2 A if and only if it is continuous at x0 .
1
This chapter is for coda readers.
2
Clearly, the sandwich f (x0 ) " < f (x) < f (x0 ) + " amounts to jf (x0 ) f (x)j < ".
3
Semicontinuity has been introduced by Rene Baire, who studied it in detail in his 1899 piece.

715
716 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

Proof The \if" is obvious. As to the converse, assume that f is both upper and lower
semicontinuous at x0 2 A. Fix " > 0. There exist 0" ; 00" > 0 such that, for each x 2 A,
0 00
kx x0 k < " =) f (x) < f (x0 ) + " and kx x0 k < " =) f (x) > f (x0 ) "
0 00
So, by taking " = min "; " , we have
kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A
In view of (23.1), we conclude that f is continuous at x0 , as desired.

The study of the two forms of semicontinuity, upper and lower, is analogous: indeed, it
is easy to see that f is upper semicontinuous if and only if f is lower semicontinuous. For
this reason, we will focus on upper semicontinuity because it is more relevant for the study
of maximizers.

The next result presents the sequential characterization of upper semicontinuity.

Proposition 1070 A function f : A Rn ! R is upper semicontinuous at a point x0 2 A


if and only if lim sup f (xn ) f (x0 ) for each sequence fxn g A such that xn ! x0 .

By Proposition 552, for continuous functions we have lim f (xn ) = f (x0 ), so this sequen-
tial characterization of semicontinuous functions helps to understand to what extent upper
semicontinuity generalizes continuity. For lower semicontinuous, we have the dual condition
lim inf f (xn ) f (x0 ).4

Proof Let f be upper semicontinuous at the point x0 . Let fxn g be such that xn ! x0 . Fix
" > 0. There is n" 1 such that kxn x0 k < " for all n n" . By De nition 1068, we then
have f (xn ) < f (x0 ) + " for each n n" . Therefore, lim sup f (xn ) f (x0 ) + ". Since this
is true for each " > 0, we conclude that lim sup f (xn ) f (x0 ).
Suppose now that lim sup f (xn ) f (x0 ) for each sequence fxn g such that xn ! x0 .
Suppose, per contra, that f is not upper semicontinuous at x0 . Therefore, there exists
" > 0 such that for each > 0 there is x with kx x0 k < and f (x ) f (x0 ) + ".
Setting = 1=n, it follows that for each n there exists xn such that kxn x0 k < 1=n and
f (xn ) f (x0 ) + ". In this way we can construct a sequence fxn g such that xn ! x0
and f (xn ) f (x0 ) + " for each n. Therefore, lim inf f (xn ) f (x0 ) + " > f (x0 ), which
contradicts lim sup f (xn ) f (x0 ). We conclude that f is upper semicontinuous at x0 .

Example 1071 The function f : [0; 1] ! R de ned by


(
1 if x = 0
f (x) =
x if x 2 (0; 1]
is upper semicontinuous. Indeed, it is continuous { so, upper semicontinuous { at each
x 2 (0; 1]. As to the origin x = 0, consider fxn g [0; 1] with xn ! 0. For each such xn we
have f (xn ) 1 and therefore lim sup f (xn ) 1 = f (0). By Proposition 1070, f is upper
semicontinuous also at 0. N
4
Being lim sup f (xn ) lim inf f (xn ), a function is then both upper and lower semicontinuous at x0 if and
only if lim sup f (xn ) = f (x0 ) = lim inf f (xn ), i.e., if and only if lim f (xn ) = f (x0 ) (cf. Proposition 412).
This con rms from a sequential angle that a function is both upper and lower semicontinuous if and only if
is continuous.
23.1. SEMICONTINUOUS FUNCTIONS 717

Example 1072 Recall that the function f : R ! R given by (13.2), i.e.,


8
>
> x for x < 1
<
f (x) = 2 for x = 1
>
>
:
1 for x > 1

has a removable discontinuity at x0 = 1, as its graph shows:

3
y

2 2

1 1

0
O 1 x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

The function is upper semicontinuous at x0 = 1. In fact, let fxn g R with xn ! 1. For every
such xn we have f (xn ) 1 and therefore lim sup f (xn ) 1 < 2 = f (1). By Proposition
1070, f is then upper semicontinuous at x0 (so, it is upper semicontinuous because it is
continuous at each x 6= x0 ). N

This last example shows that, in general, if a function f has a removable discontinuity
at a point x0 { i.e., the limit limx!x0 f (x) exists but it is di erent from f (x0 ) { then
at x0 is either upper semicontinuous if f (x0 ) > limx!x0 f (x) or lower semicontinuous if
f (x0 ) < limx!x0 f (x).

Example 1073 Recall that the function f : R ! R given by (13.5), i.e.,


(
2 if x 1
f (x) = (23.2)
x if x < 1

has a non-removable jump discontinuity at x0 = 1. However, it is upper semicontinuous at


x0 . In fact, let fxn g R with xn ! 1. For every such xn we have f (xn ) 2 and therefore
lim sup f (xn ) 2 = f (1). By Proposition 1070, f is upper semicontinuous also at 1 (so, it
is upper semicontinuous because it is continuous at each x 6= x0 ). N

In general, the reader can verify that an increasing function f : R ! R of a single


variable is upper semicontinuous at x0 if and only if it is continuous at x0 from the right,
that is, limx!x+ f (x) = f (x0 ), while it is lower semicontinuous at x0 if and only if it is
0
718 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

there continuous from the left, that is, limx!x f (x) = f (x0 ). For example, let us modify
0
the function (23.2) at x0 = 1, so to have
(
2 if x > 1
f (x) =
x if x 1

It is now lower semicontinuous at x0 = 1.

23.1.2 Properties
The upper contour sets of continuous functions are closed (Example 1004 and Lemma 1004).
Remarkably, this property is still true for upper semicontinuous functions, so this weaker
notion of continuity preserves this important property.

Proposition 1074 Let f : A Rn ! R be upper semicontinuous on a closed subset C of


A. Then, the sets (f t) \ C are closed for every t 2 R.

Proof Let f be upper semicontinuous on C. Fix t 2 R. We want to show that (f t) \ C


is closed. Let fxn g (f t) \ C with xn ! x 2 Rn . By Theorem 174, it is enough to show
that x 2 (f t) \ C. Note that x 2 C because C is closed. Moreover, f (xn ) t for each
n 1. Since f is upper semicontinuous, by Proposition 1070 we have

t lim sup f (xn ) f (x)

Therefore x 2 (f t). We conclude that x 2 (f t) \ C, as desired.

Example 1075 Given a closed subset C of Rn , let 1C : Rn ! R be de ned by


(
1 if x 2 C
1C (x) =
0 if x 2 =C

In words, the function 1C takes on value 1 on C and 0 elsewhere. Though not continuous, it
is upper semicontinuous. Indeed, let x0 2 Rn . If x0 2 C, then f (x0 ) f (x) for all x 2 Rn ,
so it trivially holds that lim sup f (xn ) f (x) whenever xn ! x. If x0 2 = C, then it belongs
to the open set C c . Given any " > 0, if xn ! x then there is n" 1 such that xn 2 C c , so
f (xn ) = 0, for all n n" . Thus, lim f (xn ) = f (x0 ) = 0. By Proposition 1070, we conclude
that f is upper semicontinuous since x0 was arbitrarily chosen. Its upper contour sets:
8 n
>
> R if t 0
<
(1C t) = C if t 2 (0; 1]
>
>
:
; if t>1

are closed for each t 2 R, in accordance with the last result. N

From the previous result it follows that also Proposition 1009 continues to hold under
upper semicontinuity.
23.1. SEMICONTINUOUS FUNCTIONS 719

Proposition 1076 An upper semicontinuous function f : A Rn ! R is coercive on every


compact and non-empty subset C A.

Proof Let C A be compact. If f : A Rn ! R is upper semicontinuous on C, Proposition


1074 implies that every set (f t) \ C is closed. Since a closed subset of a compact set is,
in turn, compact, it follows that every (f t) \ C is compact. This shows that f is coercive
on C.

A nal important property is the stability of upper semicontinuity with respect to in ma


and suprema of functions.

Proposition 1077 Given a family ffi gi2I of functions fi : A Rn ! R upper semicontin-


uous at x0 2 A, de ne h : A Rn ! ( 1; +1] and g : A Rn ! [ 1; +1) by

g (x) = inf fi (x) and h (x) = sup fi (x)


i2I i2I

Then, the function g is upper semicontinuous at x0 2 A, while the function h is upper


semicontinuous at x0 2 A provided I is nite.

In words, upper semicontinuity is preserved by in ma over sets of functions of any car-


dinality, while is preserved under suprema only over nite sets of functions. In this case, we
can actually write h (x) = maxi2I fi (x).
The last example showed that there is a tight connection between upper semicontinuous
functions and closed sets. It is therefore not surprising that the stability of upper semi-
continuous functions relative to in ma and suprema reminds that of closed sets relative to
intersections and unions, respectively.

Example 1078 The union of the closed sets An = [ 1 + 1=n; 1 1=n] is the open interval
( 1; 1), as noted after Corollary 166. The supremum of the in nitely many upper semicon-
tinuous functions
fn (x) = 1[ 1+ 1 ;1 1 ] (x)
n n

is the lower, but not upper, semicontinuous function

h (x) = sup 1[ 1
1+ n ;1 1
] (x) = 1( 1;1) (x)
n2N n

Proof of Proposition 1077 Let x0 2 A. Given " > 0, there exists i 2 I such that
fi (x0 ) < g (x0 ) + ". Since fi is upper semicontinuous, there exists " > 0 such that

kx x0 k < " =) fi (x) < fi (x0 ) + " 8x 2 A

So,
kx x0 k < " =) g (x) fi (x) < fi (x0 ) + " < g (x0 ) + 2" 8x 2 A
that is,
kx x0 k < " =) g (x) < g (x0 ) + 2" 8x 2 A
720 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

This proves that g is upper semicontinuous at x0 2 A. We leave to the reader the proof that
h is upper semicontinuous at x0 2 A when I is nite.

Dual properties hold for lower semicontinuous functions: lower semicontinuity is pre-
served by suprema over sets of functions of any cardinality, while is preserved under in ma
only over nite sets of functions. Now the analogy is with the stability properties of open
sets relative to intersections and unions. Indeed, a tight connection { dual to the established
in Example 1075 { is easily seen to exist for lower semicontinuous functions and open sets.
In view of Proposition 1069, we then have the following important corollary about the
\ nite" stability of continuous functions.

Corollary 1079 Given a nite family ffi gni=1 of functions fi : A Rn ! R continuous at


x0 2 A, the functions g; h : A Rn ! R de ned by

g (x) = min fi (x) and h (x) = max fi (x)


i=1;:::;n i=1;:::;n

are both continuous at x0 2 A.

In ma and suprema of in nitely many continuous functions are, in general, no longer


continuous. This fragility of continuity is a main reason for the importance of lower and
upper semicontinuity.

23.2 The (almost) ultimate Tonelli


Proposition 1076 shows that upper semicontinuity is the natural form of continuity to use for
coercivity. Not surprisingly, then, we can now state and prove a version of Tonelli's Theorem
in which upper semicontinuity replaces continuity, thus substantially broadening the scope
of the theorem.

Theorem 1080 (Tonelli) A function f : A Rn ! R which is coercive and upper semi-


continuous on a subset C of A admits (at least) a maximizer in C, that is, there exists a
x
^ 2 C such that
f (^
x) = max f (x)
x2C

If, in addition, C is closed, then arg maxx2C f (x) is compact.

The proof is a slight modi cation of the rst proof of Weierstrass' Theorem, which essen-
tially still goes through under upper semicontinuity (a further sign that upper semicontinuity
is the relevant notion of continuity to establish the existence of maximizers).

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x), that is, = sup f ( ). By Lemma
1001, there exists a sequence fan g f ( ) such that an ! . Let fxn g be such that
an = f (xn ) for every n 1. Since is compact, the Bolzano-Weierstrass' Theorem yields a
subsequence fxnk g fxn g that converges to some x ^ 2 , that is, xnk ! x^ 2 . Since fan g
converges to , also the subsequence fank g converges to . Since f is upper semicontinuous,
it follows that
= lim ank = lim f (xnk ) f (^ x)
k!1 k!1
23.3. THE ORDINAL TONELLI 721

Here the penultimate inequality is due to upper semicontinuity. So, = f (^ x) and we thus
conclude that f (^ x) f (x) for every x 2 . At the same time, if x 2 C we have
f (x) < t and so f (^ x) t > f (x). It follows that f (^
x) f (x) for every x 2 C, that is,
f (^
x) = maxx2C f (x).
It remains to show that arg maxx2C f (x) is compact if C is closed. Since arg maxx2C f (x)
, it is enough to show that arg maxx2C f (x) is closed. Clearly,

arg max f (x) = f max f (x) \ C


x2C x2C

and so arg maxx2C f (x) is closed by Proposition 1074, as desired.

Example 1081 (i) The function f : R ! R given by


(
2 if x = 0
f (x) =
e kxk if x 6= 0

is coercive and upper semicontinuous. Thanks to Tonelli's Theorem, it has a maximizer in


C = R. Note that, instead, this function has no minimizers (here Weierstrass' Theorem does
not hold because the function is not continuous and R is not compact).
(ii) Consider the upper semicontinuous function f : [0; 1] ! R of Example 1071. By
Proposition 1076, this function is coercive on its compact domain [0; 1], so by Tonelli's
Theorem it has at least a maximizer. Note that also this function has no minimizers (here
Weierstrass' Theorem cannot be applied because the function is not continuous). N

The compactness of the set of maximizers makes coercivity a necessary condition for
global optimality for upper semicontinuous objective functions.

Corollary 1082 A function f : A Rn ! R which is upper semicontinuous on a closed


subset C of A is coercive on C if and only if arg maxx2C f (x) is non-empty and compact.

If arg maxx2C f (x) is not compact, coercivity is no longer a necessary condition for
optimality, as the constant function on Rn shows (recall the discussion after Example 1014).

Proof \If". Let arg maxx2C f (x) be non-empty and compact. Since arg maxx2C f (x) =
(f maxx2C f (x)) \ C, the function f is coercive on C. \Only if". Let f be coercive on C.
By Tonelli's Theorem, arg maxx2C f (x) is non-empty and compact.

We conclude by observing that for minimizers hold dual versions of the results that we
established, with for instance lower contour sets in place of the upper ones (as readers can
check).

23.3 The ordinal Tonelli


There is a feature of the previous general form of Tonelli's Theorem that, conceptually, is
still a bit unsatisfactory: unlike coercivity (Proposition 1007), upper semicontinuity is not
an ordinal notion, as the next example shows.
722 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

Example 1083 The function f : R ! R de ned by f (x) = x is trivially continuous. In


contrast, the strictly increasing function g : R ! R de ned by
8
>
> x + 1 if x > 0
<
g (x) = 0 if x = 0
>
>
:
x 1 if x < 0
is neither lower nor upper semicontinuous at 0. To see the failure of upper semicontinuity,
just note that xn = 1=n ! 0 but lim g (xn ) = 1 > 0 = g (0). Since g f = g, this proves that
lower and upper semicontinuity are not preserved by strictly increasing transformations, so
they are not ordinal properties. N
Since upper semicontinuity is not an ordinal notion, we might end up with equivalent
objective functions { in the sense of Section 22.1.5 { for which Tonelli's Theorem is applicable
to only one of them, thus creating an unnatural asymmetry between them. To address this
issue, next we present an ordinal version of upper semicontinuity.
De nition 1084 A function f : A Rn ! R is said to be upper quasi-continuous at
x0 2 A if
f (xn ) f (y) =) f (x0 ) f (y) 8y 2 A (23.3)
for each sequence fxn g A such that xn ! x0 .
It is an ordinal notion, as the next result shows.
Proposition 1085 Given a function f : A Rn ! R, let g : B R ! R be strictly
increasing with Im f B. The function f is upper quasi-continuous at x0 2 A if and only
if the composite function g f is upper quasi-continuous.
Proof We only prove the \only if", the converse being similarly proved. Let f be upper
quasi-continuous. We want to show that g f is upper quasi-continuous at x0 . Let fxn g A
be such that xn ! x0 . Suppose that y 2 A is such that (g f ) (xn ) (g f ) (y). Since g is
strictly increasing, by Proposition 221 we have
f (xn ) f (y) () (g f ) (xn ) (g f ) (y) (23.4)
Since f is upper quasi-continuous at x0 , we then have f (x0 ) f (y). In view of (23.4), this
in turn implies (g f ) (x0 ) (g f ) (y), thus proving that g f is upper quasi-continuous
at x0 .

Besides being ordinal, upper quasi-continuity is weaker than upper semicontinuity.


Proposition 1086 If a function f : A Rn ! R is upper semicontinuous at x0 2 A, then
it is upper quasi-continuous at x0 .
Proof Let f be upper semicontinuous at x0 2 A. Let xn ! x0 and y 2 A such that
f (xn ) f (y) for all n 1. By upper semicontinuity, we then have f (x0 ) lim sup f (xn )
lim inf f (xn ) f (y). So, f is upper quasi-continuous at x0 .

Like the upper contour sets of upper semicontinuous functions, also those of upper quasi-
continuous functions are closed but with an important caveat: the \levels" have to be images
of the function.
23.3. THE ORDINAL TONELLI 723

Proposition 1087 Let f : A Rn ! R be upper quasi-continuous on a closed subset C of


A. Then, the sets (f t) \ C are closed for every t 2 Im f .
Proof Let f be upper quasi-continuous on C. Fixed t 2 Im f , we want to show that (f t)\
C is closed. Since t 2 Im f , there is xt 2 A such that f (xt ) = t. Let fxn g (f f (xt )) \ C
with xn ! x 2 Rn . By Theorem 174, it is enough to show that x 2 (f f (xt )) \ C. Note
that x 2 C since C is closed. Moreover, f (xn ) f (xt ) for each n 1. Since f is upper
quasi-continuous, we have f (x) f (xt ) = t. Therefore, x 2 (f t). We conclude that
x 2 (f t) \ C, as desired.

We can now state and prove a general ordinal version of Tonelli's Theorem in which
upper quasi-continuity replaces upper semicontinuity.5
Theorem 1088 (Ordinal Tonelli) A function f : A Rn ! R which is coercive and
upper quasi-continuous on a subset C of A admits (at least) a maximizer in C. If, in
addition, C is closed, then arg maxx2C f (x) is compact.
The proof relies on a sharpening of Lemma 1001.
Lemma 1089 Let A be a subset of the real line. There exists a convergent and increasing
sequence fan g A such that an " sup A.
Proof Set = sup A. Suppose that 2 R. In the proof of Lemma 1001 we proved the
existence of a sequence fan g A such that an and an ! . Set bn = max fa1 ; :::; an g.
Then 0 bn an ! 0, so bn ! . Suppose now = +1. In the proof of Lemma
1001 we proved the existence of a sequence fan g A such that an ! +1. Again, by setting
bn = max fa1 ; :::; an g, we have bn " +1.

Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x). By Lemma 1089, there exists a sequence
fan g f ( ) such that an " . Let fxn g be such that an = f (xn ) for every n 1. Since
is compact, the Bolzano-Weierstrass' Theorem yields a subsequence fxnk g fxn g that
converges to some x ^ 2 . We want to show that = f (^ x). Suppose, by contradiction, that
f (^
x) < . Since ank " , then there exists k 1 large enough so that nk nk > f (^
x)
for all k k. Hence, f (xnk ) f xnk for all k k. Since f is upper quasi-continuous at
x
^, we then have f (^ x) f xnk > , a contradiction.6 We conclude that = f (^ x). So,
f (^
x) f (x) for every x 2 . At the same time, if x 2 C we have f (x) < t and so
f (^
x) t > f (x). It follows that f (^x) f (x) for every x 2 C, as desired.
In view of Proposition 1087, the compactness of arg maxx2C f (x) can be proved along
the lines of the semicontinuous Tonelli's Theorem.

Corollary 1082 continues to hold in the upper quasi-continuous case, as readers can check.

The ordinal Tonelli's Theorem is the most general form of this existence theorem that we
present. The earlier pre-coda version of Tonelli's Theorem for continuous functions, Theorem
1013, is enough for the results of the book. Yet, when later in the book readers will come
across topics that rely on Tonelli's Theorem, they may then wonder how much generality
would be gained via its stronger semicontinuous and quasi-continuous versions.
5
We leave to readers the dual minimization version, based on a lower quasi-continuity notion.
6
Here xnk plays the role of y in (23.3).
724 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

23.4 Asymptotic analysis: beyond compactness


23.4.1 Recession cones
For concave functions we can establish existence results for maximizers that do not rely on
any form of compactness, unlike Tonelli's and Weierstrass' Theorems.

De nition 1090 The recession (or asymptotic) cone RC of a set C of Rn is de ned by

RC = fy 2 Rn : x + ty 2 C for all x 2 C and all t 0g

with the convention R; = Rn .

The vectors in RC are called directions of recession. Intuitively, along these directions
the set C is unbounded.

Example 1091 In the plane, the recession cones of the convex sets

1
C1 = (x1 ; x2 ) 2 R2 : x2 and x1 > 0 and C2 = (x1 ; x2 ) 2 R2 : x2 x21
x1

are the positive orthant and the vertical axis, respectively. That is, RC1 = R2+ and RC2 =
f(0; x2 ) : x2 0g. N

Lemma 1092 Let C be a subset of Rn . Then, RC is a convex cone, which is closed if C


itself is closed.

Proof Let y 2 RC , so that x + ty 2 Rn for all x 2 C and all t 0. For each 0, we then
have x + t ( y) = x + (t ) y 2 C for all x 2 C and all t 0. So, y 2 RC and we conclude
that RC is a cone. To show that it is convex, let y 0 ; y 00 2 RC and 2 [0; 1]. Then,

x+t y 0 + (1 ) y 00 = x + ty 0 + (1 ) x + ty 00 2 C

for all x 2 C and all t 0, as desired.

The next lemma gives some basic properties of recession cones of closed convex sets.
Observe that by point (iii) to check whether a vector y is a direction of recession is enough
to check a single x 2 C.

Lemma 1093 Let C be a closed convex subset of Rn . Then, the following conditions are
equivalent:

(i) y 2 Rn belongs to RC ;

(ii) there exist fxn g C and f ng R+ , with n " 1, such that lim (xn = n) = y.

(iii) there exists x 2 C such that x + ty 2 C for all t 0;


23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 725

Proof (i) trivially implies (iii). (iii) implies (ii). Suppose there is x 2 C such that x+ y 2 C
for all t 0. Hence xn = x + ny 2 C for all n 1. If we set n = n, we have xn = n ! y.
(ii) implies (i). Suppose y 2 Rn is such that (xn = n ) = y for some fxn g C and
f n g R+ , with n " 1. Given any x 2 C and t 0, set

t t
zn = 1 x+ xn 8n 1
n n

We have zn 2 C for all n large enough (i.e., such that t= n 1), and limn zn = x + ty. Since
C is closed, we have x + ty 2 C. So, y 2 RC .

Example 1094 (i) Let C = R2++ [ 0. It is easy to see that RC = C, so the recession cone
might not be closed when C is not closed. Note that properties (ii) and (iii) of the last
lemma are not true for this set C. N

Recession cones are stable with respect to intersections.

Proposition 1095 Let C be a closed convex subset of Rn . If D is a closed convex subset of


Rn such that C \ D 6= ;, then RC\D = RC \ RD .

It is easy to see that more is T


true: given a collection fCi gi2I of closed convex sets with
non-empty intersection, we have i2I RCi = RTi2I Ci .

Proof We prove only that RC\D RC \ RD since the converse is trivial. Let y 2 RC\D
and x 2 C \ D. Then, x + ty 2 C \ D for all t 0. By Lemma 1093, y belongs to both RC
and RD .

Given a m n matrix A = (aij ) and a vector b 2 Rm , consider a non-empty polyhedron


P = fx 2 Rn : Ax bg. Next we show that its recession cone has a neat form.

Corollary 1096 We have RP = fy 2 Rn : Ay 0g.


m
\
Proof By Proposition 1095, we have RP = RHi where Hi = fx 2 Rn : ai x bi g (cf.
i=1
Section 22.7.2). We have RHi fx 2 Rn : ai x 0g. Indeed, let y 2 RHi . Then, ai
(x + ty) bi for all x 2 Hi and all t 0. So, if x 2 Hi is such that ai x = bi , we have
t (ai y) bi for all t 0, i.e., ai y bi =t for all t 0. By taking t ! +1, we conclude
that RHi fx 2 Rn : ai y 0g. As the converse inclusion is easily established, we conclude
that RHi = fx 2 Rn : ai y 0g, as desired.

Since the directions of recession of a set are, intuitively, the directions along which the
set is unbounded, it is natural to expect that a set without such directions be bounded. The
next important result con rms that this intuition is correct as long as the set is closed and
convex. Thus, this is the class of sets where the notion of recession cone is most meaningful.

Theorem 1097 Let C be a closed convex subset of Rn . Then, C is bounded if and only if
RC = f0g.
726 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

So, a closed and convex set is compact if and only if its recession cone is trivial. It is a
remarkable characterization of compactness for closed convex sets.

Proof If C is bounded, then for any sequence xn 2 C we have xn = n ! 0 if n ! 1. By


Lemma 1093, RC = f0g. Conversely, let RC = f0g and suppose, by contradiction, that
C is unbounded. Then, there is a sequence fxn g C with kxn k " 1. Each xn = kxn k
n
belongs to the unit ball B1 (0) of R and so, being B1 (0) compact, there is z 2 B1 (0) and a
subsequence fxnk gk such that xnk = kxnk k ! z. Hence z 2 RC , which contradicts RC = f0g
since z belongs to the unit sphere @B1 (0) and so is non-zero.

Example 1098 (i) Let C = f(x1 ; 0) : x1 0g [ f(0; x2 ) : x2 0g be the union of the hori-
zontal and vertical axes, a non-convex and unbounded set of the plane. We have RC = f0g,
which shows that in the last theorem the hypothesis of convexity is key.
Since RC1 = C1 where C1 = f(x1 ; 0) : x1 0g, this example also shows that larger sets
might well not feature larger recession cones.
(ii) Let C = (0; 1) [0; 1) [ f(0; 0) ; (1; 0)g, a non-closed and unbounded set of the plane.
We have RC = f0g, and so in the last theorem also the closedness hypothesis is key. N

Next we give a noteworthy consequence of the last theorem.

Corollary 1099 A non-empty polyhedron P = fx 2 Rn : Ax bg is a polytope if and only


if x = 0 is the unique solution of the homogeneous system of linear inequalities Ax 0.

Proof By Corollary 1096, RP = fy 2 Rn : Ay 0g. In view of Proposition 1046, it is then


enough to note that, by Theorem 1097, P is bounded if and only if 0 is the unique solution
of Ax 0.

A vector y 2 Rn is a direction of recession if, given any x 2 C, we remain in C by moving


forward along the direction y, i.e., x + ty for all t 0. The next stronger de nition requires
that this happens by moving both backward and forward, i.e., x + ty 2 C for all t 2 R.

De nition 1100 The lineality space LC of a set C of Rn is de ned by

LC = fy 2 Rn : x + ty 2 C for all x 2 C and all t 2 Rg

The vectors in LC are called directions of constancy. Along them, going both backward
and forward, the set C is \symmetrically" unbounded. A vector space, not just a cone, then
results.

Proposition 1101 The lineality space LC is a vector space, with

LC = RC \ R C = RC \ RC (23.5)

Thus, a vector y is a direction of constancy if and only if both y and y are directions of
recession. A direction of constancy can thus be regarded as two-sided direction of recession.

Proof It is easy to check that LC is a vector space. Let y 2 RC \ R C . Given any t < 0
and x 2 C, consider x + ty. Then, x 2 C and so x + ( t) y 2 C, i.e., x + ty 2 C.
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 727

This shows that y 2 LC . Conversely, let y 2 LC . Clearly, LC RC . Moreover, given any


t < 0 and x 2 C, we have x + ty 2 C, i.e., x + ( t) y 2 C. This implies y 2 R C .
It remains to show that R C = RC . We have
y 2 R C () x + ty 2 C 8x 2 C; 8t 0
() z + t ( y) 2 C 8z 2 C; 8t 0 () y 2 RC () y 2 RC
So, R C = RC .

Along with Corollary 1096, this proposition has the following interesting consequence.
Corollary 1102 We have LP = fy 2 Rn : Ay = 0g.
Proof By Corollary 1096, RP = fy 2 Rn : Ay 0g and RP = fy 2 Rn : Ay 0g. By
(23.5), we have LP = fy 2 Rn : Ay = 0g.

A cone C in Rn is pointed provided x 2 C and x 2 C imply x = 0. Equivalently,


C \ ( C) = f0g. Geometrically, a pointed cone does not contain straight lines: if, by
contradiction, for some 0 6= x 2 C we have x 2 C for all 2 R, then x 2 C and so x = 0,
a contradiction.
In view of (23.5), the recession cone RC is pointed if and only if the lineality space LC is
trivial, i.e., LC = f0g. Geometrically, a trivial lineality space thus corresponds to a recession
cone that does not contain straight lines.

23.4.2 Recession cones of functions


Next we de ne the recession cone of a function as the intersection of all recession cones of
its upper contour sets (f ).
De nition 1103 The recession cone Rf of a function f : C ! R is de ned by
\
Rf = R(f )
2R

Observe that Rf is necessarily a closed and convex cone. Moreover, it is an ordinal


notion.
Proposition 1104 Given f : C ! R, let g : B R ! R be a strictly increasing function
with Im f B. Then, Rf = Rg f .
Proof It is enough to observe that f and g f have the same upper contour sets.

The following result clari es the nature of Rf for concave functions by showing that
Rf = R(f ) for all non-empty (f ).
Lemma 1105 Let f : C ! R be an upper semicontinuous concave function de ned on a
closed convex subset C of Rn . Then,
Rf = R(f )

for all non-empty (f ). In particular, Rf = fy 2 Rn : (y; 0) 2 Rhypo f g.7


7
The hypograph hypo f was de ned in Section 17.2.1.
728 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

Proof Fix 2 R such that (f ) 6= ;. We have y 2 R(f ) if and only x + ty 2 (f )


for all x 2 (f ) and all t 0, i.e., if and only if f (x + ty) for all x 2 (f ) and all
t 0. Set A = (f ) f g Rn R. Then,

(y; s) 2 RA () (x; ) + t (y; s) 2 A 8 (x; ) 2 A; 8t 0


() (x + ty; + ts) 2 A 8 (x; ) 2 A; 8t 0
() f (x + ty) and s = 0 8x 2 (f ) ; 8t 0

Hence, RA = (y; 0) : y 2 R(f ) . On the other hand, A = hypo f \ (Rn f g). Since
RRn f g = Rn f0g, by Proposition 1095 we have

RA = Rhypo f \ RRn f g = f(y; 0) : (y; 0) 2 Rhypo f g

We conclude that

(y; 0) : y 2 R(f ) = f(y; 0) : (y; 0) 2 Rhypo f g

i.e., R(f ) = fy 2 Rn : (y; 0) 2 Rhypo f g. Since was arbitrarily chosen, we conclude that
R(f ) = Rf .

This lemma has an interesting consequence.

Corollary 1106 For an upper semicontinuous concave function f : C ! R de ned on a


closed convex subset C of Rn the following conditions are equivalent:

(i) Rf = f0g;

(ii) an upper contour set is compact;

(iii) all upper contour sets are compact.

In view of Proposition 1016, this corollary implies that an upper semicontinuous concave
function f : Rn ! R has a trivial recession cone if and only if it is supercoercive. So, the
condition Rf = f0g can be viewed as a general condition of supercoercivity for concave
functions de ned on closed convex sets of Rn .

Proof In view of the last lemma, the only non-trivial implication is that (ii) implies (iii). By
hypothesis, all upper contour sets of f are closed and convex. Suppose that one of them, say
f for some 2 R, is compact. By Theorem 1097, R(f ) = f0g. By the last lemma,
we then have R(f ) = R(f ) = f0g for all 2 R. Again by Theorem 1097, (f ) is then
compact for all 2 R.

Next we characterize Rf for concave functions. In particular, point (iii) shows that Rf
is the set of all directions of increase of f , while point (v) provides a remarkable asymptotic
characterization of the elements of Rf .

Proposition 1107 Let f : C ! R be an upper semicontinuous concave function de ned on


a closed convex subset C of Rn . Then, the following conditions are equivalent:
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 729

(i) y 2 Rf ;

(ii) f (x + ty) f (x) for all t 0 and all x 2 C;

(iii) f (x + ty) is, as a function of t, increasing on [0; 1) for all x 2 C;

(iv) inf t2[0;1) f (x + ty) > 1 for all x 2 C;

(v) limt!+1 f (x + ty) =t 0 for all x 2 C.

The proof relies on the following lemma, which reports yet another remarkable property
of concave functions.

Lemma 1108 For a concave function : [0; 1) ! R the following properties are equivalent:

(i) is increasing;

(ii) is bounded below;

(iii) limx!+1 (x) =x 0.

Proof (iii) implies (i). Suppose, by contradiction, that is not increasing. So, there exist
w > y 0 with (w) < (y). Let z > w. There is 2 (0; 1) such that w = z + (1 ) y.
Since is concave, we have (w) (z) + (1 ) (y) > (z) + (1 ) (w), and so
(w) > (z). Since z was arbitrarily chosen, we have (w + h) < (w) for all h > 0. In
turn, this implies 0+ (w) 0. In view of Corollary 1552, from (y) > (w) > (z) it follows
02 = @ (w). By Proposition 1518, we have @ (w) = 0+ (w) ; 0 (w) . Since 0 2 = @ (w) and
0 0
+ (w) 0, we have (w) < 0. By the de nition of superdi erential (Section 32.1), we
have
(x) (w) + 0 (w) (x w) 8x 0
We can write each scalar x > w as x = tw for some t > 1. For each t 1 we thus have
(tw) (w) + 0 (w) (tw w) = (w) + (t 1) 0 (w) w, which in turn implies
0
(x) (tw) (w) (w) w
lim = lim lim + (t 1)
x!+1 x t!+1 tw t!+1 tw tw
(w) 0 t 1
= lim + (w) lim = 0 (w) < 0
t!+1 tw t!+1 t
because 0 (w) < 0. We reached a contradiction, so we conclude that is increasing.
(i) trivially implies (ii). (ii) implies (iii). Suppose that is bounded below, i.e., there
exists some k 2 R such that k. We have (t) =t k=t for all t > 0, so limt!+1 (t) =t
limt!+1 k=t = 0.

Proof
T (i) implies (ii). Let x 2 C and let t 0. We have x 2 (f f (x)), and so y 2
R
2R (f ) implies x + ty 2 (f f (x)) for all t 0, i.e., f (x + ty) f (x).
0 00
(ii) implies (iii). Let x 2 C. Let t > t . As f (x + ty) f (x) for all t 0, we have
x + ty 2 C for all t 0. Hence, f (x + t0 y) = f (x + (t0 t00 ) y + t00 y) f (x + t00 y) since
x + t00 y 2 C.
(iii) trivially implies (iv) because limt!+1 f (x + ty) f (x).
730 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

(iv) implies (v). De ne : [0; 1) ! R by (t) = f (x + ty). The function is concave.


Indeed, for any t0 ; t00 0 and 2 [0; 1], we have

t0 + (1 ) t00 = f (x + t0 + (1 ) t00 y) = f ( x + t0 y + (1 ) x + t00 y )


f (x + t0 y) + (1 ) f (x + t00 y) = t0 + (1 ) t00

as desired. By (iv), is bounded below. By Lemma 1108, is then increasing. So, (t) =t
(0) =t for all t 0, which implies

f (x + ty) (t) (0)


lim = lim lim =0
t!+1 t t!+1 t t!+1 t
as desired.
(v) implies (i). Consider again the function . By Lemma 1108, is increasing. Hence,
f (x + ty) = (t) (0) = f (x). That is, (x; f (x)) 2 hypo f is such that (x; f (x)) +
t (y; 0) 2 hypo f for all t 0. By Lemma 1093-(i), this implies (y; 0) 2 Rhypo f , and so, by
Lemma 1105, y 2 Rf .

Though conceptually illuminating, this proposition is less useful to nd the elements of


Rf . The next result shows that points (iv) and (v) still characterize the elements of Rf if
they just hold for some x 2 Rn , something that greatly simpli es the identi cation of the
elements of Rf .

Proposition 1109 Let f : C ! R be an upper semicontinuous and concave function de ned


on a closed convex subset C of Rn . Then, the following conditions are equivalent:

(i) y 2 Rf ;

(ii) there is x 2 C such that inf t 0f (x + ty) > 1;

(iii) there is x 2 C such that limt!+1 f (x + ty) =t 0.

Proof (i) implies (iii) by Proposition 1107. (iii) implies (ii). Let x0 2 C be such that
limt!1 f (x0 + ty) =t 0. Again, de ne the concave function : [0; 1) ! R by (t) =
f (x0 + ty). By Lemma 1108, is bounded below, so inf t 0 f (x + ty) > 1.
(ii) implies (i). Let x0 2 C be such that inf t 0 f (x + ty) > 1. By Lemma 1108,
is increasing. Hence, f (x0 + ty) = (t) (0) = f (x0 ). That is, (x0 ; f (x0 )) 2 hypo f is
such that (x0 ; f (x0 )) + t (y; 0) 2 hypo f for all t 0. This implies (y; 0) 2 Rhypf and so, by
Lemma 1105, y 2 Rf .

Example 1110 Let f : Rn ! R be a superlinear (and so continuous) function. Set x = 0


in Proposition 1109. Then, y 2 Rf if and only if f (y) = limt!+1 f (ty) =t 0. Hence,
Rf = fy 2 Rn : f (y) 0g = (f 0). N

Example 1111 Given a non-empty polyhedron P = fx 2 Rn : Ax bg, a square and neg-


ative semi-de nite symmetric matrix B of order n and a vector c 2 Rn , consider the linear
quadratic optimization problem

max f (x) sub x 2 P


x
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 731

where f : Rn ! R is the multivariable quadratic function de ned by


1
f (x) = x Bx + c x
2
We have Rf = fy 2 Rn : By = 0 and c y 0g. Indeed, by Proposition 1109 we have
!
t2
f (ty) 2 y By tc y t
lim = lim + = c y+y By lim 0 () By = 0 and c y 0
t!+1 t t!+1 t t t!+1 2

If B is negative de nite, so invertible (Proposition 1195), we then have Rf = f0g because


y = 0 is the only solution of By = 0. So, f is supercoercive and continuous on P and, by
Tonelli's Theorem, the set of solutions of the linear quadratic optimization problem is then
non-empty and compact. Since f is strictly concave, it is actually a singleton (which can be
computed with the methods of Chapter 39). N
T
De nition 1112 The lineality space of a function f : C ! R is de ned by Lf = 2R L(f ).

Next we show that also the vector spaces L(f ) coincide for the non-empty upper contour
sets (f ) of concave functions.

Lemma 1113 Let f : C ! R be an upper semicontinuous concave function. Then,

Lf = L(f )

for all non-empty (f ).

Proof De ne the auxiliary function g : C ! R by g (x) = f ( x) for all x 2 C. The


function g is upper semicontinuous and concave if f is. We show that

R (f ) = fy 2 Rn : (y; 0) 2 Rhypo g g (23.6)

for all 2 R such that (f ) 6= ;, i.e., such that (f ) 6= ;. Fix 2 R such that
(f ) 6= ;. We have R (f ) = R(g ) . In fact, y 2 R (f ) if and only x + ty 2
(f ) for all x 2 (f ) and all t 0, i.e., if and only if g (x + ty) for all
x 2 (g ) and all 0. By Lemma 1105, (23.6) holds.
We conclude that R (f ) are all equal provided (f ) 6= ;. In turn, this implies
Lf = L(f ) for all (f ) 6= ;.

As readers can check, Propositions 1107 and 1109 take the following form for lineality
spaces, where the real line replaces the positive half-line.

Proposition 1114 Let f : C ! R be an upper semicontinuous concave function de ned on


a closed convex subset C of Rn . Then, the following conditions are equivalent:

(i) y 2 Lf ;

(ii) f (x + ty) = f (x) for all t 2 R and all x 2 C;

(iii) f (x + ty) is, as a function of t, constant on R for all x 2 C;


732 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

(iv) inf t2R f (x + ty) > 1 for all x 2 C;

(v) limt! 1f (x + ty) =t = 0 for all x 2 C;

(vi) there is x 2 C such that inf t2R f (x + ty) > 1;

(vii) there is x 2 C such that limt! 1f (x + ty) =t = 0.

Example 1115 Let f : Rn ! R be a superlinear (and so continuous) function. Set x = 0


in Proposition 1114. Then, y 2 Lf if and only if

f (ty) f ( t ( y)) f (ty)


f (y) = lim =0 and f ( y) = lim = lim =0
t!1 t t! 1 t t! 1 t

So, Lf = fy 2 Rn : f (y) = f ( y) = 0g. To appreciate this nding, recall that f (y)


f ( y) for all y 2 Rn (Proposition 874).
In particular, if f is linear we have Lf = f0g because in this case f (y) = f ( y) for all
y 2 Rn and f (y) = 0 if and only if y = 0 (Section 15.1). N

23.4.3 Maxima
We can nally state and prove the main result of this section, an existence result for maximiz-
ers of concave functions that does not rely on any compactness assumption. In reading the
result recall that, under the hypotheses of the theorem, the set arg maxx2C f (x) is convex.

Theorem 1116 A function f : A Rn ! R which is concave and upper semicontinuous


on a closed convex subset C of A admits (at least) a maximizer in C if RC \ Rf = LC \ Lf .

By Proposition 1095 and Lemma 1105, RC \ Rf = RC \ R(f ) = RC\(f ) as well


as LC \ Lf = LC\(f ) for each 2 R such that (f ) \ C is non-empty. So, the key
condition RC \ Rf = LC \ Lf requires that the directions of recession common to the choice
set C and to an upper contour set (f ) of the objective function f be two-sided, so one
can freely move both backward and forward along them without any impediment.
Since LC \ Lf RC \ Rf , the key condition is trivially satis ed when there are no such
common directions, i.e., RC \ Rf = f0g. This case corresponds to coercivity, as the next
result shows. It is a version of Corollary 1082 that takes advantage of the present convex
setup. In particular, that (ii) implies (i) shows that in such setup Theorem 1116 generalizes
Tonelli's Theorem.

Corollary 1117 For a function f : A Rn ! R which is concave and upper semicontinuous


on a closed convex subset C of A, the following conditions are equivalent:

(i) RC \ Rf = f0g;

(ii) f is coercive;

(iii) arg maxx2C f (x) is non-empty and compact.


23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 733

Proof (i) implies (iii) Suppose RC \Rf = f0g. Then, the condition RC \Rf LC \Lf is triv-
ially satis ed, and so arg maxx2C f (x) 6= ; by Theorem 1116. Since ; = 6 arg maxx2C f (x) =
(f maxx2C f (x))\C, by Proposition 1095 we have R(f maxx2C f (x))\C = R(f maxx2C f (x)) \
RC = Rf \ RC = f0g. By Theorem 1097, the set (f maxx2C f (x)) \ C is then compact,
as desired.
(iii) implies (ii). Suppose that the convex set arg maxx2C f (x) is non-empty and compact.
Since arg maxx2C f (x) = (f maxx2C f (x)) \ C, the function f is coercive.
(ii) implies (i) By coercivity, there is t 2 R such that (f t) \ C is compact and non-
empty. So, by Proposition 1095 and Theorem 1097 we have RC \ Rf = RC \ R(f t) =
RC\(f t) = f0g, as desired.

The proof of Theorem 1116 relies on the next lemma that gives a condition under which
a monotone sequence of closed convex sets has non-empty intersection.

Lemma 1118 Let fCT


n g be a monotone sequence
T of Tclosed convex sets of Rn , with C1
Cn . Then n Cn 6= ; provided n RCn = n LCn .

Proof Observe rst that there exists, for every n 1, an element xn 2 Cn of minimal norm,
i.e.,
xn 2 arg min kxn k
x2Cn

Indeed, given any xn 2 Cn the set fx 2 Cn : kxk kxn kg is compact, so a minimizer exists
by (a dual) Tonelli's Theorem because the norm is coercive on Cn .
Because of the monotonicity of the sequence fCn g, the sequence fkxn kg is easily seen to
be increasing: kx1 k kxn k . It is also bounded. Suppose, by contradiction,
that kxn k " 1. The sequence xn = kxn k belongs to the unit ball B1 (0) of Rn , which is
compact. By the Bolzano-Weierstrass' Theorem, there exist a subsequence xnk k and a
vector y 2 @B1 (0) such that limk xnk = xnk = y. Fix m 1. Then,Txnk 2 Cm for all
k large enough. TBy Lemma 1093, y 2 RCm . Since m is arbitrary, y 2 n RCn and so, by
hypothesis, y 2 n LCn . Since limk xnk = xnk = y, we have

xnk xnk y
lim =0 (23.7)
k xnk
T
Moreover, y 2 n LCn implies xnk + tk y 2 Cnk for all tk 2 R and all k 1. Then,

xnk + tk y xnk

for all k 1 since by construction each xnk is a minimum norm vector. Setting tk = xnk ,
we then have
xnk xnk y
1 8k 1 (23.8)
xnk

which contradicts (23.7). We conclude that the monotone sequence fkxn kgn is bounded.
Then,Tby Corollary 324 n
T there is a vector z 2 R such that xn = z. It is easy to check
that z 2 n Cn , and so n Cn 6= ;.
734 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION

Proof of Theorem 1116. Set = supx2C f (x) and consider an increasing sequence
f n g f (C) with n " (Lemma 1089). Set Cn = (f n ) \ C for all n 1. Since f is
upper semicontinuous and concave on C, each Cn is closed and convex. Then, by Proposition
1095 we have RCn = R(f n )\C = R(f n ) \ RC =TRf \ RC . Similarly, T LCn = Lf \ LC
and so RCn = LCn for all n 1. By Lemma 1118, n Cn 6= ;. Let x 2 n Cn . We have
f (x ) n for all n 1, and so f (x ) = supx2C f (x). We conclude that x is a
maximizer, as desired.

We illustrate the results of this section with some examples.

Example 1119 Consider the trivial optimization problem

max f (x) sub x 2 Rn


x

where the objective function f : Rn ! R is constant. Here Theorem 1116, but not Tonelli's
Theorem, applies. In fact, Rf = Lf = RRn = LRn = Rn , so the condition RC \ Rf =
LC \ Lf is trivially satis ed. Instead, condition RC \ Rf = f0g is not satis ed and, indeed,
arg maxx2Rn f (x) = Rn . N

Example 1120 Given a non-empty polyhedron P = fx 2 Rn : Ax bg, consider the opti-


mization problem
max f (x) sub x 2 P
x
By Corollary 1117, the set of solutions of this problem is a non-empty and compact set if
and only if
fx 2 Rf : Ax 0g = f0g
that is, if and only if x = 0 is the only vector of Rf that solves the system of linear inequalities
Ax 0. By Corollary 1099, this happens if and only if P is bounded (i.e., it is a polytope).
N

Example 1121 Given a superlinear (and so continuous) function f : Rn ! R and a closed


convex subset C of Rn , consider the optimization problem

max f (x) sub x 2 C (23.9)


x

In view of Examples 1110 and 1115, by Theorem 1116 this problem has solutions provided

fx 2 LC : f (x) = f ( x) = 0g = fx 2 RC : f (x) 0g

which is equivalent to

f (x) 0 =) x 2 RC and f ( x) = 0 8x 2 RC (23.10)

In particular, by Corollary 1117 the set of solutions is non-empty and compact if and only if

f (x) 0 =) x = 0 8x 2 RC (23.11)

For example, consider the Leontief superlinear function f : Rn ! R de ned by

f (x) = min xi
i=1;:::;n
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 735

Intuitively, arg maxx2C f (x) is non-empty if the set C has no positive directions of recession,
i.e., if RC \Rn+ = f0g. Indeed, these are directions along which f can keep growing. Condition
(23.10) makes precise this simple insight. For, let x 2 RC be such that f (x) 0. It then
follows that x 0 and so RC \ Rn+ = f0g implies x = 0. Hence, f ( x) = 0 and x 2 RC ,
so condition (23.10) holds. Under condition RC \ Rn+ = f0g it also holds condition (23.11),
so we conclude that arg maxx2C f (x) is non-empty and compact. N

Example 1122 If in the last example the objective function f is linear, by Example 1115
we have Lf = f0g. So, by Corollary 1117 in this case the set of solutions is non-empty and
compact if and only if RC \ Rf = f0g. This is the case when fx 2 RC : f (x) 0g = f0g,
that is,
f (x) 0 =) x = 0 8x 2 RC
For instance, given a vector c 2 Rn and a non-empty polyhedron P = fx 2 Rn : Ax bg,
consider the linear programming problem

max f (x) sub x 2 P


x

with f (x) = c x. This problem has a non-empty and compact set of solutions if and only if

RP \ Rf = fx 2 Rn : Ax 0g \ fx 2 Rn : c x 0g = f0g

that is, if and only if x = 0 is the unique solution of the following system of linear inequalities
8
>
> c1 x1 c2 x2 cn xn 0
>
>
< a11 x1 + a12 x2 + + a1n xn 0
a21 x1 + a22 x2 + + a2n xn 0
>
>
>
>
:
am1 x1 + am2 x2 + + amn xn 0

This is the case, for instance, if P is bounded because RP = f0g by Theorem 1097 (cf. the
Fundamental Theorem of Linear Programming). N

In the rest of the book we will never invoke Theorem 1116. However, when we will use
Tonelli's Theorem for an optimization problem with a concave objective function the coda
reader should wonder about the use of the more general Theorem 1116. Finally, we leave to
readers an ordinal version of Theorem 1116.
736 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
Chapter 24

Projections and approximations

24.1 Projection Theorem


In this chapter we address a simple general problem, with far-reaching implications: given
a point x 2 Rn and a vector subspace V of Rn , we would like to identify, if it exists, the
point m of the vector subspace V which is \closest" to x. Formally, m is the point of V that
minimizes kx yk as y varies in V . Graphically:

1.5

1
x
0.5

0 ||x-m||
O
-0.5
m
-1

-1.5

-2
-1 0 1 2 3 4

Clearly, the problem is trivial if x belong to V : just set m = x. Things become interesting
when x is not in V . In this regard, note that we can paraphrase the problem by saying that
it consists in nding in V the best approximation of a given x 2 Rn : the vector subspace
V thus represents the space of \admissible approximations" and x m is interpreted as an
\approximation error" because it represents the error made by approximating x with m.
The problem described above is an optimization problem that consists in minimizing
kx yk under the constraint y 2 V , that is,

min kx yk sub y 2 V (24.1)


y

The relevant questions about this problem are:

737
738 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

(i) Does a solution m exist?

(ii) If it exists, is it unique?

(iii) How can it be characterized?

The following theorem addresses all these questions. It relies on the notions of orthogonal-
ity we studied earlier in the book (Chapter 4). In particular, recall that two vectors x; y 2 Rn
are orthogonal, written x?y, when their inner product is null. When x is orthogonal to all
vectors in a subset S of Rn , we write x?S.

Theorem 1123 (Projection Theorem) Let V be a vector subspace of Rn . For every


x 2 Rn , the optimization problem (24.1) has a unique solution, given by the vector m 2 V
with error x m orthogonal to V , that is, (x m) ?V .

Note that the uniqueness of m implies that kx mk < kx yk for each y 2 V di erent
from m.

This remarkable result ensures the existence and uniqueness of the solution, thus an-
swering the rst two questions, and characterizes it as the vector in V which makes the
approximation error orthogonal to V itself. Orthogonality with respect to the error is a key
property of the solution that has a number of consequences in applications. Furthermore,
Theorem 1128 will show how orthogonality allows for identifying the solution in closed form
in terms of a basis of V , thus fully answering also the last question.

To prove the theorem, given a x 2 Rn consider the function f : Rn ! R de ned by


f (y) = kx yk. Problem (24.1) can be rewritten as

max f (y) sub y 2 V (24.2)


y

Thanks to the following lemma, one can apply Tonelli's Theorem and Theorem 1032 to
this optimization problem.

Lemma 1124 The function f is strictly concave and coercive on V .

Proof The proof is analogous to that of Lemma 1058 and is thus left to the reader (note
that, from Proposition 861, V is a closed and convex subset of Rn ).

Proof of the Projection Theorem In light of the previous lemma, problem (24.2), so
problem (24.1), has a solution by Tonelli's Theorem because f is coercive on V and such a
solution is unique by Theorem 1032 because f is strictly concave.
It remains to show that, if m minimizes kx yk, then (x m) ?V . Suppose, by contra-
diction, that there is a y~ 2 V which is not orthogonal to x m. Without loss in generality,
suppose that k~y k = 1 (otherwise, it would su ce to take y~= k~
y k which has norm 1) and that
(x m) y~ = 6= 0. Denote by y 0 the element in V such that y 0 = m + y~. We have that
2
x y0 = kx m y~k2 = kx mk2 2 (x m) y~ + 2
= kx mk2 2
< kx mk2
24.2. PROJECTIONS 739

thus contradicting the assumption that m minimizes kx yk as the element y 0 would make
kx yk even smaller. The contradiction proves the desired result.

Denote by V ? = fx 2 Rn : x?V g the set of vectors that are orthogonal to V . The reader
can easily check that such a set is a vector subspace of Rn . It is thus called the orthogonal
complement of V .

Example 1125 Let V = span fy1 ; :::; yk g be the vector subspace generated by the vectors
fyi gki=1 and let Y 2 M (k; n) be the matrix whose rows are such vectors. Given x 2 Rn ,
we have x?V if and only if Y x = 0. Therefore, V ? consists of all the solutions of this
homogeneous linear system. N

The Projection Theorem has the following important corollary.

Corollary 1126 Let V be a vector subspace of Rn . Each vector x 2 Rn can be uniquely


decomposed as
x=y+z (24.3)
with y 2 V and z 2 V ? .

Proof It su ces to set y = m and z = x m.

In words, any vector can be uniquely represented as sum of vectors in V and in its
orthogonal complement V ? , and this can be done for any vector subspace V of Rn . The
uniqueness of such a decomposition is remarkable as it entails that the vectors y and z are
uniquely determined. For this reason we say that Rn is a direct sum of subspaces V and V ? ,
in symbols Rn = V V ? , as we will see momentarily in Section 24.5. In many applications it
is important to be able to regard Rn as a direct sum of one of its subsets and its orthogonal
complement.

24.2 Projections
Given a vector subspace V of Rn , the solution of the minimization problem (24.1) is called
projection of x onto V . In such way one can de ne an operator PV : Rn ! Rn , called
projection, that associates to each x 2 Rn its projection PV (x).

Proposition 1127 The projection is a linear operator.

Proof Take x; y 2 Rn and ; 2 R. Our aim is to show that PV ( x + y) = PV (x) +


PV (y). For every z 2 V , we have

( PV (x) + PV (y) ( x + y)) z = ( (PV (x) x) + (PV (y) y)) z


= (PV (x) x) z + (PV (y) y) z = 0

Therefore,
( PV (x) + PV (y) ( x + y)) ?V
and, by the Projection Theorem and by the uniqueness of decomposition (24.3), PV (x) +
PV (y) is the projection of x + y on V , that is, PV ( x + y) = PV (x) + PV (y).
740 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

Being linear, projections have a matrix representation. To nd it, consider a set fyi gki=1
of vectors that generate the subspace V , that is, V = span fy1 ; :::; yk g. Given x 2 Rn , by the
Projection Theorem we have (x PV (x)) ?V , so
(x PV (x)) yi = 0 8i = 1; :::; k
which are called normal equations
P of the projection. Since PV (x) 2 V , we can write it as a
linear combination PV (x) = ki=1 k yk . The normal equations then become:
k
!
X
x k yk yi = 0 8i = 1; :::; k
i=1

that is,
k
X
k (yk yi ) = x yi 8i = 1; :::; k
i=1
We thus end up with the system
8
>
> 1 (y1 y1 ) + 2 (y2
y1 ) + + k (yk y1 ) = x y1
<
1 (y1 y2 ) + 2 2 y2 ) +
(y + k (yk y2 ) = x y2
>
>
:
1 (y1 yk ) + 2 (y2 yk ) + + k (yk yk ) = x yk

Let Y 2 M (n; k) be the matrix that has as columns the generating vectors fyi gki=1 . We can
rewrite the system in matrix form as
YT Y = YT x (24.4)
k nn kk 1 k nn 1

We thus end up with the Gram square matrix Y T Y , which has rank equal to that of Y by
Proposition 692, that is, Y T Y = (Y ).
If the vectors fyi gki=1 are linearly independent, matrix Y has full rank k and so the Gram
matrix is invertible. By multiplying all elements in system (24.4) by the inverse of the Gram
1
matrix Y T Y , we then have
1
= Y TY Y Tx
So, the projection is given by
k
X 1
PV (x) = k yk =Y = Y Y TY Y Tx 8x 2 Rn
i=1

We have thus proven the important:


Theorem 1128 Let V be a vector subspace of Rn generated by the linearly independent
vectors fyi gki=1 .1 The projection PV : Rn ! Rn on V is given by
1
PV (x) = Y Y T Y Y Tx 8x 2 Rn (24.5)
where Y 2 M (n; k) is the matrix that has such vectors as columns.
1
The assumption that V is generated by the linearly independent vectors fyi gki=1 is equivalent to requiring
that such vectors be a basis for V . The theorem can be equivalently formulated as: Let fyi gki=1 be a basis of
a vector subspace of Rn .
24.3. THE ULTIMATE RIESZ 741

1
In conclusion, the matrix Y Y T Y Y T represents the linear operator PV .

Example 1129 Consider the vector subspace V = f y : 2 Rg of the plane generated by


a single non-zero vector y 2 R2 . By (24.5), we have
y x
PV (x) = y 8x 2 R2
y y

Geometrically, V is the straight line determined by y (cf. Example 87). So, PV (x) is the
point y in such line that is closest to x. Its coe cient is the ratio
Pn
y x xi yi
= Pi=1 n 2 (24.6)
y y i=1 yi

This can be also checked directly because the optimization problem (24.1) here takes the
form
Xn
min (xi yi )2 sub 2 R
i=1

and the value of that solves this problem is easily checked to be (24.6). For instance, let
y = (2; 3) 2 R2 , so that V = f(2 ; 3 ) : 2 Rg. We have

(2; 3) (x1 ; x2 ) 2x1 + 3x2 4x1 + 6x2 6x1 + 9x2


PV (x) = (2; 3) = (2; 3) = ;
(2; 3) (2; 3) 13 13 13

for all x 2 R2 . N

24.3 The ultimate Riesz


Projections makes it possible an important re nement of Theorem 762, the version of Riesz's
Theorem for vector subspaces. Given a linear function f : V ! R, let be the set of vectors
2 Rn for which (15.50) holds, that is, the vectors such that

f (x) = x 8x 2 V

By Theorem 762, such a set is non-empty. Remarkably, the projection of its elements on V
are the same:

Lemma 1130 PV ( 0 ) = PV ( ) for each ; 0 2 .

Proof Take 2 . By (24.3) it holds that = PV ( ) + y with y 2 V ? , so that

f (x) = x = (PV ( ) + y) x = PV ( ) x + y x = PV ( ) x

If 0 2 we have
0
f (x) = PV x = PV ( ) x 8x 2 V
and so PV ( 0 ) x = 0 for every x 2 V . It follows that PV ( 0 ) 2 V ? , that is,
PV ( 0 ?
) 2 V \ V since by de nition PV ( 0 ?
) 2 V . However, V \ V = f0g and so
PV ( 0 0
) = 0, that is, PV ( ) = PV ( ).
742 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

In light of this lemma, let us denote the common projection as , that is = PV ( ) with
2 . By the decomposition (24.3), every 2 can be uniquely written as = +", where
" 2 V ? , so that the vectors " and are orthogonal. In other words, = +":"2V? .
Since
f (x) = x = ( + ") x = x+" x= x 8x 2 V
the projection is the only vector in V that represents f . We have thus proven the following
version of Riesz's Theorem for vector subspaces.

Theorem 1131 (Riesz) Let V be a vector subspace of Rn . A function f : V ! R is linear


if and only if there is a unique vector 2 V such that

f (x) = x 8x 2 V

In what follows, when mentioning Riesz's Theorem we will refer to this general version
of the result.

Example 1132 In Example 761 we have = (1; 1; 0) 2 V . N

Projections have made it possible to address the multiplicity of vectors that a icted
Theorem 762, which in turn resulted from the multiplicity of the extensions f : Rn ! R of
a function f on Rn provided by the Hahn-Banach's Theorem (Section 15.11).
In particular, if f : Rn ! R is a linear function on Rn and is the unique vector of Rn
such that f (x) = x for every x 2 Rn , for its restriction fjV on a vector subspace V the
vector = PV ( ) is the only vector in V such that f (x) = x for every x 2 V . By (24.5),
we then have the following remarkable formula
1
= Y Y TY YT

24.4 Least squares and projections


The idea of approximation that underlies both least squares (Section 22.10) and projections
suggests the existence of a connection between the two notions. Let us make such an intuition
more precise.

Least squares The least squares solution x 2 Rn solves the minimization problem

min kAx bk2 sub x 2 Rn (24.7)


x

At the same time, since the image Im F of the linear operator F (x) = Ax is a vector subspace
of Rm , the projection PIm F (b) of vector b 2 Rm solves the optimization problem

min ky bk2 sub y 2 Im F


y

that is,
kPIm F (b) bk ky bk 8y 2 Im F
24.4. LEAST SQUARES AND PROJECTIONS 743

Therefore, a vector x 2 Rn is a least squares solution if and only if

Ax = PIm F (b) (24.8)

that is, if and only if its image Ax is the projection of b on the vector subspace Im F
generated by the columns of A. The image Ax is often denoted as y . With such a
notation, (24.8) can be rewritten as y = PIm F (b).

Errors Equality (24.8) shows the tight relationship between projections and least squares.
In particular, by the Projection Theorem the error Ax b is orthogonal to the vector
subspace Im F :
(Ax b) ? Im F
or, equivalently, (y b) ? Im F .
The vector subspace Im F is generated by the columns of A, which are therefore orthogo-
nal to the approximation error. For example, in the statistical interpretation of least squares
from Section 22.10.2, matrix A is denoted as X and has the form (22.58); each column XiT of
X displays data on the i-th regressor in every period. If we identify each such column with
the regressor whose data it portrays, we can see Im F as the vector subspace of Rm generated
by the regressors. The least squares method is equivalent to considering the projection of
the output vector Y on the subspace generated by the regressors X1 , ..., Xn . In particular,
the regressors are orthogonal to the approximation error:

(X Y ) ?Xi 8i = 1; ::; n

By setting Y = X one equivalently has that (Y Y ) ?Xi for every i = 1; ::; n, a classic
property of least squares that we already mentioned.

Solution's formula Assume that (A) = n, so matrix A has full rank and the linear
operator F is injective (Corollary 689). In this case, we have
1
x =F (PIm F (b)) (24.9)

so that the least squares solution can be determined via the projection. Equality (24.9) is
even more signi cant if we can express it in matrix form. In doing so, note that the linearly
independent (since (A) = n) columns of A generate the subspace Im F , thus taking the
role of matrix Y from Section 24.2. By Theorem 1128, we have
1
Ax = PIm F (b) = A AT A AT b

By multiplying by matrix AT we get:


1
AT A x = AT A AT A AT b = AT b

Finally, by Proposition 692 we have (A) = AT A = n, so the Gram matrix AT A is


n n
1
invertible. By multiplying its inverse AT A , we have the following remarkable matrix
formula for the least squares solution:
1
x = AT A AT b
744 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

This is the matrix representation of (24.9) that is made possible by the matrix representation
of projections established in Theorem 1128. Cramer's Theorem is the special case when A is
an invertible square matrix of order n. Indeed, in this case also the transpose AT is invertible
(Proposition 717), so by Proposition 704 we have
1 1
x = AT A AT b = A 1
AT AT b = A 1
b

We have thus found the least squares solution when the matrix A has full rank. Using
the statistical notation, we end up with the well-known least squares formula
1
= X TX X TY

24.5 Direct sums


The sum
V1 + V2 = fx1 + x2 : x1 2 V1 ; x2 2 V2 g
of two vector subspaces of Rn (cf. Section 21.4) is easily seen to be a vector subspace of V .

Lemma 1133 Each vector x 2 V1 + V2 can be uniquely written as

x = x1 + x2

with x1 2 V1 and x2 2 V2 if and only if V1 \ V2 = f0g.

Proof \If". Suppose that V1 \ V2 = f0g. Let x 2 V1 + V2 . Suppose that x1 ; x01 2 V1


and x2 ; x02 2 V2 are such that x = x1 + x2 = x01 + x02 . Then, x1 x01 = x2 x02 and so
x1 x01 ; x2 x02 2 V1 \ V2 . We conclude that x1 = x01 and x2 = x02 , as desired.
\Only if". Suppose that each x 2 V1 + V2 can be uniquely written as x = x1 + x2 . Let
0 6= x 2 V1 \ V2 . If x 2 V1 + V2 , there exist x1 2 V1 and x2 2 V2 such that x = x1 + x2 . Then,
it also holds x = (x1 + x) (x2 x) with x1 + x 2 V1 and x2 x 2 V2 , a contradiction.

When V1 \ V2 = f0g, we say that V1 + V2 is a direct sum of the two vector subspaces and
denote it by
V1 V 2
A basic illustration is the plane as the direct sum of its Cartesian axes.
In view of the last lemma, when x belongs to a direct sum V1 V2 , it can be uniquely
written as a sum x = x1 + x2 with x1 2 V1 and x2 2 V2 . In particular, this uniqueness
permits to de ne the projections PV1 : V1 V2 ! V1 and PV2 : V1 V2 ! V2 by

PV1 (x) = x1 and PV2 (x) = x2

We can thus write x = PV1 (x) + PV2 (x) for all x 2 V1 V2 .

Example 1134 Let V23 = x 2 R3 : x2 = x3 = 0 and V13 = x 2 R3 : x1 = x3 = 0 be


two axes in R3 . Clearly, V23 \ V13 = f0g. It holds

V23 V13 = V3
24.6. A FINANCE ILLUSTRATION 745

that is, the direct sum of these two axes is the horizontal plane V3 = x 2 R3 : x3 = 0 in
R3 . In particular, the projections PV23 : V3 ! V23 and PV13 : V3 ! V13 are given by

PV23 (x) = (x1 ; 0; 0) and PV13 (x) = (0; x2 ; 0)

for all x = (x1 ; x2 ; 0) 2 V3 . Clearly, x = PV23 (x) + PV13 (x) for all x 2 V3 . N

The next simple lemma highlights an important property of projections.

Lemma 1135 Projections are linear operators.

Proof Let x; x0 2 V1 V2 and ; 2 R. Then,

v + v0 = (PV1 (x) + PV2 (x)) + PV1 x0 + PV2 x0


= PV1 (x) + PV1 x0 + PV2 (x) + PV2 x0

Since PV1 (x) + PV1 (x0 ) 2 V1 and PV2 (x) + PV2 (x0 ) 2 V2 , we thus have

PV1 v + v 0 = PV1 (x) + PV1 x0 and PV2 v + v 0 = PV2 (x) + PV2 x0

as desired.

By the Projection Theorem, we have

Rn = V V?

for all vector subspaces V of Rn . In this case, the direct sum of the two vector subspaces V
and V ? is the entire space Rn . This orthogonal form is the only possible one that a such
direct sum can take, as next we show.

Proposition 1136 If the space Rn is the direct sum of the two vector subspaces V1 and V2 ,
then V1 = V2? and V2 = V1? .

Proof Suppose that Rn = V1 V2 . By the Projection Theorem, we have Rn = V1 V1? . In


turn, this easily implies that V2 = V1? . A similar argument proves that V1 = V2? .

Thus, Rn can be the direct sum only of a vector subspace and of its orthogonal com-
plement. In this case, we are back to the projection PV of Section 24.2 and to its twin
PV ? .

24.6 A nance illustration


We consider a two-period frictionless nancial market. At date 0 (today) investors trade n
primary assets { in any quantity and without any kind of impediment (transaction costs,
short sales constraints, etc.) { that pay out at date 1 (tomorrow), contingent on which state
s 2 S = fs1 ; :::; sk g obtains tomorrow. States are mutually exclusive (only one of them
obtains) and provide an exhaustive description of uncertainty (at least one of them obtains).
Let L = fy1 ; :::; yn g Rk be the collection of primary assets and p = (p1 ; p2 ; : : : ; pn ) 2 Rn
the vector of their market prices (per unit of asset). The pair (L; p) describes the nancial
market.
746 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

24.6.1 Portfolios and contingent claims


A primary asset j = 1; :::; n is a state-contingent payo traded in the market, denoted by

yj = (y1j ; :::; ykj ) 2 Rk

where yij represents its payo if state si obtains. Portfolios of primary assets can be formed
in the market, each identi ed by a vector of weights x = (x1 ; :::; xn ) 2 Rn where xj is the
traded quantity of primary asset yj . If xj 0 (resp., xj 0) the portfolio is long (resp.,
short) on asset yj , that is, it buys (resp., sells) xj units of the asset (cf. Example 917).
In particular, the primary asset y1 is identi ed by the portfolio e1 = (1; 0; :::; 0) 2 Rn , the
primary asset y2 by e2 = (0; 1; 0::::; 0) 2 Rn , and so on.
The linear combination Xn
xj yj 2 Rk
j=1

is the state-contingent payo that, tomorrow, portfolio x ensures.

Example 1137 Suppose the payments of the primary assets depend on the state of the
economy (e.g., dividends if assets are shares), which can be of three types:

s1 = \recession" s2 = \stasis" s3 = \growth"

Each primary asset yj can be described as a vector

yj = (y1j ; y2j ; y3j ) 2 R3

in which yij is the payment of the asset in case state si obtains, for i = 1; 2; 3. Suppose there
exist only four assets on the market, with L = fy1 ; y2 ; y3 ; y4 g. Let xj be the quantity of
asset yj held, so that the vector of coe cients x = (x1 ; x2 ; x3 ; x4 ) 2 R4 represents a portfolio
formed by these assets. The quantities xj can be both positive and negative. In the rst
case we are long in the asset and we are paid yij in case state si obtains; when xj is negative
we are instead short on the asset and we have to pay yij when si obtains. The payment
of a portfolio x 2 R4 in the di erent states is, therefore, given by the linear combination
x1 y1 + x2 y2 + x3 y3 + x4 y4 2 R3 . For instance, suppose

y1 = ( 1; 0; 2) , y2 = ( 3; 0; 3) , y3 = (0; 2; 4) , y4 = ( 2; 0; 2) (24.10)

Then, the portfolio x = (1; 2; 1; 2) has payo y1 + 2y2 + y3 + 2y4 = ( 11; 2; 16) 2 R3 . N

We call contingent claim any state-contingent payo wX 2 Rk . A claim w is replicable


n
(in the market) if there exists a portfolio x such that w = xj yj . In words, replicable
j=1
contingent claims are the state-contingent payo s that, tomorrow, can be attained by trading,
today, primary assets. The market W is the vector subspace of Rk consisting of all replicable
contingent claims, that is,
W = span L
The market is complete if W = Rk : if so, all contingent claims are replicable. Otherwise,
the market is incomplete. In view of Example 90, completeness of the market amounts to
the replicability of the k Arrow (or pure) contingent claims ei 2 Rk that pay out one euro
if state si obtains and zero otherwise. These important claims uniquely identify states.
24.6. A FINANCE ILLUSTRATION 747

Example 1138 In the previous example the market generated by the four primary assets
(24.10) is easily seen to be complete. On the other hand, suppose that only the rst two
assets are available, that is, L = fy1 ; y2 g. Then, W = span L = f(x; 0; y) : x; y 2 Rg, so
the market is now incomplete. Indeed, it is not possible to replicate contingent claims that
feature non-zero payments when state s2 obtains. N

24.6.2 Market value


The payo operator R : Rn ! Rk given by
Xn
R (x) = xj yj
j=1

is the linear operator that describes the contingent claim determined by portfolio x. In other
words, Ri (x) is the payo of portfolio x if state si obtains. Clearly, W = Im R and so the
rank (R) of the linear operator R : Rn ! Rk is the dimension of the market W .
To derive the matrix representation of the payo operator R, consider the payo matrix
2 3
y11 y12 y1n
6 y21 y22 y2n 7
Y = (yij ) = 6
4
7
5
k n
yk1 yk2 ykn

It has k rows (states) and n columns (assets). Entry yij represents the payo of primary
asset yj in state si . In words, Y is the matrix rendering of the collection L of primary assets.
It is easy to see that the payo operator R : Rn ! Rk can be represented as

R (x) = Y x

The payo matrix Y is thus the matrix associated with operator R. Its rank is then the
dimension of market W (see Section 15.4.2).
In a frictionless market, the (market) value
Xn
v (x) = p x = pj xj
j=1

of a portfolio x is its (today) cost caused by the market operations it requires.2 The (market)
value function v : Rn ! R is the linear function that assigns to each portfolio x its value
v (x). In particular, the value of primary assets is their price. For, recalling that the primary
asset yj is identi ed by the portfolio ej , we have

v ej = p ej = pj (24.11)

Note that it is the frictionless nature of the market that ensures the linearity of the value
function. For instance, if there are transaction costs and so the price of asset yj depends on
the traded quantity { e.g., v 2ej < 2pj { then the value function is no longer linear.
2
Since there are no restrictions to trade, and so it is possible to go long or short on assets, to be precise
v (x) is actually a cost if positive, but a bene t if negative.
748 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

24.6.3 Law of one price


The law of one price is a fundamental property of a nancial market.

De nition 1139 The nancial market (L; p) satis es the law of one price (LOP) if, for all
portfolios x; x0 2 Rn ,
R (x) = R x0 =) v (x) = v x0 (24.12)

In words, portfolios that induce the same contingent claims must share the same market
value. Indeed, the contingent claims that they determine is all that matters in portfolios,
which are just instruments to achieve them. If two portfolios inducing the same contingent
claim had di erent market values, a (sure) saving opportunity would be missed in the market.
The LOP requires that the nancial market takes advantage of any such opportunity.
Since W = Im R, we have R (x) = R (x0 ) if and only if x; x0 2 R 1 (w) for some w 2 W .
The LOP can be then equivalently stated as follows: given any replicable claim w 2 W ,

x; x0 2 R 1
(w) =) v (x) = v x0 (24.13)

All portfolios x that replicate a contingent claim w thus share the same value v (x). It is
then natural to regard such common value as the price of the claim.

De nition 1140 The price pw of a replicable contingent claim w 2 W is the value of a


replicating portfolio x 2 R 1 (w), that is, pw = v (x) where w = R (x).

In words, pw is the market cost v (x) incurred today to form a portfolio x that tomorrow
will ensure the contingent claim w, that is, w = R (x). By the form (24.13) of the LOP, the
de nition is well posed: it is immaterial which speci c replicating portfolio x is considered
to determine price pw . The LOP thus permits to price all replicable claims.
For primary assets we get back to (24.11), that is, pj = v ej . In general, we have
Xn
1
pw = v (x) = pj xj 8x 2 R (w)
j=1

The price of a contingent claim in the market is thus the linear combination of the prices of
the primary assets held in any replicating portfolio, weighted according to assets' weights in
such portfolio.

Example 1141 (i) The portfolio x = (c; :::; c) consisting


Pn of c units of each primary
Pn asset
replicates the contingent claim w = R (x) = c j=1 yj . We have pw = c j=1 pj . (ii)
The portfolio x = (p1 ; :::; pn ), in which the holding of each primary P asset is proportional
n
to its market price, replicates the contingent claim w = R (x) = j=1 pj yj . We have
Pn
pw = j=1 pj .2 N

In sum, the LOP makes it possible to establish a rst pricing formula


Xn
pw = pj xj 8x 2 R 1 (w) (24.14)
j=1

which permits to price all contingent claims in the market, starting from the market prices
of primary assets.
24.6. A FINANCE ILLUSTRATION 749

24.6.4 Pricing rules


In a market that satis es the LOP, the previous de nition permits to de ne the pricing rule
f : W ! R as the function that associates to each replicable contingent claim w 2 W its
price pw , that is,
f (w) = pw
The next result is a fundamental consequence of the LOP.

Theorem 1142 Suppose the nancial market (L; p) satis es the LOP. Then, the pricing
rule f : W ! R is linear.

Proof First observe that, by the LOP, v = f R, that is, v (x) = f (R (x)) for each x 2 Rn .
Let us prove the linearity of f . Let w; w0 2 W and ; 2 R. We want to show that
f ( w + w0 ) = f (w) + f (w0 ). Since W = Im R, there exist vectors x; x0 2 Rn such that
R (x) = w and R (x0 ) = w0 . By De nition 1140, pw = v (x) and pw0 = v (x0 ). By the linearity
of R and v, we then have

f w + w0 = f R (x) + R x0 =f R x + x0 =v x + x0
= v (x) + v x0 = pw + pw0 = f (w) + f w0

The function f : W ! R is thus linear on W .

The fact that the linearity of the pricing rule characterizes the (frictionless) nancial
markets in which the LOP holds is a remarkable result, upon which modern asset pricing
theory relies. It permits to price all contingent claims in the market in terms of other
contingent claims, thus generalizing formula (24.14). For, suppose a contingent claim w
can
Xmbe written as a linear combination of some replicable contingent claims, that is, w =
j wj . Then w is replicable, with
j=1

Xm Xm Xm
pw = f (w) = f j wj = jf (wj ) = j pw j (24.15)
j=1 j=1 j=1

Formula (24.14) is the special case where the contingent claims wj are primary assets and
their weights are the portfolio ones. In general, it may be easier (e.g., more natural from a
nancial standpoint) to express a contingent claim in terms of other contingent claims rather
in terms of primary assets. The pricing formula
Xm
pw = j pw j (24.16)
j=1

permits to price contingent claims when expressed in terms of other contingent claims.

Inspection of the proof of Theorem 1142 shows that the pricing rule inherits its linearity
from that of the value function, which in turn depends on the frictionless nature of the
nancial market. We conclude that, in the nal analysis, the pricing rule is linear because
the nancial market is frictionless. Whether or not the market is complete is, instead,
irrelevant.
750 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

24.6.5 Pricing kernels


Much more is true, however. Indeed, the Theorem of Riesz (in its version for subspaces,
Theorem 762, since the market W is not necessarily complete) leads to the following key
representation result for the pricing rule.

Theorem 1143 Suppose the nancial market (L; p) satis es the LOP. Then, there exists a
unique vector 2 W such that

f (w) = w 8w 2 W (24.17)

Proof By Theorem 1142, the function f : Rk ! R is linear on W . By Theorem 762, there


exists a unique vector 2 W such that f (w) = w for every w 2 W .

The representing vector is called the pricing kernel. When the market is complete,
2 Rk . In this case we have i = pei where pei is the price of the Arrow contingent claim
i
e ; indeed, by (24.17)
pei = f ei = ei = i
In words, the i-th component of the pricing kernel i is the price of the Arrow contingent
claim that corresponds to state si . That is, i is the cost of having, for sure, one euro
tomorrow if state si obtains (and zero otherwise).
As a result, when the market is complete the price of a contingent claim w is the weighted
average
Xk
pw = f (w) = w= i wi (24.18)
i=1

of its payments in the di erent states, each state weighted according to how much it costs
today to have one euro tomorrow at that state. Consequently, the knowledge of the pricing
kernel (i.e., of the prices of the Arrow contingent claims) permits to price all contingent
claims in the market via the pricing formula

k
X
pw = i wi (24.19)
i=1

The earlier pricing formulas (24.14) and (24.16) require, to price each claim, the knowledge
of replicating portfolios or of prices of some other contingent claims. In contrast, the pricing
formula (24.19) requires only a single piece of information, the value of the Ppricing kernel,
to price all claims. In particular, for primary assets it takes the form pj = ki=1 i yij .

Example 1144 In the three-state economy of Example 1137, there are three Arrow con-
tingent claims e1 , e2 , and e3 . Suppose that the today market price of having tomorrow one
euro in the recession state (and zero otherwise) is higher than in the stasis state, which is in
turn higher than in the growth state, say pe1 = 3, pe2 = 2, and pe3 = 1. Then, the pricing
kernel is = (3; 2; 1) and the pricing formula (24.19) becomes pw = 3w1 + 2w2 + w3 for all
w 2 W . For instance, the price of the contingent claim w = (2; 1; 4) is pw = 12. N
24.6. A FINANCE ILLUSTRATION 751

24.6.6 Arbitrage
A portfolio x 2 Rn is an arbitrage if either of the following conditions holds

Yx 0 Yx>0
I ; II
p x<0 p x 0

A portfolio that satis es condition I has a strictly negative market value and, nevertheless,
ensures a positive payment in all states. On the other hand, a portfolio that satis es condition
II has a negative market value and, nevertheless, a strictly positive payo in at least some
states. Well-functioning nancial markets should be able to take advantage of any such
opportunity of a sure gain, and so they should feature no arbitrage portfolios.
In this section we will study such well-functioning markets. In particular, in a market
without arbitrages I we have:

I R (x) 0 =) v (x) 0 8x 2 Rn (24.20)

while without arbitrages II we have:

II R (x) > 0 =) v (x) > 0 8x 2 Rn (24.21)

The rst no arbitrage condition is enough to ensure that the market satis es the LOP.

Lemma 1145 A nancial market (L; p) that has no arbitrages I satis es the LOP.

Proof By applying (24.20) to the portfolio x, we have

R (x) 0 =) v (x) 0 8x 2 Rn

that is,
R (x) 0 =) v (x) 0 8x 2 Rn
Along with (24.20), this implies

R (x) = 0 =) v (x) = 0 8x 2 Rn

Let x and x0 be two portfolios such that R (x) = R (x0 ). The linearity of R implies
R (x x0 ) = 0, and so v (x0 x) = 0, i.e., v (x0 ) = v (x) by the linearity of v.

Consider a complete market, that is, W = Rk . Thanks to the lemma, the no arbitrage
condition (24.20) implies that contingent claims are priced according to the formula (24.17).
But much more is true: under this no arbitrage condition the vector is positive, and so the
pricing rule becomes linear and increasing. Better claims command higher market prices.

Proposition 1146 A complete nancial market (L; p), with p 6= 0, satis es the no arbitrage
condition (24.20) if and only if the pricing rule is linear and increasing, that is, there exists
a unique vector 2 Rk+ such that

f (w) = w 8w 2 W (24.22)
752 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

Proof \If". Let R (x) 0. Then, v (x) = f (R (x)) = R (x) 0 since 0 by hypothesis.
\Only if". Since the market is complete, we have W = Im R = Rk . By Lemma 1145, the
LOP holds and so f is linear (Proposition 1142). We need to show that f is increasing. Since
f is linear, this amounts to show that it is positive, i.e., that w 0 implies f (w) 0. Let
w 2 Rk with w 0. Being Im R = Rk , there exists x 2 Rn such that R (x) = w. We thus
have R (x) = w 0, and so (24.20) implies v (x) 0 because of the no arbitrage condition.
Hence, f (w) = f (R (x)) = v (x) 0. We conclude that the linear function f is positive, and
so increasing. By the Riesz-Markov Theorem, there exists a unique positive vector 2 Rk+
such that f (z) = z for every z 2 Rk .

The result becomes sharper when the market also satis es the second no arbitrage condi-
tion (24.21): the vector then becomes strictly positive, so that the pricing rule gets linear
and strictly increasing. Strictly better claims thus command strictly higher market prices.
As the no arbitrage conditions (24.20) and (24.21) are both compelling, a well-functioning
market should actually satisfy both of them. We thus have the following important result
(as its demanding name shows).3

Theorem 1147 (Fundamental Theorem of Finance) A complete nancial market (L; p),
with p 6= 0, satis es the no arbitrage conditions (24.20) and (24.21) if and only if the pricing
rule is linear and strictly increasing, that is, there exists a unique vector 2 Rk++ such that

f (w) = w 8w 2 W (24.23)

Proof \If". Let R (x) > 0. Then, v (x) = f (R (x)) = R (x) > 0 because 0 by
hypothesis. \Only if." By Proposition 1146, f is linear and increasing. We need to show
that f is strictly increasing. Since f is linear, this amounts to show that is strictly positive,
i.e., that w > 0 implies f (w) > 0. Let w 2 Rk with w > 0. Being Im R = Rk , there
exists x 2 Rn such that R (x) = w. We thus have R (x) = w > 0, and so (24.21) implies
v (x) 0 because of the linearity of v. Hence, f (w) = f (R (x)) = v (x) > 0. We conclude
that the linear function f is strictly positive, and so strictly increasing. By the Riesz-Markov
Theorem, there exists a unique strongly positive vector 2 Rk++ such that f (z) = z for
every z 2 Rk .

The price of any replicable contingent claim w is thus the weighted average

k
X
pw = f (w) = w= i wi
i=1

of its payments in the di erent states, with strictly positive weights. If market prices do not
have this form, the market is not exhausting all arbitrage opportunities. Some sure gains
are still possible.

3
We refer interested readers to Cochrane (2005) and Ross (2005).
24.7. CODA MONOTONA 753

24.7 Coda monotona


24.7.1 Ideals
Recall from Section 20.1 that joins and meets permit to associate to a vector x in Rn its
positive part x+ , negative part x and modulus jxj, de ned via the formulas

x+ = x _ 0 , x = (x ^ 0) and jxj = x _ ( x)

If x is an element of a vector subspace V , it might well happen that x+ or x or jxj does


not belong to V . For instance, none of the positive vectors x+ = (0; 1), x = (1; 0) and
jxj = (1; 1) belongs to the one-dimensional vector subspace in R2 generated by the vector
x = ( 1; 1). Yet, in view of the properties in Propositions 915 and 916, if a vector subspace
V is closed under only one of the operations x _ y, x ^ y, x+ , x and jxj then it is closed
under all others. This is the case for the next class of vector subspaces.

De nition 1148 A vector subspace of Rn is called a Riesz subspace if it is a lattice.

A Riesz subspace thus contains all the joins x _ y and meets x ^ y of all pairs of its vectors
x and y. Therefore, it also contains the positive x+ and negative parts x as well as the
modules jxj of all its elements x.

Example 1149 Let 0 6= x 2 Rn . The one-dimensional vector subspace generated by x is a


Riesz subspace of Rn when either x 0 or x 0, that is, when x = jxj. Otherwise, it is
not { as, for example, in the case x = ( 1; 1) seen before. N

The following class of Riesz subspaces plays a special role.

De nition 1150 A vector subspace V is an ideal if, for all x 2 V ,

jyj jxj =) y 2 V 8y 2 Rn

Since x 2 V implies jxj 2 V , an ideal is a Riesz subspace. In particular, when 0 x 2 V


we have y 2 V for all 0 y x. Next we show that this is indeed a characterizing property.

Proposition 1151 A Riesz subspace V is an ideal if and only if, for all x 2 V ,

0 y x =) y 2 V 8y 2 Rn

In other words, a Riesz subspace is an ideal if and only if it contains all intervals [0; x]
determined by its positive elements x.

Proof The condition is clearly necessary, as previously discussed. As to su ciency, let


jyj jxj with x 2 V . If V is a Riesz subspace, we have jxj 2 V . Moreover, y + jyj jxj as
well as y jyj jxj. That is, both y + and y belong to [0; jxj]. Consequently, they belong
to V and so y = y + y 2 V .
754 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

Example 1152 Fix I f1; :::; ng and let

VI = fx 2 Rn : 8i 2 I; xi = 0g

with V; = Rn . This vector subspace is an ideal. Indeed, let 0 x 2 VI . If 0 y x, we


clearly have yi = 0 for all i 2 I and so y 2 VI . By Proposition 1151, we conclude that VI is
an ideal. N

The ideal VI is easily seen to be isomorphic to RjIj . This is actually the general form of
an ideal, as next we show.

Proposition 1153 A Riesz subspace V is an ideal if and only if there is an index set
I f1; :::; ng such that V = VI .

This result implies, inter alia, that Rn is the only ideal that contains non-zero constant
vectors.

Proof In view of the last example, we just need the prove the \only if". Let V be an
ideal in Rn . If V = Rn we have V = V; . So, assume that V =
6 Rn . For each x 2 V set
I (x) = fi 2 f1; :::; ng : xi = 0g.

Claim For each x 2 V there exists 0 y 2 V such that I (x) = I (y). Moreover, for any
nite family fxi gki=1 of positive vectors in V there exists 0 x 2 V such that
k
\
I (xi ) = I (x) (24.24)
i=1

6 k 2 R, let kI(x) be the vector in Rn which is 0 if


Proof of the Claim For each scalar 0 =
i 2 I and k otherwise, i.e., (
0 if i 2 I
kI(x) =
k if i 2=I
For > 0 large enough, we have kI(x) jxj = j xj. Since x belongs to the ideal V ,
we conclude that kI(x) 2 V . Clearly, I (x) = I x + kI(x) . For k > 0 large enough, we have
x + kI(x) 0. By setting y = x + kI(x) , the rst part of the claim is then proved. As to
k
_
(24.24), it is enough to set x = xi .
i=1

Let 0 x 2 V . If x 0, then for each versor ei there is i > 0 small enough so that
ei
i 2 [0; x]. In turn, this implies span [0; x] = Rn and so V = Rn , a contradiction. Thus,
I (x) 6= ; for all 0 x 2 V . By the Claim, I (x) 6= ; for all x 2 V . Set
\
I= I (x) = fi 2 f1; :::; ng : 9x 2 V; xi = 0g
x2V
\
Since 0 2 V , this intersection is non-empty. Moreover, by the Claim we have I = I (x).
0 x2V
Since there are at most nitely many distinct sets fI (x) : 0 x 2 V g, by (24.24) there exists
0 xI 2 V such that I (xI ) = I.
24.7. CODA MONOTONA 755

It remains to show that V = VI . Clearly, V VI . To prove the converse inclusion, let


x 2 VI . For a large enough > 0 we have jxj xI because fi 2 f1; :::; ng : xi 6= 0g
I c = i 2 f1; :::; ng : xIi > 0 . Since xI belongs to the ideal V , it follows that x 2 V . We
conclude that V = VI .

24.7.2 Positive projections


The order structure of Rn permits to introduce an order-theoretic notion of orthogonality.

De nition 1154 Two vectors x and y in Rn are disjoint, written x > y, if jxj ^ jyj = 0.

Clearly, > is a symmetric relation: we have x > y if and only if y > x. We can naturally
extend disjointness to pairs of sets by requiring that all their elements be pairwise disjoint.
In particular, two vector subspaces V1 and V2 are disjoint when x1 > x2 for all x1 2 V1 and
all x2 2 V2 . This notion provides an order-theoretic angle on direct sums of ideals.

Proposition 1155 Two ideals V1 and V2 are disjoint if and only if V1 \ V2 = f0g.

Proof \If". Suppose that V1 \ V2 = f0g. Let x1 2 V1 and x2 2 V2 . We have jx1 j 2 V1 and
jx2 j 2 V2 . Since jx1 j ^ jx2 j jx1 j and jx1 j ^ jx2 j jx2 j, we have jx1 j ^ jx2 j 2 V1 \ V2 because
V1 and V2 are ideals. Hence, jx1 j ^ jx2 j = 0.
\Only if". Suppose that V1 and V2 are disjoint. Let x 2 V1 \ V2 . Then, jxj = jxj ^ jxj = 0
and so x = 0.

We can now establish the monotonicity of projections, a key order-theoretic property.

Proposition 1156 Let V1 and V2 be two disjoint ideals. Then,

x y =) PV1 (x) PV1 (y) and PV2 (x) PV2 (y)

for all x; y 2 V1 V2 .

This result implies that, for each x 0 that belongs to V1 V2 , in the unique decomposi-
tion x = y1 + y2 we have y1 0 and y2 0. The direct sum thus acquires an order-theoretic
nature.

Proof We prove the result for PV1 (the argument for PV2 is similar). Since PV1 is a linear
operator, it is enough to prove that it is positive (Proposition 650). Let x 0. Since
x PV1 (x) = PV2 (x) 2 V2 , we have jx PV1 (x)j ^ jPV1 (x)j = 0. If 0 = jPV1 (x)j
jx PV1 (x)j, then PV1 (x) = 0 and so PV1 (x) = x 0. If 0 = jx PV1 (x)j jPV1 (x)j, then
PV1 (x) x = 0 and so so PV1 (x) = x 0. We conclude that the linear operator PV1 is
positive, as desired.

This result has an immediate, yet important, consequence.

Corollary 1157 Let V be an ideal. Then,

x y =) PV (x) PV (y)

for all x; y 2 Rn .
756 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

When V is an ideal we can thus uniquely decompose each 0 x 2 Rn as

x=y+z (24.25)

with 0 y 2 V and 0 z 2 V ? . It is an order version of the earlier, purely algebraic,


decomposition (24.3).
This corollary has an interesting converse that shows that ideals are needed when dealing
with positive projections.

Proposition 1158 Let V be a vector subspace. If the projections PV and PV ? are positive,
then V and V ? are ideals.

Proof We only prove that V is an ideal, the argument for V ? being similar. Given x 2 V ,
let y 2 Rn be such that 0 y x. In view of Proposition 1151, we need to show that y 2 V .
Set z = x y 0. It holds

x = y + z = PV (y) + PV ? (y) + PV (z) + PV ? (z)

and so
PV ? (y) + PV ? (z) = x (PV (y) + PV (z)) 2 V \ V ? = f0g
| {z } | {z }
2V ? 2V

Thus, PV ? (y) + PV ? (z) = 0. Since PV ? (y) and PV ? (z) are both positive, we then have
PV ? (y) = PV ? (z) = 0. In turn, this implies y = PV (y) 2 V , as desired. We conclude that
V is an ideal.

24.7.3 The ultimate Riesz-Markov


The positive cone V+ = fx 2 V : x 0g of a vector subspace V is the collection of its positive
elements. It has the dual cone

V+0 = fx 2 Rn : 8y 2 V+ ; x y 0g

In view of the Riesz's Theorem, the dual cone is isomorphic to the collection

f 2 V 0 : 8y 2 V+ ; f (y) 0

of linear functions f : Rn ! R that are positive on V+ . This observation motivates the


\dual" terminology. In a similar spirit, the set

f 2 V 0 : 80 6= y 2 V+ ; f (y) > 0

of linear functions f : Rn ! R that are strictly positive on the non-zero elements of V+ is


isomorphic to the set
0
V++ = fx 2 Rn : 80 6= y 2 V+ ; x y > 0g

Clearly, when V = Rn we have V+ = Rn+ and V++ = Rn++ .

De nition 1159 Let x; y 2 Rn . We write:


24.7. CODA MONOTONA 757

(i) x y if x y 2 V+0 ;

(ii) x > y if x y and x y;4

(iii) x y if x 0 .
y 2 V++

In other words, we have:

(i) x y if x z y z for all z 2 V+ ;

(ii) x > y if x y and x z > y z for some z 2 V+ ;

(iii) x y if x z > y z for all 0 < z 2 V+ .

We call dual orders these binary relations. They are transitive and, in particular, is
a preorder (i.e., it is re exive and transitive). It is immediate to see that subspaces with
larger positive cones induce coarser dual orders. In particular, the standard orders on Rn
can be seen as the dual orders induced by Rn+ , the largest positive cone, and so are coarser
than all other dual orders:

x y =) x y ; x > y =) x > y ; x y =) x y

for all x; y 2 Rn . Next we give an extremal characterization of dual orders based on the
Minkowski Theorem. Here
( n
)
X
(V ) = x 2 V+ : xi = 1
i=1

is the simplex of subspace V , which is non-empty when V+ 6= f0g. It reduces to the standard
simplex n 1 when V = Rn .

Proposition 1160 Let V be a subspace with V+ 6= f0g. For all x; y 2 Rn we have

(i) x y if and only if x z y z for all z 2 ext (V );

(ii) x > y if and only if x y and x z > y z for some z 2 ext (V );

(iii) x y if and only if x z > y z for all z 2 ext (V ).

Proof (i) We prove only the \if", the converse being trivial. Assume that x z y z for
all zP2 ext (V ). By the Minkowski Theorem, each z 2 (V ) is a convex combination
z = ni=1 i zi of extreme points zi of the simplex (V ). Thus,
n
X n
X n
X n
X
x z y z () x i zi y i zi () i (x zi ) i (y zi )
i=1 i=1 i=1 i=1

This implies that x z y z for all z 2 (V ). In turn, this readily implies x z y z for
all z 2 V+ , that is, x y. This completes the proof of (i). The other points are similarly
proved.
4
We write x y if both x y and y x.
758 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

Dual orders take a familiar form when V is an ideal. To show this, we call support of a
vector subspace V of Rn , written supp V , the set

fi 2 f1; :::; ng : 9x 2 V; xi 6= 0g

A vector space has full support when supp V = f1; :::; ng. By Proposition 1153, a Riesz
subspace V does not have a full support if and only if it is an ideal distinct from Rn : indeed,
supp V is the complement I c of the index set I.

Lemma 1161 Let V be an ideal. Then, for all x; y 2 Rn we have:

(i) x y if and only if xi yi for all i 2 supp V ;


(ii) x > y if and only if x y and xi > yi for at least some i 2 supp V ;
(iii) x y if xi > yi for all i 2 supp V .

In particular, when V = Rn the dual orders reduce to the standard ones ( ; >; ).

Proof In view of Proposition 1160, it is enough to observe that the simplex of the ideal V
is (V ) = ei : i 2 supp V , as easily checked.

Next we further illustrate the dual orders with a couple of one-dimensional vector spaces.

Example 1162 (i) Let V be the vector subspace f( ; ) : 2 Rg of the plane R2 gener-
ated by the point (1; 1). Its positive cone V+ = f0g is trivial and so for all vectors x; y 2 R2
we have x y. Hence, for no x; y 2 R2 we have x > y, let alone x y. These two partial
orders are here empty and so have no bite.
(ii) Let V be the vector subspace f( ; ) : 2 Rg of the plane R2 generated by the point
(1; 1), graphically the 45 degree line. Its positive cone is V+ = f( ; ) : 0g. In particular,
we have
x y () x1 + x2 y1 + y2
for all x; y 2 R2 because, being (V ) = f(1=2; 1=2)g,
1 1 1 1
x y () (x1 ; x2 ) ; (y1 ; y2 ) ;
2 2 2 2
1 1 1 1
() x1 + x2 y1 + y2 () x1 + x2 y1 + y2
2 2 2 2
Similarly, we have
x > y () x y () x1 + x2 > y1 + y2
for all x; y 2 R2 . N

Next we state and prove the ultimate version of the Riesz-Markov's Theorem.

Theorem 1163 (Riesz-Markov) Let V be a vector subspace. A function f : V ! R is


linear and increasing if and only if there is a unique vector 0 in V such that

f (x) = x 8x 2 V

If, in addition, V+ is non-trivial, then


24.7. CODA MONOTONA 759

(i) > 0 if and only if f is strongly increasing;

(ii) 0 if and only if f is strictly increasing;

(iii) 2 (V ) if and only if f (PV (1)) = 1.

Before proving this result, we illustrate it using the vector subspaces discussed in the last
example.

Example 1164 (i) If f : V ! R is an increasing linear function de ned on an ideal, then


there exists a unique 2 V , with i 0 for all i 2 supp V , such that f (x) = x for all
x 2 V . It is easily seen to be given by
(
f ei if i 2 supp V
i =
0 else

If i > 0 for all i 2 supp V , the function f is strictly increasing, while it is strongly increasing
if i > 0 for at least some i 2 supp V . Finally, since
(
1 if i 2 supp V
PV (1) =
0 else

we have 2 (V ) if and only if f (PV (1)) = 1.


(ii) The only linear and increasing function f : V ! R de ned on the vector subspace
V = f( ; ) : 2 Rg of R2 is the zero function f (x) = 0 for all x 2 V . The zero vector
= 0 is the unique vector 0 in V such that f (x) = x for all x 2 V .5
(iii) If f : V ! R is a linear and increasing function de ned on the vector subspace
V = f( ; ) : 2 Rg of R2 , the unique vector 0 in V such that f (x) = x for all
x 2 V is given by
f (1; 1) f (1; 1)
= ; (24.26)
2 2
In particular, we have f (x) = f (1; 1) (x1 + x2 ) =2 for all x 2 V .6 If f (1; 1) 6= 0, the function
f is strictly (so, strongly) increasing. This is the case under the normalization f (1; 1) = 1,
for which we have = (1=2; 1=2), in accordance with the Riesz-Markov Theorem since
(V ) = f(1=2; 1=2)g. N

The proof of the Riesz-Markov Theorem relies on a lemma of independent interest that
generalizes Corollary 1157.

Lemma 1165 Let V be a vector subspace. Then,

x y =) PV (x) PV (y) (24.27)

for all x; y 2 Rn .
5
For each 2 R2 with 1 = 2, it holds f (x) = x for all x 2 V . Among them, = 0 is the only one
that belongs to V .
6
For each 2 R2 with 1 + 2 = f (1; 1), it holds f (x) = x for all x 2 V . Among them, (24.26) is the
only one that belongs to V .
760 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS

Proof Let x 2 V+ . For each y 2 V+ we have 0 y x = y (PV (x) + PV ? (x)) = y PV (x).


Since y is arbitrary, we conclude that PV (x) 0. Since PV is linear, this implies (24.27).

Proof of Riesz-Markov's Theorem Let f be an increasing linear function. By Proposition


765, there exists 2 Rn+ such that f (x) = x for all x 2 V . Set = PV ( ). By the last
lemma, 0. As to uniqueness, let 0 2 V be such that f (x) = 0 x for all x 2 V . Then,
( 0 ) x = 0 for all x 2 V and so 0 2 V ? . In turn, this implies 0 2 V ? \V = f0g,
0
i.e., = . Hence, is unique.
As to the converse, let f (x) = x for all x 2 V , with 0. Let x; y 2 V with x y.
Since 0, we have (x y) 0 because x y 2 V+ and so f (x) f (y). We conclude
that f is increasing.
(i) \If". Let f be a strongly increasing linear function. By (i), f (x) = x for all x 2 V ,
with 0. We then have x = f (x) > 0 for all 0 x 2 V , and so > 0.
\Only if". Let f (x) = x for all x 2 V , with > 0. There exists x 2 V+ such that
x > 0. Let x 0. There exists > 0 such that x x. Hence, ( x x) 0, which
implies x x > 0, so that f (x) = x > 0. We conclude that f is strongly increasing.
(ii) \If". Let f be a strictly increasing linear function. By (i), f (x) = x for all x 2 V ,
with 0. We then have x = f (x) > 0 for all 0 < x 2 V . Hence, 0.
\Only if". Let f (x) = x for all x 2 V , with 0. Then, x > 0 for all 0 < x 2 V
and so, for all x; y 2 V , x > y implies f (x) = x > y = f (y). Thus, f is strictly
increasing.
(iii) \If". Let f be an increasing linear function with f (PV (1)) = 1. By (i), f (x) = x
for all x 2 V , with 0. Then, since 2 V we have
n
X
1 = f (PV (1)) = (PV (1) + PV ? (1)) = 1= i
i=1

Hence, 2 (V ).
\Only if". Let f (x) = x for all x 2 V , with 2 (V ). Then, since 2 V we have
n
X
f (PV (1)) = (PV (1) + PV ? (1)) = 1= i =1
i=1

as desired.
Chapter 25

Forms and spectra

In this chapter we consider two important topics, eigenvalues of symmetric matrices and
quadratic forms. The latter are an important class of functions that plays a key role in
optimization as it will be seen later in the book. The rst topic is also instrumental to the
second one, so we begin by studying eigenvalues.

25.1 Spectra
25.1.1 Eigenvalues
De nition 1166 Let A be a symmetric matrix of order n. A scalar 2 R is called eigen-
value of A and a vector 0 6= x 2 Rn is called eigenvector of A if they jointly solve the
equation
Ax = x (25.1)

An eigenvalue and eigenvector pair ( ; x) of A is called eigenpair of A. The collection of


all eigenvalues of A is called spectrum of A and is denoted by (A).
Since Ax x = (A I) x, the equation (25.1) can be written as an homogeneous linear
system
(A I) x = 0
This simple remark underlies a basic characterization of eigenvalues.

Proposition 1167 Let A be a symmetric matrix. A scalar 2 R is an eigenvalue of A if


and only if it makes singular the matrix A I, that is,

det (A I) = 0 (25.2)

We can thus write (A) = f 2 R : det (A I) = 0g.

Proof \Only if". Let 2 (A). So, there is 0 6= x 2 Rn such that (A I) x = 0, which
in turn implies that the matrix A I is singular, that is, det (A I) = 0.
\If". Suppose 2 R is such that det (A I) = 0. Then A I is singular, so there
exists 0 6= x 2 Rn such that (A I) x = 0, that is, such that Ax = x. We conclude that
2 (A).

761
762 CHAPTER 25. FORMS AND SPECTRA

To nd an eigenpair ( ; x), one can rst nd the eigenvalue by solving (25.2) and then
nd the eigenvector x by solving the homogeneous linear system (A I) x = 0.

Example 1168 For n = 2, we have


a11 a12
A I=
a21 a22
and so
2
det (A I) = (a11 ) (a22 ) a12 a21 = (a11 + a22 ) + a11 a22 a12 a21
2
= tr A + det A

where tr A = a11 + a22 is the trace of the


P matrix. In general, the trace of a matrix is the
sum of the diagonal entries, i.e., tr A = ni=1 aii .
For instance, for the symmetric matrix
1 2
A=
2 3
we have
1 2
A I=
2 3
and
2
det (A I) = (1 ) (3 ) 4= 4 1
p
Since the solutions
p ofpthis second-order equation are 2 5, by Proposition 1167 we have
(A) = 2 + 5; 2 5 . To nd the two eigenvectors, we need to solve the two systems
( p
1 + 5 x1 + 2x2 = 0
p
2x1 + 1 5 x2 = 0
and p
1+ 5 p x1 + 2x2 = 0
2x1 + 1 + 5 x2 = 0
Fromp the rst system, one easily sees that the eigenvectors associated to the eigenvalue
2 + 5 are ( p ! )
5 1
; 1 : 0 6= 2 R
2
p
while those to the eigenvalue 2 5 are
( p ! )
5 1
; 1 : 0 6= 2 R
2

This example shows two things: (i) there can be multiple eigenvalues; (ii) to each such
eigenvalue there may correspond multiple eigenvectors. The latter point is best clari ed by
the next result.
25.1. SPECTRA 763

Proposition 1169 Let A be a symmetric matrix of order n. If 2 (A), then the collection

W = fx 2 Rn : (A I) x = 0g

is a vector subspace of Rn , called eigenspace.

The eigenvectors associated to an eigenvalue are thus the non-zero elements of the
eigenspace W .

Proof Let x; x0 2 W and ; 2 R. In view of Proposition 64, it is enough to show that


x + x0 2 W . We have

(A I) x + x0 = (A I) x + (A I) x0 = 0

and so x + x0 2 W , as desired.

Example 1170 In the last example, we have


( p ! )
5 1
W2+p5 = ;1 : 2R
2

and ( p ! )
p
5 1
W2 5 = ;1 : 2R
2
Both eigenspaces have dimension 1 (geometrically, they are the straight lines determined by
any eigenvector). N

Each eigenvalue induces a linear operator T : Rn ! Rn de ned by T (x) = (A I) x.


The eigenspace W is just the kernel of T , i.e., W = ker T . If we denote by the dimension
of W and by that of the image subspace Im T , by the Rank-Nullity Theorem we have

+ =n

Clearly, the dimension of the eigenspace represents the number of linear independent
eigenvectors associated with 2 (A). Since the matrix A I is singular, we thus have
1 n. In the last example, 2+p5 = 2 p5 = 1.

Example 1171 We can have = n. In fact, assume A = I. It is easy to check that =1


is the only eigenvalue. Indeed:

0 = det (I I) = det ((1 ) I) = (1 )n det I = (1 )n =) =1

where the third equality follows from Proposition 717. This implies that T1 = I I = O.
n
Therefore, W1 = ker O = R and so 1 = n. The same conclusion holds when A = O. It is
easy to check that = 0 is the only eigenvalue. Indeed:

0 = det (O I) = det ( I) = ( )n det I = ( 1)n n


=) =0

where, again, the third equality follows from Proposition 717. This implies that T0 = O
I = O. Therefore, W0 = ker O = Rn and so 0 = n. N
764 CHAPTER 25. FORMS AND SPECTRA

Eigenvectors associated to distinct eigenvalues are orthogonal. It is a remarkable property


with a remarkable consequence: since orthogonal eigenvectors are independent (Proposition
116), there exist at most n distinct eigenvalues; i.e., j (A)j n.

Proposition 1172 Let A be a symmetric matrix of order n. For all eigenpairs ( ; x) and
0 0
; x of A, we have
6= 0 =) x ? x0
0 0
Proof Let ; 2 (A), with 6= . We have
Ax x0 = x x0 = x x0
as well as
Ax x0 = x AT x0 = x Ax0 = x 0 0
x = 0
x x0
0 0
So, (x x0 ) = (x x0 ), which implies x x0 = 0 because 6= .

Having established that there are at most n eigenvalues, it remains to address the most
basic question: do eigenvalues exist? Formally, is the spectrum (A) non-empty? To address
this question we introduce an important notion.

De nition 1173 Let A be a symmetric matrix of order n. The characteristic polynomial


pA : R ! R of A is de ned by
pA (t) = det (tI A)

A scalar is thus an eigenvalue of A if and only if it is a real solution of the polynomial


equation
pA (t) = 0
which is called characteristic equation of A. Indeed:
pA (t) = 0 () det (tI A) = 0 () det (( 1) (A tI)) = 0
n
() ( 1) det (A tI) = 0 () det (A tI) = 0
So, the number of eigenvalues of A is equal to the number of the distinct solutions of the
characteristic equation of A. It becomes then important to understand what is the degree
of the characteristic polynomial.
For n = 2, from Example 1168 we deduce that the characteristic polynomial has degree
2 and is given by
pA (t) = t2 t tr A + det A (25.3)
Remarkably, both the trace and the determinant of A appear. As to the latter, we have
pA (0) = det ( A) = ( 1)n det A
and so the constant coe cient of the characteristic polynomial is indeed ( 1)n det A.
Another noteworthy feature of the polynomial (25.3) is that its roots are all real.1 Indeed,
the discriminant of the second-order equation t2 t tr A+det A = 0 is (a11 a22 )2 +4a212 0.
The next result generalizes what we just saw for n = 2. In particular, the trace is the
coe cient of tn 1 . We omit the proof, which is a non-trivial elaboration based on (15.27).
1
Recall from high school that the roots a polynomial f are the solutions of the polynomial equation
f (x) = 0.
25.1. SPECTRA 765

Theorem 1174 The characteristic polynomial pA : R ! R of a symmetric matrix A of


order n has degree n and is given by

pA (t) = tn + n 1t
n 1
+ + ( 1)n det A

where n 1 = tr A. Its roots are all real.

In view of this result we conclude that 1 j (A)j n because the characteristic poly-
nomial has at least 1 and at most n roots, as some roots can be repeated, as well-known.
In particular, denote by m ( ) the multiplicity of the eigenvalue 2 (A), that is, its
multiplicity as a root of the characteristic polynomial. We then have
X
m( ) = n
2 (A)

This remark paves the way to the following result.

Proposition 1175 Let A be a symmetric matrix of order n. We have

dim W = m ( )

for each 2 (A).

We omit the proof of this result. To understand its scope, we need the following important
orthogonalization result, whose proof introduces a classic orthogonalization procedure based
on the Projection Theorem.2

Proposition 1176 Let S be a set of k linearly independent vectors of Rn . There exists a


set of k orthonormal vectors S~ of Rn such that span S = span S.
~

So, we can always generate a vector subspace through an orthonormal basis. In partic-
ular, Proposition 1175 then implies that there exists an orthonormal basis of Rn formed by
eigenvectors of a symmetric matrix.3 A such basis is called an eigenbasis.

Example 1177 For the symmetric matrix


2 3
0 2 2
A=4 2 0 1 5
2 1 0
we have
1 1p 1 1p
(A) = 1; 33; + 33
2 2 2 2
The eigenvectors
1 1p 1 1p
(0; 1; 1) ; 33; 1; 1 ; + 33; 1; 1
4 4 4 4
form a basis of R3 that, once normalized (by diving each vector by its norm), becomes an
eigenbasis. N
2
Recall from Section 4.2.2 that an orthogonal set composed of unit vectors is called orthonormal.
3
Eigenvectors can always be normalized: if x is an eigenvector, so is x= kxk.
766 CHAPTER 25. FORMS AND SPECTRA

Example 1178 Consider the symmetric matrix


2 7 2
p 3
3 3 2 0
6 p 7
A=4 2
2 5
0 5
3 3
0 0 3

We have (A) = f1; 3g with m (1) = 1 and m (3) = 2. The normalized eigenvector associated
to the eigenvalue 1 is
! r !
p1
2 1 1 2
1
p ; 1p ; 0 = p ; ;0
2 6 2 6 3 3

Normalized eigenvectors associated to the eigenvalue 2 are


p ! r !
2 1 2 1
(0; 0; 1) and p ; p ;0 = ; p ;0
3 3 3 3

So, ( ! )
r
1 2
W1 = p ; ;0 : 2R
3 3

and ( r ! )
1 2
W3 = (0; 0; 1) + p ; ;0 : ; 2R
3 3

We have dim W1 = 1 and dim W3 = 2. The vectors


r ! r !
1 2 2 1
p ; ; 0 ; (0; 0; 1) ; ; p ;0
3 3 3 3

form an eigenbasis of R3 . N

Proof of Proposition 1176 Let S = fx1 ; :::; xk g be a set of k linearly independent vectors
of Rn . We can turn S into an orthonormal basis of V via the so-called Gram-Schmidt
orthonormalization. De ne a family S~ = f~ x1 ; :::; x
~k g of vectors as follows. If x2 x1 = 0, we
can just take x
~1 = x1 and x
~2 = x2 . So, suppose x2 x1 6= 0. De ne rst
x1
x
~1 =
kx1 k

To de ne x
~2 , rst consider the auxiliary vector y2 = x2 (x2 x
~1 ) x
~1 and then set
y2
x
~2 =
ky2 k

Clearly, x
~1 ; x
~2 2 span fx1 ; x2 g, so span f~
x1 ; x
~2 g span fx1 ; x2 g. Note that, since x
~1 x
~1 =
x1 k2 = 1, we have
k~
y2 = x2 Pspanf~x1 g (x2 )
25.1. SPECTRA 767

That is, we de ned y2 by subtracting from x2 its projection on the vector subspace generated
by x
~1 (cf. Example 1129). By the Projection Theorem, we then have x ~1 ? y2 and so x
~1 ? x
~2 .
This can be easily checked directly:

x2 (x2 x ~1 ) x
~1 1
x
~2 x
~1 = x
~1 = [~
x1 x2 (x2 x
~1 ) (~
x1 x
~1 )]
ky2 k ky2 k
1
= [~
x1 x2 (x2 x
~1 )] = 0
ky2 k

We also have x
~2 6= 0; otherwise
(x2 x1 )
x2 = x1
kx1 k2
and so x2 and x1 would be linear dependent. Clearly, x1 2 span f~ x1 ; x
~2 g. On the other
hand, x2 = y2 + (x2 x ~1 ) x~1 = ky2 k x
~2 + kx2 k (~
x2 x
~1 ) x
~1 , so x2 2 span f~ x1 ; x
~2 g. Thus,
span fx1 ; x2 g span f~x1 ; x
~2 g. We conclude that span fx1 ; x2 g = span f~ x1 ; x
~P2 g.
k 1
We can continue by induction till we de ne the auxiliary vector yk = xk j=1 (xk x ~j ) x
~j ,
which is non-zero because of the linear independence of the vectors in S, and then set
~k = yk = kyk k. One can then prove that the collection S~ = f~
x x1 ; :::; x
~n g so constructed is such
that span S = span S.~

If a symmetric matrix is invertible, then its inverse matrix is symmetric as well. Indeed,
T T T T
I = A 1 A = AT A 1 = A A 1 and so A 1 = A 1 . This motivates the follow-
ing elegant result that shows that the eigenvalues of the inverse matrix are the reciprocals
of the eigenvalues of the original matrix.

Proposition 1179 A symmetric matrix A is invertible if and only if all its eigenvalues are
non-zero. In this case,
1
2 (a) () 2 A 1 (25.4)

Proof To prove the equivalence, just note pA (0) = det ( A) = ( 1)n det A, so 0 2 (A) if
and only if det A = 0. It remains to prove (25.4). Let A be invertible and 2 (A). By
what just established, we have 6= 0. Let 0 6= x 2 Rn be the eigenvector associated with .
Then, Ax = x implies
1
A 1 (Ax) = x = Ax

which, in turn, implies


1 1 1 1
A x= A (Ax) = x
1
So, 1= 2 A 1 . We conclude that 2 (a) implies 1= 2 A 1 . Since A = A 1 ,
the converse implication is also true.

25.1.2 Diagonalization
We are now ready to move towards the result that motivates for us the study of eigenvalues.

De nition 1180 A square matrix B is said to be orthogonal if B T B = I.


768 CHAPTER 25. FORMS AND SPECTRA

Orthogonal matrices generalize a key feature of identity matrices, that is, that they have
orthonormal rows as well as orthonormal columns (given by the orthonormal set e1 ; :::; en ).
The next proposition clari es.

Proposition 1181 For a square matrix B, the following conditions are equivalent:

(i) B is orthogonal;

(ii) B has orthonormal rows;

(iii) B has orthonormal columns;

(iv) B is invertible, with B 1 = BT.

The proof relies on the following lemma.

Lemma 1182 A square matrix B has:

(i) orthonormal rows if and only if BB T = I;

(ii) orthonormal columns if and only if B T B = I.

Proof We prove (i) and leave (ii) to readers. First note that, by de nition,
8 n
>
> X (
Xn < b2ik if i = j kbi k2 if i = j
T T
BB ij = bik bkj = k=1 =
>
> bi bT if i 6= j
k=1 : b bT if i 6= j j
i j

\Only if". Suppose that the rows b1 ; :::; bn of B are orthonormal, i.e., bi bj = 0 for all
1 i 6= j n and kbi k2 = 1 for all i = 1; :::; n. Then
( (
kbi k2 if i = j 1 if i = j
=
bi bTj if i 6= j 0 if i 6= j

and 2 3
bj1
6 bj2 7
bi bT
j = bi1 bi2 bin 6 7 = b i bj = 0
4 5
bjn
\If". Suppose that BB T = I. Then,
8 n
( >
> X
1 if i = j < b2ik if i = j
BB T = = k=1
ij
0 if i 6= j >
>
: b bT if i 6= j
i j

n
X
and so bi bj = 0 for all 1 i 6= j n, as well as b2ik = kbi k2 = 1.
k=1
25.1. SPECTRA 769

Proof of Proposition 1181 Before starting, by Binet's Theorem recall that det B T B =
det B T det B and det BB T = det B det B T . This implies that

det B T B 6= 0 () det B 6= 0 and det B T 6= 0 () det B T B 6= 0 (25.5)

(i) implies (iv). Since B is orthogonal, we have that B T B = I. By (25.5), we can


conclude that det B T B = 1 and det B 6= 0. By Theorem 724, we have that B is invertible.
Moreover,

B T B = I =) BTB B 1
=B 1
=) B T I = B 1
=) B T = B 1

(iv) implies (ii). Since B T = B 1 , it follows that BB T = BB 1 = I. By point (i) of


Lemma 1182, we have that B has orthonormal rows.
(ii) implies (iii). By point (i) of Lemma 1182 and since B has orthonormal rows, we have
that BB T = I. By (25.5) and since BB T = I, we can conclude that det BB T = 1 and
det B 6= 0. By Theorem 724, we have that B is invertible. Moreover,

BB T = I =) B 1
BB T = B 1
=) IB T = B 1
=) B T = B 1

Since B T = B 1 , this implies that B T B = B 1B = I. By point (ii) of Lemma 1182, we


have that B has orthonormal columns.
(iii) implies (i). By point (ii) of Lemma 1182 and since B has orthonormal columns, we
have that B T B = I, that is, B is orthogonal.

Example 1183 In view of Example 117, the vectors

1 1 1 2 1 1 1 1
p ;p ;p ; p ;p ;p ; 0; p ;p
3 3 3 6 6 6 2 2
are orthonormal. So, by Proposition 1181 the matrix
2 1 1
3
p p p1
3 3 3
6 7
6 p2 p1 p1 7
A=6 6 6 6 7 (25.6)
4 5
0 p1 p1
2 2

is orthogonal. Its determinant is 1. Indeed, the determinant of any orthogonal matrix is


either 1 or 1: by Binet's Theorem

1 = det B T B = det B T det B = (det B)2

and so det B = 1. N

Example 1184 In view of Proposition 1172, a matrix of order n whose rows are normalized
eigenvectors associated to distinct n eigenvalues is orthogonal. For instance, consider the
symmetric matrix " #
13 4
5 5
A= 4 7
5 5
770 CHAPTER 25. FORMS AND SPECTRA

We have the eigenpairs ( 1 ; x1 ) = (1; ( 1=2; 1)) and ( 2 ; x2 ) = (3; (2; 1)). The normalized
eigenvectors are
!
1
x1 1 1 2
= p2 ; p = p ;p
kx1 k 5 5 5 5
2 2
x2 2 1
= p ;p
kx2 k 5 5
So, the matrix 2 3
p1 p2
5 5
B=4 5 (25.7)
p2 p1
5 5
is orthogonal. Its determinant is 1. Note that this orthogonal matrix is symmetric. N
We can now state the main result of this section.
Theorem 1185 A symmetric matrix A is orthogonally diagonalizable, that is, there exists
an orthogonal matrix B such that
B T AB =
where is the diagonal matrix that has the eigenvalues as its entries, each repeated according
to its multiplicity.
Since A = BB T ABB T = B B T , the diagonalization implies that a symmetric matrix
can be decomposed as:
A = B BT
which is a most convenient spectral decomposition of a symmetric matrix. Note that, by
Binet's Theorem
det A = (det B)2 det = det = 1 2 n

because det B = 1 (see Example 1183). So, the determinant of a symmetric matrix is the
product of its eigenvalues (with their multiplicity):
det A = 1 2 n (25.8)
With a radiation metaphor, we can think of the spectrum of the matrix A as its \X-rays"
plate: if we radiate A we would observe its skeleton . The orthogonal matrix B consists of
the \soft issues" of the matrix A that let X-rays pass through.
The next proof clari es how to construct the orthogonal matrix B through the eigenvec-
tors. So, the diagonalization is actually a joint outcome of the eigenpairs.

Proof Let A be a symmetric matrix of order n. Assume rst that it has n distinct eigenvalues
and denote by xi a normalized eigenvector associated to the eigenvalue i for i = 1; :::; n. Let
B be the matrix whose rows are normalized eigenvectors associated to distinct eigenvalues:
2 1 3
x
6 x2 7
6 7
B=6 6
7
7
4 5
xn
25.1. SPECTRA 771

We have
2 3
xi1
6 xi2 7
6 7
xi1 ; xi2 ; :::; xin A6
6
7 = xi Axi = xi
7 ix
i
= i xi xi = i xi = i
4 5
i
xn

where the last inequality holds because xi is normalized, i.e., xi = 1. Moreover, for i 6= j
we have 2 j 3
x
6 1j 7
6 x2 7
i i i 6 7
x1 ; x2 ; :::; xn A 6 7 = xi Axj = xi j xj = j xi xj = 0
6 7
4 5
j
xn

where the last inequality holds because xi ? xj when i 6= j. In turn, this implies
2 1 32 32 1 3
x1 x1n a11 a1n x1 xn1
6 x21 x2n 7 6 a2n 7 6 1 xn2 7
6 7 6 a21 7 6 x2 7
B AB = 6
T
6
76
76
76
76
7
7
4 54 54 5
x1n xnn an1 ann 1
xn xnn
2 1 1 1 2 1 n
3 2 3
x Ax x Ax x Ax 1 0 0
6 x2 Ax1 x2 Ax2 x2 Axn 7 6 0 7
6 7 6 0 2 7
= 6 .. .. .. .. 7 = 6 .. .. .. .. 7 =
4 . . . . 5 4 . . . . 5
xn Ax1 xn Ax2 xn Axn 0 0 n

as desired. Finally, if some eigenvalue are repeated, one repeats accordingly the associated
eigenvector.

The proof shows that the role of the orthogonal matrix in the diagonalization can be
played by the matrix whose columns are normalized eigenvectors associated to distinct eigen-
values. The next examples illustrate (and show that other orthogonal matrices can be also
considered).

Example 1186 As in Example 1184 consider the symmetric matrix


" 13 4 #
5 5
A= 4 7
5 5

We have (A) = f1; 3g with m (1) = m (3) = 1. By (25.8), det A = 1 3 = 3. The matrix
(25.7) of normalized eigenvectors is
2 3
p1 p2
5 5
B=4 5
p2 p1
5 5
772 CHAPTER 25. FORMS AND SPECTRA

Through it, we get the orthogonal diagonalization of the matrix A:


2 3 2 3
p1 p2 p1 p2
A=4 2
5 5
5 1 0 4 5 5
5
p p1 0 3 p2 p1
5 5 5 5

That said, a slightly di erent orthogonalization of A is


2 3 2 3
p1 p2 p1 p2
5 5 1 0 5 5
A=4 5 4 5
p2 p1 0 3 p2 p1
5 5 5 5

where now the orthogonal matrix is


2 3
p1 p2
5 5
B=4 5
p2 p1
5 5
N
Example 1187 Consider the symmetric matrix
2 7 2
p 3
3 3 2 0
6 p 7
A=6
4
2
3 2 5
3 0 7
5
0 0 3
We have (A) = f1; 3g with m (1) = 1 and m (3) = 2. By (25.8), det A = 1 3 3 = 9. In
view of Example 1178, 2 q 3
p1 0 2
6 3 3 7
6 q 7
B=6 6 2
0 p1 7
7
4 3 3 5

0 1 0
is a matrix of normalized eigenvectors. Through it, we get the orthogonal diagonalization of
matrix A: 2 q 3 2 q 3
p1 2
0 2 3 p1 0 2
6 3 3 7 1 0 0 6 3 3 7
6 7 6 q 7
A=6 6 0 0 1 7 4
7 0 3 0 6
56 2
0 p1 7
7
4 q 5 0 0 3 4 3 3 5
2 p1
3 3
0 0 1 0
A di erent orthogonal diagonalization of matrix A is:
2 1 3 2 3
p p1 p1 2 3 p1 p2 0
3 3 3 3 6
6 7 1 0 0 6 7
6 p2 p1 p 7
1 4 0 3 0 56 p1 p1 p1 7
A=6 6 6 6 7 6 3 6 2 7
4 5 4 5
0 0 3
0 p1 p1 p1 p1 p1
2 2 3 6 2
where the orthogonal matrix (25.6) appears. N
We close by noting that eigenvalues and eigenvectors can be de ned for any square matrix,
not necessarily symmetric. Yet, in this general case eigenvalues have to be allowed to take
complex values, so we do not discuss this important topic that we leave to more advanced
courses.
25.2. FORMS 773

25.2 Forms
25.2.1 Forms
A function f : Rn ! R of the form

f (x1 ; :::; xn ) = k (x1 1 x2 2 xnn )


P
with k 2 R and i 2 N, is called a monomial of degree m if ni=1 i = m. For exam-
ple, f (x1 ; x2 ) = 2x1 x2 is a monomial of second degree, while f (x1 ; x2 ; x3 ) = 5x1 x32 x43 is a
monomial of eighth degree.

De nition 1188 A function f : Rn ! R is a form if it is a sum of monomials of the same


degree.

For instance, the functions f (x1 ; x2 ; x3 ) = x81 + 5x1 x32 x43 and f (x1 ; x2 ; x3 ) = x1 x3 +
5x2 x3 + x23 are forms, while the function f (x1 ; x2 ; x3 ) = x1 x2 x3 + x1 x52 x3 is not a form.
A formPk is linear if it is the sum of monomials of rst degree, which we can write as
f (x) = i=1 ki xi . By Riesz's Theorem, linear forms are thus the linear functions.
A form is quadratic if it is the sum of monomials of second degree. For example,
f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the monomi-
als of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions are
quadratic forms:

f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3

Quadratic forms are the most important nonlinear forms. In what follows we study them
in detail.

25.2.2 Quadratic forms


There is a bijective correspondence between quadratic forms and symmetric matrices, as the
next result shows.

Proposition 1189 There is a bijective correspondence between quadratic forms f : Rn ! R


and symmetric matrices A of order n established by:4
n X
X n
f (x) = x Ax = aij xi xj 8x 2 Rn (25.9)
i=1 j=1

In other words, given a symmetric matrix A there exists a unique quadratic form
n n
f : Rn ! R for which (25.9) holds. Vice versa, given a quadratic form f : Rn ! R there
exists a unique symmetric matrix A for which (25.9) holds.
n n

4 T
To ease notation we write x Ax instead of the more precise x AxT (cf. the dicussion on vector
notation in in Section 15.2.4). So, we drop all the \T".
774 CHAPTER 25. FORMS AND SPECTRA

P P
Proof For the \if", just note that x Ax = ni=1 aii x2i +2 1 i<j n aij xi xj , so f (x) = x Ax
is a quadratic form. As to the converse, let f : Rn ! R be a quadratic form. It is easy to see
that f (x) = x Ax where aii corresponds to the coe cient of x2i and aij + aji corresponds
to the coe cient of xi xj . In particular, A is symmetric if and only if aij and aji are equal
to a half of the coe cient of xi xj . So, there is a unique symmetric matrix A for which we
have f (x) = x Ax.

The matrix A = (aij ) is called the symmetric matrix associated to the quadratic form f .
We can write (25.9) in an extended manner as

f (x) = a11 x21 + a22 x22 + a33 x23 + + ann x2n


+ 2a12 x1 x2 + 2a13 x1 x3 + + 2a1n x1 xn
+ 2a23 x2 x3 + + 2a2n x2 xn + + 2an 1n xn 1 xn

The coe cients of the squares x21 , x22 , ..., x2n are therefore the elements (a11 ; a22 ; :::ann ) of
the diagonal of A, while for every i; j = 1; 2; :::n the coe cient of the monomial xi xj is 2aij .
It is therefore simple to move from the symmetric matrix to the quadratic form and vice
versa. Let us see give some examples.

Example 1190 The symmetric matrix associated to the quadratic form f (x1 ; x2 ; x3 ) =
3x1 x3 x2 x3 is given by
2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0

Indeed, for every x 2 R3 we have:


2 3
32 3
0 0 2 x1
x Ax = (x1 ; x2 ; x3 ) 4 0 0 1 54
x2 5
2
3 1
2 2 0 x3
3 1 3 1
= (x1 ; x2 ; x3 ) x3 ; x3 ; x1 x2
2 2 2 2
3 1 3 1
= x1 x3 x2 x3 + x1 x3 x2 x3 = 3x1 x3 x2 x3
2 2 2 2
Note that also the matrices
2 3 2 3
0 0 3 0 0 0
4
A= 0 0 1 5 and 4
A= 0 0 0 5 (25.10)
0 0 0 3 1 0

are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the bijective correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (25.9) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (25.10) show: for both of them, (25.9) holds. N
25.2. FORMS 775

Example 1191 As to the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have

1 2
A=
2 1

Indeed, for every x 2 R2 we have

1 2 x1
x Ax = (x1 ; x2 ) = (x1 ; x2 ) (x1 2x2 ; 2x1 + x2 )
2 1 x2
= x21 2x1 x2 2x1 x2 + x22 = x21 + x22 4x1 x2

N
P
Example 1192 Let f : Rn ! R be de ned by f (x) = kxk2 = ni=1 x2i . The symmetric
matrix
Pn associated to this quadratic form
Pn is the2 identity matrix I. Indeed, x Ix = x x =
2
i=1 xi . More generally, let f (x) = i=1 i xi with i 2 R for every i = 1; :::; n. It is easy
to see that the symmetric matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n

Observe that if f : Rn ! R is a quadratic form, we have f (0) = 0. According to the sign


of f , it is possible to classify the quadratic forms as follows.

De nition 1193 A quadratic form f : Rn ! R is said to be:

(i) positive (negative) semi-de nite if f (x) 0( 0) for all x 2 Rn ;

(ii) positive (negative) de nite if f (x) > 0 (< 0) for all 0 6= x 2 Rn ;

(iii) inde nite if there exist x; x0 2 Rn such that f (x) < 0 < f (x0 ).

Note a basic duality: f is negative de nite (semi-de nite) if and only if f is positive
de nite (semi-de nite). So, properties established for the positive cases have dual versions
for negative ones. For this reason, only the positive case is often explicitly considered.
In view of Proposition 1189, we have a parallel classi cation for symmetric matrices that
translates that of the corresponding quadratic form. In particular, a symmetric matrix A of
order n is:

(i) positive (negative) semi-de nite if x Ax 0( 0) for all x 2 Rn ;

(ii) positive (negative) de nite if x Ax > 0 (< 0) for all 0 6= x 2 Rn .


776 CHAPTER 25. FORMS AND SPECTRA

Here the previous duality takes the following form: a symmetric matrix is negative de nite
(semi-de nite) if and only if A positive de nite (semi-de nite). So, to check whether a
matrix A is negative de nite (semi-de nite) one can check whether A is positive de nite
(semi-de nite). Criteria that establish whether a symmetric matrix is positive de nite (semi-
de nite) thus have dual versions for the negative case.

Example 1194 Consider the symmetric matrix

2 1
A=
1 12

We have x Ax = 2 (x1 + x2 =2)2 , so A is positive semi-de nite but not positive de nite. For
instance, for x = (1; 2) we have x Ax = 0. N

Next we give a rst important property of positive de nite matrices.

Proposition 1195 A positive de nite matrix is invertible. Its inverse matrix is also positive
de nite.

Proof Let A be a positive de nite matrix. Suppose, by contradiction, that it is not invertible.
So, there exists 0 6= x 2 Rn such that Ax = 0. In turn, this implies x Ax = 0, which
contradicts the positive de niteness of A.
It remains to prove that A 1 is a positive de nite matrix. Let 0 6= x 2 Rn . Set y = A 1 x,
so that 0 6= y 2 Rn . We then have
0 1
Xn Xn Xn n X
X n
x A 1x = x y = xi yi = @ aij yj A yi = aij yj yi = y Ay > 0
i=1 i=1 j=1 i=1 j=1

as desired.

The previous result can be sharpened as follows.

Proposition 1196 A positive semi-de nite matrix is invertible if and only if it is positive
de nite.

Positive de nite matrices can thus be regarded as the positive semi-de nite matrices that
are invertible. So, the semi-de nite matrices that are not positive de nite are the singular
ones (cf. Example 1194).
The proof of this remarkable property relies on the following lemma, of independent
interest.

Lemma 1197 Let A be a positive semi-de nite matrix. For each x 2 Rn , we have x Ax = 0
if and only if Ax = 0.

When A is positive semi-de nite, the homogeneous linear system Ax = 0 is thus equiva-
lent to the quadratic equation x Ax = 0.
25.2. FORMS 777

Proof We prove the \only if", the converse being trivial. Let A be a positive semi-de nite
matrix. Let x 2 Rn be such that x Ax = 0. De ne an auxiliary map p : R ! R by
p (t) = (tx + Ax) A (tx + Ax).5 Some algebra shows that

p (t) = t2 x Ax + 2t (Ax Ax) + x (AAA) x = 2t (Ax Ax) + x (AAA) x

Since A is positive semi-de nite, we have p (t) 0 for all t 2 R. So, Ax Ax = 0 because,
otherwise, we would have p (t) < 0 for some t < 0. In turn, from Ax Ax = 0 it follows that
Ax = 0, as desired.

Proof of Proposition 1196 We prove the \only if" as the converse is given by Proposition
1195. Let A be a positive semi-de nite matrix which is invertible. Let x 2 Rn be such that
x Ax = 0. By the last lemma, Ax = 0. Since A is invertible, we have x = 0. So, A is
positive de nite.

Since positive de nite matrices are invertible, their eigenvalues are non-zero (Proposition
1179). Much more is true: next we show that eigenvalues provide a key characterization of
positive de nite matrices.

Proposition 1198 A symmetric matrix is:

(i) positive de nite if and only if all its eigenvalues are strictly positive;

(ii) positive semi-de nite if and only if all its eigenvalues are positive.

The positivity of eigenvalues is thus what characterizes semi-de nite matrices among
symmetric matrices, as well as positive de nite matrices among symmetric invertible matrices
(cf. Proposition 1179). The proof of this spectral characterization relies on the orthogonal
diagonalization of symmetric matrices previously established in this chapter.

Proof We only prove (i) because the proof of (ii) is similar.6 \Only if" Suppose the sym-
metric matrix A is positive de nite. Let 2 (A). From the proof of Theorem 1185 we
T
know that = x AxT , where x is a normalized eigenvector associated to the eigenvalue
. Since x 6= 0, we have
T
= x AxT > 0
as desired. \If" Suppose that > 0 for all 2 (A). By Theorem 1185, there is an
orthogonal matrix B such that B AB = . Let 0 6= x 2 Rn . Set y T = B T x. Since B is
T

invertible, y 6= 0. Since B is orthogonal, xT = By T . Then,


n
X
T
T T T T T T T T T T 2
x Ax = By A By =y B AB y =y y = i yi >0
i=1

as desired.

Next we give a rst dividend of the spectral characterization.


5
This map generalizes the one we have seen in proving the Cauchy-Schwarz inequality. Here, we follow
Horn and Johnson (2013) p. 431.
6 T
Here, we use the formal notation x AxT rather than its shorthand version x Ax.
778 CHAPTER 25. FORMS AND SPECTRA

Corollary 1199 Let A be a symmetric matrix.

(i) If A is positive de nite, then det A > 0.

(ii) If A is positive semi-de nite, then det A 0.

We close by showing that positive de nite matrices are invertible matrices of a famil-
iar form: they are Gram matrices. It is another remarkable dividend of the orthogonal
diagonalization.

Theorem 1200 A symmetric matrix A is positive de nite if and only if there exists an
invertible matrix B such that A = B T B.

Proof \If". Assume that there exists an invertible matrix B such that A = B T B. Let
0 6= x 2 Rn and set y = Bx. Since B is invertible, y 6= 0. Then x Ax = x B T Bx = y T y =
y y > 0, and so A is positive de nite.
\Only if". Assume that A is a positive de nite matrix. By Theorem 1185, there is an
orthogonal matrix C such that A = C C T . Then
1 1
A = (C 2 )( 2 C T)
1
where 2 is the diagonal matrix with entries that are square roots of the corresponding
entries of (since A is positive de nite, all the diagonal terms of are strictly positive).
1 1 p p p 1
By Binet's Theorem, det C 2 = det C det 2 = 1 2 n 6= 0, so C 2
1 1 1
is invertible. Moreover, (C 2 )T = 2 C T . If we set B = 2 C T , we conclude that A =
1 1
(C 2 )( 2 C T ) = B T B.

Interestingly, inspection of the proof shows that, via the change of variable y = Bx, we
can write a positive de nite quadratic form f (x) = x Ax as an inner product y y, that is,
as a sum of squares. This observation sheds further light on the nature of positive de nite
quadratic forms (cf. also Brioschi's Theorem below).

O.R. It can be proved that the matrix B in the last theorem can be uniquely chosen to be
upper triangular with strictly positive diagonal entries. This is important from a computa-
tional viewpoint because triangular matrices are especially easy to handle. So, denote by
L the transpose of B, which is lower triangular with strictly positive diagonal entries. The
\triangular" version of the factorization established in this last result is often written as

A = LLT

and is called Cholesky factorization. If one is able to compute L, this factorization may
greatly simplify, for example, the computation of the solution of a linear system Ax = b
when A is positive de nite. Indeed, in this case if one rst nds the solution y of the system
Ly = b, then to solve the system Ax = b amounts to solve the system LT x = y . In the
two steps computations are substantially simpli ed by the triangular nature of the involved
matrices (as readers will learn in more advanced courses). H
25.2. FORMS 779

25.2.3 Ubi minor


In some cases it is easy to check the sign of a quadratic form, which is a key operational
issue
Pn in applications. For example, it is immediate to see that the quadratic form f (x) =
2
i=1 i xi is positive semi-de nite if and only if i 0 for every i, while it is positive de nite
if and only if i > 0 for every i.
In general, however, it is not simple to determine directly the sign of a quadratic form and,
therefore, some useful criteria have been elaborated that, typically, involve the associated
symmetric matrices.

Proposition 1201 Let A be a symmetric matrix of order n.

(i) If A is positive de nite, then aii > 0 for all i = 1; :::; n.

(ii) If A is positive semi-de nite, then aii 0 for all i = 1; :::; n.

Proof It is enough to observe that f ei = aii for all = 1; :::; n.

In the negative cases, the inequalities are reversed. These conditions are only necessary:
for instance, the matrix
1 2
2 1
is easily checked to be not positive de nite. Yet, as previously
P remarked, the conditions
become su cient for \diagonal" quadratic forms f (x) = ni=1 i x2i .
This result gives a simple necessary \diagonal" conditions that are mostly useful for a
preliminary inspection: if these conditions are violated, the matrix cannot be de nite or
semi-de nite. For a deeper analysis, we need more sophisticated criteria. Among them,
next we will present the classic Sylvester-Jacobi criterion. To introduce it, we need some
terminology about submatrices and minors (Section 15.6.7). Let A be a square matrix of
order n. We call:

(i) principal submatrices the square submatrices of A obtained by eliminating k rows and
columns that have the same indexes (place), with 0 k n 1;

(ii) principal minors the determinants of principal submatrices;

(iii) leading principal submatrices the principal submatrices obtained by eliminating the
last k rows and columns, with 0 k n 1, that is, in symbols
2 3
a11 a12 a13
a11 a12
A1 = [a11 ] ; A2 = ; A3 = 4 a21 a22 a23 5 ; :::; An = A
a21 a22
a31 a32 a33
(25.11)

(iv) leading principal minors the determinants of leading principal submatrices.

Example 1202 Let 2 3


1 3 2
A = 4 10 1 2 5
3 5 7
780 CHAPTER 25. FORMS AND SPECTRA

Its principal minors are the determinants

1 3 1 2
det A = 101; det = 29; det = 3; det [1] = 1
10 1 5 7
1 2
det = 1; det [7] = 7
3 7

These principal minors correspond, respectively, to the principal submatrices obtained by


eliminating:

0 rows and 0 columns,

1 row and 1 column of index 3 (i.e., the elimination of the last row and column),

1 row and 1 column of index 1 (i.e., the elimination of the rst row and column),

2 rows and 2 columns of indexes 2 and 3 as well as 2 rows and 2 columns of indexes 1
and 3 (in both cases, we end up with the square submatrix [1]),

2 rows and 2 columns of index 2 (i.e., the elimination of the middle rows and columns),

2 rows and 2 columns of indexes 1 and 2.

Finally, the matrix A has only three leading principal minors:

1 3
det A1 = det [1] = 1; det A2 = det = 29; det A3 = det A = 101
10 1

A rst use of this terminology is in a remarkable decomposition of a quadratic form in


a sum of squares proved by Francesco Brioschi in 1856.7 Here for k = 1; :::; n we denote by
Ak the matrices (25.11) and adopt the convention det A0 = 1.

Theorem 1203 (Brioschi) Let A be a symmetric matrix of order n with non-zero leading
principal minors. Then, there exists an upper triangular matrix C of order n with unitary
diagonal entries such that, for all x 2 Rn ,
n
X det Ak 2
x Ax = z (25.12)
det Ak 1 k
k=1

where z = Cx.

Proof We only prove the case n = 2 originally established by Lagrange (a complete proof
can be found in Debreu, 1952). The left hand side of (25.12) is

x Ax = a11 x21 + a22 x22 + 2a12 x1 x2


7
The cases n = 2; 3 was already proved by Lagrange (see Debreu, 1952, for a discussion of this result and
for relevant historical references).
25.2. FORMS 781

The matrix C has here the form


1 c
C=
0 1
where c 2 R. So, z = Cx implies (z1 ; z2 ) = (x1 + cx2 ; x2 ). The right hand side of (25.12) is
then:
det A1 2 det A2 2 a11 a22 a212 2 a11 a22 a212 2
z1 + z2 = a11 z12 + z2 = a11 (x1 + cx2 )2 + x2
det A0 det A1 a11 a11
a11 a22 a212 2
= a11 x21 + 2a11 cx1 x2 + a11 c2 x22 + x2
a11
a11 a22 a212
= a11 x21 + 2a11 cx1 x2 + a11 c2 + x22
a11
a2 c2 + a11 a22 a212 2
= a11 x21 + 2a11 cx1 x2 + 11 x2
a11
By equating the coe cients of the two sides of (25.12), we have
2a12 = 2a11 c
a211 c2 + a11 a22 a212
a22 =
a11
By setting c = a12 =a11 (by hypothesis, a11 = det A1 6= 0), the coe cients are easily seen to
match. So, the matrix
1 aa12
C= 11
0 1
is the sought-after triangular matrix.

Next we generalize the simple necessary conditions of Proposition 1201, which are the
special case of the next result involving only \diagonal" principal minors of the form aii .
Proposition 1204 Let A be a symmetric matrix of order n.
(i) If A is positive de nite, then its principal minors are all strictly positive.
(ii) If A is positive semi-de nite, then its principal minors are all positive.
Proof We only prove (i) because the proof of (ii) is similar. Assume that A is positive
de nite. Let Aii be the (n 1) (n 1) principal submatrix of A obtained by eliminating
the rows and columns of index i { e.g., A11 results from the elimination of the rst rows and
columns. Let x 2 Rn 1 and consider x ~ 2 Rn be such that x~ i = x.8 Then, x Aii x = x
~ A~
x>0
for all 0 6= x 2 R n 1 . So, Aii is positive de nite. A similar argument, just notationally
messier, proves that any principal submatrix B of A is positive de nite. By Corollary 1199,
det B > 0.

This proposition empowers the preliminary analysis that was based on Proposition 1201:
now it is enough to exhibit any principal minor that violates the positivity conditions to
conclude that the matrix is not de nite or semi-de nite. Yet, the main interest of this result
is as a stepping stone towards the Sylvester-Jacobi criterion, which we can now state and
prove.
8
Recall the notation x i introduced in Section 14.
782 CHAPTER 25. FORMS AND SPECTRA

Proposition 1205 (Sylvester-Jacobi criterion) A symmetric matrix A is:

(i) positive de nite if and only if its leading principal minors are all strictly positive;

(ii) negative de nite if and only if its leading principal minors are not zero and change sign
starting with a negative sign;

(iii) inde nite if its leading principal minors are not zero and the sequence of their signs
does not respect (i) and (ii).

Remarkably, it is enough to check just the leading principal minors to establish whether
a symmetric matrix is positive or negative de nite, a computationally much lighter task than
checking all principal minors.

Proof (i) The \only if" part follows from Proposition 1204. The \if" part follows from
Brioschi's Theorem. Indeed, let 0 6= x 2 Rn . Let C be the triangular matrix C of Brioschi's
Theorem. Since it has unitary diagonal entries, it is invertible. Thus, we have z = Cx 6= 0.
So, (25.12) implies x Ax > 0. Point (ii) is just the dual \negative" version of (i) because
det ( A) = ( 1)n det A for a square matrix of order n (see Proposition 717). Finally, point
(iii) is a straightforward consequence of points (i) and (ii).

Example 1206 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The symmetric matrix
associated to f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2

The leading principal minors are:

det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 42
3
det A3 = det A = > 0
2
Hence, by the Sylvester-Jacobi criterion the quadratic form f is positive de nite. N

We close with a Sylvester-Jacobi criterion for positive semi-de nite matrices.9


9
We omit the proof, as well as the dual version for negative semi-de nite matrices (which is left to readers).
25.2. FORMS 783

Proposition 1207 A symmetric matrix is positive semi-de nite if all its principal minors,
leading or not, are positive.

Here we need to consider all principal minors, not just the leading ones as in the pos-
itive de nite case. The Sylvester-Jacobi criterion is thus computationally heavier, so less
attractive, in the positive semi-de nite case.

Example 1208 The leading principal minors of the matrix

0 0
A=
0 a22

are positive, yet the matrix is positive semi-de nite if and only if a22 0. N
784 CHAPTER 25. FORMS AND SPECTRA
Part VI

Di erential calculus

785
Chapter 26

Derivatives

26.1 Marginal analysis


Consider a function c : R+ ! R whose value c (x) represents the cost (say, in euros) required
to produce the quantity x of an output. Suppose that the producer wants to evaluate the
impact on the costs of a variation x in the output produced. For example, if x = 100
and x = 3, he has to evaluate the impact on costs of a positive variation { that is, of an
increment { of 3 units of output with respect to the current production of 100 units.
The output variation x determines a variation of the cost

c = c (x + x) c (x)

If x is a non-zero discrete variation, that is,

x 2 f:::; 3; 2; 1; 1; 2; 3; :::g

the average cost of each additional unit of output in x is given by


c c (x + x) c (x)
= (26.1)
x x
The ratio c= x, called di erence quotient, is fundamental in evaluating the impact on the
cost of the variation x of the quantity produced. Let us illustrate it with the following
table, in which c(x)=x denotes the average cost (in euros) of each unit produced:
c(x) c
x c (x) x x

100 4; 494 44:94


4;500 4;500 4;494
102 4; 500 102 ' 44:11767 2 =3
4;510 4;510 4;500
105 4; 510 105 ' 42:95238 3 = 3:3
4;515 4;515 4;510
106 4; 515 106 ' 42:59434 1 =5

As the production increases, while the average cost decreases the di erence quotient in-
creases. This means that the average cost of each additional unit increases. Therefore, to
increase the production is, \at the margin", more and more expensive for the producer. In

787
788 CHAPTER 26. DERIVATIVES

particular, the last additional unit has determined an increase in costs of 5 euros: for the
producer such increase in the production is pro table if (and only if) there is an at least
equal increase in the di erence quotient of the return R(x), that is, in the return of each
additional unit:
R R (x + x) R (x)
= (26.2)
x x
Let us add to the table two columns with the returns and their di erence quotients:

c(x) c R
x c (x) x x R (x) x

100 4; 494 44:94 5; 000


5;100 5;000
102 4; 500 44:11767 3 5; 100 2 = 50
5;200 5;100
105 4; 510 42:95238 3:3 5; 200 3 = 33:3
5;204 5;200
106 4; 515 42:59434 5 5; 204 1 =4

The rst two increases in production are pro table for the producer: they determine a
di erence quotient of the returns equal to 50 euros and 33:3 euros, respectively, versus a
di erence quotient of the costs equal to 3 euros and 3:3 euros, respectively. After the last
increment in production, the di erence quotient of the returns decreases to only 4 euros,
lower than the corresponding value of 5 euros of the di erence quotient of the costs. The
producer will nd, therefore, pro table to increase the production to 105 units, but not to
106. That this choice is correct is con rmed by the trend of the pro t (x) = R (x) c (x),
which for convenience we add to the table:
c(x) c R
x c (x) x x R (x) x (x)
100 4; 494 44:94 5; 000 506

102 4; 500 44:11767 3 5; 100 50 600

105 4; 510 42:95238 3:3 5; 200 33:3 690

106 4; 515 42:59434 5 5; 204 4 689

The pro t of the producer continues to increase up to the level 105 of produced output, but
decreases in case of a further increase to 106. The \incremental" information, quanti ed by
di erence quotients such as (26.1) and (26.2), is therefore key for the producer ability to
assess his production decisions. In contrast, the information on average costs or on average
returns is, for instance, completely irrelevant (in our example it is actually misleading: the
decrease in average costs can lead to wrong decisions). In the economics jargon, the producer
should decide based on what happens at the margin, not on average.

Until now we have considered the ratio (26.1) for discrete variations x. Idealizing, let
us consider arbitrary non-zero variations x 2 R and, in particular, smaller and smaller
variations, that is, x ! 0. Their limit c0 (x) is given by

c (x + x) c (x)
c0 (x) = lim (26.3)
x!0 x
26.2. DERIVATIVES 789

When it exists and is nite, c0 (x) is called the marginal cost at x: it indicates the variation
in cost determined by in nitesimal variations of output with respect to the \initial" quantity
x.
This idealization permits to frame marginal analysis within di erential calculus, a fun-
damental mathematical theory that will be the subject matter of the chapters of this part of
the book. Because it formalizes marginal analysis, di erential calculus pervades economics.

26.2 Derivatives
For a function f : (a; b) ! R, the di erence quotient (26.1) takes the form
f f (x + h) f (x) f (x + h) f (x)
= = (26.4)
x (x + h) x h

where x = h denotes a generic variation, positive if h > 0 or negative if h < 0.1

De nition 1209 A function f : (a; b) ! R is said to be derivable at a point x0 2 (a; b) if


the limit
f (x0 + h) f (x0 )
lim (26.5)
h!0 h
exists and is nite. This limit is called the derivative of f at x0 , and is denoted by f 0 (x0 ).

Therefore, the derivative is nothing but the limit of the di erence quotient when it exists
and is nite. Other notations used for the derivative at x0 are
df
Df (x0 ) and (x0 )
dx
The notation f 0 (x0 ), which we will mostly use, is probably the most convenient; sometimes
we will use also the other two notations, whenever convenient.2
Note the double requirement that the limit exist and be nite: if at a point the limit of
the di erence quotient (26.5) exists but is in nite, the function does not have a derivative
at that point (see Example 1213).

A few remarks are in order. (i) Di erential calculus, of which derivatives are a rst key
notion, originated in the works of Leibniz and Newton in the second part of the seventeenth
century. Newton was motivated by physics, which indeed features a classic example of a
derivative: let t be the time and s be the distance covered by a mobile object. Suppose
the function s(t) indicates the total distance covered until time t. The di erence quotient
s= t is the average velocity in a time interval of t. Therefore, its derivative at a point
t0 can be interpreted as the instantaneous velocity at t0 . If space is measured in kilometers
and time in hours, the velocity is measured in km/h, that is, in \kilometers per hour" (as
speedometers do).
(ii) In applications, the dependent and independent variables y and x that appear in
a function y = f (x) take a concrete meaning and are both evaluated in terms of a unit of
1
Since the domain (a; b) is an open interval, for h su ciently small we have x + h 2 (a; b).
2
Di erent notations for the same mathematical object can be convenient in di erent contexts. For this
reason, it may be important to have several notations at hand (provided they are then used consistently).
790 CHAPTER 26. DERIVATIVES

measure (e, $, kg, liters, years, miles, parsecs, etc.): if we denote by T the unit of measure of
the dependent variable y and by S that of the independent variable x, the di erence quotient
y= x (and so the derivative, if it exists) is then expressed in the unit of measure T =S. For
instance, if in the initial example the cost is expressed in euros and the quantity produced
in quintals, the di erence quotient (26.1) is expressed in e/q, that is, in \euros per quintal".
(iii) The notation df =dx (or the equivalent dy=dx) wants to suggest that the derivative
is a limit of ratios.3 Note, however, that df =dx is only a symbol, not a true ratio { indeed,
it is the limit of ratios. Nevertheless, heuristically it is often treated as a true ratio (see, for
example, the remark on the chain rule at the end of Section 26.9). This can be a useful trick
to help our intuition as long as what found is then checked formally.
(iv) The terminology \derivable at" is not so common, but its motivation will become
apparent in Section 26.12.2. In any case, a function f : (a; b) ! R which is derivable at each
point of (a; b) is called derivable, without any further quali cation.

26.3 Geometric interpretation


The derivative has an important geometric interpretation. Given a function f : (a; b) ! R
and a point x0 2 (a; b), consider the straight line passing through the points (x0 ; f (x0 )) and
(x0 + h; f (x0 + h)), where h 6= 0 is a variation. Assume, for simplicity, that h > 0 (similar
considerations hold for h < 0):

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

The equation of such a straight line is obtained by solving the system


(
f (x0 ) = mx0 + q
f (x0 + h) = m (x0 + h) + q

A simple calculation gives

f (x0 + h) f (x0 )
y = f (x0 ) + (x x0 ) (26.6)
h
3
This notation is due to Leibniz, while the f 0 notation is due to Lagrange.
26.3. GEOMETRIC INTERPRETATION 791

which is the equation of the sought-after straight line passing through the points (x0 ; f (x0 ))
and (x0 + h; f (x0 + h)). Taking the limit as h ! 0, we get

y = f (x0 ) + f 0 (x0 ) (x x0 ) (26.7)

that is, the equation of the straight line which is tangent to the graph of f at the point
(x0 ; f (x0 )) 2 Gr f .
As h tends to 0, the straight line (26.6) thus tends to the tangent (straight) line, whose
slope is the derivative f 0 (x0 ). The graph of the tangent line is:

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

In sum, geometrically the derivative can be regarded as the slope of the tangent line at
the point (x0 ; f (x0 )). In turn, the tangent line can be regarded as a local approximation
of the function f at x0 , a key observation that will be developed through the fundamental
notion of di erential (Section 26.12).

Example 1210 Consider the function f : R ! R given by f (x) = x2 1. At a point x 2 R


we have
h i
f (x + h) f (x) (x + h)2 1 x2 1
f 0 (x) = lim = lim
h!0 h h!0 h
2
h + 2xh
= lim = lim (h + 2x) = 2x
h!0 h h!0

The derivative exists at each x 2 R and is given by 2x. For example, the derivative at x = 1
is f 0 (1) = 2, with tangent line

y = f (1) + f 0 (1) (x 1) = 2x 2
792 CHAPTER 26. DERIVATIVES

at the point (1; 0) 2 Gr f . Graphically:

y
3

0
-1 O 1 x

-1

-2
-2 -1 0 1 2 3

The derivative at the origin is f 0 (0) = 0, with tangent line


y = f (0) + f 0 (0) x = 1
at the point (0; 1) 2 Gr f . Graphically:

y
3

0
-1 O 1 x

-1

-2
-2 -1 0 1 2 3

In this case the tangent line is horizontal (constant) and is always equal to 1. N

Example 1211 Consider a constant function f : R ! R, that is, f (x) = k for every x 2 R.
For every h 6= 0 we have
f (x + h) f (x) k k
= =0
h h
and therefore f 0 (x) = 0 for every x 2 R. The derivative of a constant function is zero. N

Example 1212 Consider the function f : R ! R given by


( 1
x if x 6= 0
f (x) =
0 if x = 0
26.3. GEOMETRIC INTERPRETATION 793

with graph:

10

0
x
-2

-4

-6

-8

-10
-5 0 5

At a point x 6= 0 we have

1 1
f (x + h) f (x) x (x + h)
f 0 (x) = lim = lim x+h x
= lim
h!0 h h!0 h h!0 hx (x + h)
h 1 1
= lim = lim =
h!0 hx (x + h) h!0 x (x + h) x2

The derivative exists at each x 6= 0 and is given by x 2. For example, the derivative at
x = 1 is f 0 (1) = 1, and at x = 2 is f 0 ( 2) = 1=4.
If we consider the origin x = 0 we have, for h 6= 0,

1
f (x + h) f (x) h 0 1
= =
h h h2

so that
f (x + h) f (x)
lim = +1
h!0 h

The limit is not nite and hence the function does not have a derivative at x = 0. Recall
that the function is not continuous at this point (Example 554). N

Example 1213 Consider the function f : R ! R given by

( p
x if x 0
f (x) = p
x if x < 0
794 CHAPTER 26. DERIVATIVES

with graph:

3.5

3 y
2.5

1.5

0.5

0
O x
-0.5

-1

-1.5

-2
-6 -4 -2 0 2 4 6 8

Take x = 0. For h > 0 we have


p
f (x + h) f (x) h 1
= = p ! +1
h h h
and, for h < 0, we have
p p
f (x + h) f (x) h h 1
= = =p ! +1
h h h h
Therefore,
f (x + h) f (x)
lim = +1
h!0 h
Since the limit is not nite, the function does not have a derivative at x = 0. Note that,
di erently from the previous example, the function is continuous at this point. N

26.4 Derivative function


Given a function f : (a; b) ! R, the set D (a; b) of the points of the domain where f is
derivable is called domain of derivability of f . In Examples 1210 and 1211 the domain of
the function coincides with that of derivability. On the contrary, in Examples 1212 and 1213
the domain of the function is R, while the domain of derivability is R f0g.
We can now introduce a new function: the derivative function.

De nition 1214 Let f : (a; b) ! R be a function with domain of derivability D (a; b).
0 0
The function f : D ! R that to each x 2 D associates the derivative f (x) is called the
derivative function of f .

The derivative function f 0 describes the derivative of f at the di erent points where it
exists, thus describing its overall behavior. In the examples previously discussed:
26.5. ONE-SIDED DERIVATIVES 795

(i) for f (x) = x2 1, the derivative function f 0 : R ! R is given by f 0 (x) = 2x;

(ii) for f (x) = k, the derivative function f 0 : R ! R is given by f 0 (x) = 0;

(iii) for f (x) = 1=x = x 1, the derivative function f 0 : R f0g ! R is given by f 0 (x) =
x 2.

The notion of derivative function allows to frame in a bigger picture the computations
that we did in the examples of the last section: to compute the derivative of a function f at
a generic point x of the domain amounts to computing its derivative function f 0 . When we
have found that the derivative of f (x) = x2 1 is, at any point x 2 R, given by 2x, we have
actually found that its derivative function f 0 : R ! R is given by f 0 (x) = 2x.

Example 1215 Let r : R+ ! R be the return function and c : R+ ! R be the cost function
of a producer (see Section 22.1.4). The derivative function r0 : D R+ ! R is called the
marginal return function, and the derivative function c0 : D R+ ! R is called the marginal
cost function. Their economic interpretation should be, by now, clear. N

26.5 One-sided derivatives


Until now we have considered the two-sided limit (26.5) of the di erence quotient. Sometimes
it is useful to consider separately positive and negative variations of h. To this end, we
introduce the notions of right and left derivatives.

De nition 1216 A function f : (a; b) ! R is said to be derivable from the right at the
point x0 2 (a; b) if the one-sided limit

f (x0 + h) f (x0 )
lim (26.8)
h!0+ h

exists and is nite, and to be derivable from the left at x0 2 (a; b) if the one-sided limit

f (x0 + h) f (x0 )
lim (26.9)
h!0 h

exists and is nite.

When it exists and is nite, the limit (26.8) is called the right derivative of f at x0 , and
it is denoted by f+0 (x0 ). Analogously, when it exists and is nite, the limit (26.9) is called
left derivative of f at x0 , and it is denoted by f 0 (x0 ). Since two-sided limits exist if and
only if both one-sided limits exist (Proposition 521), we have:

Proposition 1217 A function f : (a; b) ! R is derivable at x0 2 (a; b) if and only if it is


derivable from both the right and the left, with f+0 (x0 ) = f 0 (x0 ). In this case,

f 0 (x0 ) = f+0 (x0 ) = f 0 (x0 )


796 CHAPTER 26. DERIVATIVES

Example 1218 Consider the function f : R ! R given by


(
1 x2 if x 0
f (x) =
1 if x > 0
with graph:
3
y

1
1

0
O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

It is easy to see that the function is derivable at each point x 6= 0, with


(
2x if x < 0
f 0 (x) =
0 if x > 0
At 0 we have
f (0 + h) f (0) 1 1
f+0 (0) = lim = lim =0
h!0+ h h!0+ h
f (0 + h) f (0) 1 h2 1
f 0 (x0 ) = lim = lim = lim h=0
h!0 h h!0 h h!0

Therefore, by Proposition 1217 the function is derivable also at 0, with f 0 (0) = 0. In


conclusion, (
2x if x 0
f 0 (x) =
0 if x > 0
N

Through unilateral derivatives we can classify two important classes of points where
derivability fails. Speci cally, a point x0 of the domain of f is called:

(i) a corner point if the right derivative and the left derivative exist but are di erent, i.e.,
f+0 (x0 ) 6= f 0 (x0 );
(ii) a cuspidal point (or a cusp) if the right and left limits of the di erence quotient are
in nite with di erent sign:
f (x0 + h) f (x0 ) f (x0 + h) f (x0 )
lim = 1 and lim = 1
h!0+ h h!0 h
26.5. ONE-SIDED DERIVATIVES 797

Example 1219 Let f : R ! R be given by f (x) = jxj, with graph

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

At x0 = 0 we have
(
f (x0 + h) f (x0 ) jhj 1 if h > 0
= =
h h 1 if h < 0

The two-sided limit of the di erence quotient does not exist at 0, so the function is not
derivable at 0. Nevertheless, at 0 there exist the one-sided derivatives. In particular,

f (0 + h) f (0) f (0 + h) f (0)
f+0 (0) = lim =1 ; f 0 (0) = lim = 1
h!0+ h h!0 h

The origin x0 = 0 is, therefore, a corner point. The reader can check that the function is
derivable at each point x 6= 0, with

(
0 1 if x > 0
f (x) =
1 if x < 0

Example 1220 The function

( p
x if x 0
f (x) = p
x if x < 0
798 CHAPTER 26. DERIVATIVES

has a cuspidal point at the origin x = 0, as we can see from its graph:

3.5

3 y
2.5

1.5

0.5

0
O x
-0.5

-1

-1.5

-2
-6 -4 -2 0 2 4 6 8

We close by noting that the right and left derivative functions are de ned in the same
way, mutatis mutandis, as the derivative function. In Example 1219, the one-sided derivative
functions f+0 : R ! R and f 0 : R ! R are given by
( (
0 1 if x 0 0 1 if x > 0
f+ (x) = and f (x) =
1 if x < 0 1 if x 0

26.6 Derivability and continuity


A rst important property of derivable functions is their continuity.

Proposition 1221 A function f : (a; b) ! R derivable at a point x 2 (a; b) is continuous


at x.

Proof We have to prove that limx!x0 f (x) = f (x0 ). Since f is derivable at x, the limit of
the di erence quotient exists and is nite, and it is equal to f 0 (x0 ):
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
Let us rewrite the limit by setting x = x0 + h, so that h = x x0 . Observing that, as h
tends to 0, we have that x tends to x0 , we get:
f (x) f (x0 )
lim = f 0 (x0 )
x!x0 x x0
Therefore, by the algebra of limits (Proposition 333) we have:
f (x) f (x0 ) f (x) f (x0 )
lim (f (x) f (x0 )) = lim (x x0 ) = lim lim (x x0 )
x!x0 x!x0 x x0 x!x0 x x0 x!x0
0 0
= f (x0 ) lim (x x0 ) = f (x0 ) 0 = 0
x!x0
26.6. DERIVABILITY AND CONTINUITY 799

where the last equality holds since f 0 (x0 ) exists and is nite. We have thus proved that
limx!x0 (f (x) f (x0 )) = 0. On the other hand, again by algebra of limits, we have:

0 = lim (f (x) f (x0 )) = lim f (x) lim f (x0 ) = lim f (x) f (x0 )
x!x0 x!x0 x!x0 x!x0

Therefore limx!x0 f (x) = f (x0 ), as desired.

Derivability at a point thus implies continuity at that point. The converse is false: the
absolute value function f (x) = jxj is continuous at x = 0 but is not derivable at that point
(Example 1219). In other words, continuity is a necessary, but not su cient, condition for
derivability.4

Proposition 1221, and the examples seen until now, allow us to identify ve possible
causes of non-derivability at a point x:

(i) f is not continuous at x (Example 1212).

(ii) f has a corner point at x (Example 1219).

(iii) f has a cuspidal point at x (Example 1220).

(iv) f has at x a point at which a one-sided derivative exist but, at the other side, the limit
of the di erence quotient is +1 or 1; for example, the function
( p
x if x 0
f (x) =
x if x < 0

is such that f 0 (0) = 1 and limh!0+ (f (x0 + h) f (x0 )) =h = +1.

(v) f has a vertical tangent at x; for example, the function


( p
x if x 0
f (x) = p
x if x < 0

seen in Example 1213 has a vertical tangent at x = 0 because limh!0 f (h) =h = +1.

The ve cases just identi ed are, however, not exhaustive: there are other sources of
non-derivability. For example, the function
8
< x sin 1 if x 6= 0
f (x) = x
:
0 if x = 0

is continuous everywhere.5 At the origin x0 = 0 it is, however, not derivable because the
limit
f (x0 + h) f (x0 ) h sin h1 0 1
lim = lim = lim sin
h!0 h h!0 h h!0 h
4
In the coda we say more on this important issue.
5
Indeed, limx!0 x sin (1=x) = 0 because jsin (1=x)j 1 and so x x sin (1=x) x.
800 CHAPTER 26. DERIVATIVES

does not exist. The origin is not a corner point and there is no vertical tangent at this point.
The lack of derivability here is due to the fact that f has, in any neighborhood of the origin,
in nitely many oscillations { which are such that the di erence quotient sin (1=h) oscillates
in nitely many times between 1 and 1. Note that in this example the one-sided derivatives
f+0 (0) and f 0 (0) do not exist as well.

We close with an important piece of terminology, often used in the rest of the book.

Terminology When f is derivable at all the interior points (a; b) and is one-sided derivable
at the endpoints a and b, we say that it is derivable on the closed interval [a; b]. It is
immediate to see that f is then also continuous on such interval.

The next proposition clari es the import of this piece of terminology by showing that a
function is derivable on a closed interval if and only if it is the restriction on such interval
of a function which is derivable on the entire real line. So, one can always regard a function
derivable on a closed interval as a restriction of a function derivable everywhere.

Proposition 1222 A function f : [a; b] ! R is derivable on [a; b] if and only if it is the


restriction on [a; b] of a function g : R ! R derivable on R.

Proof We prove only the \only if", the converse being obvious. So, let f : [a; b] ! R be
derivable on [a; b], that is, derivable at all the interior points (a; b) and one-sided derivable
at the endpoints a and b. De ne g : R ! R by
8
< f (a) + f+0 (a) (x a) if x < a
g (x) = f (x) if x 2 [a; b]
: 0
f (b) + f (b) (x b) if x > b
Clearly, f is the restriction of g on [a; b], i.e., g (x) = f (x) for all x 2 [a; b]. Next we show
that g is derivable on R. It is easily seen to be derivable on ( 1; a) [ (a; b) [ (b; 1). Indeed,
g 0 (x) = f 0 (x) for all x 2 (a; b), g 0 (x) = f+0 (a) for all x < a, and g 0 (x) = f 0 (b) for all
x > b. The only points which are delicate are x = a and x = b. By hypothesis, we have
g+0 (a) = f 0 (a) and g 0 (b) = f 0 (b). At the same time, we have
+

g (a + h) g (a) f (a) + f+0 (a) (a + h a) f (a)


g 0 (a) = lim = lim = f+0 (a) = g+
0
(a)
h!0 h h!0 h
proving that g 0 (a) = g+
0 (a). So, g is derivable at a. Similarly, we have that

0 g (b + h) g (b) f (b) + f 0 (b) (b + h b) f (b)


g+ (b) = lim = lim = f 0 (b) = g 0 (b)
h!0+ h h!0+ h
proving that g 0 (b) = g+
0 (b). So, g is derivable at b. We conclude that g is derivable on R.

26.7 Derivatives of elementary functions


Proposition 1223 The power function f : R ! R given by f (x) = xn for n 1 is derivable
at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = nxn 1
(26.10)
26.7. DERIVATIVES OF ELEMENTARY FUNCTIONS 801

For example, the function f (x) = x5 has derivative function f 0 (x) = 5x4 and the function
f (x) = x3 has derivative function f 0 (x) = 3x2 .
We give two proofs of this basic result.

Proof 1 By Newton's binomial formula, we have

f (x + h) f (x) (x + h)n xn
f 0 (x) = lim = lim
h!0 h h!0 h
Pn n! n k k n
k=0 k!(n k)! x h x
= lim
h!0 h
n
x + nx n 1 h + n(n2 1) xn 2 h2 + + nxhn 1 + hn xn
= lim
h!0 h
n (n 1) n 2
= lim nxn 1 + x h+ + nxhn 2 + hn 1
h!0 2
= nxn 1

as claimed.

Proof 2 We establish (26.10) by induction, using the derivative of the product of functions
(see Section 26.8). First, we show that the derivative of the function f (x) = x is equal to 1.
The limit of the di erence quotient of f is

f (x + h) f (x) x+h x h
lim = lim = lim =1
h!0 h h!0 h h!0 h

Therefore f 0 (x) = 1, so (26.10) thus holds for n = 1. Suppose that (26.10) holds for n 1
(induction hypothesis), that is,

D(xn 1
) = (n 1)xn 2

Consider the function xn = x (xn 1 ). Using the derivative of the product of functions (see
26.13 below) and the induction hypothesis, we have

D(xn ) = 1 (xn 1
) + x D(xn 1
) = (xn 1
) + x (n 1)xn 2
= (1 + n 1)(xn 1
) = nxn 1

that is, (26.10).

Proposition 1224 The exponential function f : R ! R given by f (x) = x, with > 0, is


derivable at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = x
log

In particular, dex =dx = ex , that is, the derivative function of the exponential function is
the exponential function itself. So, the exponential function equals its derivative function, a
truly remarkable invariance property that gives the exponential function a special status in
di erential calculus.
802 CHAPTER 26. DERIVATIVES

Proof We have
x+h x x h 1
f (x + h) f (x)
f 0 (x) = lim = lim = lim
h!0 h h!0 h h!0 h
h 1
= x lim = x log
h!0 h

where the last equality follows from the basic limit (12.34).

Proposition 1225 The function f : R ! R given by f (x) = sin x is derivable at each


x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = cos x

Proof From the basic trigonometric formula sin (a + b) = sin a cos b + cos a sin b, it follows
that
f (x + h) f (x) sin (x + h) sin x
f 0 (x) = lim = lim
h!0 h h!0 h
sin x cos h + cos x sin h sin x
= lim
h!0 h
sin x (cos h 1) + cos x sin h
= lim
h!0 h
cos h 1 sin h
= sin x lim + cos x lim = cos x
h!0 h h!0 h

The last equality follows from the basic limits (12.32) and (12.31) for cos x and sin x, respec-
tively.

In a similar way it is possible to prove that the function f : R ! R given by f (x) = cos x
is derivable at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = sin x (26.11)

26.8 Algebra of derivatives


In Section 6.3.2 we studied the algebra of functions, that is, their sums, products and quo-
tients. Let us see now how derivatives behaves with respect to the fundamental operations.
We begin with addition.

Proposition 1226 Let f; g : (a; b) ! R be two derivable functions at x 2 (a; b). The sum
function f + g : (a; b) ! R is derivable at x, with

(f + g)0 (x) = f 0 (x) + g 0 (x)


26.8. ALGEBRA OF DERIVATIVES 803

The result actually holds, more generally: for any linear combination f + g : (a; b) ! R,
with ; 2 R, we have
( f + g)0 (x) = f 0 (x) + g 0 (x) (26.12)
In particular, the derivative of f (x) is f 0 (x).

Proof We prove the result directly in the more general form (26.12). We have

( f + g) (x + h) ( f + g) (x)
( f + g)0 (x) = lim
h!0 h
( f )(x + h) + ( g) (x + h) ( f )(x) ( g) (x)
= lim
h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim +
h!0 h h
f (x + h) f (x) g (x + h) g (x)
= lim + lim
h!0 h h!0 h
0 0
= f (x) + g (x)

as desired.

Thus, the sum behaves in a simple manner with respect to derivatives: the \derivative of
a sum" is the \sum of the derivatives".6 More subtle is the case of the product of functions.

Proposition 1227 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b). The product
function f g : (a; b) ! R is derivable at x, with

(f g)0 (x) = f 0 (x) g (x) + f (x) g 0 (x) (26.13)

Proof We have
(f g) (x + h) (f g) (x) f (x + h) g (x + h) f (x) g (x)
(f g)0 (x) = lim = lim
h!0 h h!0 h
f (x + h) g (x + h) f (x) g (x + h) + f (x) g (x + h) f (x) g (x)
= lim
h!0 h
g (x + h) (f (x + h) f (x)) + f (x) (g (x + h) g (x))
= lim
h!0 h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim +
h!0 h h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim + lim
h!0 h h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim g (x + h) lim + f (x) lim
h!0 h!0 h h!0 h
0 0
= g (x) f (x) + f (x) g (x)
6
The converse does not hold: if the sum of two functions has a derivative, it is not necessarily true that
the individual functions have a derivative (for example, the origin is a corner point of both f (x) = jxj and
g (x) = jxj, but the sum f + g is a constant function that has derivative at every point of the real line).
The same is true for the multiplication and division operations on functions.
804 CHAPTER 26. DERIVATIVES

as desired. In the last step we have limh!0 g (x + h) = g (x) thanks to the continuity of g,
which is ensured by its derivability.

The derivative of the product, therefore, is not the product of the derivatives, but it is
given by the more subtle product rule (26.13). A similar rule { the so-called quotient rule {
holds mutatis mutandis for the quotient.

Proposition 1228 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b), with
g (x) 6= 0. The quotient function f =g : (a; b) ! R is derivable at x, with
0
f f 0 (x) g (x) f (x) g 0 (x)
(x) = (26.14)
g g (x)2
Proof We start with the case in which f is constant and equal to 1. We have
1 1
0
1 g (x + h) g (x) g (x) g (x + h)
(x) = lim = lim
g h!0 h h!0 g (x) g (x + h) h
1 g (x) g (x + h)
= lim
g (x) h!0 g (x + h) h
1 g (x + h) g (x) 1 g 0 (x)
= lim lim =
g (x) h!0 h h!0 g (x + h) g (x)2
Now consider any f : (a; b) ! R. Thanks to (26.13), we have
0
f 1 0 1 1 0
(x) = f (x) = f 0 (x) (x) + f (x) (x)
g g g g
f 0 (x) g 0 (x) f 0 (x) g 0 (x)
= + f (x) = f (x)
g (x) g (x)2 g (x) g (x)2
f 0 (x) g (x) f (x) g 0 (x)
=
g (x)2
as desired.

Example 1229 (i) Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have

(f + g)0 (x) = 3x2 + cos x 8x 2 R

and
(f g)0 (x) = 3x2 sin x + x3 cos x 8x 2 R
as well as 0
f 3x2 sin x x3 cos x
(x) = 8x 2 R fn : n 2 Zg
g sin2 x
In the last formula fn : n 2 Zg is the set of the points f ; 2 ; ; 0; ; 2 ; g where
the function g (x) = sin x in the denominator is zero.
(ii) Let f : R ! R be given by f (x) = tan x. Since tan x = sin x= cos x, we have
1
f 0 (x) = 1 + tan2 x =
cos2 x
26.8. ALGEBRA OF DERIVATIVES 805

as the reader can check.


(iii) Let c : [0; 1) ! R be a cost function, with marginal cost function c0 : (0; 1) ! R.
Consider the average cost function cm : (0; 1) ! R given by

c (x)
cm (x) =
x
By the quotient rule, we have

c(x)
xc0 (x) c (x) x c0 (x) x c0 (x) cm (x)
c0m (x) = = =
x2 x2 x
Since x > 0, we have

c0m (x) 0 () c0 (x) cm (x) 0 () c0 (x) cm (x) (26.15)

Therefore, at a point x the variation in average costs is positive if and only if marginal costs
are larger than average costs. In other words, average costs continue to increase until they
are lower than marginal costs (cf. the numerical examples with which we began the chapter).
More generally, the same reasoning holds for each function f : [0; 1) ! R that represents,
when x 0 varies, an economic \quantity": return, pro t, etc.. The function fm : (0; 1) !
R de ned by
f (x)
fm (x) =
x
is the corresponding \average quantity" (average return, average pro t, etc.), while the
derivative function f 0 (x) represents the \marginal quantity" (marginal return, marginal
pro t, etc.). At each x > 0, the function f 0 (x) can be interpreted geometrically as the slope
of the tangent line of f at x, while fm (x) is the slope of the straight line passing through
the origin and the point (x; f (x)).

150 150
y y
f(x)

100 100

f'(x)
50 50

f(x)/x

0 0
O x O x
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5

Geometrically, (26.15) says that the variation of the average fm is positive at a point x > 0,
0 (x)
that is, fm 0, until the slope of the tangent line is larger than that of the straight line
passing through the origin and the point (x; f (x)), that is, f 0 (x) fm (x). N
806 CHAPTER 26. DERIVATIVES

26.9 The chain rule


Turn now to the derivatives of composite functions g f . How can we calculate its derivative
starting from the derivatives of the functions f and g? The answer to the question is provided
by the important formula (26.16), called chain rule.

Proposition 1230 Let f : (a; b) ! R and g : (c; d) ! R be two functions with Im f (c; d).
If f is derivable at x 2 (a; b) and g is derivable at f (x), then the composite function g f :
(a; b) ! R is derivable at x, with

(g f )0 (x) = g 0 (f (x)) f 0 (x) (26.16)

Thus, the chain rule features the product of the derivatives g 0 and f 0 , where g 0 has as its
argument the image f (x). Before proving it, we provide a simple heuristic argument. For h
small enough, we have
g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)
=
h f (x + h) f (x) h
If h ! 0, then
g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)
lim = lim lim
h!0 h h!0 f (x + h) f (x) h!0 h
= g (f (x)) f 0 (x)
0

Note that we tacitly assumed that the denominator f (x + h) f (x) is always di erent from
zero, something that the hypotheses of the theorem do not guarantee. For this reason, we
need the following rigorous proof.

Proof Since g is derivable at y = f (x), we have


g (y + k) g (y)
lim = g 0 (y)
k!0 k
This is equivalent to
g (y + k) g (y)
= g 0 (y) + o (1) as k ! 0
k
This equality holds for k 6= 0 and implies

g (y + k) g (y) = g 0 (y) + o (1) k as k ! 0 (26.17)

which holds also for k = 0. Choose h small enough and set k = f (x + h) f (x). Since f is
derivable at x, f is continuous at x, so k ! 0 as h ! 0. By (26.17), we have

g (f (x + h)) g (f (x)) = g 0 (f (x)) + o (1) [f (x + h) f (x)] as h ! 0

It follows that
g (f (x + h)) g (f (x)) [f (x + h) f (x)]
= g 0 (f (x)) + o (1) ! g 0 (f (x)) f 0 (x) ;
h h
proving the statement.
26.9. THE CHAIN RULE 807

Example 1231 Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have, at


each x 2 R, (g f ) (x) = sin x3 and (f g) (x) = sin3 x, so

(g f )0 (x) = g 0 (f (x)) f 0 (x) = cos x3 3x2 = 3x2 cos x3

and
(f g)0 (x) = f 0 (g (x)) g 0 (x) = 3 sin2 x cos x

Example 1232 Let f : (a; b) ! R be any function derivable at each x 2 (a; b) and let
g (x) = ex . We have
(g f )0 (x) = g 0 (f (x)) f 0 (x) = ef (x) f 0 (x) (26.18)
4 4
For example, if f (x) = x4 , (g f ) (x) = ex and (26.18) becomes (g f )0 (x) = 4x3 ex . N

The chain rule is very useful to compute the derivative of a function that can be written
as a composition of other functions.

Example 1233 Let ' : R ! R be given by ' (x) = sin3 (9x + 1). To calculate '0 (x) it is
useful to write ' as
'=f g h (26.19)

where f; g; h : R ! R are given by f (x) = x3 , g (x) = sin x, and h (x) = 9x + 1. By the


chain rule, we have

'0 (x) = f 0 ((g h) (x)) (g h)0 (x) = f 0 ((g h) (x)) g 0 (h (x)) h0 (x)
= 3 sin2 (9x + 1) cos (9x + 1) 9 = 27 sin2 (9x + 1) cos (9x + 1)

Expressing the function ' as in (26.19) thus simpli es the computation of its derivative. N

O.R. If we write z = f (x) and y = g (z), we clearly have y = g (f (x)). What we have
proved can be summarized by stating that

dy dy dz
=
dx dz dx

which is easy to remember if the symbol d =d is interpreted as a true ratio { it is a kind of


Pinocchio, a puppet that behaves like a true kid. H

O.R. The chain rule has an onion avor because the derivative of a composite function is
obtained by successively \peeling" the function from the outside:

(f g h )0 = (f (g ((h ( ))))0 = f 0 (g (h ( ))) g 0 (h ( )) h0 ( )

H
808 CHAPTER 26. DERIVATIVES

26.10 Derivative of inverse functions


Theorem 1234 Let f : (a; b) ! R be an injective function derivable at x0 2 (a; b). If
f 0 (x0 ) 6= 0, the inverse function f 1 is derivable at y0 = f (x0 ), with

1 0 1
f (y0 ) = (26.20)
f 0 (x0 )

In short, the derivative of the inverse function of f , at y0 , is the reciprocal of the derivative
of f , at x0 . The geometric intuition of this result should be clear once one remembers that
the graph of the inverse function is the mirror image of the graph of the function with respect
to the 45 degree line (see Section 6.4.2).

It would be nice to invoke the chain rule and say that, from y0 = f f 1 (y0 ) it
0 0
follows that 1 = f 0 f 1 (y0 ) f 1 (y0 ), so that 1 = f 0 (x0 ) f 1 (y0 ), which is formula
(26.20). Unfortunately, we cannot use the chain rule because we are not sure (yet) that f 1
is derivable: indeed, this is what we rst need to prove in this theorem.

Proof Set f (x0 + h) = y0 + k and observe that, by the continuity of f , when h ! 0, also
k ! 0. By the de nition of inverse function, x0 = f 1 (y0 ) and x0 + h = f 1 (y0 + k).
Therefore, h = f 1 (y0 + k) f 1 (y0 ). By hypothesis, there exists

f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
But
f (x0 + h) f (x0 ) y0 + k y0 1
= 1 (y
= 1 (y 1 (y
h f 0 + k) f 1 (y0 ) f 0 + k) f 0)
k
Therefore, provided f 0 (x0 ) 6= 0, the limit of the ratio

f 1 (y + k) f 1 (y
0 0)
k

as k ! 0 also exists, and it is the reciprocal of the previous one, i.e., f 1 0 (y = 1=f 0 (x0 ).
0)

The derivative of the inverse function is thus given by a unit fraction in which at the
denominator the derivative f 0 has as its argument the preimage f 1 (y), that is,

1 0 1 1
f (y) = =
f 0 (x) f 0 (f 1 (y))

Example 1235 Let f : R ! R be the exponential function f (x) = ex , so that f 1 :


(0; 1) ! R is the logarithmic function f 1 (y) = log y. Given that dex =dx = ex = y, we
have
d log y 1 1 1 1
= 0 = x = log y =
dy f (x) e e y
for every y > 0. N
26.10. DERIVATIVE OF INVERSE FUNCTIONS 809

This example, along with the chain rule, yield the important formula
d log f (x) f 0 (x)
=
dx f (x)
for strictly positive derivable functions f . It is the logarithmic version of (26.18).

The last example, again along with the chain rule, also leads to an important generaliza-
tion of Proposition 1223.

Proposition 1236 The power function f : R ! R given by f (x) = xa , with a 2 R, is


derivable at each x 2 R, with derivative function f 0 : R ! R given by

f 0 (x) = axa 1

Proof We have
a
xa = elog x = ea log x (26.21)
Setting f (x) = ex and g (x) = a log x, from (26.21) it follows that
d (xa ) a a
= f 0 (g (x)) g 0 (x) = ea log x = xa = axa 1
dx x x
as desired.

Let us see two more examples.

Example 1237 Let f : [ =2; =2] ! R be given by f (x) = sin x, so that f 1 : [ 1; 1] !


[ =2; =2] is given by f 1 (y)= arcsin y. From (26.20) we have
d sin x p p
= cos x = 1 sin2 x = 1 y2
dx
and so
d arcsin y 1
=p
dy 1 y2
for every y 2 [ 1; 1]. In the same way we prove that
d arccos y 1
= p
dy 1 y2
for every y 2 [ 1; 1] . N

Example 1238 Let f : [ =2; =2] ! R be given by f (x) = tan x, so that f 1 : R !


[ =2; =2] is given by f 1 (y) = arctan y. From (26.20) we have
d tan x
= 1 + tan2 x = 1 + y 2
dx
and so, for every y 2 R,
d arctan y 1
=
dy 1 + y2
N
810 CHAPTER 26. DERIVATIVES

Condition f 0 (x0 ) 6= 0 cannot be omitted in Theorem 1234. Indeed, when this condition
fails, anything goes: the inverse may or may not be derivable at y0 = f (x0 ), something that
one has to check directly. The next example illustrates.

1
Example 1239 De ne f; g : R ! R by f (x) = x3 and g (x) = x 3 . We have:

(i) f 1 = g and f 0 (0) = 0; the inverse f 1 is not derivable at f (0) = 0.

(ii) g 1 = f and g is not derivable at 0; the inverse g 1 is derivable at g (0) = 0, with


0
g 1 (0) = 0.

So, anything goes when condition f 0 (x0 ) 6= 0 fails. N

O.R. Denoting by y = f (x) a function and by x = f 1 (y) its inverse, we can summarize
what we have seen by writing
dx 1
=
dy dy
dx
Again the symbol d =d behaves like a true ratio, a further proof of its Pinocchio nature.H

We relegate to an example the derivative of a function with variable base and exponent.

Example 1240 Let F : R ! R be the function given by F (x) = [f (x)]g(x) with f : R ! R+


and g : R ! R. Since one can write

g(x)
F (x) = elog[f (x)] = eg(x) log f (x)

the chain rule yields

f 0 (x)
F 0 (x) = eg(x) log f (x) D [g (x) log f (x)] = F (x) g 0 (x) log f (x) + g (x)
f (x)

For example, the derivative of F (x) = xx is

dxx 1
= xx log x + x = xx (1 + log x)
dx x

2
while the derivative of F (x) = xx is
2
dxx 2 1 2 +1
= xx 2x log x + x2 = xx (1 + 2 log x)
dx x
x
The reader can try to calculate the derivative of F (x) = xx . N
26.11. FORMULARY 811

26.11 Formulary

The chain rule permits to broaden considerably the scope of the results on the derivatives of
elementary functions seen in Section 26.7. In Example 1232 we already saw how to calculate
the derivative of a generic function ef (x) , which is much more general than the exponential
ex of Proposition 1224.

In a similar way it is possible to generalize all the results on the derivatives of elemen-
tary functions seen until now. We summarize all this in two tables: the rst one lists the
derivatives of elementary functions, while the second one contains its generalization that can
be obtained through the chain rule.

f f0 Reference
k 0 Example 1211
xa axa 1 Proposition 1236
ex ex Proposition 1224
x x log Proposition 1224
1
log x Example 1235
x
1
loga x Exercise for the reader
x log a
sin x cos x Proposition 1225
cos x sin x Observation 26.11
1
tan x = 1 + tan2 x Example 1229
cos2 x
1
cotanx = cotan2 x Exercise for the reader
sin2 x
1
arcsin x p Example 1237
1 x2
1
arccos x p Exercise for the reader
1 x2
1
arctan x Example 1238
1 + x2
1
arccotanx Exercise for the reader
1 + x2

Given their importance in so many contexts, it is useful to memorize the previous table,
as one learned as a child by heart the multiplication tables. Let us now see its general
version obtained through the chain rule. In the next table, f are the elementary functions
of the previous table, while g is any derivable function. Most of the derivatives that arise in
812 CHAPTER 26. DERIVATIVES

applications can be calculated by using properly this last table.

f g (f g)0 Image of g
a
g (x) ag (x)a 1 g 0 (x) A R
eg(x) g 0 (x) eg(x) A R
g(x) g 0 (x) g(x) log A R
g 0 (x)
log g (x) A (0; 1)
g (x)
g 0 (x) 1
loga g (x) A (0; 1)
g (x) log a
sin g (x) g 0 (x) cos g (x) A R
cos g (x) g 0 (x) sin g (x) A R
g 0 (x)
tan g (x) = g 0 (x) 1 + tan2 g (x) A R
cos2 g (x)
g 0 (x)
arcsin g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arccos g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arctan g (x) A R
1 + g 2 (x)

26.12 Di erentiability and linearity


When we introduced the notion of derivative at the beginning of the chapter, we emphasized
its meaning as a way to represent the incremental, \marginal", behavior of a function f :
(a; b) ! R at a point x0 2 (a; b). This section will show that the derivative can be seen
also from a di erent perspective, as a linear approximation of the increment of the function.
These two perspectives, with their interplay, are at the heart of di erential calculus.

26.12.1 Di erential
A fundamental question is whether it is possible to approximate a function f : (a; b) ! R
locally { that is, in a neighborhood of a given point of its domain { by an a ne function,
namely, by a straight line (recall Proposition 820). If this is possible, we could locally
approximate the function { even if very complicated { by the simplest function: a straight
line.
To make precise this idea, given a function f : (a; b) ! R and a point x0 2 (a; b), suppose
that there exists an a ne function r : R ! R that approximates f at x0 in the sense that

f (x0 + h) = r (x0 + h) + o (h) as h ! 0 (26.22)

for every h such that x0 + h 2 (a; b), i.e., for every h 2 (a x0 ; b x0 ).


When h = 0, the local approximation condition (26.22) becomes f (x0 ) = r (x0 ). This
condition thus requires two properties for a straight line r : R ! R to be considered an
26.12. DIFFERENTIABILITY AND LINEARITY 813

adequate approximation of f at x0 . First, the straight line must coincide with f at x0 , that
is, f (x0 ) = r (x0 ): at the point x0 the approximation must be exact, without any error.
Second, and most important, the approximation error

f (x0 + h) r (x0 + h)

at x0 + h is o (h), that is, as x0 + h approaches x0 , the error goes to zero faster than h: the
approximation is (locally) \very good".

Since the straight line r can be written as r (x) = mx + q, the condition f (x0 ) = r (x0 )
implies
r (x0 + h) = m (x0 + h) + q = mh + mx0 + q = mh + f (x0 )
Denote by l : R ! R the linear function de ned by l (h) = mh, which geometrically is
a straight line passing through the origin. The approximation condition (26.22) can be
equivalently written as

f (x0 + h) f (x0 ) = l (h) + o (h) as h ! 0 (26.23)

This expression (26.23) emphasizes the linearity of the approximation l (h) of the di erence
f (x0 + h) f (x0 ), as well as the goodness of this approximation: the di erence f (x0 + h)
f (x0 ) l (h) is o (h). This emphasis is important and motivates the following de nition.

De nition 1241 A function f : (a; b) ! R is said to be di erentiable at x0 2 (a; b) if there


exists a linear function l : R ! R such that

f (x0 + h) = f (x0 ) + l (h) + o (h) as h ! 0 (26.24)

for every h 2 (a x0 ; b x0 ).

In other words, the de nition requires that there exists a number m 2 R, independent
of h but dependent on x0 , such that

f (x0 + h) = f (x0 ) + mh + o (h) as h ! 0

Therefore, f is di erentiable at x0 if the linear function l (x) = mx approximates the dif-


ference f (x0 + h) f (x0 ) with an error that is o (h), so it goes to zero faster than h.
Equivalently, f is di erentiable at x0 if the a ne function r : R ! R given by

r (h) = f (x0 ) + l (h)

approximates f at x0 according to the condition (26.22).

De nition 1242 The linear function l : R ! R in (26.24) is called di erential of f at x0


and is denoted by df (x0 ) : R ! R.

With such a notation, (26.24) becomes7

f (x0 + h) = f (x0 ) + df (x0 ) (h) + o (h) as h ! 0 (26.25)


7
Note that h in df (x0 ) (h) is the argument of the di erential df (x0 ) : R ! R. In other words, df (x0 ) is
a function of the variable h, while x0 indicates the point at which the di erential approximates the function
f.
814 CHAPTER 26. DERIVATIVES

By setting h = x x0 , we can write (26.25) in the form

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (x x0 ) as x ! x0 (26.26)

which we will often use.

A nal piece of terminology: a function f : (a; b) ! R which is di erentiable at each


point of (a; b) is called di erentiable, without any further quali cation.

O.R. Di erentiability says that a function can be well approximated by a straight line { that
is, by the simplest type of function { at least nearby the point of interest. The approximation
is good in the close proximity of the point but, as we move away from it, in general its quality
deteriorates rapidly. Such an approximation, even if rough, however conveys at least two
valuable pieces of information:

(i) its mere existence ensures that the function is well behaved (it is continuous);

(ii) it reveals whether the function goes up or down and, with its slope, it tells us approx-
imately which is the rate of change of the function at the point studied.

These two pieces of information are often useful in applications. Chapter 29 will study
in more depth these issues and will present sharper local approximations. H

26.12.2 Di erentiability and derivability


The next key result shows that the two perspectives on derivability, incremental and of linear
approximation, are consistent. By recalling the geometric interpretation of the derivative
(Section 26.3), not surprisingly all this means that the tangent line is exactly the a ne
function that satis es condition (26.22).

Theorem 1243 A function f : (a; b) ! R is di erentiable at x0 2 (a; b) if and only if it is


derivable at this point. In this case, the di erential df (x0 ) : R ! R is given by

df (x0 ) (h) = f 0 (x0 ) h

The di erential at a point can be thus written in terms of the derivative at that point.
Inter alia, this also shows the uniqueness of the di erential df (x0 ).

Proof \If". Let f be a function derivable at x0 2 (a; b). We have


f (x0 + h) f (x0 ) f 0 (x0 ) h f (x0 + h) f (x0 )
lim = lim f 0 (x0 )
h!0 h h!0 h
f (x0 + h) f (x0 )
= lim f 0 (x0 ) = 0
h!0 h
that is f (x0 + h) f (x0 ) f 0 (x0 ) h = o (h). Setting m = f 0 (x0 ), this implies (26.24) and
therefore f is di erentiable at x0 .

\Only if". Let f be di erentiable at x0 2 (a; b). By (26.24),

f (x0 + h) f (x0 ) = l (h) + o (h) as h ! 0


26.12. DIFFERENTIABILITY AND LINEARITY 815

The linear function l : R ! R is a straight line passing through the origin, so there exists
m 2 R such that l (h) = mh. Hence

f (x0 + h) f (x0 ) l (h) + o (h)


lim = lim =m2R
h!0 h h!0 h
at x0 the limit of the di erence quotient exists and is nite and therefore f is derivable at
x0 .

Di erentiability and derivability are, therefore, equivalent notions for scalar functions.
When they hold, we have, as h ! 0,

f (x0 + h) = f (x0 ) + df (x0 ) (h) + o (h) = f (x0 ) + f 0 (x0 ) h + o (h) (26.27)

or, equivalently, as x ! x0 ,

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (h) (26.28)


0
= f (x0 ) + f (x0 ) (x x0 ) + o (x x0 )

The reader may recall from (26.7) that

r (x) = f (x0 ) + f 0 (x0 ) (x x0 ) (26.29)

is the equation of the tangent line at x0 . This con rms the natural intuition that such line
is the a ne approximation that makes f di erentiable at x0 . Graphically:

y
5

f(x +h)
4 0

3
f(x )
0
2

0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6

O.R. The di erence f (x0 + h) f (x0 ) is called the increment of f at x0 and is often denoted
by f (x0 ) (h). When f is di erentiable at x0 , we have

f (x0 ) (h) = df (x0 ) (h) + o (h)

So,
f (x0 ) df (x0 ) as h ! 0
816 CHAPTER 26. DERIVATIVES

when f 0 (x0 ) 6= 0. Indeed,

f (x0 ) (h) df (x0 ) (h) o (h) f 0 (x0 ) h o (h) o (h)


= + = + = f 0 (x0 ) + ! f 0 (x0 )
h h h h h h

The two in nitesimals f (x0 ) and df (x0 ) are, therefore, of the same order. This is another
way of saying that, when f is di erentiable at x0 , the di erential well approximates the true
increment. H

26.12.3 Di erentiability and continuity


A fundamental property of di erentiable functions, and therefore of derivable functions, is
continuity. In view of Theorem 1243, now Proposition 1221 can be regarded as a corollary
of the following result.

Proposition 1244 If f : (a; b) ! R is di erentiable at x0 2 (a; b), then it is continuous at


x0 .

The converse is clearly false, as shown by the absolute value function f (x) = jxj at
x0 = 0.

Proof By (26.28), we have

lim f (x) = lim f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


x!x0 x!x0
= f (x0 ) + f 0 (x0 ) lim (x x0 ) = f (x0 )
x!x0

Therefore, f is continuous at x0 .

26.12.4 A terminological turning point


In view of the equivalence established in Theorem 1243, from now on we say that a function
f : (a; b) ! R is \di erentiable" at x0 rather than \derivable". This is also in accordance with
the more standard terminology. The key conceptual distinction between the two viewpoints
embodied by derivability and di erentiability should be kept in mind, however, as it will be
key in multivariable calculus.
Another important piece of terminology involves the derivative function f 0 : D ! R of a
function f : (a; b) ! R (Section 26.4). Speci cally, when the derivative function f 0 is contin-
uous on a subset E of its domain, we say that the function f is continuously di erentiable on
E. That is, f is continuously di erentiable on E if its derivative is continuous at all points of
E. In particular, when D = E, the function is said to be continuously di erentiable, without
further speci cation. We denote by
C 1 (E)

the collection of all continuously di erentiable functions on a set E. When we write f 2


C 1 (E) we thus mean that the function f is continuously di erentiable on E.
26.13. HIGHER ORDER DERIVATIVES 817

26.13 Higher order derivatives


26.13.1 Generalities
Consider the derivative function
f0 : D ! R
of a function f : (a; b) ! R. Being itself a function, f 0 in turn admits a derivative at a point
x of its domain D, denoted by f 00 (x) and given by
f 0 (x + h) f 0 (x)
f 00 (x) = lim (26.30)
h!0 h
when this limit exists nite. In this case, we call f 00 (x) the second derivative of f : (a; b) ! R
at x and say that f is twice di erentiable at x.

Example 1245 (i) The quadratic function f : R ! R given by f (x) = x2 is twice di eren-
tiable at all points of the real line, with derivative function f 0 : R ! R given by

f 0 (x) = 2x

In turn, f 0 has derivative f 00 (x) = 2 at each x 2 R.


(ii) The logarithmic function f : (0; 1) ! R given by f (x) = log x is twice di erentiable
at all strictly positive points of the real line, with derivative function f 0 : (0; 1) ! R given
by
1
f 0 (x) =
x
0 00 2
In turn, f has derivative f (x) = x at each x > 0. N

Let
D0 D
be the domain of di erentiability of f 0 (i.e., the collection of points where the second deriva-
tive exists). The second derivative function f 00 : D0 ! R associates to every x 2 D0 the
second derivative f 00 (x).

Example 1246 (i) The quadratic function has second derivative f 00 : R ! R given by
f 00 (x) = 2 for all x 2 R. (ii) The logarithmic function has second derivative f 00 : (0; 1) ! R
given by f 00 (x) = x 2 . N

In turn, f 00 has derivative at a point x 2 D0 , denoted by f 000 (x) and given by


f 00 (x + h) f 00 (x)
f 000 (x) = lim
h!0 h
when this limit exists nite. In this case, we call f 000 (x) the third derivative of f at x and
say that f is three times di erentiable at x. If

D00 D0 (26.31)

denotes the domain of di erentiability of f 00 , we can write the third derivative function as
f 000 : D00 ! R.
818 CHAPTER 26. DERIVATIVES

Example 1247 (i) The quadratic function is three times di erentiable at all points of the
real line: its second derivative function f 00 : R ! R has derivative f 000 (x) = 0 at each x 2 R.
Thus, the third derivative function f 000 : R ! R of the quadratic function is the zero function.
(ii) The logarithmic function is three times di erentiable at all strictly positive points of
the real line: its second derivative function f 00 : R ! R has derivative f 000 (x) = 2x 3 at each
x > 0. So, the third derivative function f 000 : (0; 1) ! R of the logarithmic function is given
by f 000 (x) = 2x 3 .
(iii) De ne f : R ! R by
( 3
3
x for x 0
f (x) = jxj =
( x)3 for x < 0
We have ( (
3x2 for x 0 3x2 for x 0
0
f (x) = =
3 ( x)2 for x < 0 3x2 for x < 0
and (
6x for x 0
f 00 (x) = 6 jxj =
6x for x < 0
Thus, f 000 (0) does not exist and
(
6 for x > 0
f 000 (x) =
6 for x < 0

We conclude that f belongs to C 2 (R) but not to C 3 (R). N

We can iterate ad libitum, with fourth derivative, fth derivative, and so on. Denoting
by
f (n) : D(n 1)
!R
the n-th derivative function of f : (a; b) ! R, we can de ne by recurrence the di erentiability
of higher order of a function.

De nition 1248 A function f : (a; b) ! R which is n 1 times di erentiable at x 2 D(n 1)

is said to be n times di erentiable at x if the limit

f (n 1) (x + h) f (n 1) (x)
lim (26.32)
h!0 h
exists nite.

For n = 0 we put f (0) = f . When n = 1, we have ordinary di erentiability and the limit
(26.32) de nes the ( rst) derivative. When n = 2, this limit de nes the second derivative,
and so on.

Example 1249 (i) The quadratic function is n times di erentiable for each n 1. Its n-th
derivative function f (n) : R ! R is given, for each n 3, by f (n) (x) = 0:
(ii) The function f : R ! R given by f (x) = e x is n times di erentiable for each n 1.
Its n-th derivative function f (n) : R ! R is given by f (n) (x) = ( 1)n e x . N
26.13. HIGHER ORDER DERIVATIVES 819

Functions can be classi ed according to their degree of di erentiability. Speci cally, when
the derivative function f (n) : D(n 1) ! R is continuous on a subset E of its domain, we say
that f is n times continuously di erentiable on E. As usual, when E = (a; b) the function
is said to be n times continuously di erentiable, without further speci cation.
We denote by C n (E) the collection of all n times continuously di erentiable functions
on E. For n = 1 we go back to the class C 1 (E) of the continuously di erentiable functions,
previously introduced.

Example 1250 (i) The quadratic functions is n times continuously di erentiable, so f 2


C n (R), for all n 1. The logarithmic function is n times continuously di erentiable, so
n
f 2 C (0; 1), for all n 1.
(ii) Let f : R ! R be given by f (x) = x4 . At each x 2 R we have

f 0 (x) = 4x3 ; f 00 (x) = 12x2 ; f 000 (x) = 24x; f (iv) (x) = 24; f (v) (x) = 0

and f (n) (x) = 0 for every n 5. The function f is thus n times continuously di erentiable,
so f 2 C n (R), for all n 1.
(ii) The function f : R ! R given by
( 2
x for x > 0
f (x) =
0 for x 0

is C 1 (R) but not C 2 (R). N

As derivability implies continuity, we have the following simple but interesting result.

Proposition 1251 A function f : (a; b) ! R is in nitely di erentiable if and only if f 2


C n (a; b) for all n 1.8

In words, a function is in nitely di erentiable if and only if it has continuous derivative


functions of all orders. We denote by C 1 (E) the collection of all in nitely di erentiable
functions on E. By the last result,
1
\
1
C (E) = C n (E)
n=1

In nitely di erentiable functions are also called smooth.

Example 1252 The quadratic function and the function f : R ! R given by f (x) = x4
both belong to C 1 (R). N

Observe that the di erence quotient in (26.30) is meaningful when the point x 2 D is
interior, i.e., when
x 2 int D
8
In nitely di erentiable means that f has derivatives of all orders on (a; b). To ease notation we write
C (a; b) in place of C n ((a; b)).
n
820 CHAPTER 26. DERIVATIVES

This interiority condition ensures the existence of a neighborhood B" (x) = (x "; x + ") of
x contained in D, so that the values f 0 (x + h) are well de ned for all x + h 2 B" (x), i.e.,
for all " < h < ".
Thus, D0 consists of interior points, that is,

D0 int D

Similarly, we have:

D00 int D0 ; D000 int D00 ; ; D(n 1)


int D(n 2)

For this reason, in De nition 1248 we tacitly (to ease exposition) assumed that x 2 int D(n 1) ,
an hypothesis that we now make explicit. This awareness leads to a lemma that will come
in handy when studying Taylor approximations.

Lemma 1253 If f : (a; b) ! R is n times di erentiable at a point x0 2 (a; b), there is a


neighborhood B" (x0 ) (a; b) of x0 on which f is n 1 times di erentiable.

Proof Let f be n times di erentiable at x0 2 (a; b). Hence, x0 belongs to the domain D(n 1)
of the n-th derivative function f (n) . Since D(n 1) int D(n 2) , there exists a neighborhood
B" (x0 ) included in D(n 2) , i.e., in the domain of the (n 1)-th derivative function f (n 1) .
As, by de nition, f is n 1 times di erentiable on D(n 2) , so does on B" (x0 ).

In words, a function n times di erentiable at a point is n 1 times di erentiable over a


small enough neighborhood of that point. For instance, if f is twice di erentiable at a point
x0 , there is a small enough neighborhood B" (x0 ) of x0 over which f is di erentiable. A trade-
o emerges: di erentiability of some order at a point x0 gets upgraded to di erentiability
over an interval (x0 "; x0 + ") provided one gives up one order of di erentiability, moving
from n to n 1.

26.13.2 Higher order chain rule


The algebra of derivatives easily extends to higher order derivatives f (n) via simple induction
arguments on n. In particular, linear combinations f + g preserve di erentiability of order
n at a point x0 , with

( f + g)(n) (x0 ) = f (n) (x0 ) + g (n) (x0 ) (26.33)

This formula generalizes formula (26.12). The same is true for products f g, with the so-called
Leibnitz's formula
n
X n (n k) (k)
(f g)(n) (x0 ) = f g
k
k=0
This interesting formula reduces to the standard product formula (26.13) when n = 1:
1 (1) 1 (0)
(f g)0 (x0 ) = f (x0 ) g (0) (x0 ) + f (x0 ) g (1) (x0 ) = f 0 (x0 ) g (x0 ) + f (x0 ) g 0 (x0 )
0 1

Let us turn to composition. The chain rule (g f )0 (x) = g 0 (f (x)) f 0 (x) gives the rst
derivative of a composite function g f . It is sometimes important in applications to nd
26.13. HIGHER ORDER DERIVATIVES 821

the higher order derivatives of the composite function g f , that is, to extend the chain rule
to higher order derivatives. Some simple algebra shows that, when the involved derivatives
exist, we have:

(g f )00 (x) = g 00 (f (x)) f 0 (x)2 + g 0 (f (x)) f 00 (x) (26.34)


and

(g f )000 (x) = g 000 (f (x)) f 0 (x)3 + 3g 00 (f (x)) f 00 (x) f 0 (x) + g 0 (f (x)) f 000 (x) (26.35)

A pattern seems to emerge in these formulas. Indeed, the next result establishes a powerful
combinatorial formula that gives the derivatives of any order of a composite function, a
higher order chain rule.

Theorem 1254 (Faa di Bruno) Let f : (a; b) ! R and g : (c; d) ! R be two functions
with Im f (c; d). If f is n times di erentiable at x 2 (a; b) and g is n times di erentiable
at f (x), then the composite function g f : (a; b) ! R is n times di erentiable at x, with
!kn
(n)
X n! f 0 (x) k1
f 00 (x) k2
f (n) (x)
(g f ) (x) = g (k) (f (x))
k1 !k2 ! kn ! 1 2! n!
P
where k = k1 + + kn and the sum is over all natural numbers k1 , k2 , ..., kn that solve
the equation
n = k1 + 2k2 + + nkn

This combinatorial formula for the derivative (g f )(n) is the so-called Faa di Bruno
formula.9 For instance, for n = 2 the sum is over all natural numbers k1 and k2 that solve
the equation
2 = k1 + 2k2
There are two solutions. The rst one is k1 = 2 and k2 = 0, which gives the term
2 0
2! (2) f 0 (x) f 0 (x)
g (f (x)) = g 00 (f (x)) f 0 (x)2
2!0! 1 2!

The second solution is k1 = 0 and k2 = 1, which gives the term


0 1
2! (1) f 0 (x) f 0 (x)
g (f (x)) = g 0 (f (x)) f 0 (x)
0!1! 1 2!

Formula (26.34) is thus a special case of the Faa di Bruno formula. The reader can check
that also (26.35) is a special case of this formula for n = 3.

Example 1255 For n = 4, the equation

4 = k1 + 2k2 + 3k3 + 4k4


9
This formula was established by Francesco Faa di Bruno in 1855, though an earlier version was proved
by Louis Arbogast in 1800.
822 CHAPTER 26. DERIVATIVES

is solved by the following ve quadruples (k1 ; k2 ; k3 ; k4 ) of natural numbers:


(0; 0; 0; 1) ; (2; 1; 0; 0) ; (1; 0; 1; 0) ; (4; 0; 0; 0) ; (0; 2; 0; 0)
These quadruples give the terms:
4! (1) f (4) (x)
(0; 0; 0; 1) =) g (f (x)) = g 0 (f (x)) f (4) (x)
1! 4!
4! (3) 00
2 f (x)
(2; 1; 0; 0) =) g (f (x)) f 0 (x) = 6g 000 (f (x)) f 0 (x)2 f 00 (x)
2!1! 2!
4! (2) f 000 (x)
(1; 0; 1; 0) =) g (f (x)) f 0 (x) = 4g 00 (f (x)) f 0 (x) f 000 (x)
1!1! 3!
4! (4) 4
(4; 0; 0; 0) =) g (f (x)) f 0 (x) = g (iv) (f (x)) f 0 (x)4
4!
4! (2) f 00 (x) 2
(0; 2; 0; 0) =) g (f (x)) = 3g 00 (f (x)) f 00 (x)2
2! 2!
By the Faa di Bruno formula, we thus have
(g f )(iv) (x) = g 0 (f (x)) f (4) (x) + g 00 (f (x)) (4f 0 (x) f 000 (x) + 3f 00 (x)2 )
+6g 000 (f (x)) f 0 (x)2 f 00 (x) + g (iv) (f (x)) f 0 (x)4
N

26.14 Discrete limits


We conclude by showing that the di erential analysis of this chapter is closely connected
with the discrete calculus of Chapter 10. Given a function f : R ! R, x x0 2 R and h > 0.
Set an = f (x0 + nh) for every n 0.10 De ne the di erence quotients:
a0 2a ka
2 0 k 0
hf (x0 ) = ; hf (x0 ) = ; ; hf (x0 ) =
h h2 hk
We have:
a0 f (x0 + h) f (x0 )
hf (x0 ) = =
h h
2a 1 f (x0 + 2h) 2f (x0 + h) + f (x0 )
2 0
hf (x0 ) = = 2 ( a1 a0 ) =
h2 h h2

k
1 X k
k
hf (x0 ) = ( 1)k i
f (x0 + ih)
hk i
i=0

where the last equality follows from (10.5). By de nition the rst derivative is the limit, as
h approaches 0, of the di erence quotient h f (x0 ). Interestingly, the next result shows that
also the second di erence quotient converges to the second derivative, the third di erence
quotient converges to the third derivative, and so on.
10
Here it is convenient to start the sequence at n = 0.
26.14. DISCRETE LIMITS 823

Proposition 1256 Let f be n 1 times di erentiable on R and n times di erentiable at


x0 . We have f (k) (x0 ) = limh!0 kf
h (x0 ) for all 1 k n.

Proof We only prove the case n = 2. In Chapter 29 we will establish the following quadratic
approximation:
1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2
2
Then, f (x0 + 2h) = f (x0 ) + 2f 0 (x0 ) h + 2f 00 (x0 ) h2 + o h2 , so

f (x0 + 2h) 2f (x0 + h) + f (x0 ) = f 00 (x0 ) h2 + o h2

as desired.11

Conceptually, this result shows that derivatives can be viewed as limits of nite di er-
ences, so the \discrete" and \continuous" calculi are consistent. Indeed, some important
continuous properties can be viewed as inherited, via limits, from discrete ones: for instance,
the algebra of derivatives can be easily deduced from that of nite di erences via limits. All
this is important (and, in a sense, reassuring) because discrete properties are often much
easier to grasp intuitively.
By establishing a \direct" characterization of second and of higher order derivatives, this
proposition is also important for their numerical computation. For instance, inspection of
the proof shows that f 00 (x0 ) = 2h f (x0 ) + o(h2 ). In general, 2h f (x0 ) is much easier to
compute numerically than f 00 (x0 ), with o(h2 ) being the magnitude of the approximation
error.

A leap of faith Consider a function f : R ! R. Fix a point x0 2 R and an integer n 1.


Let x be any point in R, say x x0 . Set
x x0
h=
n
and xi = x0 + ih for i = 1; :::; n. So, x0 x1 xn = x and the n points xi form
an evenly-spaced subdivision of the interval [x0 ; x]. The choice of n determines how ne the
subdivision is: larger values of n correspond to ner subdivisions.
By the Newton di erence formula (10.14), we have12
n
X n
X
n(k) k n(k) k
f (x) = f (x0 + nh) = an = a0 = hf (x0 ) hk
k! k!
k=0 k=0

We thus get the noteworthy formula


n
X kf
n(k) (x0 )
f (x) = h
(x x0 )k 8x 2 R
nk k!
k=0
11
For a direct proof of this result, we refer readers to Jordan (1893) pp. 116-118.
12
A notation short circuit: here n plays the role of m in (10.14), k that of j, while in the notation of (10.14)
here we have n = 0.
824 CHAPTER 26. DERIVATIVES

So far so good. Yet, from this formula one might be tempted to take ner and ner
subdivisions by letting n ! +1. For each k we have
n(k) nk
as well as
k
hf (x0 ) f (k) (x0 )
provided f is in nitely di erentiable. Indeed, by Proposition 1256 we have kh f (x0 ) !
f (k) (x0 ) as h ! 0, so as n ! +1. Unfortunately, the equivalence relation does not
necessarily go through sums, let alone through in nite ones (cf. Lemma 355). Yet, if we take
a leap of faith { in a eighteen century style { we \then" have a series expansion
1
X f (k) (x0 )
f (x) (x x0 )k 8x 2 R
k!
k=0

Fortunately, later in the book Chapter 30 will make rigorous all this by showing that in nitely
di erentiable functions that are analytic admit an (exact) series expansion, something that
makes them the most tractable class of functions. Though rough, the previous heuristic
argument based on discrete calculus thus opens a door on a key topic.

26.15 Coda: Weierstrass' monster


For a long time it was rmly believed { often on the basis of some (fallacious) geometric
intuition { that continuous functions, though possibly not di erentiable at some points (as
the absolute value functions forcefully shows), had to be nevertheless di erentiable at some
points of their domain.
It thus came as a total surprise, even as a shock, the proof of Karl Weierstrass of the
existence of a continuous function which is nowhere di erentiable, that is, not di erentiable
at any point of its domain. That happened in 1872: next we state, without proof, his stark
nding (in a more general version proved by Godfrey Hardy in 1916).

Theorem 1257 (Weierstrass) The function f : R ! R de ned by


1
X
f (x) = an cos (bn x)
n=0

where 0 < a < 1 < b, with ab 1, is continuous and nowhere di erentiable.

For example, the function


1
X 1
f (x) = cos (3n x) (26.36)
2n
n=0

satis es the hypotheses of the theorem, so it is continuous and nowhere di erentiable. This
function is de ned, point by point, via a trigonometric series that can be proved to be
convergent. For instance, at x = 0 we have
X1 X1
1 1 1
f (0) = n
cos (0) = n
= 1 =2
2 2 1 2
n=0 n=0
26.15. CODA: WEIERSTRASS' MONSTER 825

Though f is de nitely not a friendly function, yet we can have a sense of its behavior via its
graph:

Nowhere di erentiable continuous functions have been regarded as oddities, even as


\monsters". It was thus a second, even major surprise, to learn in the 1930s that this
kind of functions describe the typical trajectory of the Brownian motion, a fundamental
random phenomenon which is pervasive in applications. A prominent economic example of
such phenomena are stock prices, whose great volatility generates a bumpy behavior over
time of the kind described in the last graph.
So, these \monsters" eventually took center stage (readers will encounter them again in
courses in probability theory and in nance). Here they are also a powerful illustration of
how our \ nitary" intuition may miserably fail even on the real line, which should be the
most familiar uncountably in nite set. Let alone on other, less familiar, such sets.
We close by somehow vindicating the original intuition that continuous functions have
to be di erentiable somewhere, that is, at least at some points of their domain. A stronger
notion of continuity is, however, needed.

Theorem 1258 (Rademacher) A locally Lipschitz continuous function f : (a; b) ! R is


somewhere di erentiable.

We omit the proof of this result, a version of a much more general theorem due to Hans
Rademacher.13 To appreciate its scope, recall that concave and convex functions de ned on
open convex sets are locally Lipschitz continuous (Theorem 904).

13
He actually quali ed the \somewhere" by showing that the set of points at which the function is not
di erentiable is small in a well de ned sense (it has \measure zero", as readers will learn in more advanced
courses).
826 CHAPTER 26. DERIVATIVES
Chapter 27

Di erential calculus in several


variables

27.1 Partial derivatives

27.1.1 The notion

Our study of di erential calculus has so far focused on functions of a single variable. Its
extension to functions of several variables is a fundamental, but subtle, topic. We can begin,
however, with a simple notion of di erentiation in Rn : partial di erentiation. Let us start
with the two-dimensional case. Consider the origin x = (0; 0) in the plane. There are,
intuitively, two main directions along which to approach the origin: the horizontal one {
that is, moving along the horizontal axis { and the vertical one { that is, moving along the
vertical axis.

0.8

0.6

0.4

0.2

0
O
-0.2

-0.4

-0.6

-0.8

-1
-1 -0.5 0 0.5 1

As we can approach the origin along the two main directions, vertical and horizontal, the
same can be done for any point x in the plane.

827
828 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

0.8

0.6

0.4

0.2

x0
2
-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

To formalize this intuition, let us consider the two versors e1 = (1; 0) and e2 = (0; 1) in
R2 . For every x = (x1 ; x2 ) 2 R2 and every scalar h 2 R, we have

x + he1 = (x1 ; x2 ) + (h; 0) = (x1 + h; x2 )

Graphically

0.8

0.6

0.4

0.2
x 1
x + he
x0
2
-0.2

-0.4

-0.6

O x x +h
-0.8 1 1

-1
-1 -0.5 0 0.5 1

The set
x + he1 : h 2 R

is, therefore, formed by the vectors of R2 with the same second coordinate, but with a
di erent rst coordinate.
27.1. PARTIAL DERIVATIVES 829

0.8

0.6

0.4

0.2
1
x x { x + he , h ∈ ℜ }
02

-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

Graphically, it is the horizontal straight line that passes through the point x. For example,
if x is the origin (0; 0), the set

x + he1 : h 2 R = f(h; 0) : h 2 Rg

is the horizontal axis. Similarly, for every scalar h 2 R we have

x + he2 = (x1 ; x2 ) + (0; h) = (x1 ; x2 + h)

Graphically

0.8

0.6
x x
2
0.4

0.2

x + h0 2
2 x + he
-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

In this case the set x + he2 : h 2 R is formed by the vectors of R2 with the same rst
coordinate, but with a di erent second coordinate.
830 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

1 2
{ x + he , h ∈ ℜ }
0.8

0.6
x x
2
0.4

0.2

-0.2

-0.4

-0.6

O x
-0.8 1

-1
-1 -0.5 0 0.5 1

Graphically, it is the vertical straight line that passes through the point x. When x is the
origin (0; 0), the set x + he2 : h 2 R is the vertical axis.

The partial derivative @f =@x1 (x) of a function f : R2 ! R at a point x 2 R2 considers


the e ect on f of in nitesimal variations along the horizontal straight line x + he1 : h 2 R ,
while the partial derivative @f =@x2 (x) considers the e ect on f of in nitesimal variations
along the vertical straight line x + he2 : h 2 R . In other words, we study the function f
at x by moving along the two basic directions parallel to the Cartesian axes. In particular,
we de ne the partial derivatives at x as the limits1
@f f x + he1 f (x) f (x1 + h; x2 ) f (x1 ; x2 )
(x) = lim = lim (27.1)
@x1 h!0 h h!0 h
@f f x + he2 f (x) f (x1 ; x2 + h) f (x1 ; x2 )
(x) = lim = lim (27.2)
@x2 h!0 h h!0 h
when they exist nite.

Though key for understanding the meaning of partial derivatives, (27.1) and (27.2) are
less useful to compute them. To this end, for a xed x 2 R2 we introduce the two auxiliary
scalar functions, called projections, '1 ; '2 : R ! R de ned by

'1 (t) = f (t; x2 ) ; '2 (t) = f (x1 ; t)


Note that 'i is a function of only the i-th variable, denoted by t, while the other variable
is kept constant. It is immediate to see that for the partial derivatives @f =@xi at the point
x 2 R2 we have
@f ' (x1 + h) '1 (x1 )
(x) = lim 1 = '01 (x1 ) (27.3)
@x1 h!0 h
@f ' (x2 + h) '2 (x2 )
(x) = lim 2 = '02 (x2 ) (27.4)
@x2 h!0 h
1
The symbol @, a stylized d, takes the place of d to stress that we are not dealing with functions of a single
variable.
27.1. PARTIAL DERIVATIVES 831

The partial derivative @f =@xi is nothing but the ordinary derivative '0i of the scalar function
'i calculated at t = xi , with i = 1; 2. Thus, using the auxiliary functions 'i we go back
to the di erentiation of scalar functions studied in the last chapter. Formulas (27.3) and
(27.4) are very useful for the computation of partial derivatives, which is thus reduced to
the computation of standard derivatives of scalar functions.

Example 1259 (i) Let f : R2 ! R be given by f (x1 ; x2 ) = x1 x2 . Let us compute the


partial derivatives of f at x = (1; 1). We have
'1 (t) = f (t; 1) = t ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 1 and at the point t = 1 we have
'02 ( 1) = 1, which implies
@f @f
(1; 1) = '01 (1) = 1 ; (1; 1) = '02 ( 1) = 1
@x1 @x2
More generally, at any point x 2 R2 we have
'1 (t) = tx2 ; '2 (t) = x1 t
Therefore, their derivatives at the point x are '01 (x1 ) = x2 and '02 (x2 ) = x1 . Hence,
@f @f
(x) = '01 (x1 ) = x2 ; (x) = '02 (x2 ) = x1
@x1 @x2

(ii) Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x2 . Let us compute the partial derivatives
of f at x = (1; 2). We have
'1 (t) = f (t; 2) = 2t2 ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 4 and at the point t = 2 we have '02 (2) = 1.
Hence,
@f @f
(1; 2) = '01 (1) = 4 ; (1; 2) = '02 (2) = 1
@x1 @x2
Again, more generally, at any point x 2 R2 we have
'1 (t) = t2 x2 ; '2 (t) = x21 t
Therefore, their derivatives at the point x are '01 (x1 ) = 2x1 x2 and '02 (x2 ) = x21 , so
@f @f
(x) = '01 (x1 ) = 2x1 x2 ; (x) = '02 (x2 ) = x21
@x1 @x2
N

Thus, to calculate @f =@x1 (x) we considered f as a function of the single variable x1 ,


keeping constant the other variable x2 , and we calculated its standard derivative at x1 .
This is what, implicitly, the projection '1 did. Similarly, to calculate @f =@x2 (x) through
the projection '2 amounts to considering f as a function of the single variable x2 , keeping
constant the other variable x1 , and calculating its standard derivative at x2 . Once all this
has been understood, we can skip a step and no longer mention explicitly projections. The
calculation of partial derivatives then essentially reduces to that of standard derivatives.
832 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Example 1260 Let f : R R++ ! R be given by f (x1 ; x2 ) = x1 log x2 . Let us calculate


the partial derivatives at x 2 R R++ . We start with @f =@x1 (x). If we consider f as a
function of the single variable x1 , its derivative is log x2 . Therefore,
@f
(x) = log x2
@x1
On the other hand, '1 (t) = t log x2 , and therefore at the point t = x1 we have '01 (x1 ) =
log x2 . Let us move to @f =@x2 (x). If we consider f as a function of the single variable x2 ,
its derivative is x1 =x2 . Therefore,
@f x1
(x) =
@x2 x2
N
O.R. Geometrically, at a point (x1 ; x2 ) the projection '1 (t) = f (t; x2 ) is obtained by
sectioning the surface that represents f with the vertical plane of equation x2 = x2 , while the
projection '2 (t) = f (x1 ; t) is obtained by sectioning the same surface with the vertical plane
(perpendicular to the previous one) of equation x1 = x1 . Therefore, as with a panettone,
the surface is cut with two planes perpendicular one another: the projections are nothing
but the shapes of the two slices and, as such, scalar functions (whose graph lies on the plane
with which we cut the surface).

The partial derivatives at (x1 ; x2 ) are therefore simply the slopes of the two projections at
this point. H

The notion of partial derivative extends in a natural way to functions of n variables


by considering the versors e1 = (1; 0; :::; 0), e2 = (0; 1; :::; 0), ..., en = (0; 0; :::; 1) of Rn .
Throughout the chapter we consider functions f : U ! R de ned (at least) on an open set
U in Rn .
De nition 1261 A function f : U ! R is said to be partially derivable at a point x 2 U
if, for each i = 1; 2; :::; n, the limits
f x + hei f (x)
lim (27.5)
h!0 h
27.1. PARTIAL DERIVATIVES 833

exist and are nite. These limits are called the partial derivatives of f at x.

The limit (27.5) is the i-th partial derivative of f at x, denoted by either fx0 i (x) or

@f
(x)
@xi
Often, it is actually convenient to write

@f (x)
@xi
The choice among these alternatives will be just a matter of convenience. The vector

@f @f @f
(x) ; (x) ; :::; (x) 2 Rn
@x1 @x2 @xn

of the partial derivatives of f at x is called the gradient of f at x, denoted by rf (x) or,


simply, by f 0 .2
When f is partially derivable at all the points of a subset E of U , for brevity we say that
f is partially derivable on E. When f is partially derivable at all the points of its domain,
it is called partially derivable, without further speci cation.
Clearly, partial derivability reduces to standard derivability when f is a scalar function.

Also in the general case of n independent variables, to calculate the partial derivatives
at a point x one can introduce the projections 'i de ned by

'i (t) = f (x1 ; : : : ; xi 1 ; t; xi+1 ; : : : ; xn ) 8i = 1; 2; : : : ; n

Using the scalar function 'i , we have

@f (x) ' (xi + h) 'i (xi )


= lim i = '0i (xi ) 8i = 1; 2; : : : ; n
@xi h!0 h

which generalizes to Rn formulas (27.3) and (27.4), reducing in this case, too, the calculation
of partial derivatives to that of standard derivatives of scalar functions.

Example 1262 Let f : R4 ! R be de ned by f (x1 ; x2 ; x3 ; x4 ) = x1 + ex2 x3 + 2x24 . At each


point x 2 Rn we have

'1 (t) = t + ex2 x3 + 2x24 ; '2 (t) = x1 + etx3 + 2x24


'3 (t) = x1 + ex2 t + 2x24 ; '4 (t) = x1 + ex2 x3 + 2t2

and therefore

'01 (t) = 1 ; '02 (t) = x3 etx3


'03 (t) = x2 ex2 t ; '04 (t) = 4t
2
The symbol r is called nabla.
834 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Hence
@f @f
(x) = '01 (x1 ) = 1 ; (x) = '02 (x2 ) = x3 ex2 x3
@x1 @x2
@f @f
(x) = '03 (x3 ) = x2 ex2 x3 ; (x) = '04 (x4 ) = 4x4
@x3 @x4
By putting them together, we have the gradient
rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )
N
As in the special case n = 2, also in the general case to calculate the partial derivative
@f (x) =@xi through the projection 'i amounts to considering f as a function of the single
variable xi , keeping constant the other n 1 variables. We then calculate the ordinary
derivative at xi of this scalar function. In other words, we study the incremental behavior
of f with respect to variations of xi only, by keeping constant the other variables.

27.1.2 A continuity failure


The following example shows that for functions of several variables, with n 2, the existence
of partial derivatives does not imply continuity, contrary to the scalar case n = 1.
Example 1263 The function f : R2 ! R de ned by
(
0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0
is partially derivable at the origin, but is discontinuous there. Intuitively, this happens
because the function is 0 on the axes and 1 o the axes. Formally, x any 0 < " < 1.
Consider the points of the straight line x2 = x1 di erent from the origin, that is, the set
of the points (t; t) with t 6= 0.3 We have f (t; t) = 1 and each neighborhood of the origin
B (0; 0) contains (in nitely many) such points. Therefore,
jf (t; t) f (0; 0)j = j1 0j = 1 > " 8t 6= 0
Hence, for every 0 < " < 1 there is no neighborhood B (0; 0) such that
jf (x) f (0; 0)j < " 8x 2 B (0; 0)
This shows that f is not continuous at (0; 0).
Let us now consider the partial derivatives of f at (0; 0). We have
@f f (h; 0) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x1 h!0 h h!0 h
and
@f f (0; h) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x2 h!0 h h!0 h
so that rf (0; 0) = (0; 0). In conclusion, f is partially derivable at (0; 0) but is not continuous
at (0; 0). N
As it will be shortly seen in Section 27.2, in Rn a notion of di erentiability is required to
guarantee both continuity and derivability.
3
We can actually choose any straight line passing through the origin, except the axes.
27.1. PARTIAL DERIVATIVES 835

27.1.3 Derivative operator


The set D U of the points of the domain where a function f : U ! R is partially derivable
is called, as in the scalar case (Section 26.4), the domain of (partial ) derivability of f .
Since the gradient is a vector of Rn , to extend to vector functions the notion of derivative
function it is necessary to consider operators.

De nition 1264 Let f : U ! R be a function with domain of derivability D U . The


operator
@f @f
rf = ; :::; : D ! Rn (27.6)
@x1 @xn
that associates to every x 2 D the gradient rf (x) is called the derivative operator.

The derivative function f 0 : D ! R is recovered in the special case n = 1.

Example 1265 Taking again Example 1262, let f : R4 ! R be given by f (x1 ; x2 ; x3 ; x4 ) =


x1 + ex2 x3 + 2x24 . It is easy to check that the derivative operator rf : R4 ! R4 is given by

rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )

As emphasized in (27.6), the operator rf : D ! Rn can be regarded (cf. Section 13.7) as


the n-tuple (@f =@x1 ; :::; @f =@xn ) of functions of several variables, i.e., its partial derivatives
@f =@xi : D Rn ! R.

Example 1266 The partial derivatives

@f @f @f
(x) = x2 x3 ; (x) = x1 x3 ; (x) = x1 x2
@x1 @x2 @x3

of the function f (x1 ; x2 ; x3 ) = x1 x2 x3 are functions on all R3 . Together they form the
derivative operator

@f @f @f
rf (x) = (x) ; (x) ; (x) = (x2 x3 ; x1 x3 ; x1 x2 )
@x1 @x2 @x3

of f . N

27.1.4 Ceteris paribus: marginal analysis


Partial derivability is a ceteris paribus approach, a methodological principle that studies the
e ect of a single explanatory variable by keeping xed the other ones, so not to confound
matters. It informs much of the economic analysis, in particular the all-important marginal
analysis in which partial derivatives play, indeed, a fundamental role. Here we consider two
classic examples.
836 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Production Let f : A Rn+ ! R+ be a production function which speci es that the


producer is able to transform a vector x 2 Rn+ of inputs into the quantity f (x) of output.
The partial derivative
@f (x)
(27.7)
@xi
quanti es the variation in the output produced that the producer obtains for in nitesimal
variations of the i-th input, when the values of the other inputs are kept xed.
In other words, the partial derivative (27.7) isolates the e ect on the output caused by
variations in the i-th input, ceteris paribus { that is, by keeping xed the quantities of the
other inputs. The partial derivative (27.7) is called the marginal product of input i, with
i = 1; 2; : : : ; n, and plays a key role in the production decisions of producers.

Utility Let u : A Rn ! R be a utility function. If we assume that u has a cardinal


interpretation, i.e., that u (x) quanti es the pleasure obtained by consuming the bundle x,
then the di erence
u x + hei u (x) (27.8)
indicates the variation in pleasure that the consumer experiences when one varies the quantity
consumed of the good i in the bundle x, ceteris paribus, that is, when the quantities consumed
of the other goods are kept xed. It follows that the partial derivative
@u (x)
(27.9)
@xi
quanti es the variation in pleasure that the consumer enjoys for in nitesimal variations of
the good i, the quantities consumed of the other goods being xed. It is called the marginal
utility of the good i in the bundle x and it is central in the cardinalist vision of consumer
theory.
In the ordinalist approach, instead, marginal utilities are no longer meaningful because
the di erences (27.8) have no meaning. It is easy to construct examples in which we have

u x + hei > u (x) and (g u) x + hei < (g u) (x)

with g : R ! R strictly increasing. Since u and g u are utility functions that are equivalent
from the ordinal point of view, this shows that the di erences (27.8) per se have no meaning.
For this reason, the ordinalist consumer theory uses marginal rates of substitution and not
marginal utilities { as we will see in Section 34.3.2. Nevertheless, marginal utility remains a
notion commonly used in economics because of its intuitive appeal.

27.2 Di erential
The notion of di erential introduced in De nition 1241 naturally extends to functions of
several variables.

De nition 1267 A function f : U ! R is said to be di erentiable at a point x 2 U if there


exists a linear function l : Rn ! R such that

f (x + h) = f (x) + l (h) + o (khk) as khk ! 0 (27.10)


27.2. DIFFERENTIAL 837

for every h 2 Rn such that x + h 2 U .4

The linear function l is called the di erential of f at x, denoted by df (x) : Rn ! R. The


di erential is the linear approximation at the point x of the variation f (x + h) f (x) with
error of magnitude o (khk), that is,5

f (x + h) f (x) = df (x) (h) + o (khk) (27.11)

i.e.,
f (x + h) f (x) df (x) (h) o (khk)
lim = lim =0
h!0 khk h!0 khk

By Riesz's Theorem, the linear function df (x) : Rn ! R has the representation

df (x) (h) = h

for a suitable vector 2 Rn . The next important theorem identi es such a vector and shows
that di erentiability guarantees both continuity and partial derivability.

Theorem 1268 If f : U ! R is di erentiable at x 2 U , then it is both continuous and


partially derivable at that point, with
n
X @f (x)
df (x) (h) = rf (x) h = hi (27.12)
@xi
i=1

for every h = (h1 ; :::; hn ) 2 Rn .

Approximation (27.11) thus takes the remarkable form

f (x + h) = f (x) + rf (x) h + o (khk) (27.13)

When f is scalar we nd again the classic expression

df (x) (h) = f 0 (x) h 8h 2 R

of the di erential in the scalar case (Theorem 1243). In this case, approximation (27.13)
thus reduces to the scalar one (26.27).

Proof Let f : U ! R be di erentiable at x 2 U . By (27.10), we can write

lim f (x + h) = lim (f (x) + l (h) + o (khk)) (27.14)


h!0 h!0
= lim f (x) + lim l (h) + lim o (khk)
h!0 h!0 h!0

But:
4
In the scalar case the clause \for every h 2 Rn such that x0 + h 2 U " reduces to the clause \for every
h 2 (x0 a; b x0 )" of De nition 1241.
5
As in the scalar case, note that h is in df (x) (h) the argument of the di erential df (x) : Rn ! R. In
other words, df (x) is a function of the variable h, while x denotes the speci c point at which the di erential
approximates the function f .
838 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

(i) limh!0 l (h) = l (0) = 0 since linear functions l : Rn ! R are continuous (Theorem
646);
(ii) by the de nition of little-o, limh!0 o (khk) = 0.

Therefore, (27.14) implies limh!0 f (x + h) = f (x), so the function is continuous at x.

To show the existence of partial derivatives at x, let us consider the case n = 2 (the
general case does not present novelties, except of notation). In this case, (27.10) implies the
existence of = ( 1 ; 2 ) 2 R2 such that
f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 ) 1 h1 2 h2
lim p =0 (27.15)
(h1 ;h2 )!(0;0) h21 + h22
Setting h2 = 0 in (27.15), we have
f (x1 + h1 ; x2 ) f (x1 ; x2 ) 1 h1 f (x1 + h1 ; x2 ) f (x1 ; x2 ) 1 h1
0 = lim = lim
h1 !0 jh1 j h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
= lim 1
h1 !0 h
and therefore
f (x1 + h1 ; x2 ) f (x1 ; x2 ) @f (x1 ; x2 )
1 = lim =
h1 !0 h @x1
In a similar way it is possible to prove that 2 = @f (x1 ; x2 ) =@x2 , that is, rf (x1 ; x2 ) = .
In conclusion, both partial derivatives exist, so the function f is partially derivable, with
df (x1 ; x2 ) (h1 ; h2 ) = rf (x1 ; x2 ) (h1 ; h2 )
This proves (27.12).

To illustrate approximation (27.13), so the strength of di erentiability, in the next ex-


ample we report a curious fact about partial derivatives that di erentiability entails.

Example 1269 If f : U R2 ! R is di erentiable at x 2 U , then


f (x1 + h; x2 + h) f (x1 ; x2 + h) @f (x)
lim =
h!0 h @x1
Indeed, by (27.13) we have
@f (x)
f (x1 ; x2 + h) = f (x1 ; x2 ) + h + o (h)
@x2
as well as
@f (x) @f (x)
f (x1 + h; x2 + h) = f (x1 ; x2 ) + h+ h + o (h)
@x1 @x2
p p
because o( h2 + h2 ) = o( 2 jhj) = o (h). Hence,
@f (x) @f (x) @f (x)
f (x1 + h; x2 + h) f (x1 ; x2 + h) @x1 h + @x2 h + o (h) @x2 h + o (h) @f (x)
lim = lim =
h!0 h h!0 h @x1
as desired. A similar argument holds for x2 .
27.2. DIFFERENTIAL 839

Denoting by x0 the point at hand and setting x = x0 + h, expression (27.12) can be


rewritten as
df (x0 ) (x x0 ) = rf (x0 ) (x x0 )

So, the a ne function r : Rn ! R de ned by

r (x) = f (x0 ) + rf (x0 ) (x x0 ) (27.16)

generalizes the tangent line (26.29). The approximation (27.10) assumes the form f (x) =
r (x) + o (kx x0 k), that is,

f (x) = f (x0 ) + rf (x0 ) (x x0 ) + o (kx x0 k)

This vector form generalizes the scalar one (26.26).

In the special case n = 2, the a ne function (27.16) that best approximates a function
f : U R2 ! R at a point x0 = (x01 ; x02 ) 2 U takes the form6

@f (x0 ) @f (x0 )
r(x1 ; x2 ) = f (x01 ; x02 ) + (x1 x01 ) + (x2 x02 )
@x1 @x2

It is called the tangent plane to f at the point x0 = (x01 ; x02 ). Graphically:

4
x3

-2

-4 -2
2 -1
1 0
0 1
-1
-2 2
x2
x1

For n 3, the a ne function (27.16) that best approximates a function in the neighborhood
of a point x0 of its domain is called tangent hyperplane. For obvious reasons, it cannot be
visualized graphically.
We close with a piece of terminology. When f is di erentiable at all the points of a subset
E of U , for brevity we say that f is di erentiable on E. When f is di erentiable at all the
points of its domain, it is called di erentiable, without further speci cation.
6
Here x01 and x02 denote the components of the vector x0 .
840 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

27.2.1 Di erentiability and partial derivability


Partial derivability does not imply continuity when n 2 (Example 1263). In view of the
last theorem, partial derivability then does not imply di erentiability, again unlike the scalar
case n = 1. The next example illustrates this failure.
p
Example 1270 Let f : R2 ! R be given by f (x1 ; x2 ) = jx1 x2 j. In this case, we have
@f f (h; 0) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x1 h!0 h h!0 h
and
@f f (0; k) f (0; 0) 0 0
(0; 0) = lim = lim =0
@x2 k!0 k k!0 k
Therefore, f has partial derivatives at (0; 0) and rf (0; 0) = (0; 0). On the other hand, f is
not di erentiable at (0; 0). Indeed, by contradiction, assume that f is di erentiable at (0; 0).
In this case, we would have
p
f (h; k) = f (0; 0) + rf (0; 0) (h; k) + o h2 + k 2
p
Since f (0; 0) = 0 and rf (0; 0) = (0; 0), we could conclude that f (h; k) = o h2 + k 2 . By
using the de nition of and this fact, we would obtain
p r
o h2 + k 2 f (h; k) jhkj
0= lim p = lim p =
(h;k)!(0;0) 2
h +k 2 (h;k)!(0;0) 2
h +k 2 h + k2
2

But, this is not possible. Indeed, if for example we consider the points on the straight line
x2 = x1 , that is, of the form (t; t), we get
r r r
jhkj t2 1
= = 8t 6= 0
h2 + k 2 t2 + t2 2
This shows that f is not di erentiable at (0; 0),7 even if it has partial derivatives at (0; 0).N

Summing up:

di erentiability implies partial derivability (Theorem 1268), but not vice versa when
n 2 (Example 1270);
di erentiability implies continuity (Theorem 1268);
partial derivability does not imply continuity when n 2 (Example 1263).

It is natural to ask which additional hypotheses are required for partial derivability to
imply di erentiability (so, continuity). The answer is given by the next remarkable result
that extends Theorem 1243 to the vector case by showing that, under a simple regular-
ity hypotheses (the continuity of partial derivatives), a partially derivable function is also
di erentiable (so, continuous).
7
For the more demanding reader: note pthat each neighbourhood
p of the origin contains points
p of the type
(t; t) with t 6= 0. For such points we have hk= (h2 + k2 ) = 1=2. Therefore, for 0 < " < 1=2 there is no
p
neighbourhood of the origin such that, for all its points (h; k) 6= (0; 0), we have hk= (h2 + k2 ) 0 < ".
27.2. DIFFERENTIAL 841

Theorem 1271 Let f : U ! R be partially derivable. If the partial derivatives are contin-
uous, then f is di erentiable.

Proof8 For simplicity of notation, we consider the case in which n = 2, the function f is
de ned on the entire plane R2 , and the partial derivatives @f =@x1 and @f =@x2 exist on R2 .
Apart from more complicated notation, the general case can be proved in a similar way.
Therefore, let f : R2 ! R and x 2 R2 . Assume that @f =@x1 and @f =@x2 are both
continuous at x. By adding and subtracting f (x1 + h1 ; x2 ), for each h 2 R2 we have:

f (x + h) f (x) (27.17)
= f (x1 + h1 ; x2 ) f (x1 ; x2 ) + f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )

The partial derivative @f =@x1 (x) is the derivative of the function 1 : R ! R de ned by
9
1 (x1 ) = f (x1 ; x2 ), in which x2 is considered as a constant. By the Mean Value Theorem,
there exists z1 2 (x1 ; x1 + h1 ) R such that

1 (x1+ h1 ) 1 (x1 ) (x1 + h1 ) 1 (x1 )


0
1 (z1 ) = = 1
x1 + h1 x1 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
=
h1

Similarly, the partial derivative @f =@x2 (x + h) is the derivative of the function 2 : R ! R


de ned by 2 (x2 ) = f (x1 + h1 ; x2 ), in which x1 + h1 is considered as a constant. Again by
the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) R such that

2 (x2+ h2 ) 2 (x2 ) (x2 + h2 ) 2 (x2 )


0
2 (z2 ) = = 2
x2 + h2 x2 h2
f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
=
h2

0 0
Since by construction @f =@x1 (z1 ; x2 ) = 1 (z1 ) and @f =@x2 (x1 + h1 ; z2 ) = 2 (z2 ), we can
rewrite (27.17) as:

@f @f
f (x + h) f (x) = (z1 ; x2 ) h1 + (x1 + h1 ; z2 ) h2
@x1 @x2

8
Since this proof uses the Mean Value Theorem for scalar functions that will be presented in the next
chapter, it is best understood after learning that result. The same remark applies to the proof of Schwartz's
Theorem.
9
The Mean Value Theorem for scalar functions will be studied in the next chapter.
842 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

On the other hand, by de nition rf (x) h = @f =@x1 (x1 ; x2 ) h1 + @f =@x2 (x1 ; x2 ) h2 . Thus:
jf (x + h) f (x) rf (x) hj
lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) h2 @x1 (x1 ; x2 ) h1 + @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
lim + lim
h!0 khk h!0 khk
@f @f jh1 j
= lim (z1 ; x2 ) (x1 ; x2 )
h!0 @x1 @x1 khk
@f @f jh2 j
+ lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x2 @x2 khk
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) + lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
where the last inequality holds because
jh1 j jh2 j
0 1 and 0 1
khk khk
On the other hand, since z1 2 (x1 ; x1 + h1 ) and z2 2 (x2 ; x2 + h2 ), we have z1 ! x1 for
h1 ! 0 and z2 ! x2 for h2 ! 0. Therefore, being @f =@x1 and @f =@x2 both continuous at
x, we have
@f @f @f @f
lim (z1 ; x2 ) = (x1 ; x2 ) and lim (x1 + h1 ; z2 ) = (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
which implies
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) = lim (x1 + h1 ; z2 ) (x1 ; x2 ) =0
h!0 @x1 @x1 h!0 @x2 @x2
In conclusion, we have proved that
jf (x + h) f (x) rf (x) hj
lim =0
h!0 khk
and the function f is thus di erentiable at x.

Example 1272 (i) Consider the function f : Rn ! R given by f (x) = kxk2 . Its gradient is
@f @f
rf (x) = (x) = 2x1 ; :::; (x) = 2xn = 2x 8x 2 Rn
@x1 @xn
The partial derivatives are continuous on Rn and therefore f is di erentiable on Rn . By
(27.10), at each x 2 Rn we have
df (x) (h) = rf (x) h 8h 2 Rn
27.2. DIFFERENTIAL 843

and
kx + hk2 kxk2 = 2x h + o (khk)
as khk ! 0. P
(ii) Consider the function f : Rn++ ! R given by f (x) = ni=1 log xi . Its gradient is
@f 1 @f 1
rf (x) = (x) = ; :::; (x) = 8x 2 Rn++
@x1 x1 @xn xn
The partial derivatives are continuous on Rn++ and therefore f is di erentiable on Rn++ . By
(27.10), at each x 2 Rn++ we have
df (x) (h) = rf (x) h 8h 2 Rn
so that, as khk ! 0,
n
X n
X n
X hi
log (xi + hi ) log xi = + o (khk)
xi
i=1 i=1 i=1
N

27.2.2 Total di erential


In an imprecise, yet suggestive, way expression (27.12) is often written as
@f @f
df = dx1 + + dxn (27.18)
@x1 @xn
This formula, called total di erential of f , shows how the overall e ect of df on f is de-
composed into the sum of the e ects that have on f the in nitesimal variations dxi of the
individual variables. The summands @f =@xi are sometimes called partial di erentials.
For example, if f : Rn ! R is a production function with n inputs, the total di erential
tells us that the overall variation df of the output is the result of the sum of the e ects
@f
dxi
@xi
that the in nitesimal variations dxi of each input have on the production function. In a
more economic language, the overall variation of the output df is given by the sum of the
in nitesimal variations dxi of the inputs, multiplied by their respective marginal products
@f =@xi . The greater (in absolute value) the marginal product @f =@xi of input i, the greater
the impact of its variation on output.
Similarly, if u : Rn+ ! R is a utility function, the total di erential takes the form
@u @u
du = dx1 + + dxn
@x1 @xn
The overall variation du of utility decomposes in the sum of the e ects
@u
dxi
@xi
on the utility function of in nitesimal variations dxi of the single goods that belong to bundle
x: the overall variation of utility du is the sum of the in nitesimal variations of the goods
dxi , multiplied by their respective marginal utilities @u=@xi .
844 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Pn
Example 1273 P Let u : Rn++ ! R be the log-linear utility function u (x1 ; :::; xn ) = i=1 ai log xi
with ai > 0 and ni=1 ai = 1. Its total di erential is

a1 an
du = dx1 + + dxn
x1 xn

The impact of each in nitesimal variation dxi on the overall variation of utility du is deter-
mined by the coe cient ai =xi . N

However evocative, one should not forget that the total di erential (27.18) is only a
heuristic version of the di erential df (x), which is the rigorous notion.10

27.2.3 Chain rule


One of the most useful formulas of di erential calculus for scalar functions is the chain rule
(f g)0 (x) = f 0 (g (x)) g 0 (x) for composite functions f g. This rule generalizes to functions
of several variables as follows (we omit the proof as later we will prove a more general chain
rule).

Theorem 1274 (Chain rule) Let g : U Rn ! R and f : B R ! R with Im g B.


If g is di erentiable at x 2 U and if f is di erentiable at g (x), then the composition f g:
U Rn ! R is di erentiable at x, with

@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) rg (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn

In the scalar case n = 1, we get back the classic rule (f g)0 (x) = f 0 (g (x)) g 0 (x).
Moreover, by Theorem 1268 the di erential of the composition f g is:
n
X @g (x)
d (f g) (x) (h) = f 0 (g (x)) hi (27.19)
@xi
i=1

The total di erential form of (27.19) reads

df @g df @g
d (f g) = dx1 + + dxm (27.20)
dg @x1 dg @xm

The variation of f g can decomposed according to the di erent in nitesimal variations dxi ,
each of which induces the variation (@g=@xi ) dxi on g, which in turn causes a variation df =dg
on f . Summing these partial e ects we get the overall variation d (f g).
10
As we already remarked a few times, heuristics plays an important role in the quest for new results (of
a \vanguard of heuristic e orts towards the new" wrote Carlo Emilio Gadda). The rigorous veri cation of
the results so obtained is, however, key; only few outstanding mathematicians, dear to the gods, can rely on
intuition without caring too much of rigor. Yet, one of them, the great Archimedes, so writes in his Method
\... certain things became clear to me by a mechanical method, although they had to be demonstrated by
geometry afterwards because their investigation by the said method did not furnish an actual demonstration."
(Trans. Heath).
27.2. DIFFERENTIAL 845

Example 1275 (i) Let f : R ! R be given by f (x) = e2x and let g : R2 ! R be given by
g (x) = x1 x22 . Let us calculate with the chain rule the di erential of the composite function
f g : R2 ! R given by
2
(f g) (x) = e2x1 x2
We have
2 2
r (f g) (x) = 2x22 e2x1 x2 ; 4x1 x2 e2x1 x2
and therefore
2
d (f g) (x) (h) = 2e2x1 x2 x22 h1 + 2x1 x2 h2
for every h 2 R2 . The total di erential is
2
d (f g) = 2e2x1 x2 x22 dx1 + 2x1 x2 dx2

(ii) Let f : (0; 1) ! R be given by f (x) = log x and let g : R2++ [ R2 ! R be given
p
by g (x1 ; x2 ) = x1 x2 . Here the function g must be restricted to R2++ [ R2 to satisfy the
condition Im g (0; 1). Let us calculate with the chain rule the di erential of the composite
function f g : R2++ [ R2 ! R given by
p
(f g) (x) = log x1 x2

We have r r
@g (x) 1 x2 @g (x) 1 x1
= and =
@x1 2 x1 @x2 2 x2
so that
@g (x) 0 @g (x)
r (f g) (x) = f 0 (g (x)) ; f (g (x))
@x1 @x2
r r
1 1 x2 1 1 x1 1 1
= p ;p = ;
x1 x2 2 x1 x1 x2 2 x2 2x1 2x2
and
1 1
d (f g) (x) (h) = h1 + h2
2x1 2x2
for every h 2 R2 . The total di erential is
1 1
d (f g) = dx1 + dx2
2x1 2x2
Pn 1
(iii) Let g : Rn++ ! R and f : R+ ! R be given by g (x) = i=1 ai xi and f (x) = x ,
with ai 2 R and 6= 0, so that f g : Rn++ ! R is

n
!1
X
(f g) (x) = ai xi
i=1

We have, for every x 2 Rn++ ,

@g 1 @g 1
rg (x) = (x) = a1 x1 ; :::; (x) = an xn
@x1 @xn
846 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

so that
@g (x) @g (x)
r (f g) (x) =f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
0 ! 1 !1 1
n 1 n 1
1 X 1 X
=@ ai xi a1 x1 1 ; :::; ai xi an xn 1A

i=1 i=1
0 !1 !1 1
n
X n
X
= @a1 ai xi x1 1
; :::; an ai xi xn 1A

i=1 i=1

and
n n
!1 n
!1 n
X X X X
1 1
d (f g) (x) (h) = ai ai xi xi hi = ai xi ai xi hi
i=1 i=1 i=1 i=1

for every h 2 Rn . The total di erential is


n
!1 n
X X
1
d (f g) = ai xi ai xi dxi
i=1 i=1
Pn
(iv) Let g : Rn ! R and f : R++ ! R be given by g (x) = i=1 ai e
xi and f (x) =
1
log x , with ai 2 R and 6= 0, so that f g: Rn ! R is
n
X
1 xi
(f g) (x) = log ai e
i=1

We have, for every x 2 Rn ,


@g x1 @g xn
rg (x) = (x) = a1 e ; :::; (x) = an e
@x1 @xn
so that
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
1 1 1 1
= Pn xi
a1 e x1 ; :::; Pn xi
an e xn
a
i=1 i e i=1 i e
a
a e 1 x an e n x
= Pn 1 x
; :::; Pn xi
i=1 ai e i=1 ai e
i

and
n
X n
a e xi 1 X
d (f g) (x) (h) = Pn i xi
h i = ai e xi
hi
i=1 i=1 ai e g (x)
i=1
for every h 2 Rn . The total di erential is
n
1 X xi
d (f g) = ai e dxi
g (x)
i=1
N
27.3. PARTIAL DERIVATIVES OF HIGHER ORDER 847

27.3 Partial derivatives of higher order


Consider a function f : U ! R de ned (at least) on an open set U in Rn and partially
derivable there. As already observed (Section 27.1.3), its partial derivatives @f =@xi can, in
turn, be seen as functions of n variables
@f
:U !R
@xi
Example 1276 The partial derivatives
@f @f
(x) = ex2 and (x) = x1 ex2
@x1 @x2
of the function f (x1 ; x2 ) = x1 ex2 are functions on R2 . N

Hence, it makes sense to talk about existence of partial derivatives of the partial deriva-
tives functions @f =@xi : U ! R at a point x 2 U . In this case, for every i; j = 1; :::; n we
have the partial derivative
@f
@ @x i
(x)
@xj
with respect to xj of the partial derivative @f =@xi . These partial derivatives are called
second-order partial derivatives of f and are denoted by

@2f
(x)
@xi @xj

or by fx00i xj . When i = j we write


@2f
(x)
@x2i
instead of @ 2 f =@xi @xi . Using this notation, we can construct the matrix
2 3
@2f @2f @2f
2 (x) @x1 @x2 (x) @x1 @xn (x)
6 @x1 7
6 7
6 @2f @2f @2f
7
6 (x) (x) (x) 7
6 @x2 @x1 @x22 @x2 @xn 7
6 7
6 7
6 7
6 7
6 7
4 2 2 2
5
@ f @ f @ f
@xn @x1 (x) @xn @x2 (x) @x 2 (x)
n

of second-order partial derivatives. It is called the Hessian matrix of f and is denoted by


r2 f (x).

Example 1277 Let f : R3 ! R be given by f (x) = ex1 x2 + 3x2 x3 for every x 2 R3 , and let
us compute its Hessian matrix. We have:
@f @f @f
(x) = x2 ex1 x2 ; (x) = x1 ex1 x2 + 3x3 ; (x) = 3x2
@x1 @x2 @x3
848 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

and so

@2f 2 x1 x2 @2f x1 x2 @2f


(x) = x2 e ; (x) = (1 + x x
1 2 ) e ; (x) = 0
@x21 @x1 @x2 @x1 @x3
@2f @2f @2f
(x) = (1 + x1 x2 ) ex1 x2 ; (x) = x2 x1 x2
1 e ; (x) = 3
@x2 @x1 @x22 @x2 @x3
@2f @2f @2f
(x) = 0; (x) = 3; (x) = 0
@x3 @x1 @x3 @x2 @x23

It follows that the Hessian matrix of f is


2 3
x22 ex1 x2 (1 + x1 x2 ) ex1 x2 0
6 7
6 7
r f (x) = 6
2
6 (1 + x1 x2 ) e
x1 x2 x21 ex1 x2 3 7
7
4 5
0 3 0

The second-order partial derivatives can, in turn, be seen as functions of several variables.
We can therefore look for their partial derivatives, which (if they exist) are called the third-
order partial derivatives. We can then move to their partial derivatives (if they exist) and
get the fourth-order derivatives, and so on.
For instance, going back to the previous example, consider the partial derivative

@2f
(x) = (1 + x1 x2 ) ex1 x2
@x1 @x2

The third-order derivatives exist and are


@2f
@3f @ @x1 @x2
(x) = (x) = 2x2 + x1 x22 ex1 x2
@x1 @x2 @x1 @x1
@2f
@3f @ @x1 @x2
(x) = = 2x1 + x21 x2 ex1 x2
@x1 @x22 @x2
@2f
@3f @ @x1 @x2
(x) = (x) = 0
@x1 @x2 @x3 @x3

and clearly we can go to the fourth-order partial derivatives, etc.

Example 1278 Let f : R2 ! R be given by f (x1 ; x2 ) = x1 x2 . It is immediate that f has


continuous partial derivatives of any order. More generally, this holds for all polynomial in
several variables. N

The following theorem establishes a key interchangeability property of second-order par-


tial derivatives.
27.3. PARTIAL DERIVATIVES OF HIGHER ORDER 849

Theorem 1279 (Schwarz) Let f : U ! R be a function that has second-order partial


derivatives on U . If they are continuous at x 2 U , then

@2f @2f
(x) = (x) (27.21)
@xi @xj @xj @xi
for every i; j = 1; :::; n.

Proof For simplicity we consider the case n = 2. In this case, (27.21) reduces to:

@2f @2f
= (27.22)
@x1 @x2 @x2 @x1
Again for simplicity, we also assume that the domain A is the whole space R2 , so that we
consider a function f : R2 ! R. By de nition,
@f f (x1 + h1 ; x2 ) f (x1 ; x2 )
(x) = lim
@x1 h1 !0 h1
and therefore:
@f @f
@2f @x1 (x1 ; x2 + h2 ) @x1 (x1 ; x2 )
(x) = lim
@x1 @x2 h2 !0 h2
1 f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )
= lim lim
h2 !0 h2 h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
lim
h1 !0 h1

Let : R2 ! R be an auxiliary function de ned by:

(h1 ; h2 ) = f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 ) f (x1 + h1 ; x2 ) + f (x1 ; x2 )

for each (h1 ; h2 ) 2 R2 . Using the function , we can write:

@2f (h1 ; h2 )
(x) = lim lim (27.23)
@x1 @x2 h2 !0 h1 !0 h 2 h1
Consider in addition the scalar auxiliary function 1 : R ! R de ned by 1 (x) = f (x; x2 + h2 )
f (x; x2 ) for each x 2 R. We have:

0 @f @f
1 (x) = (x; x2 + h2 ) (x; x2 ) (27.24)
@x1 @x1
Moreover, by the Mean Value Theorem there exists z1 2 (x1 ; x1 + h1 ) such that

0 1 (x1 + h1 ) 1 (x1 ) (h1 ; h2 )


1 (z1 ) = =
h1 h1
and therefore, by (27.24), such that
@f @f (h1 ; h2 )
(z1 ; x2 + h2 ) (z1 ; x2 ) = (27.25)
@x1 @x1 h1
850 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

@f
Let 2 : R ! R be another auxiliary scalar function de ned by 2 (x) = @x1 (z1 ; x) for each
x 2 R. We have:
0 @2f
2 (x) = (z1 ; x) (27.26)
@x2 @x1
By the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) such that
@f @f
0 2 (x2 + h2 ) 2 (x2 ) @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
2 (z2 ) = =
h2 h2
and therefore, by (27.26), such that
@f @f
@2f @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
(z1 ; z2 ) =
@x2 @x1 h2
Together with (27.25), this implies that
@2f (h1 ; h2 )
(z1 ; z2 ) = (27.27)
@x2 @x1 h2 h 1
Go back now to (27.23). Thanks to (27.27), expression (27.23) becomes:
@2f @2f
(x) = lim lim (z1 ; z2 ) (27.28)
@x1 @x2 h2 !0 h1 !0 @x2 @x1

On the other hand, since zi 2 (xi ; xi + hi ) for i = 1; 2, we have zi ! xi when hi ! 0. Being


@ 2 f =@x1 @x2 continuous by hypothesis at x = (x1 ; x2 ), we therefore have
@2f @2f
lim lim (z1 ; z2 ) = (x1 ; x2 ) (27.29)
h2 !0 h1 !0 @x2 @x1 @x2 @x1
Putting together (27.28) and (27.29), we get (27.22), as desired.

Thus, when they are continuous, the order in which we take partial derivatives does not
matter: we can compute rst the partial derivative with respect to xi and then the one with
respect to xj , or vice versa, with the same result. So, we can choose the way that seems
computationally easier, obtaining then \for free" the other second-order partial derivative.
This simpli es considerably the computation of derivatives and, moreover, results in an
elegant property of symmetry of the Hessian matrix.
Example 1280 (i) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x21 x2 x3 . Simple calculations
show that:
@2f @2f
(x) = (x) = 2x1 x3
@x1 @x2 @x2 @x1
in accordance with Schwarz's Theorem because the second partial derivatives are continuous.
(ii) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = cos (x1 x2 ) + e x3 . The Hessian matrix of
f is
2 3
x22 cos (x1 x2 ) sin (x1 x2 ) x1 x2 cos (x1 x2 ) 0
6 7
6 7
2 6
r f (x) = 6 sin (x1 x2 ) x1 x2 cos (x1 x2 ) 2
x1 cos (x1 x2 ) 0 7 7
4 5
0 0 e x 3

In accordance with Schwarz's Theorem, this matrix is symmetric. N


27.4. TAKING STOCK: THE NATURAL DOMAIN OF ANALYSIS 851

To conclude, we show a case not covered by Schwarz's Theorem.

Example 1281 Following Peano (1884), de ne f : R2 ! R by:


8
> x2 x2
< x1 x2 x21 +x22 if (x1 ; x2 ) 6= (0; 0)
1 2
f (x1 ; x2 ) =
>
: 0 if (x1 ; x2 ) = (0; 0)

The reader can verify that: (i) f has continuous partial derivatives @f =@x1 and @f =@x2 ; (ii)
f has second-order partial derivatives @ 2 f =@x1 @x2 and @ 2 f =@x2 @x1 de ned on all R2 , but
discontinuous at the origin (0; 0). Therefore, the hypothesis of continuity of the second-order
partial derivatives of Schwarz's Theorem does not hold at the origin, so the theorem cannot
say anything about the behavior of these derivatives at the origin. Let us calculate them:
@2f @2f
(0; 0) = 1 and (0; 0) = 1
@x1 @x2 @x2 @x1
So,
@2f @2f
(0; 0) 6= (0; 0)
@x1 @x2 @x2 @x1
The continuity of the second-order partial derivatives is, therefore, needed for the validity of
equality (27.21). N

27.4 Taking stock: the natural domain of analysis


We have studied so far partial derivability and di erentiability, and established some re-
markable properties. In particular, we learned that the continuity of partial derivatives, of
di erent orders, is key for some highly desirable properties. Some terminology is, thus, in
order. We say that a function f of several variables that has partial derivatives of order
n continuous on a set E is n-times continuously di erentiable on E. The set of all such
functions is denoted by C n (E), thus extending the terminology of the scalar case (Section
26.13).
In particular, C 1 (E) and C 2 (E) are the classes of the functions with continuous rst-
order derivatives and with continuous rst- and second-order derivatives on E, respectively.
Two fundamental results, Theorem 1271 and Schwarz's Theorem, show the importance of
these classes: the former showed that for the functions in C 1 (E) partial derivability implies
continuity, the latter that for the functions in C 2 (E) the mixed partial derivatives are equal.
The most signi cant results of di erential calculus hold for functions of, at least, class
1
C (E) which is, therefore, the natural space in which to carry out analyses that rely on
di erential methods. In applications, functions are typically assumed to belong to C 1 (E).

27.5 Incremental and approximation viewpoints


27.5.1 Directional derivatives
Via the di erence quotient
f x + hei f (x)
lim (27.30)
h!0 h
852 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

partial derivatives consider in nitesimal variations along the basic directions identi ed by
the vectors ei . But, what about the other directions? Intuitively, there are in nitely many
ways to approach a point in Rn and one may wonder about in nitesimal variations along
them. In particular, are they consistent, in some sense, with the variations along the basic
directions? In this section we address this issue and, in so doing, we expatiate on the
incremental (marginal) viewpoint in multivariable di erential calculus.
To take into account the in nite directions along which we can approach a point in Rn ,
we generalize the quotient (27.30) as follows
f (x + hy) f (x)
lim
h!0 h
This limit represents the in nitesimal increments of the function f at the point x when we
move along the direction determined by the vector y of Rn , which is no longer required to
be a versor ei . Graphically:

This suggests the following de nition.

De nition 1282 A function f : U ! R is said to be derivable at a point x 2 U if, for each


y 2 Rn , the limit
f (x + hy) f (x)
f 0 (x; y) = lim (27.31)
h!0 h
exists and is nite. This limit is called the directional derivative of f at x along the direction
y.

The function f 0 (x; ) : Rn ! R is called the directional derivative of f at x.11 To better


understand this notion, observe that, given any two vectors x; y 2 Rn , the straight line hx; yi
that passes through them is given by
hx; yi = f(1 h) x + hy : h 2 Rg
11
Note that directional derivatives only consider \linear" approaches to a point x, namely along straight
lines. In Section 12.3.2 we saw that there are highly nonlinear ways to approach a point.
27.5. INCREMENTAL AND APPROXIMATION VIEWPOINTS 853

Going back to (27.31), we have

f (x + hy) = f ((1 h) x + h (x + y))

Therefore, the ratio


f (x + hy) f (x)
h
tells us which is the \incremental" behavior of the function when we move along the line
hx; x + yi. Each y 2 Rn identi es a line and, therefore, gives us a direction along which we
can study the increments of the function.

Not all lines hx; x + yi identify di erent directions: the next result shows that, given a
vector y 2 Rn , all vectors y identify the same direction when 6= 0.

Proposition 1283 Given a point x 2 Rn , for each y; y 0 2 Rn we have hx; x + yi = hx; x + y 0 i


if and only if there exists 6= 0 such that y 0 = y.

Proof \If". Suppose that y 0 = y with 6= 0. We have

x + y 0 = x + y = x + (1 ) x + y = (1 )x + (x + y)

and therefore x + y 0 2 hx; x + yi. This implies hx; x + y 0 i hx; x + yi. Since y = (1= ) y 0 ,
by proceeding in a similar way we can prove that hx; x + yi hx; x + y 0 i. We conclude that
hx; x + yi = hx; x + y 0 i. \Only if". Suppose that hx; x + y 0 i = hx; x + yi. Suppose y 6= y 0
(otherwise the result is trivially true). At least one of them has then to be non-zero, say y 0 .
Since x+y 0 2 hx; x + yi and y 0 6= 0, there exists h 6= 0 such that x+y 0 = (1 h) x+h (x + y).
This implies y 0 = hy and therefore, by setting = h, we have the desired result.

The next corollary shows that this redundancy of the directions translates, in a simple
and elegant way, in the homogeneity of the directional derivative, a property that permits
to determine the value f 0 (x; y) for every scalar once we know the value of f 0 (x; y).
854 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Corollary 1284 If f is derivable at a point x 2 U , then the directional derivative f 0 (x; ) :


Rn ! R is homogeneous, i.e., for every 2 R and every y 2 Rn , we have

f 0 (x; y) = f 0 (x; y) (27.32)

Proof Let 2 R. Since h ! 0 if and only if ( h) ! 0, we have:

f (x + ( h) y) f (x) f (x + ( h) y) f (x)
lim = lim = f 0 (x; y)
h!0 h ( h)!0 h

Dividing and multiplying by , we therefore have:

f (x + h ( y)) f (x) f (x + ( h) y) f (x)


lim = lim = f 0 (x; y)
h!0 h h!0 h
It follows that the limit
f (x + h ( y)) f (x)
f 0 (x; y) = lim
h!0 h
exists, it is nite and is equal to f 0 (x; y), as desired. On the other hand, if = 0 we have

f (x + 0) f (x)
f 0 (x; y) = f 0 (x; 0) = lim =0
h!0+ h
Therefore, f 0 (x; y) = 0 = f 0 (x; y), which completes the proof.

Partial derivatives are nothing but the directional derivatives computed along the fun-
damental directions in Rn represented by the versors ei . That is,

@f (x)
f 0 x; ei =
@xi
for each i = 1; 2; :::; n. So, functions that are derivable at x, are partially derivable there.
The converse is false, as the next example shows.

Example 1285 In Example 1263 we showed that the function f : R2 ! R de ned by

0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0

is partially derivable at the origin. However, it is not derivable at the origin 0 = (0; 0).
Indeed, consider x = 0 and y = (1; 1). We have

f (x + hy) f (x) f (h; h) 1


= = 8h 6= 0
h h h
so the limit (27.31) does not exists, and the function is not derivable at 0. N

In sum, partial derivability is a weaker notion than derivability, something not surpris-
ing: the former notion controls, indeed, only two directions out of the in nitely many ones
controlled by the latter notion.
27.5. INCREMENTAL AND APPROXIMATION VIEWPOINTS 855

27.5.2 Algebra
Like that of partial derivatives, also the calculus of directional derivatives can be reduced to
the calculus of ordinary derivatives of scalar functions. Given a point x 2 Rn and a direction
y 2 Rn , de ne an auxiliary scalar function as (h) = f (x + hy) for every h 2 R. The
domain of is the set fh 2 R : x + hy 2 U g, which is an open set in R containing 0. By
de nition of derivative, we have

0 (h) (0) f (x + hy) f (x)


(0) = lim = lim = f 0 (x; y) (27.33)
h!0 h h!0 h
The derivative f 0 (x; y) can therefore be seen as the ordinary derivative of the scalar function
computed at the point 0. Naturally, when is di erentiable at 0, formula (27.33) reduces
to f 0 (x; y) = 0 (0).

Example 1286 (i) Let f : R3 ! R be de ned by f (x1 ; x2 ; x3 ) = x21 + x22 + x23 . Compute
the directional derivative of f at x = (1; 1; 2) along the direction y = (2; 3; 5). We have:

x + hy = (1 + 2h; 1 + 3h; 2 + 5h)

and therefore

(h) = f (x + hy) = (1 + 2h)2 + ( 1 + 3h)2 + (2 + 5h)2

It follows that 0 (h) = 76h + 18 and, by (27.33), we conclude that f 0 (x; y) = 0 (0) = 18.
(ii) Let us generalize the previous example and consider the function f : Rn ! R de ned
by f (x) = kxk2 . We have
n n
d X X
0
(h) = (xi + hyi )2 = 2 yi (xi + hyi ) = 2y (x + hy)
dh
i=1 i=1

Therefore, f 0 (x; y) = 0 (0) = 2x y. The directional derivative of f (x) = kxk2 thus exists
at all the points and along all possible directions, that is, f is derivable on Rn . Its general
form is
f 0 (x; y) = 2x y
In the special direction y = (2; 3; 5) of point (i), we indeed have f 0 (x; y) = 2 (1; 1; 2) (2; 3; 5) =
18.
(iii) Consider the function f : R2 ! R de ned by
8 2
< x21 x22 if (x1 ; x2 ) 6= (0; 0)
x1 +x2
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)

Consider the origin 0 = (0; 0). For every y 2 R2 we have (h) = f (hy) = hy1 y22 = y12 + y22
and so f 0 (0; y) = 0 (0) = y1 y22 =y12 + y22 . In conclusion,

f 0 (0; y) = f (y)

for every y 2 R2 . So, the function f is derivable at the origin and equals its own directional
derivative there. N
856 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

Using the auxiliary functions , it is easy to prove that for directional derivatives the
usual algebraic rules hold:

(i) ( f + g)0 (x; y) = f 0 (x; y) + g 0 (x; y);

(ii) (f g)0 (x; y) = f 0 (x; y) g (x) + f (x) g 0 (x; y);

(iii) (f =g)0 (x; y) = (f 0 (x; y) g (x) f (x) g 0 (x; y)) =g 2 (x).

27.5.3 The two viewpoints


Derivability is conceptually important in that it represents, via the directional derivative
f 0 (x; ) : Rn ! R, the incremental, marginal, behavior of a vector function f : U ! R at a
point x 2 U .
Di erentiability, on the other hand, represents the linear approximation standpoint (Sec-
tion 27.2), which is the other fundamental viewpoint that we learned to characterize dif-
ferential calculus. Remarkably, for functions of a single variable the two viewpoints are
equivalent, as Theorem 1243 showed by proving that, at a given point, a scalar function
is derivable if and only if is di erentiable. We will now show that for functions of several
variables this equivalence no longer holds, thus making all the more important to distinguish
the two viewpoints.

Theorem 1287 If a function f : U ! R is di erentiable at a point x 2 U , then it is


derivable at x, with

f 0 (x; y) = df (x) (y) = rf (x) y 8y 2 Rn (27.34)

Thus, di erentiability implies derivability. Moreover, from the incremental behavior


along the basic directions { i.e., from the partial derivatives { we can retrieve such be-
havior along any direction through linear combinations. Under di erentiability, incremental
behavior is thus consistent across directions.
The next example shows that the converse of the previous theorem is false { i.e., derivabil-
ity does not imply di erentiability. It also shows that, without di erentiability, incremental
behavior might fail to be consistent across directions.

Example 1288 In Example 1286-(iii) we studied a function f : R2 ! R that, at the origin


0 = (0; 0), has directional derivative f 0 (0; y) = f (y). Since the function f is not linear, the
directional derivative f 0 (0; ) : R2 ! R is not a linear function, so it cannot coincide with
the di erential (which, by de nition, is a linear function). Hence, in view of the last theorem
we can say that f is not di erentiable at 0 { otherwise, equality (27.34) would hold.
In sum, this example shows that a function derivable at a point might not be di erentiable
at that point. The nonlinear nature of the directional derivative f 0 (0; ) also shows how
unrelated may be the behavior along di erent directions. N

We already learned that partial derivability does not imply di erentiability (Example
1270). Now we learned that even full- edged derivability is not enough to imply di eren-
tiability. It is, indeed, not even enough to imply continuity: there exist functions that are
derivable at some point but are discontinuous there, as the following example shows.
27.6. DIFFERENTIAL OF OPERATORS 857

Example 1289 Let f : R2 ! R be de ned by


8 4 2
< x81 x24 if (x1 ; x2 ) 6= (0; 0)
x1 +x2
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)

Take x = 0 = (0; 0). Clearly, f 0 (0; 0) = 0. Moreover, for every 0 6= y 2 R2 we have:

0 f (hy) (hy1 )4 (hy2 )2


f (0; y) = lim = lim h i
h!0 h h!0
h (hy )8 + (hy )4 1 2

h6 y 4 y 2 hy 4 y 2
= lim 5 4 18 2 4 = lim 4 81 2 4 =0
h!0 h h y1 + y2 h!0 h y1 + y2

Therefore, f 0 (0; y) = 0 for every y 2 R2 and the directional derivative at the origin 0 is then
the null linear function. It follows that f is derivable at 0. However, it is not continuous
at 0 (a fortiori, it is not di erentiable at 0 by Theorem 1268). Indeed, consider the points
t; t2 2 R2 that lie on the graph of the parabola x2 = x21 . For each t 6= 0 we have
2
2 t4 t 2 t4 t 4 1
f t; t = = =
t8 + (t2 )4 8
t +t 8 2

Along these points the function is constant and takes on value 1=2. It follows that limt!0 f t; t2 =
1=2 and, being f (0) = 0, the function is discontinuous at 0. N

Summing up, we just learned that:

di erentiability implies derivability (Theorem 1287), but not vice versa when n 2
(Example 1288);

derivability does not imply continuity when n 2 (Example 1289).

These relations sharpen some of the ndings of Section 27.2.1 on partial derivability.

27.6 Di erential of operators


27.6.1 Representation
In Section 27.2 we noted that the di erential df (x) : Rn ! R of a function f : U ! R is such
that
f (x + h) f (x) df (x) (h)
lim =0
h!0 khk
or, equivalently,
jf (x + h) f (x) df (x) (h)j
lim =0
h!0 khk
This suggests the following generalization of the de nition of di erential to the case of op-
erators.
858 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

De nition 1290 An operator f : U ! Rm is said to be di erentiable at a point x 2 U if


there exists a linear operator df (x) : Rn ! Rm such that
kf (x + h) f (x) df (x) (h)k
lim =0 (27.35)
h!0 khk
The operator df (x) is said to be the di erential of f at x.

This de nition generalizes De nition 1267, which is the special case m = 1. The linear
approximation is now given by a linear operator with values in Rm , while at the numerator
of the incremental ratio in (27.35) we nd a norm instead of an absolute value because we
now have to deal with vectors in Rm .
The di erential for operators satis es properties that are similar to those that we saw
in the case m = 1. Naturally, instead of the vector representation of Theorem 1268 we now
have a more general matrix representation based on the operator version of Riesz's Theorem
(Theorem 672). To see its form, we introduce the Jacobian matrix. Recall that an operator
f : U ! Rm can be regarded as a m-tuple (f1 ; :::; fm ) of functions de ned on U and with
values in R. The Jacobian matrix Df (x) of an operator f : U ! Rm at x 2 U is, then, a
matrix m n given by:
2 @f @f1 @f1
3
1
@x1 (x) @x2 (x) @xn (x)
6 7
6 @f2 @f2 @f2 7
6
Df (x) = 6 @x1 (x) @x2 (x) @xn (x) 7
7
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)

that is,
2 3
rf1 (x)
6 rf (x) 7
Df (x) = 6
4
2 7
5 (27.36)
rfm (x)
We can now give the matrix representation of di erentials, which shows that the Ja-
cobian matrix Df (x) is, indeed, the matrix associated to the linear operator df (x). This
representation generalizes the vector representation of Theorem 1268 because the Jacobian
matrix Df (x) reduces to the gradient rf (x) in the special case m = 1.

Theorem 1291 Let f : U ! Rm be di erentiable at x 2 U . Then,


df (x) (h) = Df (x) h 8h 2 Rn

Proof We begin by considering a simple property of the norm. Let x = (x1 ; :::; xn ) 2 Rn .
For every j = 1; ::; n we have:
v
q uX
u n 2
jxj j = xj t
2 xj = kxk (27.37)
j=1

Now assume that f is di erentiable at x 2 U . Set h = tej with j = 1; ::; n. By de nition,


f x + tej f (x) df (x) tej
lim =0
t!0 ktej k
27.6. DIFFERENTIAL OF OPERATORS 859

and therefore, being tej = jtj, we have

f x + tej f (x)
lim df (x) ej =0 (27.38)
t!0 jtj

From inequality (27.37), for each i = 1; :::; m we have

fi x + tej fi (x) f x + tej f (x)


dfi (x) ej df (x) ej
jtj jtj

Together with (27.38), this implies

fi x + tej fi (x)
lim dfi (x) ej =0
t!0 jtj

for each i = 1; :::m. We can therefore conclude that, for every i = 1; :::; m and every
j = 1; :::; n, we have:

@fi fi x + tej fi (x)


(x) = lim = dfi (x) ej (27.39)
@xj t!0 t

The matrix associated to a linear operator f : Rn ! Rm is (Theorem 672):

A = f e1 ; f e2 ; :::; f (en )

In our case, thanks to (27.39) we therefore have

A = df (x) e1 ; :::; df (x) (en )


2 3
df1 (x) e1 df1 (x) e2 df1 (x) (en )
6 df (x) e1 df2 (x) e2 df2 (x) (en ) 7
=6
4
2 7
5
dfm (x) e1 dfm (x) e2 dfm (x) (en )
2 @f1 @f1 @f1
3
@x1 (x) @x2 (x) @xn (x)
6 7
6 @f2 @f2 @f2 7
=6 @x1 (x) @x2 (x) @xn (x) 7 = Df (x)
6 7
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)

as desired.

Example 1292 The Hessian matrix of a function f : A Rn ! R is the Jacobian matrix


of its derivative operator rf : D ! Rn , as the reader can easily check. N

Example 1293 Let f : R3 ! R2 be de ned by f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 .


For example, if x = (2; 5; 3), then f (x1 ; x2 ; x3 ) = (2 4 + 5 3; 2 625) = (10; 623) 2 R2 .
We have:
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; f2 (x1 ; x2 ; x3 ) = x1 x42
860 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

and so
4x1 1 1
Df (x) = 3
1 4x2 0
By Theorem 1291, the di erential at x is given by the linear operator df (x) : R3 ! R2
de ned by
df (x) (h) = Df (x) h = 4x1 h1 + h2 + h3 ; h1 4x32 h2
for each h 2 R3 . For example, at x = (2; 5; 3) we have df (x) (h) = (8h1 + h2 + h3 ; h1 500h2 ).
N

Example 1294 Let f : R ! R3 be de ned by f (x) = (x; sin x; cos x). For example, if
x = , then f (x) = ( ; 0; 1) 2 R3 . We have:

f1 (x) = x ; f2 (x) = sin x ; f3 (x) = cos x

and so 2 3
1
Df (x) = 4 cos x 5
sin x
By Theorem 1291, the di erential at x is given by the linear operator df (x) : R ! R3 de ned
by
df (x) (h) = Df (x) h = (h; h cos x; h sin x)
for each h 2 R. For example, at x = we have df (x) (h) = (h; h; 0). N

Example 1295 (i) Let f : Rn ! Rm be the linear operator represented by f (x) = Ax,
with 2 3
a11 a12 a1n
6 a21 a22 a2n 7
A=64
7
5
am1 am2 amn
Let a1 ; :::; am be the row vectors of A, that is, a1 = (a11 ; a12 ; :::; a1n ) ; ::::; am = (am1 ; am2 ; :::; amn ).
We have:

f1 (x1 ; :::; xn ) = a1 x = a11 x1 + + a1n xn


f2 (x1 ; :::; xn ) = a2 x = a21 x1 + + a2n xn

fm (x1 ; :::; xn ) = am x = am1 x1 + + amn xn

which implies Df (x) = A. Hence, the Jacobian matrix of a linear operator coincides with
the associated matrix A. By Theorem 1291, the di erential of a linear operator f is therefore
given by the linear operator itself, i.e., at each x 2 Rn it holds

df (x) (h) = f (h) 8h 2 Rn (27.40)

This naturally generalizes the well-known result that, for scalar functions of the form f (x) =
ax with a 2 R, the di erential is df (x) (h) = ah.
27.6. DIFFERENTIAL OF OPERATORS 861

(ii) Let f : U ! Rn be the identity operator f (x) = x for all x 2 U . By (27.40), at each
x 2 U the di erential df (x) is the identity operator itself, i.e.,

df (x) (h) = h 8h 2 Rn

The Jacobian matrix is the identity matrix, i.e., Df (x) = I (x) at each x 2 U . N

27.6.2 Chain rule


Next we state the chain rule for operators, the most general form of this rule that we study.

Theorem 1296 Let g : U Rn ! Rm and f : B Rm ! Rq with g (U ) B. If g is


di erentiable at x 2 U and f is di erentiable at g (x), then the composition f g : U Rn !
Rq is di erentiable at x, with

d (f g) (x) = df (g (x)) dg (x) (27.41)

The right-hand side is the product of the linear operators df (g (x)) and dg (x). By
Theorem 677, its matrix representation is given by the product Df (g (x)) Dg (x) of the
Jacobian matrices. We thus have the fundamental chain rule formula:

D (f g) (x) = Df (g (x)) Dg (x) (27.42)

In the scalar case n = m = q = 1, the rule takes its basic form (f g)0 (x) = f 0 (g (x)) g 0 (x)
studied in Proposition 1230.
Another important special case is when q = 1. In this case we have f : B Rm ! R
and g = (g1 ; :::; gm ) : U n m
R ! R , with g (U ) B. For the composite function f g :
U Rn ! R the chain rule takes the form:

r (f g) (x)
= rf (g (x)) Dg (x)
2 @g1 @g1 @g1
3
@x1 (x) @x2 (x) @xn (x)
6 7
@f @f 6 @g2 @g2 @g2 7
= (g (x)) ; :::; (g (x)) 6 @x1 (x) @x2 (x) @xn (x) 7
@x1 @xm 6 7
4 5
@gm @gm @gm
@x1 (x) @x2 (x) @xn (x)
m m
!
X @f @gi X @f @gi
= (g (x)) (x) ; :::; (g (x)) (x)
@xi @x1 @xi @xn
i=1 i=1

As to the di erential, for each h 2 Rn we have

d (f g) (x) (h) = r (f g) (x) h


m
X m
X
@f @gi @f @gi
= (g (x)) (x) h1 + + (g (x)) (x) hn
@xi @x1 @xi @xn
i=1 i=1

Grouping the terms for @f =@xi , we get the following equivalent form:
n
X n
X
@f @g1 @f @gm
d (f g) (x) (h) = (g (x)) (x) hi + + (g (x)) (x) hi
@x1 @xi @xm @xi
i=1 i=1
862 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

which can be reformulated in the following imprecise, yet expressive, way:


n
X @f @g1 @f @gm
d (f g) = dxi + + dxi (27.43)
@g1 @xi @gm @xi
i=1

This is the formula of the total di erential for the composite function f g. The total
variation d (f g) of f g is the result of the sum of the e ects on the function f of the
variations of the single functions gi determined by in nitesimal variations dxi of the di erent
variables.

In the next two points we consider two subcases of the case q = 1.

(i) When q = m = 1 we return, with f : B R ! R and g : U Rn ! R, to the chain


rule r (f g) (x) = f 0 (g (x)) rg (x) of Theorem 1274. It corresponds to the di erential
(27.19).

(ii) Suppose q = n = 1. Let f : B Rm ! R and g : U R ! Rm , with g (U ) B. The


composite function f g : U R ! R is scalar and for this function we have:
2 dg1 3
dx (x)
@f @f
(f g)0 (x) = rf (g (x)) Dg (x) = (g (x)) ; :::; (g (x)) 4 5
@x1 @xm dgm
dx (x)
m
X @f dgi
= (g (x)) (x)
@xi dx
i=1

The di erential is
m
X @f dgi
d (f g) (x) (h) = (g (x)) (x) h
@xi dx
i=1

for each h 2 R, and the total di erential (27.43) becomes:

@f dg1 @f dgm
d (f g) = dx + + dx
@g1 dx @gm dx

Example 1297 To illustrate subcase (ii), consider a production function f : Rm ! R whose


m inputs depend on a common parameter, the time t, which indicates the availability of the
di erent inputs at t. Inputs are then represented by a function g = (g1 ; :::; gm ) : R ! Rm ,
where gi (t) denotes what is the quantity of input i at time t. The composition f g : R ! R
is a scalar function that tells us how the output varies according to the parameter t. We
have
@f dg1 @f dgm
d (f g) = dt + + dt (27.44)
@g1 dt @gm dt
that is, the total variation d (f g) of the output is the result of the sum of the e ects that
the variations of the availability of the di erent inputs, due to in nitesimal variations dt
of time, have on the production function. In this example, (27.44) has therefore a clear
27.6. DIFFERENTIAL OF OPERATORS 863

economic interpretation. More concretely, let g : R ! R3 be de ned by g (t) = 1=t; 3=t; e t


for t 6= 0, and let f : R3 ! R be de ned by f (x1 ; x2 ; x3 ) = 3x21 x1 x2 + 6x1 x3 . We have:

@f dg1 @f dg2 @f dg3


(f g)0 (t) = (g (t)) (t) + (g (t)) (t) + (g (t)) (t)
@x1 dt @x2 dt @x3 dt
1 1
= 6e t 2
t t

Therefore,
t 1 1
d (f g) (t) (h) = 6e h 8h 2 R
t2 t
and the total di erential (27.44) is

t 1 1
d (f g) = 6e dt
t2 t
N

Next we give a chain rule example with q 6= 1.

Example 1298 Consider the operators f : R2 ! R2 de ned by f (x1 ; x2 ) = (x1 ; x1 x2 ) and


g : R3 ! R2 de ned by g (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 . Since both f and g are
di erentiable at each point of their domain, by the chain rule the composition f g : R3 ! R2
is itself di erentiable at each point of its domain R3 . By the chain rule, the Jacobian matrix
of f g : R3 ! R2 is given by:

D (f g) (x) = Df (g (x)) Dg (x)

In Example 1293 we saw that

4x1 1 1
Dg (x) = 3
1 4x2 0

On the other hand, we also know that:

1 0
Df (x) =
x2 x1

and therefore
1 0
Df (g (x)) =
x1 x42 2x21 + x2 + x3
It follows that:

Df (g (x)) Dg (x)
1 0 4x1 1 1
=
x1 x42 2x21 + x2 + x3 1 4x32 0
4x1 1 1
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
864 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

which implies that the di erential at x of f g is given by the linear operator d (f g) :


R3 ! R2 de ned by

d (f g) (x) (h)
2 3
h1
4x1 1 1 4 h2 5
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
h3

For example, at x = (2; 1; 1) we have:

d (f g) (x) (h) = (8h1 + h2 + h3 ; 16h1 31h2 + h3 )

Naturally, though it is in general more complicated, the Jacobian matrix of the composition
f g can be computed directly, without using the chain rule, by writing explicitly the form
of f g and by computing its partial derivatives. In this example, f g : R3 ! R2 is given
by

(f g) (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 2x21 + x2 + x3


= 2x21 + x2 + x3 ; 2x31 + x1 x2 + x1 x3 2x21 x42 x52 x42 x3

Therefore,

(f g)1 (x) = 2x21 + x2 + x3


(f g)2 (x) = 2x31 + x1 x2 + x1 x3 2x21 x42 x52 x42 x3

and we have:
@ (f g)1 @ (f g)1 @ (f g)1
= 4x1 ; = 1; =1
@x1 @x2 @x3
@ (f g)2
= 6x21 4x1 x42 + x2 + x3
@x1
@ (f g)2
= x1 8x21 x32 5x42 4x32 x3
@x2
@ (f g)2
= x1 x42
@x3

The Jacobian matrix 2 3


@(f g)1 @(f g)1 @(f g)1
@x1 @x2 @x3
4 5
@(f g)2 @(f g)2 @(f g)2
@x1 @x2 @x3

coincides with the one found through the chain rule. N

We close with an interesting application of the chain rule. A function f : Rn+ ! R is


(positively) homogeneous of order 2 R if f (tx) = t f (x) for each t > 0 and x 2 Rn+ .12
12
If f is positively homogeneous on Rn n
+ , then it is homogeneous of order 1 on R+ . This notion is thus
consistent with what we did in Chapter 18.
27.6. DIFFERENTIAL OF OPERATORS 865

Corollary 1299 Let f : Rn+ ! R be homogeneous of order . If f is di erentiable on Rn++ ,


then the derivative operator rf : Rn++ ! Rn is such that

rf (x) x = f (x) 8x 2 Rn++ (27.45)

This equality is called Euler's Formula.13 The more interesting cases are = 0 and
n
= 1. For instance, the indirect utility function v : R++ R+ ! R is easily seen to be
homogeneous of degree 0 (cf. Proposition 1051). By Euler's Formula, we have:
n
X @v (p; w) @v (p; w)
pi = w
@pi @w
i=1

for all (p; w) 2 Rn+1


++ .

Proof Fix x 2 Rn++ and consider the scalar function ' : (0; 1) ! R de ned by ' (t) =
f (tx). If we de ne g : (0; 1) ! Rn++ by g (t) = tx, we can write ' = f g. By (27.44),
we have '0 (t) = rf (tx) x. On the other hand, homogeneity implies ' (t) = t f (x), so
'0 (t) = t 1 f (x). We conclude that rf (tx) x = t 1 f (x). For t = 1, it is Euler's
Formula.

27.6.3 Proof of the chain rule (Theorem 1296)


We show that (27.41) holds, i.e., that

k(f g) (x + h) (f g) (x) (df (g (x)) dg (x)) (h)k


lim =0 (27.46)
h!0 khk
Set

(h) = g (x + h) g (x) dg (x) (h)


(k) = f (g (x) + k) f (g (x)) df (g (x)) (k)

We have

(f g) (x + h) (f g) (x) (df (g (x)) dg (x)) (h)


= f (g (x + h)) f (g (x)) df (g (x)) (dg (x) (h))
= f (g (x + h)) f (g (x)) df (g (x)) (g (x + h) g (x) (h))
= f (g (x + h)) f (g (x)) df (g (x)) (g (x + h) g (x)) + df (g (x)) ( (h))
= (g (x + h) g (x)) + df (g (x)) ( (h))

To prove (27.46) thus amounts to proving that

k (g (x + h) g (x)) + df (g (x)) ( (h))k


lim =0 (27.47)
h!0 khk

Consider the linear operator df (g (x)). By Lemma 899, there exists k > 0 such that
kdf (g (x)) (h)k k khk for each h 2 Rm . Since (h) 2 Rm for each h 2 Rn , we therefore
13
The reader can also check that the partial derivatives are homogeneous of order 1.
866 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES

have kdf (g (x)) ( (h))k k k (h)k. On the other hand, g is di erentiable at x, and so
limh!0 k (h)k = khk = 0. It follows that

kdf (g (x)) ( (h))k k (h)k


lim k lim =0 (27.48)
h!0 khk h!0 khk

Since f is di erentiable at g (x), we have

k (k)k
lim =0 (27.49)
k!0 kkk

Fix " > 0. By (27.49), there exists " > 0 such that kkk " implies k (k)k = kkk ". In
other words, there exists " > 0 such that kg (x + h) g (x)k " implies

k (g (x + h) g (x))k
"
kg (x + h) g (x)k

On the other hand, since g is continuous at x, there exists 1 > 0 such that khk 1
im-
plies kg (x + h) g (x)k " . Therefore, for khk su ciently small we have k (g (x + h) g (x))k
" kg (x + h) g (x)k. By applying Lemma 899 to the linear operator dg (x), there exists k > 0
such that

k (g (x + h) g (x))k " kg (x + h) g (x)k (27.50)


" k (h) + dg (x) (h)k
" k (h)k + " kdg (x) (h)k " k (h)k + "k khk

Go back to (27.47). Using (27.48) and (27.50), we have:

k (g (x + h) g (x)) + df (g (x)) ( (h))k


lim
h!0 khk
k (g (x + h) g (x))k kdf (g (x)) ( (h))k
lim + lim
h!0 khk h!0 khk
k (h)k khk
" lim + "k lim = "k
h!0 khk h!0 khk

Since " was xed arbitrarily, it can be taken as small as we like. Therefore:

k (g (x + h) g (x)) + df (g (x)) ( (h))k


lim k lim " = 0
h!0 khk "!0

as desired.
Chapter 28

Di erential methods

28.1 Extremal and critical points


28.1.1 Preamble
So far we have considered the notions of derivability and di erentiability for functions de ned
on open intervals (a; b) for scalar functions and, more generally, on open sets U for functions
of several variables. To study optimization problems we have to consider functions f : A
Rn ! R de ned on any subset A of Rn , open or not. Fortunately, all we saw until now for a
generic point of an open set U extends immediately to the interior points of any set A. This
is best seen in the scalar case. So, let x0 be an interior point of A R. By de nition, there
exists a neighborhood U of x0 such that U A. The restriction fjU of f on U is derivable
at x0 if the limit
fjU (x0 + h) fjU (x0 )
lim
h!0 h
exists and is nite. But, for every h small enough so that x0 + h 2 U we have

fjU (x0 + h) fjU (x0 ) f (x0 + h) f (x0 )


=
h h

and so
0 f (x0 + h) f (x0 )
fjU (x0 ) = lim
h!0 h
We can therefore consider directly the limit

f (x0 + h) f (x0 )
lim
h!0 h

and say that its value, denote by f 0 (x0 ), is the derivative of f at the interior point x0 if it
exists and is nite.
In sum, derivability and di erentiability are local notions that use only the properties of
the function in a neighborhood, however small, of the point at hand. They can therefore be
de ned at any interior point of any set.

867
868 CHAPTER 28. DIFFERENTIAL METHODS

28.1.2 Fermat's Theorem


In Section 22.5 we studied in detail the notions of local maximizers and minimizers. As we
remarked, in applications they are of little interest per se but they have a key instrumental
importance. The next fundamental result, Fermat's Theorem, is central for their study.

Theorem 1300 (Fermat) Let f : A R ! R be de ned on a set A in R and C a subset


of A. Let f be di erentiable at an interior point x
^ of C. If x
^ is a local extremal point (a
maximizer or a minimizer) of f on C, then

f 0 (^
x) = 0 (28.1)

Proof Let x ^ 2 C be an interior point and a local maximizer on C (a similar argument holds
if it is a local minimizer). There exists therefore B" (^x) such that (22.30) holds, that is,
f (^
x) f (x) for every x 2 B" (^ x) \ C. For every h > 0 su ciently small, that is, h 2 (0; "),
we have x ^ + h 2 B" (^
x). Hence

f (^
x + h) f (^
x)
0 8h 2 (0; ")
h
which implies
f (^
x + h) f (^
x) f (^
x + h) f (^
x)
lim = lim 0 (28.2)
h!0 h h!0+ h
where the limits exist and are equal because f is di erentiable at x
^.
On the other hand, for every h < 0 su ciently small, that is, h 2 ( "; 0), we have
x
^ + h 2 B" (^
x). Therefore,

f (^
x + h) f (^
x)
0 8h 2 ( "; 0)
h
which implies
f (^
x + h) f (^
x) f (^
x + h) f (^
x)
lim = lim 0 (28.3)
h!0 h h!0 h
where, again, the limits exist and are equal because f is di erentiable at x
^.
Together, inequalities (28.2) and (28.3) imply

f (^
x + h) f (^
x)
f 0 (^
x) = lim =0
h!0 h

as desired.

A necessary condition for an interior point x ^ to be a local maximizer (or minimizer) is


therefore that the derivative at such point is, if it exists, zero. This condition, often called
rst-order (necessary) condition (abbreviated as FOC), has a simple heuristic interpretation.
As we will see shortly, if f 0 (x0 ) > 0 the function is strictly increasing at x0 , while if f 0 (x0 ) < 0
the function is strictly decreasing. If f is maximized at x0 , it is neither strictly increasing
there (otherwise, an in nitesimal increase in x would be bene cial), nor strictly decreasing
28.1. EXTREMAL AND CRITICAL POINTS 869

there (otherwise, an in nitesimal decrease in x would be bene cial). Thus, the derivative, if
it exists, must be zero.1

The rst-order condition (28.1) will turn out to be key in solving optimization problems,
hence the important instrumental role of local extremal points. Conceptually, it tells us
that in order to maximize (or minimize) an objective function we need to consider what
happens at the margin: a point cannot be a maximizer if there is still room for improvement
through in nitesimal changes, be they positive or negative. At a maximizer, all marginal
opportunities must have been exhausted.
The fundamental principle highlighted by the rst-order condition is that, to maximize
levels of utility (or of production or of welfare and so on), one needs to work at the mar-
gin. In economics, the understanding of this principle was greatly facilitated by a proper
mathematical formalization of the optimization problem that made it possible to rely on dif-
ferential calculus { so, on the shoulders of the giants who created it. What becomes crystal
clear through calculus, is highly non-trivial otherwise, in particular if we just use a purely
literary analysis. Only in the 1870s the marginal principle was fully understood and was
at the heart of the marginalist theory of value, pioneered in the 1870s by Jevons, Menger,
and Walras. This approach has continued to evolve since then (at rst with the works of
Edgeworth, Marshall, and Pareto) and, over the years, has shown a surprising ability to shed
light on economic phenomena. In all this, the rst-order condition and its generalizations
(momentarily we will see its multivariable version) is, like Shakespeare's Julius Caesar: the
colossus that bestrides the economics world.

That said, let us continue with the analysis of Fermat's Theorem. It is important to
focus on the following aspects:

(i) the hypothesis that x


^ is an interior point of C;

(ii) the hypothesis of di erentiability at x


^;

(iii) the condition f 0 (^


x) = 0 is only necessary.

Let us discuss them one by one.

(i) The hypothesis that x ^ is an interior point of C is essential for Fermat's Theorem.
Indeed, consider for example f : R ! R given by f (x) = x, and let C = [0; 1]. The boundary
point x = 0 is a global minimizer of f on [0; 1], but f 0 (0) = 1 6= 0. In the same way, the
boundary point x = 1 is a maximizer, but f 0 (1) = 1 6= 0. Therefore, if x is a boundary local
extremal point, it is not necessarily true that f 0 (x) = 0.

(ii) Fermat's Theorem cannot be applied to functions that, even if they have interior
maximizers or minimizers, are not di erentiable at these points. A classic example is the
function f : R ! R given by f (x) = jxj: the point x = 0 is a global minimizer but f , at
1
This heuristic argument can be also articulated as follows. Since f is derivable at x0 , we have f (x0 + h)
f (x0 ) = f 0 (x0 ) h + o (h). Heuristically, we can set f (x0 + h) f (x0 ) = f 0 (x0 ) h by neglecting the term o (h).
If f 0 (x0 ) > 0, we have f (x0 + h) > f (x0 ) if h > 0, so a strict increase is strictly bene cial; if f 0 (x0 ) < 0, we
have f (x0 + h) > f (x0 ) if h < 0, so a strict decrease is strictly bene cial. Only if f 0 (x0 ) = 0, such strictly
bene cial variations cannot occur, so f may be maximized at x0 .
870 CHAPTER 28. DIFFERENTIAL METHODS

that point, does not admit derivative, so the condition f 0 (x) = 0 is not relevant in this case.
Another example is the following.

q
Example 1301 Let f : R ! R be given by f (x) = 3
(x2 5x + 6)2 , with graph

2.5

y
2

1.5

0.5

0
O 2 5/2 3 x
-0.5

-1

-1.5
0 1 2 3 4 5

Since x2 5x + 6 = (x 2) (x 3) is zero for x = 2 and x = 3, we conclude that

f (x) f (2) = f (3) = 0 8x 2 R

Therefore, x = 2 and x = 3 are global minimizers. The derivative of f is

2 2 1 2 (2x 5)
f 0 (x) = x 5x + 6 3
(2x 5) = p
3
3 3 x2 5x + 6

and so it does not exist where x2 5x + 6 is zero, that is, at the two minimizers! The
point x = 5=2 is such that f 0 (x) = 0 and is a local maximizer (being unbounded above, this
function has no global maximizers). N

(iii) Lastly, the condition f 0 (x) = 0 is only necessary. The following simple example
should not leave any doubt on this.
28.1. EXTREMAL AND CRITICAL POINTS 871

Example 1302 Let f : R ! R be the cubic function f (x) = x3 , with graph

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

We have f 0 (0) = 0, although the origin x0 = 0 is neither a local maximizer nor a local
minimizer.2 Condition (28.1) is therefore necessary, but not su cient, for a point to be a
local extremum. N

We now address the multivariable version of Fermat's Theorem. In this case the rst-order
condition (28.1) takes the more general form (28.4) in which gradients replace derivatives.

Theorem 1303 Let f : A Rn ! R be de ned on a set A in Rn and C a subset of A.


Suppose f is di erentiable at an interior point x
^ of C. If x
^ is a local extremal point (a
maximizer or a minimizer) of f on C, then
rf (^
x) = 0 (28.4)

We leave the proof to the reader. Indeed, mutatis mutandis, it is the same as that of
Fermat's Theorem.3

The observations (i)-(iii), just made for the scalar case, continue to hold in the multi-
variable case. In particular, as in the scalar case the rst-order condition is necessary, but
not su cient, as the next example shows.

Example 1304 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x22 . We have


rf (x) = (2x1 ; 2x2 )
so the rst-order condition (28.4) takes the form
(
2x1 = 0
2x2 = 0
2
Indeed, f (x) < 0 for every x < 0 and f (x) > 0 for every x > 0.
3
In the sequel, by Fermat's Theorem we will mean both the original scalar version as well as the present
multivariable version (the context will clarify which one we are referring to).
872 CHAPTER 28. DIFFERENTIAL METHODS

The unique solution of this system is (0; 0), which in turn is the unique point in R2 where
f satis es condition (28.4). It is easy to see that this point is neither a maximizer nor a
minimizer. Indeed, if we consider any point (0; x2 ) di erent from the origin on the vertical
axis and any point (x1 ; 0) di erent from the origin on the horizontal axis, we have

f (0; x2 ) = x22 < 0 and f (x1 ; 0) = x21 > 0

that is, being f (0; 0) = 0,

f (0; x2 ) < f (0; 0) < f (x1 ; 0) 80 6= x1 ; x2 2 R

In every neighborhood of the point (0; 0) there are, therefore, both points in which the
function is strictly positive and points in which it is strictly negative: as we can see from the
gure

0
x3

-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

the origin (0; 0) is a \saddle" point of f which is neither a maximizer nor a minimizer. N

Next we introduce a classi cation of points that will play a key role in our analysis.

De nition 1305 A point x 2 A such that rf (x) = 0 { i.e., for n = 1 a point such that
f 0 (x) = 0 { is said to be a critical or stationary of f : A Rn ! R.

Throughout the book we use interchangeably the terms critical and stationary. Using
this terminology, Theorem 1303 can be paraphrased as saying that a necessary condition for
an interior point x to be a local minimizer or maximizer is to be stationary.

Example 1306 Let f : R ! R be given by f (x) = 10x3 (x 1)2 . The rst-order condition
(28.1) becomes
10x2 (x 1) (5x 3) = 0
The points that satisfy it are x = 0, x = 1 and x = 3=5. They are the stationary points of
f. N

Example 1307 Let f : R2 ! R be given by f (x1 ; x2 ) = 2x21 + x22 3 (x1 + x2 ) + x1 x2 3.


We have
rf (x) = (4x1 3 + x2 ; 2x2 3 + x1 )
28.1. EXTREMAL AND CRITICAL POINTS 873

So here the rst-order condition (28.4) assumes the form

4x1 3 + x2 = 0
2x2 3 + x1 = 0

It is easy to see that x = (3=7; 9=7) is the unique solution of the system, so it is the unique
stationary point of f . N

28.1.3 Unconstrained optima: incipit


The role of Fermat's Theorem in solving optimization problems will be treated in detail
in Chapter 37. We can, however, see a rst simple use of this important theorem in an
unconstrained optimization problem

max f (x) sub x 2 C (28.5)


x

where C is an open set of Rn .4


Let us assume, as usual in applications, that f is di erentiable on C. Any local extremal
point is thus interior (since C is open) and f is di erentiable at that point. By Fermat's
Theorem, the local extremal points of f on C are also stationary points. This is true, a
fortiori, for any solution of problem (28.5) because it is, obviously, also a local maximizer.
Therefore, to nd the possible solutions of problem (28.5) it is necessary to solve the
rst-order condition
rf (x) = 0
The solutions of the optimization problem, if they exist, are among the solutions of this
condition, which is necessary (but not su cient!) for a point to be a local extremal one.

Example 1308 Let f : R2 ! R be given by

f (x) = x41 x42 + 4x1 x2

We have rf (x) = 4x31 + 4x2 ; 4x32 + 4x1 , so the rst-order condition is


(
4x31 + 4x2 = 0
4x32 + 4x1 = 0

that is, (
x31 = x2
x32 = x1
The stationary points are (0; 0), (1; 1), and ( 1; 1). Among them we have to look for the
possible solutions of the unconstrained optimization problem

max f (x) sub x 2 R2


x

N
4
Recall that in Section 22.1 optimization problems were called unconstrained when C is open.
874 CHAPTER 28. DIFFERENTIAL METHODS

28.2 Mean Value Theorem


In this section we study the important Mean Value Theorem, one of the classic results of
di erential calculus. We start with a special case, known as Rolle's Theorem.
Theorem 1309 (Rolle) Let f : [a; b] ! R be continuous on [a; b], with f (a) = f (b), and
di erentiable on (a; b). Then, there exists (at least) one critical point x
^ 2 (a; b), that is, a
^ 2 (a; b) such that f 0 (^
point x x) = 0.
This theorem, which provides a simple su cient condition for a function to have a critical
point, has an immediate graphical intuition:

6
y

1
O a c b x

0
0 1 2 3 4 5

Proof By Weierstrass' Theorem, there exist x1 ; x2 2 [a; b] such that f (x1 ) = minx2[a;b] f (x)
and f (x2 ) = maxx2[a;b] f (x). Denote m = minx2[a;b] f (x) and M = maxx2[a;b] f (x). If
m = M , then f is constant, that is, f (x) = m = M , and therefore f 0 (x) = 0 for every
x 2 (a; b). If m < M , then at least one of the points x1 and x2 is interior to [a; b]. Indeed,
they cannot be both boundary points because f (a) = f (b). If x1 is an interior point of [a; b],
that is, x1 2 (a; b), then by Fermat's Theorem we have f 0 (x1 ) = 0, so x ^ = x1 . Analogously,
0
if x2 2 (a; b), we have f (x2 ) = 0, and therefore x^ = x2 .
p
Example 1310 Let f : [ 1; 1] ! R be given by f (x) = 1 x2 . This function is contin-
uous on [ 1; 1] and di erentiable on ( 1; 1). Since f ( 1) = f (1) = 0, by Rolle's Theorem
^ 2 ( 1; 1), that is, a point such that f 0 (^
there exists a critical point x x) = 0. In particular,
1
from f 0 (x) = x 1 x2 2
it follows that this point is x
^ = 0. N
Given a function f : [a; b] ! R, consider the points (a; f (a)) and (b; f (b)) of its graph.
The straight line passing through these points has equation
f (b) f (a)
y = f (a) + (x a) (28.6)
b a
as the reader can verify by solving the system
(
f (a) = ma + q

f (b) = mb + q
28.2. MEAN VALUE THEOREM 875

This straight line plays a key role in the important Mean Value (or Lagrange's) Theorem,
which we now state and prove.

Theorem 1311 (Mean Value) Let f : [a; b] ! R be continuous on [a; b] and di erentiable
on (a; b). Then, there exists x
^ 2 (a; b) such that

f (b) f (a)
f 0 (^
x) = (28.7)
b a

Rolle's Theorem is the special case in which f (a) = f (b), so that condition (28.7)
becomes f 0 (^
x) = 0.
Note that
f (b) f (a)
b a
is the slope of the straight line (28.6) passing through the points (a; f (a)) and (b; f (b)) of the
graph of f , while f 0 (x) is the slope of the straight line tangent to the graph of f at the point
(x; f (x)). The Mean Value Theorem establishes, therefore, a simple su cient condition for
the existence of a point x ^ 2 (a; b) such that the straight line tangent at (^
x; f (^
x)) is parallel
to the straight line passing through the points (a; f (a)) and (b; f (b)). Graphically:

6
y

1
O a c b x

0
0 1 2 3 4 5

Note that the increment f (b) f (a) on the whole interval [a; b] can be written, thanks
to the Mean Value Theorem, as

f (b) f (a) = f 0 (^
x) (b a)

or, in an equivalent way, as

f (b) f (a) = f 0 a + t^(b a) (b a)

for a suitable 0 t^ 1. Indeed, we have

[a; b] = f(1 t) a + tb : t 2 [0; 1]g = fa + t (b a) : t 2 [0; 1]g

^ 2 [a; b] can be written in the form a + t^(b


so every point x a) for a suitable t^ 2 [0; 1].
876 CHAPTER 28. DIFFERENTIAL METHODS

Proof Let g : [a; b] ! R be the auxiliary function de ned by

f (b) f (a)
g (x) = f (x) f (a) + (x a)
b a

It is the di erence between f and the straight line passing through the points (a; f (a))
and (b; f (b)). The function g is continuous on [a; b] and di erentiable on (a; b). Moreover,
g (a) = g (b) = 0. By Rolle's Theorem, there exists x^ 2 (a; b) such that g 0 (^
x) = 0. But

f (b) f (a)
g 0 (x) = f 0 (x)
b a
and therefore
f (b) f (a)
f 0 (^
x) =0
b a
That is, x
^ satis es condition (28.7).

A rst interesting application of the Mean Value Theorem shows that constant functions
are characterized by having a zero derivative at every point.

Corollary 1312 Let f : [a; b] ! R be continuous on [a; b] and di erentiable on (a; b). Then
f 0 (x) = 0 for every x 2 (a; b) if and only if f is constant, that is, if and only if there exists
k 2 R such that
f (x) = k 8x 2 [a; b]

Proof Let us prove the \only if", since the \if" is the simple property of derivatives seen
in Example 1211. Let x 2 (a; b] and let us apply the Mean Value Theorem on the interval
[a; x]. It yields a point x
^ 2 (a; x) such that

f (x) f (a)
0 = f 0 (^
x) =
x a
that is, f (x) = f (a). Since x is any point in (a; b], it follows that f (x) = f (a) for any
x 2 [a; b].

This characterization of constant functions will prove important in the theory of integra-
tion. In particular, the following simple generalization of Corollary 1312 will be key.

Corollary 1313 Let f; g : [a; b] ! R be continuous on [a; b] and di erentiable on (a; b).
Then f 0 (x) = g 0 (x) for every x 2 (a; b) if and only if there exists k 2 R such that

f (x) = g (x) + k 8x 2 [a; b]

Two functions that have the same rst derivative are, thus, equal up to an (additive)
constant k.

Proof Here too we prove the \only if", the \if" being obvious. Let h : [a; b] ! R be the
auxiliary function h (x) = f (x) g (x). We have h0 (x) = f 0 (x) g 0 (x) = 0 for every
x 2 (a; b). Therefore, by Corollary 1312 h is constant on [a; b]. That is, there exists k 2 R
such that h (x) = k for every x 2 [a; b], so f (x) = g (x) + k for every x 2 [a; b].
28.2. MEAN VALUE THEOREM 877

Via higher order derivatives, next we establish the ultimate version of the Mean Value
Theorem.5
Theorem 1314 Let f : [a; b] ! R be n 1 times continuously di erentiable on [a; b] and n
times di erentiable on (a; b). Then, there exists x
^ 2 (a; b) such that
n
X1 f (k) (a) f (n) (^
x)
f (b) f (a) = (b a)k + (b a)n (28.8)
k! n!
k=1

The Mean Value Theorem is the special case n = 1 because (28.7) can be equivalently
written as
f (b) f (a) = f 0 (^
x) (b a)
The mean-value formula (28.8) can be seen as a version of Taylor's formula, arguably the
most important formula of calculus that will be studied in detail later in the book (Chapter
29). For this reason, we call it the Lagrange-Taylor formula.

Proof Let g : [a; b] ! R be the auxiliary function de ned by


n
X1 f (k) (x) k
g (x) = f (b) f (x) (b x)k (b x)n
k! n!
k=1

The function g is continuous on [a; b] and di erentiable on (a; b). Some algebra shows that
(b x)n 1
g 0 (x) = k f (n) (x)
(n 1)!
Let the scalar k be such that g (a) = 0, i.e.,
n
!
X1 f (k) (a) n!
k= f (b) f (a) (b a)k
k! (b a)n
k=1

We thus have g (a) = g (b) = 0. By Rolle's Theorem, there exists x


^ 2 (a; b) such that
0
g (^
x) = 0. So
^)n 1
(b x
0= k f (n) (^
x)
(n 1)!
and therefore k = f (n) (^
x). We thus have
n
X1 f (k) (a) f (n) (^
x)
0 = g (a) = f (b) f (a) (b a)k (b a)n
k! n!
k=1

which implies (28.8).

We close by noting that, as easily checked, there is a dual version of (28.8) involving the
derivatives at other endpoint of the interval:
n
X1 f (k) (b) f (n) (^
x)
f (a) f (b) = (a b)k + (a b)n (28.9)
k! n!
k=1

where, again, x
^ 2 (a; b).
5
In the statement we adopt the convention that \0 times continuous di erentiability" just amounts to
continuity. Moreover, f (0) = f .
878 CHAPTER 28. DIFFERENTIAL METHODS

28.3 Continuity properties of the derivative


The derivative function may exist at a point without being continuous at that point, as the
next example shows.

Example 1315 Let f : R ! R be de ned by


( 2
x sin x1 x 6= 0
f (x) =
0 x=0

As the reader can check (cf. Example 1372 below), we have


(
0
2x sin x1 cos x1 x 6= 0
f (x) =
0 x=0

So, f is di erentiable at 0, but the derivative function f 0 is discontinuous there. N

Although it might be discontinuous, the derivative function still satis es the intermediate
value property of Lemma 574, as the next important result proves.

Theorem 1316 (Darboux) Let f : [a; b] ! R be di erentiable, with f 0 (a) < f 0 (b). If

f 0 (a) z f 0 (b)

then there exists a c b such that f 0 (c) = z. If f 0 is strictly increasing, such c is unique.

Proof Let f 0 (a) < z < f 0 (b) (otherwise the result is trivially true). Set g(x) = f (x) zx. We
have g 0 (x) = f 0 (x) z, and therefore g 0 (a) < 0 and g 0 (b) > 0. The function g is continuous
on [a; b] and, therefore, by Weierstrass' Theorem it has a minimizer xm on [a; b]. Let us
prove that the minimizer xm is interior. Since g 0 (a) < 0, there exists a point x1 2 (a; b) such
that g(x1 ) < g(a). Analogously, being g 0 (b) > 0, there exists a point x2 2 (a; b) such that
g(x2 ) < g(b). This implies that neither a nor b are minimizers of g on [a; b], so xm 2 (a; b).
By Fermat's Theorem, g 0 (xm ) = 0, that is, f 0 (xm ) = z. In conclusion, there exists c 2 (a; b)
such that f 0 (c) = z.

As in Lemma 574, the case f 0 (a) > f 0 (b) is analogous. We can thus say that, for any z
such that
min f 0 (a) ; f 0 (b) z max f 0 (a) ; f 0 (b)
there exists a c b such that f (c) = z. If f 0 is strictly monotone, such c is unique.

Since in general the derivative function is not continuous (so Weierstrass' Theorem cannot
be invoked), Darboux's Theorem does not imply { unlike Lemma 574 { a version of the
Intermediate Value Theorem for the derivative function. Still, Darboux's Theorem is per se
a remarkable property of continuity of the derivative function that implies, inter alia, that
such function can only have essential non-removable discontinuities.

Corollary 1317 If f : [a; b] ! R is di erentiable, then its derivative function f 0 : [a; b] ! R


cannot have removable discontinuities or jump discontinuities.
28.4. MONOTONICITY AND DIFFERENTIABILITY 879

Proof Let us suppose, by contradiction, that f 0 has at x0 2 (a; b) a removable discontinuity,


that is, limx!x0 f 0 (x) = L 6= f 0 (x0 ). Suppose that L < f 0 (x0 ) (the proof is analogous if
L > f 0 (x0 )). If " is such that 0 < " < f 0 (x0 ) L, then there exists > 0 such that

x0 6= x 2 (x0 ; x0 + ) =) L " < f 0 (x) < L + " < f 0 (x0 )


0
By taking any 0 < < , we therefore have
0 0
x0 6= x 2 x0 ; x0 + =) L " < f 0 (x) < L + " < f 0 (x0 ) (28.10)
0 0
Consider the interval x0 ; x0 . By (28.10), we have f 0 (x0 ) < f 0 (x0 ). By Darboux's
0
Theorem, for every f 0 (x0 ) < z < f 0 (x0 ) there exists c 2 (x0 ; x0 ) such that f 0 (c) = z.
But this contradicts (28.10) which implies that, taking z 2 (L + "; f 0 (x0 )), there is no c 2
[x0 ; x0 ] such that f 0 (c) = z. Hence, f 0 cannot have removable discontinuities.
The function f 0 cannot have jump discontinuities either. Suppose, by contradiction, that
f has such a discontinuity at x0 2 (a; b), that is, limx!x+ f 0 (x) 6= limx!x f 0 (x). Suppose
0
0 0
that f 0 (x0 ) = limx!x+ f 0 (x) (the proof is analogous if f 0 (x0 ) = limx!x f 0 (x)). By setting
0 0
L = limx!x f 0 (x), the proof proceeds in an analogous way to the one seen for the removable
0
discontinuity, as the reader can verify.

28.4 Monotonicity and di erentiability


There is a strict link between the monotonicity of a di erentiable function and the sign of
its derivative. This allows us to study monotonicity through di erential conditions based
on properties of the derivatives. Such conditions are important, both conceptually and
operationally, to check the monotonicity properties of a function. For simplicity, we only
consider scalar functions.
We start by introducing the concept of monotonicity of a function at a point of its domain.

De nition 1318 A function f : A R ! R is said to be (locally) increasing at a limit


point x0 2 A if there exists a neighborhood B" (x0 ) of x0 such that

x < x0 < y =) f (x) f (x0 ) f (y) 8x; y 2 B" (x0 ) \ A

Moreover, the function is said to be (locally) strictly increasing if these inequalities are all
strict.

Similar de nitions hold for the (strictly) decreasing monotonicity at a point.

Example 1319 A function f : R ! R is increasing at a point x0 2 R if there exists " > 0


such that

x < x0 < y =) f (x) f (x0 ) f (y) 8x; y 2 (x0 "; x0 + ")

It is strictly increasing if all these inequalities are strict. For instance, the quadratic function
f (x) = x2 is strictly increasing at all points x0 > 0 and strictly decreasing at all points
x0 < 0; at the origin, it is neither increasing nor decreasing. N
880 CHAPTER 28. DIFFERENTIAL METHODS

To avoid misunderstandings, recall that in Section 6.4.4 we de ned monotonicity in a


global way by saying (in De nition 217) that a function f : A R ! R is increasing if

x > y =) f (x) f (y) 8x; y 2 A

and strictly increasing if

x > y =) f (x) > f (y) 8x; y 2 A

with analogous de nitions for decreasing monotonicity. Obviously, an increasing function on


A is increasing at each point of A. We will see momentarily that, in general, the converse
does not hold: local monotonicity at each point of A does not guarantee global monotonicity
on A.

The following result is immediate.

Proposition 1320 Let f : A R ! R be di erentiable at an interior point x0 2 A.

(i) If f is increasing at x0 , then f 0 (x0 ) 0.

(ii) If f 0 (x0 ) > 0, then f is strictly increasing at x0 .

A dual characterization holds for (strictly) decreasing monotonicity.

Proof If f is increasing, the di erence quotients of f at x0 are all positive (at least for h
su ciently small), so their limit is 0. If instead f 0 (x0 ) > 0, the di erence quotients are, at
least for h close to 0, strictly positive by the Theorem on the permanence of sign. It follows
that f (x0 + h) > f (x0 ) for h > 0 and f (x0 + h) < f (x0 ) for h < 0, with h su ciently
small, so f is strictly increasing at x0 .

Example 1321 The function f : R ! R de ned by f (x) = 2x2 3x is strictly increasing


at x0 = 5, since f 0 (5) = 17 > 0 and strictly decreasing at x0 = 0, since f 0 (0) = 3 < 0. N

Note the asymmetry between points (i) and (ii) of the previous proposition:

f increasing at x0 =) f 0 (x0 ) 0 (28.11)


but
f 0 (x0 ) > 0 =) f strictly increasing at x0 (28.12)
The non-negativity of the derivative is necessary for the increasing monotonicity, while its
strict positivity is su cient for the strictly increasing monotonicity.
This asymmetry is unavoidable because the converses of (28.11) and (28.12) do not hold.
For example, the function f (x) = x3 is strictly decreasing at 0 although f 0 (0) = 0, so the
converse of (28.11) is false. The function f (x) = x3 is strictly increasing at x0 = 0 (indeed
x3 > 0 for every x > 0 and x3 < 0 for every x < 0), but f 0 (0) = 0, so the converse of (28.12)
is false as well.

We might think that, if a function is monotone at each point of a set A, it enjoys the
same type of monotonicity on the entire set A, i.e., globally. This is not the case. Indeed,
28.4. MONOTONICITY AND DIFFERENTIABILITY 881

consider the function f (x) = 1=x de ned on the open set R f0g. It is strictly increasing
at each point of its domain because f 0 (x) = 1=x2 > 0 for every x 6= 0. However, it is not
increasing at all because, for example 1 < 1, while f ( 1) = 1 > 1 = f (1). Graphically:

8 y

0
O x
-2

-4

-6

-8
-4 -3 -2 -1 0 1 2 3 4 5

Therefore, monotonicity at each point of a set does not imply global monotonicity (of the
same type). Intuitively, this may happen because if such set is a union of disjoint intervals,
then at each interval the function \gets back to the beginning". The next important result
con rms this intuition by showing that the implication does hold when the set is an interval
(so we get rid of the case unions of disjoint intervals just mentioned). It is the classic
di erential criterion of monotonicity.
Proposition 1322 Let f : (a; b) ! R be a di erentiable function, with a; b 2 R. Then, f is
(globally) increasing on (a; b) if and only if f 0 (x) 0 for every x 2 (a; b).
Because of the clause a; b 2 R, the interval (a; b) can be unbounded, for example
(a; b) = R. A dual result, with negativity of the derivative on (a; b), holds for the de-
creasing monotonicity. Note that Corollary 1312 is then a special case since f 0 (x) = 0 for
every x 2 (a; b) is equivalent to having both f 0 (x) 0 and f 0 (x) 0 for every x 2 (a; b)
and therefore, being simultaneously increasing and decreasing, f is constant.

Proof \Only if". Suppose that f is increasing. Let x 2 (a; b). For every h > 0 we have
f (x + h) f (x), hence
f (x + h) f (x)
0
h
It follows that
f (x + h) f (x) f (x + h) f (x)
f 0 (x) = lim = lim 0
h!0 h h!0 + h
\If". Let f 0 (x) 0 for every x 2 (a; b). Let x1 ; x2 2 (a; b) with x1 < x2 . By the Mean
Value Theorem, there exists x
^ 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (^
x) = (28.13)
x2 x1
882 CHAPTER 28. DIFFERENTIAL METHODS

Since f 0 (^
x) 0 and x2 x1 > 0, this shows that f (x2 ) f (x1 ).

Example 1323 (i) Let f : R ! R be given by f (x) = 3x5 +2x3 . Since f 0 (x) = 15x4 +6x2
0 for every x 2 R, by Proposition 1322 the function is increasing. (ii) Let f : R ! R be
the quadratic function f (x) = x2 . We have f 0 (x) = 2x and hence Proposition 1322 (and its
analog for decreasing monotonicity) shows that f is neither increasing nor decreasing on R,
and that it is increasing on (0; 1) and decreasing on ( 1; 0), as previously argued (Example
1319). N

Next we show that the strict positivity of the derivative implies strict increasing mono-
tonicity, thus providing a most useful di erential criterion of strict monotonicity.

Proposition 1324 Let f : (a; b) ! R be a di erentiable function, with a; b 2 R. If f 0 (x) > 0


for every x 2 (a; b), then f is (globally) strictly increasing on (a; b).

Proof The proof is similar to that of Proposition 1322 and is a simple application of the
Mean Value Theorem. Let f 0 (x) > 0 for every x 2 (a; b) and let x1 ; x2 2 (a; b) with x1 < x2 .
By the Mean Value Theorem, there exists c 2 [x1 ; x2 ] such that

f (x2 ) f (x1 )
f 0 (c) = (28.14)
x2 x1

Since f 0 (c) > 0 for every c and x2 x1 > 0, from (28.14) it follows that f (x2 ) > f (x1 ).

The next example shows that the converse of the last result is false, so that for the
derivative of a strictly increasing function on an interval we can only say that it is 0 (and
not > 0).6

Example 1325 Let f : R ! R be the cubic function f (x) = x3 . It is strictly increasing at


every point, but f 0 (0) = 0 and so it is not true that f 0 (x) > 0 for every x 2 R. N

Propositions 1322 and 1324 give very useful di erential criteria for the monotonicity of
scalar functions (dual versions hold for decreasing monotonicity). They hold also for closed or
half-closed intervals, once the derivatives at the boundary points are understood as one-sided
ones.
We illustrate Proposition 1324 with an example.

Example 1326 (i) By Proposition 1324 (and its analog for decreasing monotonicity), the
quadratic function f (x) = x2 is strictly increasing on (0; 1) and strictly decreasing on
( 1; 0).
(ii) By Proposition 1324, the function f (x) = 3x5 + 2x3 is strictly increasing both on
( 1; 0) and on (0; 1). Nevertheless, the proposition cannot say anything about the strict
increasing monotonicity of f on R because f 0 (0) = 0. We can, however, check whether f is
strictly increasing on R through the de nition of strict increasing monotonicity. To this end,
note that f (y) < f (0) = 0 < f (x) for every y < 0 < x, so f is indeed strictly increasing on
the entire real line. N
6
Later in the book, we will see that the converse holds under concavity (Corollary 1441).
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 883

That said, we close with a curious characterization of strict monotonicity that, in a sense,
completes Proposition 1324.

Proposition 1327 Let f : (a; b) ! R be a di erentiable function, with a; b 2 R. Then f


is strictly increasing if and only if f 0 0 and, for every a x0 < x00 b, there exists
0 00 0
z 2 [x ; x ] such that f (z) > 0.

Thus, it is the strict positivity at the points of an \order dense" subset of the domain (a; b)
that characterizes strictly increasing functions. In view of Proposition 218, for a di erentiable
monotone function this strict positivity amounts to injectivity.

Proof \If" Since f 0 0, the function f is increasing (Proposition 1322). Suppose, by


contradiction, that it is not strictly increasing. Then, there exist a x0 < x00 b such
that f (x) = f (x ) for all x 2 [x ; x ]. By Corollary 1312, f (x) = 0 for all x 2 [x ; x00 ], a
0 0 00 0 0

contradiction. We conclude that f is strictly increasing.


\Only if" Let f be strictly increasing. A fortiori, f is increasing, so f 0 0 by Proposition
1322. Suppose, by contradiction, that there exist a x0 < x00 b such that f 0 (x) = 0 for all
x 2 [x0 ; x00 ]. Again by Corollary 1312, the function f is constant on [x0 ; x00 ], a contradiction.

We can revisit the last two examples in view of Proposition 1327. Indeed, by this result
we can say that the cubic function and the function f (x) = 3x5 + 2x3 are both strictly
increasing because their derivatives are everywhere strictly positive except at the origin.

A nal twist: under continuous di erentiability, the \dense" strict positivity of the deriva-
tive actually characterizes strictly increasing functions.

Corollary 1328 Let f : (a; b) ! R be a continuously di erentiable function, with a; b 2 R.


Then f is strictly increasing if and only if, for every a x0 < x00 b, there exists z 2 [x0 ; x00 ]
such that f 0 (z) > 0.

Proof In view of Proposition 1327, it is enough to show that f 0 0 if for every a x0 <
x00 b there exists x0 z x00 such that f 0 (z) > 0. Let x 2 (a; b). For each n large enough
so that x + 1=n 2 (a; b), there is a point x zn x + 1=n with f 0 (zn ) > 0. Since f 0 is
continuous, from zn ! x it follows that f 0 (x) = lim f 0 (zn ) 0. Since x was arbitrarily
chosen, we conclude that f 0 0.

28.5 Su cient conditions for local extremal points


28.5.1 Local extremal points
In Section 28.1 we studied the fundamental necessary condition for a point to be a local
extremal, namely, the derivative being equal to zero at that point. The simple example
f (x) = x3 showed that the condition is necessary, but not su cient. By the results just es-
tablished on monotonicity and di erentiability, we can now integrate what we saw in Section
28.1 with a su cient condition for the existence of local extremal points. For simplicity, we
will consider only scalar functions.
884 CHAPTER 28. DIFFERENTIAL METHODS

The su cient condition in which we are interested is based on a simple intuition: for x0
to be a local maximizer, there must exist a neighborhood of x0 in which the function rst
increases { i.e., f 0 (x) > 0 if x < x0 { and then, once it has reached the maximum value at
x0 , decreases { i.e., f 0 (y) < 0 if y > x0 . Graphically:

6
y

O x x
0
1

0
-1 0 1 2 3 4 5

Proposition 1329 Let f : A R ! R and C A. An interior point x0 of C is a local


maximizer if there exists a neighborhood B" (x0 ) of x0 such that f is continuous at x0 and
di erentiable at each x0 6= x 2 B" (x0 ), with

x < x0 < y =) f 0 (x) 0 f 0 (y) 8x; y 2 B" (x0 ) \ C (28.15)

If the inequalities in (28.15) are strict, the local maximizer is strong (so unique).

In a dual way, a local minimizer if in (28.15) we have f 0 (x) 0 f 0 (y), which is strong
if f 0 (x) < 0 < f 0 (y).7 Note that the di erentiability of f at x0 is not required, only its
continuity.

Proof Without loss of generality, assume that B" (x0 ) = (x0 "; x0 + ") C. Let x 2
(x0 "; x0 ). By the Mean Value Theorem, there exists 2 (x0 "; x0 ) such that

f (x0 ) f (x)
= f 0( )
x0 x

By (28.15), we have f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In a similar way,
we can prove that f (x0 ) f (y) for every y 2 (x0 ; x0 + "). So, f (x0 ) f (x) for every
x 2 B" (x0 ) and therefore x0 is a local maximizer.

In particular, the following classic corollary of Proposition 1329 holds. Though weaker,
in many cases it is good enough.
7
In particular, if in (28.15) we have f 0 (x) = 0 = f 0 (y), the point x0 is simultaneously a local maximizer
and a local minimizer, that is, the function f is locally constant at x0 .
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 885

Corollary 1330 Let f : A R ! R and C A. An interior point x0 of C is a local


maximizer if there exists a neighborhood B" (x0 ) of x0 on which f is di erentiable, with
f 0 (x0 ) = 0 and

x < x0 < y =) f 0 (x) 0 f 0 (y) 8x; y 2 B" (x0 ) \ C (28.16)

If the inequalities in (28.16) are strict, the local maximizer is strong (so unique).

Example 1331 Let f : R ! R be given by f (x) = 1 x2 and take x0 = 0. We have


f 0 (x) = 2x and hence (28.15) is satis ed in a strict sense. Thanks to Proposition 1329 or
to Corollary 1330, x0 is a strong maximizer. N

Example 1332 Let f : R ! R be given by f (x) = jxj and take x0 = 0. The function is
continuous at x0 and di erentiable at each x 6= 0. We have
(
1 if x < 0
f 0 (x) =
1 if x > 0

and hence (28.15) is satis ed in a strict sense. By Proposition 1329, x0 is a strong local
maximizer. Note that in this case Corollary 1330 cannot be applied. N

The previous su cient condition can be substantially simpli ed if we assume that the
function is twice continuously di erentiable. In this case, it is indeed su cient to evaluate
the sign of the second derivative at the point.

Corollary 1333 Let f : A R ! R and C A. An interior point x0 of C is a strong


(so unique) local maximizer if there exists a neighborhood B" (x0 ) of x0 on which f is twice
continuously di erentiable, with f 0 (x0 ) = 0 and f 00 (x0 ) < 0.

Proof Thanks to the continuity of f 00 at x0 , we have limx!x0 f 00 (x) = f 00 (x0 ) < 0. The
Theorem on the permanence of sign implies the existence of a neighborhood B" (x0 ) such
that f 00 (x) < 0 for every x 2 B" (x0 ). Hence, by Proposition 1324 the rst derivative f 0 is
strictly decreasing in B" (x0 ), that is,

x < x0 < y =) f 0 (x) > f 0 (x0 ) = 0 > f 0 (y)

By Proposition 1329, x0 is a strong local maximizer.

Example 1334 Going back to Example 1331, in view of Corollary 1333 it is actually su -
cient to observe that f 00 (0) = 2 < 0 to conclude that x0 = 0 is a strong local maximizer.
Instead, Corollary 1333 cannot be applied to Example 1331 because f (x) = jxj is not
di erentiable at x0 = 0. N

The next example shows that the condition f 00 (x0 ) < 0 is su cient, but not necessary:
there exist local maximizers x0 for which we do not have f 00 (x0 ) < 0.

Example 1335 Let f : R ! R be given by f (x) = x4 . The point x0 = 0 is a local


maximizer, yet f 00 (x0 ) = 0. N
886 CHAPTER 28. DIFFERENTIAL METHODS

28.5.2 Searching local extremal points via rst and second-order condi-
tions
Let x0 be an interior point of C, that is, x0 2 int C. In view of Corollary 1333, we can say
that:

(i) if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, then x0 is a local maximizer;

(ii) if f 0 (x0 ) = 0 and f 00 (x0 ) > 0, then x0 is a local minimizer;

(iii) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local maximizer;

(iv) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local minimizer.

We can therefore reformulate the previous corollary as follows.

Corollary 1336 Let f : A R ! R and C A. If f is twice continuously di erentiable


on int C, then:

(i) necessary condition for a point x0 2 int C to be a local maximizer is that f 0 (x0 ) = 0
and f 00 (x0 ) 0;

(ii) su cient condition for a point x0 2 int C to be a strong local maximizer is that f 0 (x0 ) =
0 and f 00 (x0 ) < 0.

Intuitively, if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, the derivative function f 0 at x0 is zero and
strictly decreasing (because its derivative f 00 is strictly negative): therefore it goes, being
zero at x0 , from positive values to negative ones. Hence, the function is increasing before
x0 , stationary at x0 and decreasing after x0 . It follows that x0 is a maximizer.8 A similar
intuition holds for the necessary part.
As it should be clear by now, (i) is a necessary but not su cient condition, while (ii) is
a su cient but not necessary condition. It is an unavoidable asymmetry which we have to
live with.

Terminology The conditions on the second derivatives of the last corollary are called second-
order conditions. In particular:

(i) the inequality f 00 (x0 ) 0 (resp., f 00 (x0 ) 0) is called second-order necessary condition
for a maximizer (resp., for a minimizer ).

(ii) the inequality f 00 (x0 ) < 0 (resp., f 00 (x0 ) > 0) is called second-order su cient condition
for a maximizer (resp., for a minimizer ).

The interest of Corollary 1336 is in allowing to establish a procedure for the search of
local maximizers and minimizers on C of a twice-di erentiable function f : A R ! R.
Though it will be considerably re ned in Section 29.2.2, it is often good enough.
Suppose that f is twice continuously di erentiable on the set of the interior points int C
of C. The procedure has two stages, based on the rst and second-order su cient conditions.
Speci cally:
8
Alternatively, at x0 the function f is stationary and concave (see below), so it admits a maximizer.
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 887

1. Determine the set S int C of the stationary interior points of f ; in other words, solve
the rst-order condition f 0 (x) = 0.

2. Compute f 00 at each of the stationary points x 2 S and check the second-order su cient
conditions: the point x is a strong local maximizer if f 00 (x) < 0, while it is a strong
local minimizer if f 00 (x) > 0. If f 00 (x) = 0 the procedure fails.

The procedure is based on Corollary 1336-(ii). The rst stage { i.e., the solution of the
rst-order condition { is based on Fermat's Theorem: stationary points are the only interior
points that are possible candidates for local extremal points. Hence, the knowledge acquired
in the rst stage is \negative", it rules out all the interior points that are not stationary as
none of them can be a local maximizer or minimizer.
The second stage { i.e., the check of the second-order condition { examines one by one the
possible candidates from the rst stage to see if they meet the su cient condition established
in Corollary 1336-(ii).

Example 1337 Let f : R ! R be given by f (x) = 10x3 (x 1)2 and C = R. Via the
procedure, we search the local extremal points of f on R. We have C = int C = R and f is
twice continuously di erentiable on R. As to stage 1, by recalling what we saw in Example
1306, we have:
S = f0; 1; 3=5g
The stationary points in S are the unique candidates for local extremal points. As to stage
2, we have
f 00 (x) = 60x (x 1)2 + 120x2 (x 1) + 20x3
and therefore f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence, the point 1 is a strong local
minimizer, the point 3=5 is a strong local maximizer, while the nature of the point 0 remains
undetermined. N

The procedure, although very useful, has important limitations. First of all, it can deal
only with the interior points of C at which f is twice continuously di erentiable. It is,
instead, completely silent on the other points of C { that is, on its boundary points as well
as on its interior points at which f is not twice continuously di erentiable.

Example 1338 Let f : [0; 1] ! R be de ned by


(
x if x 2 (0; 1)
f (x) =
2 if x 2 f0; 1g

The boundary points 0 and 1 are maximizers, but the procedure is not able to recognize
them as such. N

A further limitation of the procedure is its indeterminacy in the case f 00 (0) = 0, as the
simple function f (x) = x4 most eloquently shows: whether or not the stationary point x = 0
is a local minimizer cannot be determined through the procedure because f 00 (0) = 0. Let us
see another example which is as trivial as disconcerting (for the procedure's self-esteem).
888 CHAPTER 28. DIFFERENTIAL METHODS

Example 1339 A constant function f : R ! R is trivially twice continuously di erentiable


on R. Given any open set C of R, we have f 0 (x) = f 00 (x) = 0 for every x 2 C. Therefore,
all the points of C are stationary and the procedure is not able to say anything about their
nature. But, each point in C is trivially both a maximizer and a minimizer (and a global
one too!). N

28.5.3 Searching global extremal points via rst and second-order condi-
tions
We can apply what we just learned to the unconstrained optimization problem (28.5), re ning
for the scalar case the analysis of Section 28.1.3. So, consider the unconstrained optimization
problem

max f (x) sub x 2 C


x
where the set C is an open set of the real line. Assume that f is twice continuously di er-
entiable on C, that is, f 2 C 2 (C).
By Corollary 1336-(i), we now have a further necessary condition for a point x^ 2 C to
be a solution, that is, the second-order necessary condition f 00 (^
x) 0. We thus have the
following procedure for nding solutions of the unconstrained optimization problem:

1. Determine the set S C of the stationary interior points of f by solving the rst-order
condition f 0 (x) = 0.

2. Compute f 00 in each of the stationary points x 2 S and compute the set S2 =


fx 2 S : f 00 (x) 0g.

3. Determine the set

S3 = x 2 S2 : f (x) f x0 for all x0 2 S2

of the points of C that are candidate solutions of the optimization problem.

Note that the procedure is not conclusive because a key piece of information is lacking:
whether the problem actually admits a solution. The di erential methods of this chapter
do not ensure the existence of a solution, which only Weierstrass' and Tonelli's Theorems
are able to guarantee (in the absence of concavity properties of the objective functions).
In Chapter 37, we will show how the elimination method re nes, in a resolutive way, the
procedure that we outlined here by combining such existence theorems with the di erential
methods.

Example 1340 As usual, the study of the cubic function f (x) = x3 is of illuminating
simplicity: though the unconstrained optimization problem

max x3 sub x 2 R
x

does not admit solutions, nevertheless the procedure determines the singleton S3 = f0g.
According to the procedure, the point 0 is the unique candidate solution of the problem:
unfortunately, the solution does not exist and it is, therefore, a useless candidacy. N
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 889

The next examples illustrate the procedure.


4 2
Example 1341 Let f : R ! R be given by f (x) = e x +x and let C = R. Let us apply
the procedure to the unconstrained optimization problem
x4 +x2
max e sub x 2 R
x

The rst-order condition f 0 (x) = 0 is


x4 +x2
4x3 + 2x e =0
p
So, x = 0 and x = 1= 2 are the unique stationary points, that is,
1 1
S= p ; 0; p
2 2
Since
x4 +x2
f 00 (x) = 2 8x6 8x4 4x2 + 1 e
p p
we have f 00 (0) > 0 and f 00 1= 2 = f 00 1= 2 < 0, so
1 1
S2 = p ;p
2 2
p p
On the other hand, f 1= 2 = f 1= 2 , and hence S3 = S2 . In conclusion, the points
p
x = 1= 2 are the candidate solutions of the unconstrained optimization problem. Example
1683, through the elimination method, will show that these points are, indeed, solutions of
the problem. N
Example 1342 Consider again Example 1337 and the unconstrained optimization problem
max 10x3 (x 1)2 sub x 2 R
x

From Example 1337 we know that


S = f0; 1; 3=5g
as well as that f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence,
3
S2 = 0;
5
Since
3
f (0) = 0 < f
5
we get
3
S3 =
5
The point x = 3=5 is therefore the unique candidate solution of the unconstrained optimiza-
tion problem. As in the example of the cubic function, unfortunately this candidacy is vain:
indeed,
lim 10x3 (x 1)2 = +1
x!+1
Therefore the function, being unbounded above, has no global maximizers on R. The un-
constrained optimization problem has no solutions. N
890 CHAPTER 28. DIFFERENTIAL METHODS

It is important to observe how the global nature of the solution gives a di erent perspec-
tive on Corollary 1336. Of this result, we are interested in point (i) that provides a necessary
conditions for local maximizers (second-order necessary condition of the form f 00 (x) 0).
At the same time, in the previous search for local extremal points we considered point (ii) of
such result that covers su ciency (second-order su cient condition of the form f 00 (x) < 0).
From the \global" point of view, the fact that f 00 (x) < 0 implies that x is a strong local
maximizer is of secondary importance. Indeed, it is not conclusive: the point could be just a
local maximizer and, moreover, we could also have solutions where f 00 (x) = 0.9 In contrast,
the information f 00 (x) > 0 is conclusive in that it excludes, ipso facto, that x may be a
solution.
This is another example of how the global point of view, the one which we are really
interested in applications, can lead to view things in a di erent way relative to a local point
of view.10

28.5.4 A false start: global extremal points


The intuition presented at the beginning of the section can lead, for open domains and
with global hypotheses of di erentiability, to simple su cient conditions for global extremal
points. Also here we limit ourselves to the scalar case.

Proposition 1343 Let f : (a; b) ! R be di erentiable, with a; b 2 R. A point x0 2 (a; b) is


a global maximizer if, for every x; y 2 (a; b), we have

x < x0 < y =) f 0 (x) 0 f 0 (y) (28.17)

If the inequalities are strict, the maximizer is strong (so unique).

Naturally, x < x0 < y implies f 0 (x) 0 f 0 (y) is the dual version of (28.17) that leads
to global minimizers.

Proof Let x 2 (a; b) be such that x < x0 . Fixing any " 2 (x0 x; x0 a), it follows that
x 2 (x0 "; x0 ). By the Mean Value Theorem there exists 2 (x0 "; x0 ) such that

f (x0 ) f (x)
= f 0( )
x0 x

By (28.17), f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In an analogous way we


prove that f (x0 ) f (y) for every y > x0 . In conclusion, f (x0 ) f (x) for every x 2 (a; b),
and so x0 is a maximizer.

Example 1344 If we go back to f (x) = 1 x2 of Example 1331, we have

x < 0 < y =) f 0 (x) > 0 > f 0 (y)

So, by Proposition 1343 x0 = 0 is a unique global maximizer. N


9
For example, this is the case for the unconstrained optimization problem maxx x4 sub x 2 R.
10
Calculus courses often emphasized the local viewpoint. Motivated by applications, throughout the book
we do the opposite.
28.6. DE L'HOSPITAL'S THEOREM AND RULE 891

Despite being attractive because of its simplicity, the global hypothesis (28.17) on deriva-
tives is less relevant than one can think prima facie because in applications it is typically
subsumed by concavity. Indeed, under concavity the rst derivative (if exists) is decreasing
(cf. Corollary 1426), so condition (28.17) automatically holds provided the rst-order con-
dition f 0 (x0 ) = 0 holds. Though condition (28.17) can be used to nd the maximizers of
functions that are not concave { e.g., in Section 36.4 we will apply it to the Gaussian func-
tion, which is neither concave nor convex { it is much more convenient to consider a general
property of a function, like concavity, that does not require, a priori, the identi cation of
a point x0 on which to check (28.17). All this explains the brevity of this section (and its
title). The role of concavity, instead, will be studied at length later in the book.

28.6 De l'Hospital's Theorem and rule


28.6.1 Indeterminate forms 0=0 and 1=1
In this section we consider the so-called de l'Hospital's rule,11 another classic application of
the Mean Value Theorem that is most useful in the computation of limits that come in the
indeterminate forms 0=0 and 1=1.
As we will see, the rule says that, under suitable conditions, it is possible to reduce the
computation of the limit of a ratio limx!x0 f (x) =g (x) to that of the ratio of the derivatives,
that is, limx!x0 f 0 (x) =g 0 (x). Since this latter limit may be simpler than the former one, the
rule o ers one more instrument in the calculation of limits. As just anticipated, it reveals
itself particularly valuable for the indeterminate forms of the type 0=0 and 1=1 (to which,
as we know, it is possible to reduce all the other ones).

Theorem 1345 (de l'Hospital) Let f; g : (a; b) ! R be di erentiable on (a; b), with a; b 2
R and g 0 (x) 6= 0 for every x 2 (a; b), and let x0 2 [a; b], with
f 0 (x)
lim =L2R (28.18)
x!x0 g 0 (x)
If either limx!x0 f (x) = limx!x0 g (x) = 0 or limx!x0 f (x) = limx!x0 g (x) = 1, then
f (x)
lim =L
x!x0 g (x)
Thus, de l'Hospital's rule says that, under the hypotheses just indicated, we have
f 0 (x) f (x)
lim = L =) lim =L
x!x0 g 0 (x) x!x0 g (x)

i.e., the calculation of the limit limx!x0 f (x) =g (x) can be reduced to the calculation of
the limit of the ratio of the derivatives limx!x0 f 0 (x) =g 0 (x). The simpler the second limit
compared to the original one, the greater the usefulness of the rule.

Note that the { by now usual { clause a; b 2 R allows the interval (a; b) to be unbounded.
The rule holds, therefore, also for limits as x ! 1. Moreover, it applies also to one-sided
limits, even if for brevity we have omitted this case in the statement.
11
The result is actually due to Johann Bernoulli.
892 CHAPTER 28. DIFFERENTIAL METHODS

We omit the proof of the l'Hospital's Theorem. Next we illustrate his rule with some
examples.

Example 1346 Let f : ( 1; 1) ! R be given by f (x) = log (1 + x) and let g : R ! R be


given by g (x) = x. For x0 = 0 the limit limx!x0 f (x) =g (x) is of the indeterminate form
0=0. Let us see if de l'Hospital's rule can be applied and be of any help.
Let B" (0) = ( "; ") be a neighborhood of x0 such that ( "; ") ( 1; 1). In ( "; ") the
hypotheses of de l'Hospital's rule are satis ed. Hence,
1
f 0 (x) 1 f (x) log (1 + x)
lim 0
= lim 1+x = lim = 1 =) lim = lim =1
x!x0 g (x) x!0 1 x!0 1 + x x!x 0 g (x) x!0 x
So, de l'Hospital's rule proved to be useful in the solution of an indeterminate form. N

Example 1347 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. Set x0 = 0 and
consider the classic limit limx!x0 f (x) =g (x). In every interval ( "; ") the hypotheses of de
l'Hospital's rule are satis ed, so

f 0 (x) cos x f (x) sin x


lim = lim = lim cos x = 1 =) lim = lim =1
x!x0 g 0 (x) x!0 1 x!0 x!x0 g (x) x!0 x

It is nice to see how de l'Hospital's rule solves, in a simple way, this classic limit. N

Example 1348 Let f : (0; 1) ! R be given by f (x) = log x and g : R ! R be given


by g (x) = x. Setting x0 = +1, the limit limx!x0 f (x) =g (x) is in the indeterminate form
1=1. In every interval (a; +1), with a > 0, the hypotheses of de l'Hospital's rule are
satis ed. So,
1
f 0 (x) x f (x) log x
lim 0
= lim = 0 =) lim = lim =0
x!x0 g (x) x!+1 1 x!x0 g (x) x!+1 x

The next example shows that for the solution of some limits it may be necessary to apply
de l'Hospital's rule several times.

Example 1349 Let f; g : R ! R be given by f (x) = ex and g (x) = x2 . Setting x0 = +1,


the limit limx!x0 f (x) =g (x) is in the indeterminate form 1=1. In every interval (a; +1),
with a > 0, the hypotheses of de l'Hospital's rule are satis ed. We have

f 0 (x) ex 1 ex f (x) ex 1 ex
lim = lim = lim =) lim = lim = lim (28.19)
x!x0 g 0 (x) x!+1 2x 2 x!+1 x x!x0 g (x) x!+1 x2 2 x!+1 x
obtaining a simpler limit, but still not solved.
Let us apply again de l'Hospital's rule to the derivative functions f 0 ; g 0 : R ! R given by
f (x) = ex and g 0 (x) = x. Again in every interval (a; +1), with a > 0, the hypotheses of
0

de l'Hospital's rule are satis ed, and hence

f 00 (x) ex f 0 (x) ex
lim = lim = +1 =) lim = lim = +1
x!x0 g 00 (x) x!+1 1 x!x0 g 0 (x) x!+1 x
28.6. DE L'HOSPITAL'S THEOREM AND RULE 893

Thanks to (28.19), we conclude that

f (x) ex
lim = lim 2 = +1
x!x0 g (x) x!+1 x

To calculate this limit we had to apply de l'Hospital's rule twice. N

Example 1350 In a similar way it is possible to calculate the limit of the ratio between
f (x) = 1 cos x and g (x) = x2 as x ! 0:

f 0 (x) sin x cos x 1 f (x) 1 cos x 1


lim 0
= lim = lim = =) lim = lim 2
=
x!x0 g (x) x!0 2x x!0 2 2 x!x0 g (x) x!0 x 2
N

In some cases de l'Hospital's rule is useless or even counterproductive. This happens


when the behavior of the ratio f 0 (x) =g 0 (x) is more irregular than that of the original ratio
f (x) =g (x). The next examples illustrate this unpleasant situation.
2
Example 1351 Let f; g : R ! R be given by f (x) = ex and g (x) = ex . Setting x0 = +1,
the limit limx!x0 f (x) =g (x) is in the indeterminate form 1=1. In every interval (a; +1),
with a > 0, the hypotheses of de l'Hospital's rule are satis ed. We have
2 2 2 2
f 0 (x) 2xex xex f (x) ex xex
lim = lim = 2 lim =) lim = lim = 2 lim
x!x0 g 0 (x) x!+1 ex x!+1 ex x!x0 g (x) x!+1 ex x!+1 ex

and therefore the application of de l'Hospital's rule has led to a more complicated limit than
the original one. In this case, the rule is useless, while the limit can be solved very easily in
a direct way:
2
ex 2
lim = lim ex x = lim ex(x 1) = +1
x!+1 ex x!+1 x!+1

As usual, cogito ergo solvo: mindless mechanical arguments may well lead astray. N

Example 1352 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. By setting x0 =


+1, we can easily prove that limx!x0 f (x) =g (x) = 0. On the other hand, in every interval
(a; +1), with a > 0, the hypotheses of de l'Hospital's rule are satis ed since limx!+1 g (x) =
+1. However, the limit
f 0 (x) cos x
lim 0 = lim
x!x0 g (x) x!+1 1

does not exist. If we tried to compute the simple limit limx!x0 f (x) =g (x) = 0 through de
l'Hospital's rule we would have used a tool both useless, given the simplicity of the limit,
and ine ective. Again, a mechanical use of the rule can be very misleading. N

Summing up, de l'Hospital's rule is a useful tool in the computation of limits, but its use-
fulness must be evaluated case by case. Moreover, it is important to note that de l'Hospital's
Theorem states that, if lim f 0 =g 0 exists, then lim f =g exists too, and the two limits are equal.
The converse does not hold: it may happen that lim f =g exists but not lim f 0 =g 0 . We have
already seen an example of this, but we show two other examples, a bit more complicated.
894 CHAPTER 28. DIFFERENTIAL METHODS

Example 1353 Given f (x) = x sin x and g (x) = x + sin x, we have

sin x
f (x) x sin x 1
lim = lim = lim x =1
x!1 g (x) x!1 x + sin x x!1 sin x
1+
x
but
f 0 (x) 1 cos x
lim 0
= lim
x!1 g (x) x!1 1 + cos x

does not exist because both the numerator and the denominator oscillate between 0 and 2,
so the ratio oscillates between 0 and +1. N

1
Example 1354 Given f (x) = x2 sin and g (x) = x, we have
x

f (x) x2 sin x1 1
lim = lim = lim x sin = 0
x!0 g (x) x!0 x x!0 x

But
f 0 (x) 2x sin x1 cos x1
lim = lim
x!0 g 0 (x) x!0 1
does not exist because in the numerator the rst summand tends to 0 and the second one
does not admit limit. N

28.6.2 Other hospitalized indeterminacies


De l'Hospital's rule can be applied, through suitable manipulations, also to the indeterminate
forms 1 1 and 0 1.
Let us start with the form 0 1. Let f; g : (a; b) ! R be di erentiable on (a; b) and
let x0 2 [a; b] be such that limx!x0 f (x) = 0 and limx!x0 g (x) = 1, so that the limit
limx!x0 f (x) g (x) appears in the indeterminate form 0 1. Let, for example, limx!x0 g (x) =
+1 (the case limx!x0 g (x) = 1 is analogous). There exists a > 0 such that g (x) > 0 for
every x 2 (a; +1). Therefore,

f (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
g(x)

with limx!x0 1=g (x) = 0 and de l'Hospital's rule is applicable to the functions f and 1=g. If
f is di erent from zero in a neighborhood of x0 , we can also write

g (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
f (x)

with limx!x0 1=f (x) = 1. In this case, de l'Hospital's rule can be applied to the functions
g and 1=f . Which one of the two possible applications of the rule is more convenient must
be evaluated case by case.
28.6. DE L'HOSPITAL'S THEOREM AND RULE 895

Example 1355 Let f : R ! R be given by f (x) = x and g : (0; 1) ! R be given by


g (x) = log x. Setting x0 = 0, the one-sided limit limx!x+ f (x) g (x) is in the indeterminate
0
form 0 1. The function 1=x is de ned and strictly positive on (0; 1). On each interval
(a; +1), with a > 0, the hypotheses of de l'Hospital's rule are satis ed for the functions
log x and 1=x since limx!0+ log x = 1 and limx!0+ 1=x = +1. Hence
1
g 0 (x) x g (x)
lim 0 = lim 1 = lim ( x) = 0 =) lim 1 = lim f (x) g (x) = 0
x!x0 1 x!0+ x!0+ x!x+ x!x+
x2 0 f (x) 0
f (x)

Turn now to the indeterminate form 1 1. Let f; g : (a; b) ! R be di erentiable on


(a; b) and let x0 2 [a; b] be such that limx!x0 f (x) = +1 and limx!x0 g (x) = 1. Let us
suppose, for simplicity, that in a neighborhood of x0 both g and f are di erent from zero.
There are at least two possible ways to proceed. We can consider

g (x)
lim (f (x) + g (x)) = lim f (x) 1 + (28.20)
x!x0 x!x0 f (x)

and apply de l'Hospital's rule to the limit limx!x0 g (x) =f (x), which has the form 1=1.
Alternatively, we can consider
1 1
+
f (x) g (x)
lim (f (x) + g (x)) = lim (28.21)
x!x0 x!x0 1
f (x) g (x)

and apply de l'Hospital's rule to the limit


1 1
+
f (x) g (x)
lim
x!x0 1
f (x) g (x)

which is in the form 0=0.

Example 1356 Let f : R ! R be given by f (x) = x and g : (0; 1) ! R be given by g (x) =


log x. Setting x0 = +1, the limit limx!x0 (f (x) + g (x)) is in the indeterminate form
1 1. In Example 1348 we saw, thanks to de l'Hospital's rule, that limx!+1 (log x) =x = 0.
It follows that
log x
lim (x log x) = lim x 1 = +1
x!+1 x!+1 x
and hence the approach (28.20) has allowed to calculate the limit. N
896 CHAPTER 28. DIFFERENTIAL METHODS
Chapter 29

Approximation

29.1 Taylor's polynomial approximation


29.1.1 Polynomial expansions
Thanks to Theorem 1243, a function f : (a; b) ! R di erentiable at x0 2 (a; b) admits locally,
at this point, the linear approximation

f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

This approximation has two basic properties:

(i) the simplicity of the approximating function, given by the a ne function

f (x0 ) + f 0 (x0 ) h = f (x0 ) + df (x0 ) (h) (29.1)

(geometrically, a straight line);

(ii) the quality of the approximation, given by the error term o (h).

Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worse the quality of the approximation. In other terms, the simpler we desire
the approximating function to be, the higher the error which we may incur.
In this section we study in detail the trade-o between these two key properties when
the approximating function is a general polynomial of degree n, not necessarily of degree 1
{ i.e., a straight line like (29.1). The desideratum that we posit is that to a more complex
approximating polynomial { i.e., to a polynomial with a higher degree n { corresponds an
improved error term of magnitude o (hn ) that, as h ! 0, goes to zero faster than hn . An
increase in the complexity of the approximating polynomial should thus be compensated by
an improvement in the quality of the approximation.
To formalize these ideas, we introduce polynomial expansions. Recall that a polynomial
pn : R ! R of, at most, degree n 0 has the form pn (h) = 0 + 1 h + 2 h2 + + n hn .

De nition 1357 A function f : (a; b) ! R admits a polynomial expansion of degree n at


x0 2 (a; b) if there exists a polynomial pn : R ! R, of at most degree n, such that

f (x0 + h) = pn (h) + o (hn ) as h ! 0 (29.2)

897
898 CHAPTER 29. APPROXIMATION

for every 0 6= h 2 (a x0 ; b x0 ).1

For n = 1, the polynomial pn reduces to the a ne function r (h) = 0 + 1 h of Section


26.12.1, so the approximation (29.2) reduces to (26.22). Therefore, for n = 1 the expansion
of f at x0 is equal, apart from the known term 0 , to the di erential of f at x0 .
For n 2 the notion of polynomial expansion goes beyond that of di erential. In par-
ticular, f has a polynomial expansion of degree n at x0 2 (a; b) if there exists a polynomial
pn : R ! R that approximates f (x0 + h) with an error of magnitude o (hn ). To a polyno-
mial approximation of degree n corresponds, therefore, an error term of magnitude o (hn ),
thus formalizing the aforementioned trade-o between the complexity of the approximating
function and the goodness of the approximation.
For example, for n = 2 we have the quadratic approximation
2
f (x0 + h) = 0 + 1h + 2h + o h2 as h ! 0

Relative to the linear approximation

f (x0 + h) = 0 + h + o (h) as h ! 0

the quadratic approximation features a more complicated approximating polynomial: a


quadratic function { the polynomial of second degree 0 + 1 h + 2 h2 instead of a straight
line { the polynomial of rst degree 0 + h. On the other hand, the error term is now
better: we have o h2 instead of o (h).

N.B. By setting x = x0 + h, the polynomial expansion can be equivalently written as


n
X
f (x) = k (x x0 )k + o ((x x0 )n ) as x ! x0 (29.3)
k=0

for every x 2 (a; b). This equivalent form is often used. O

Next we establish a key property: when they exist, polynomial expansions are unique.

Lemma 1358 A function f : (a; b) ! R has at most one polynomial expansion of degree n
at each point x0 2 (a; b).

To better understand this important lemma, we rst prove it in the special quadratic
case and then in full generality.

Quadratic case We want to show that f has at most one quadratic expansion at each
point x0 2 (a; b). Suppose that, for every 0 6= h 2 (a x0 ; b x0 ), there exist two quadratic
expansions
2 2
0 + 1h + 2h + o h = 0 + 1 h + 2 h2 + o h2 (29.4)
To show that they are equal we need to show that their coe cients are equal, i.e., that
0 = 0 , 1 = 1 and 2 = 2 . To this end, we rst observe that

2
0 = lim 0 + 1h + 2h + o h2 = lim 0 + 1h + 2h
2
+ o h2 = 0
h!0 h!0
1
The condition 0 6= h 2 (a x0 ; b x0 ) ensures that (29.2) holds for every h 6= 0 with x0 + h 2 (a; b), i.e.,
for every h 6= 0 where f (x0 + h) is well de ned.
29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 899

In turn, this implies that (30.8) becomes


2
1h + 2h + o h2 = 1h + 2h
2
+ o h2 (29.5)
Dividing both sides by h, we get

1 + 2h + o (h) = 1 + 2h + o (h)
Hence,
1 = lim ( 1 + 2h + o (h)) = lim ( 1 + 2h + o (h)) = 1
h!0 h!0
In turn, this implies that (29.5) becomes
2
2h + o h2 = 2h
2
+ o h2
Dividing both sides by h2 , we get
o h2 o h2
= 2+ 2+
h2 h2
By taking the limits as h ! 0, we get 2 = 2 . This completes the proof that the two
quadratic expansions in (30.8) are equal.

Proof of Lemma 1358 Suppose that, for every 0 6= h 2 (a x0 ; b x0 ), there exist two
expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (29.6)
We want to show that they are equal, i.e., that they have equal coe cients. We begin by
observing that
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0

In turn, this implies that (29.6) becomes


2 n
1h + 2h + + nh + o (hn ) = 1h + 2h
2
+ + nh
n
+ o (hn ) (29.7)
Dividing both sides by h, we get
n 1
1 + 2h + + nh + o hn 1
= 1 + 2h + + nh
n 1
+ o hn 1

Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0

In turn, this implies that (29.7) becomes


2 n
2h + + nh + o (hn ) = 2h
2
+ + nh
n
+ o (hn )
Continuing in this way we can show that all coe cients are equal, i.e., k = k for all
1 k n. This proves that at most one polynomial p (h) can satisfy approximation (29.2).
900 CHAPTER 29. APPROXIMATION

29.1.2 Taylor and Peano


Next we introduce an all-important polynomial.

De nition 1359 Let f : (a; b) ! R be a function n times di erentiable at a point x0 2 (a; b).
The polynomial Tn : R ! R of degree at most n given by

1 1 (n)
Tn (h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + f (x0 ) hn
2 n!
Xn
f (k) (x0 ) k
= h
k!
k=0

is called the Taylor polynomial of degree n of f at x0 .

To ease notation we put f (0) = f . The polynomial Tn has as coe cients the derivatives of
f at the point x0 , up to order n. In particular, at the origin x0 = 0 the Taylor's polynomial
is called Maclaurin's polynomial.

The next approximation result, fundamental and of great elegance, shows that when f
has a suitable number of derivatives at x0 , the unique polynomial expansion at x0 is given
precisely by the Taylor polynomial.

Theorem 1360 (Taylor-Peano) If f : (a; b) ! R is n times di erentiable at x0 2 (a; b),


then it has at x0 a unique polynomial expansion pn of degree n, given by

pn (h) = Tn (h) (29.8)

Under a simple hypothesis of di erentiability at x0 , we thus obtain the fundamental


polynomial approximation
n
X f (k) (x0 )
f (x0 + h) = Tn (h) + o (hn ) = hk + o (hn ) as h ! 0 (29.9)
k!
k=0

The Taylor polynomial Tn is the unique polynomial of degree at most n that satis es De -
nition 1357, i.e., which is able to approximate f (x0 + h) with error o (hn ).

Approximation (29.9) is called Taylor's expansion (or formula) of order n of f at x0 .


In the important special case x0 = 0 it is called Maclaurin's expansion (or formula) of order
n of f .2

For n = 1, the Taylor-Peano Theorem coincides with the \if" part of Theorem 1243
because
T1 (h) = f (x0 ) + df (x0 ) (h)
2
The formula is named after Brook Taylor, who came up with it in 1715. Approximation (29.8) was proved
in 1884 by Giuseppe Peano (see annotation 67 of Genocchi and Peano, 1884, which also contains historical
remarks on Taylor's formula).
29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 901

For n = 1, the polynomial approximation (29.9) thus reduces to the linear approximation
(26.27), that is, to

f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

For n = 2, it becomes the quadratic (or second-order) approximation

1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2 as h ! 0 (29.10)
2
and so on for higher orders.

Approximation (29.9) is key in applications and is the di erential form of the aforemen-
tioned tension between the complexity of the approximating polynomial and the goodness
of the approximation. The trade-o must be solved case by case, according to the relative
importance that the two properties of the approximation { complexity and quality { have in
the particular application which we are interested in. That said, in many cases the quadratic
approximation (29.10) is a good compromise and so, among all the possible approximations,
it has a special importance.

O.R. The linear approximation is graphically, as by now we know very well, the straight line
tangent to the graph of the function. The quadratic approximating is, instead, the parabola
that shares at x0 the same value of the function, the same slope ( rst derivative), and the
same curvature (second derivative). For this reason it is called the osculating parabola.3 H

In view of the importance of the Taylor-Peano Theorem, we rst prove the special
quadratic case, with a simpli ed argument that uses a stronger hypothesis, and then prove
the result in full generality and rigor.

Quadratic case Let x0 2 (a; b). Assume that f is twice di erentiable on the entire interval
(a; b), not just at x0 . We want to establish the quadratic approximation (29.10). De ne the
auxiliary function ' : (a; b) ! R by

1 00
' (h) = f (x0 + h) f (x0 ) f 0 (x0 ) h f (x0 ) h2
2

We want to show that ' (h) = o h2 , i.e.,

' (h)
lim =0 (29.11)
h!0 h2

It holds
'0 (h) = f 0 (x0 + h) f 0 (x0 ) f 00 (x0 ) h
As f is twice di erentiable, by Proposition 1244 both f and f 0 are continuous at x0 . Hence,

lim ' (h) = ' (0) = 0 and lim '0 (h) = '0 (0) = 0
h!0 h!0
3
From the Latin os, mouth, so it is the \kissing" parabola (where the kiss is with f at x0 ).
902 CHAPTER 29. APPROXIMATION

By de l'Hospital's rule, we then have


'0 (h) ' (h)
lim = L =) lim =L
h!0 2h h!0 h2

with L 2 R. To prove (29.11) it thus su ces to prove that

'0 (h)
lim =0
h!0 h
We have
'0 (h) f 0 (x0 + h) f 0 (x0 ) f 00 (x0 ) h
lim = lim
h!0 h h!0 h
f 0 (x0 + h) 0
f (x0 )
= lim f 00 (x0 ) = f 00 (x0 ) f 00 (x0 ) = 0
h!0 h
as desired.

Proof of the Taylor-Peano Theorem In light of Lemma 1358, it is su cient to show


that the Taylor polynomial satis es (29.2). Let us start by observing preliminarily that, by
Lemma 1253, the higher order derivative functions f (k) exists, for every 1 k n 1, on a
small enough neighborhood B" (x0 ) (a; b) of the point x0 .
With this, de ne the auxiliary functions ' : B" (x0 ) ! R and : R ! R by
n
X f (k) (x0 )
' (h) = f (x0 + h) hk and (h) = hn
k!
k=0

We have to prove that ' (h) = o (hn ), i.e.,

' (h) ' (h)


lim n
= lim =0 (29.12)
h!0 h h!0 (h)

We have, for every 0 k n 1,


(k) (k)
lim (h) = (0) (29.13)
h!0

By formula (26.33), we have, for every 0 k n 1,


n
Xk n
Xk
(k) (k) f (k+j) (x0 ) j f (k+j) (x0 ) j
' (h) = f (x0 + h) h = f (k) (x0 + h) f (k)
(x0 ) h
j! j!
j=0 j=1

As f is n times di erentiable at x0 , by Proposition 1244, f (k) is continuous at x0 for every


0 k n 1. Hence,
lim '(k) (h) = '(k) (0) = 0 (29.14)
h!0

Thanks to (29.13) and (29.14), we can apply de l'Hospital's rule n 1 times, and get

'(n 1) (h) '(n 2) (h) '(0) (h)


lim (n 1) (h)
= L =) lim (n 2) (h)
= L =) =) lim =L (29.15)
h!0 h!0 h!0 (0) (h)
29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 903

with L 2 R. Simple calculations show that (n 1) (h) = n!h. From (29.15) it then follows
that
'(n 1) (h) '(n 1) (h) '(0) (h) ' (h)
lim = lim (n 1) = 0 =) lim (0) = lim =0
h!0 h h!0 (h) h!0 (h) h!0 hn
To prove (29.12) it thus remains to show that

'(n 1) (h)
lim =0
h!0 h
We have
'(n 1) (h) f (n 1) (x
0 + h) f (n 1) (x )
0 hf (n) (x0 )
lim = lim
h!0 h h!0 h !
f (n 1) (x
0 + h) f (n 1) (x )
0
= lim f (n) (x0 )
h!0 h
= f (n) (x0 ) f (n) (x0 ) = 0

as desired.

As seen for (29.3), by setting x = x0 + h the polynomial approximation (29.9) can be


rewritten as
Xn
f (k) (x0 )
f (x) = (x x0 )k + o ((x x0 )n ) (29.16)
k!
k=0
This is the form in which the approximation is often stated.

We now illustrate Taylor's (or Maclaurin's) expansions with some examples.

Example 1361 Polynomials have, Pn trivially, polynomial approximations. Indeed, if f : R !


k
R is itself a polynomial f (x) = k=0 k x , we obtain the identity
n
X f (k) (0)
f (x) = xk 8x 2 R
k!
k=0

since, as the reader can easily verify, one has


f (k) (0)
k = 81 k n
k!
Each polynomial can therefore be equivalently rewritten as a Maclaurin's expansion. For
example, for f (x) = x4 3x3 we have

f 0 (x) = 4x3 9x2 ; f 00 (x) = 12x2 18x ; f 000 (x) = 24x 18 ; f (iv) (x) = 24

and so
f 00 (0)
0 = f (0) = 0 , 1 = f 0 (0) = 0 , 2 = =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 , 4 = = =1
3! 6 4! 24
N
904 CHAPTER 29. APPROXIMATION

Example 1362 The function f : ( 1; 1) ! R given by f (x) = log (1 + x) is n times


di erentiable at each point of its domain, with
(n 1)!
f (n) (x) = ( 1)n+1 8n 1 (29.17)
(1 + x)n
Therefore, Taylor's expansion of order n of f at x0 > 1 is

h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )

or equivalently, using (29.16),


n
X (x x0 )k
log (1 + x) = log (1 + x0 ) + ( 1)k+1 k
+ o ((x x0 )n )
k=1
k (1 + x0 )

A simple polynomial thus locally approximates the logarithmic function. In particular, the
Maclaurin's expansion of order n of f is

x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (29.18)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1

Example 1363 In a similar way the reader can verify the Maclaurin's expansions of order
n of the following elementary functions:
X xk n
x2 x3 xn
ex = 1 + x + + + + + o (xn ) = + o (xn )
2 3! n! k!
k=0
n X ( 1)k n
1 3 1 ( 1)
sin x = x x + x5 + + x2n+1
+o x2n+1
= x2k+1 + o x2n+1
3! 5! (2n + 1)! (2k + 1)!
k=0
n n
X
1 2 1 ( 1) 2n ( 1)k 2k
cos x = 1 x + x4 + + x + o x2n = x + o x2n
2 4! (2n)! (2k)!
k=0

Here too it is important to observe how these functions can be locally (well) approximated
by simple polynomials. N

Example 1364 The function f : ( 1; 1) ! R given by

f (x) = log 1 + x3 3 sin2 x


29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 905

is in nitely di erentiable at each point of its domain. Let us calculate the second-order
Maclaurin expansion. We have
3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x , f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2
So,
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (29.19)
2
N

Example 1365 The function f : ( 1; 1) ! R given by


x
f (x) = e (log (1 + x) 1) + 1

is in nitely di erentiable at each point of its domain. We leave to the reader to verify that
the third-order Taylor expansion at x0 = 3 is given by
log 4 1 5 4 log 4 16 log 4 25
f (x) = 3
+1+ (x 3) + (x 3)2
e 4e3 32e3
63 32 log 4
+ 3
(x 3)3 + o (x 3)3
192e
N

As a rst illustration of the usefulness of Taylor expansions, we show how they signi -
cantly simplify the calculation of limits. Indeed, by suitably expanding f at x0 we reduce the
original limit to a simple limit of polynomials. We illustrate this with a couple of examples.

Example 1366 (i) Consider the limit

log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)
Since the limit involves x ! 0, we can use the second-order Maclaurin's expansions (29.19)
and (29.18) to approximate the numerator and the denominator, respectively. Using Lemma
547 and the little-o algebra, we have
log 1 + x3 3 sin2 x 3x2 + o x2 3x2
lim = lim = lim =0
x!0 log (1 + x) x!0 x + o (x) x!0 x
The calculation of the limit has, therefore, been considerably simpli ed through the combined
use of Maclaurin's expansions and of the comparison of in nitesimals seen in Lemma 547.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)

This limit can also be calculated by combining an expansion and a comparison of in nitesi-
mals:
x sin x x (x + o (x)) x2 + o x2 x2
lim = lim = lim = lim =1
x!0 log2 (1 + x) x!0 (x + o (x))2 x!0 x2 + o (x2 ) x!0 x2

N
906 CHAPTER 29. APPROXIMATION

29.1.3 Taylor and Lagrange


Under stronger di erentiability assumptions, we can sharpen the approximation (29.9) by
using the Lagrange-Taylor formulas (28.8) and (28.9) of the ultimate version of the Mean
Value Theorem.

Theorem 1367 (Taylor-Lagrange) Let f : (a; b) ! R be a function n times continuously


di erentiable. If f is n + 1 times continuously di erentiable at x0 2 (a; b), then for every
0 6= h 2 (a x0 ; b x0 ) there exists 0 < #h < 1 such that
n
X f (k) (x0 ) f (n+1) (x0 + #h h) n+1
f (x0 + h) = hk + h (29.20)
k! (n + 1)!
k=0

In particular,
f (n+1) (x0 + #h h) hn+1
= o (hn ) as h ! 0 (29.21)
(n + 1)!

Under the hypotheses of this theorem,4 the error term o (hn ) can be thus taken equal to

f (n+1) (x0 + #h h) n+1


h (29.22)
(n + 1)!

where the (n + 1)-th derivative is computed at an intermediate point x0 +#h h between x0 and
x0 +h. This expression allows us to better control the approximation error: if f (n+1) (x) k
for all x 2 (a; b), then
n
X f (k) (x0 ) k
f (x0 + h) hk jhjn+1
k! (n + 1)!
k=0

Error term (29.22) is called the Lagrange remainder, while o (hn ) is called the Peano
remainder. The former permits error estimates, as just remarked, but the latter is often
enough to express the quality of the approximation.

Proof Let 0 6= h 2 (a x0 ; b x0 ), i.e., such that x0 + h 2 (a; b). Suppose that h > 0 (a
similar argument holds when h < 0). Consider the interval [x0 ; x0 + h] (a; b). By formula
(28.8), we have
n
X f (k) (x0 ) k f (n+1) (^
x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0

for some x ^ 2 (x0 ; x0 + h). Thus, for some 0 < t < 1 we have x
^ = tx0 + (1 t) (x0 + h), so
x
^ = x0 + #h by setting # = 1 t. We thus get (29.20). As the number # depends on h, we
write #h .
So far we only needed f to be n times continuously di erentiable. Now, the n + 1 times
continuous di erentiability at x0 allows us to write:

f (n+1) (x0 + #h h) hn+1 1


lim n
= lim f (n+1) (x0 + #h h) h = 0
h!0 (n + 1)! h (n + 1)! h!0
4
Proved in Chapter 6 of Lagrange (1813).
29.2. OMNIBUS 907

since #h h ! 0 as h ! 0 because #h 2 (0; 1) for all h. This proves (29.21).

By setting x = x0 + h, the polynomial approximation (29.20) can be rewritten as


n
X f (k) (x0 ) f (n+1) ((1 #x ) x0 + #x x) (x x0 )n+1
f (x) = (x x0 )k + (29.23)
k! (n + 1)!
k=0

In particular, at the origin this approximation becomes


n
X f (k) (0) f (n+1) (#x x) xn+1
f (x) = xk + (29.24)
k! (n + 1)!
k=0

Example 1368 Consider the function f : ( 1; 1) ! R given by f (x) = log (1 + x). In


view of Example 1362, the Lagrange remainder of order n at the origin is
n+1
f (n+1) (#x x) xn+1 n! ( 1)n+2 x
= ( 1)n+2 xn+1
=
(n + 1)! (n + 1)! (1 + #x x)n+1 1+n 1 + #x x
Thus, formula (29.24) here takes the form
n
X n+1
xk ( 1)n+2 x
log (1 + x) = ( 1)k+1 +
k 1+n 1 + #x x
k=1

For the coda reader, an elegant application of what found in this example is the following
series expansion of a logarithmic function, which inter alia generalizes Proposition 407.
Corollary 1369 It holds
1
X xk
log (1 + x) = ( 1)k+1 8x 2 ( 1; 1] (29.25)
k
k=1

Proof Let x 2 [0; 1]. In view of the last example, for each n there exists #x;n 2 (0; 1) such
that
Xn n+1
xk ( 1)n+2 x 1
log (1 + x) ( 1)k+1 =
k 1+n 1 + #x;n x n+1
k=1
where the inequality holds because 0 x= (1 + #x;n x) 1. As n ! +1, we get (29.25). For
the case x 2 ( 1; 0) one needs the so-called Cauchy remainder (for brevity, we omit details).

29.2 Omnibus
29.2.1 Omnibus proposition for local extremal points
Although for simplicity we have studied the Taylor-Peano Theorem for functions de ned on
intervals (a; b), it holds at any interior points x0 of any set A where f is n times di erentiable.
This version allows us to state an \omnibus" proposition for local extremal points that
includes and extends both the necessary condition f 0 (x0 ) = 0 of Fermat's Theorem and
the su cient condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0 of Corollary 1333 (see also Corollary
1336-(ii)).
908 CHAPTER 29. APPROXIMATION

Proposition 1370 Let f : A R ! R and C A. If f is n times di erentiable at an


interior point x0 of C, with f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0, then:

(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer.

(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer.

(iii) If n is odd, x0 is not a local extremal point; moreover, f is increasing or decreasing at


x0 depending on whether f (n) (x0 ) > 0 or f (n) (x0 ) < 0.

For n = 1, point (iii) is nothing but the fundamental rst-order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a
local extremal point (i.e., neither a local maximizer nor a local minimizer). By taking the
contrapositive, this amounts to saying that if x0 is a local extremal point, then f 0 (x0 ) = 0.
Hence, (iii) extends to higher order derivatives the rst-order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
extends to higher order derivatives the second-order su cient condition f 00 (x0 ) < 0 for
strong local maximizers. Indeed, for n = 2 (i) is exactly condition f 00 (x0 ) < 0. Analogously,
(ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.5

N.B. In this and in the next section we will focus on the generalization of su ciency point
(ii) of Corollary 1336. It is possible to generalize in a similar way its necessity point (i), as
readers can check. O

Proof (i). Let n be even and let f (n) (x0 ) < 0. By the Taylor-Peano Theorem, from the
hypothesis f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 it follows that

f (n) (x0 ) n f (n) (x0 ) n o (hn )


f (x0 + h) f (x0 ) = h + o (hn ) = h 1+
n! n! hn

Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence,
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, because hn > 0 being n even,

f (n) (x0 ) n o (hn )


h2( ; ) =) h 1+ < 0 =) f (x0 + h) f (x0 ) < 0
n! hn

that is, setting x = x0 + h,

x 2 (x0 ; x0 + ) =) f (x) < f (x0 )

So, x0 is a local maximizer. This proves (i). In a similar way we prove (ii). Finally, (iii) can
be proved by adapting in a suitable way the proof of Fermat's Theorem.
5
Observe that, given what has been proved about the Taylor's approximation, the case n = 2 presents
an interesting improvement with respect to Corollary 1333: it is required that the function f be twice
di erentiable on the neighborhood B" (x0 ), but f 00 is not required to be continuous.
29.2. OMNIBUS 909

Example 1371 (i) Consider the function f : R ! R given by f (x) = x4 . We saw in


Example 1335 that, for its maximizer x0 = 0, it was not possible to apply the su cient
condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,
f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0
Since n = 4 is even, by Proposition 1370-(i) we conclude that x0 = 0 is a local maximizer
(actually, it is a global maximizer, but using Proposition 1370 is not enough to conclude
this).
(ii) Consider the function f : R ! R given by f (x) = x3 . At x0 = 0 we have
f 0 (0) = f 00 (0) = 0 and f 000 (0) < 0
Since n = 3 is odd, by Proposition 1370-(iii) we conclude that x0 = 0 is not a local extremal
point (rather, at x0 the function is strictly decreasing).
(iii) The function de ned by f (x) = x6 clearly attains its minimum value at x0 = 0.
Indeed, one has f 0 (0) = f 00 (0) = = f (v) (0) = 0 and f (vi) (0) = 6! > 0.
The function f (x) = x is clearly increasing at x0 = 0. One has f 0 (0) = f 00 (0) =
5

f (0) = f (iv) (0) = 0 and f (v) (0) = 5! = 120 > 0.


000 N
Proposition 1370 is a powerful result. Yet, it has important limitations. Like Corollary
1333, it can only treat interior points and is useless for local extremal points that are not
strong, so not unique, whose derivatives of any order are, in general, zero. The most basic
instance of this failure are constant functions: all their points are, trivially, both maximizers
and minimizers, but Proposition 1370 (like Corollary 1333) is not able to tell us anything
about them.
Moreover, to apply Proposition 1370 it is necessary that the function has a su cient
number of derivatives at a stationary point, which may not be the case as the next example
shows.
Example 1372 Consider the function f : R ! R de ned by
8
< x2 sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
It is continuous at the origin x = 0. Indeed, since jsin (1=h)j 1 and by applying the
comparison criterion, it follows that
1
lim f (0 + h) = lim h2 sin =0
h!0 h!0 h
It is di erentiable at the origin because
f (0 + h) f (0) h2 sin h1 0 1
lim = lim = lim h sin =0
h!0 h h!0 h h!0 h
The origin is thus a stationary point for f . But the function does not admit a second
derivative there. Indeed,
8
< 2x sin 1 cos 1 if x 6= 0
0 x x
f (x) =
:
0 if x = 0
910 CHAPTER 29. APPROXIMATION

and therefore

f 0 (0 + h) f 0 (0) 2h sin h1 cos h1 0 1 1 1


lim = lim = lim 2 sin cos
h!0 h h!0 h h!0 h h h

does not exist. Thus, Proposition 1370 cannot be applied and so it is not able to say
anything about the nature of the stationary point x = 0. Nevertheless, the graph of f shows
that the origin is not a local extremal point since f has in nitely many oscillations in any
neighborhood of zero. N

Example 1373 The general version of the previous example considers the function f : R !
R de ned, for n 1, by
8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0

and shows that it does not have derivatives of order n at the origin (in the case n = 1,
this means that at the origin the rst derivative does not exist). We leave to the reader the
analysis of this example. N

29.2.2 Omnibus procedure of search of local extremal points


Thanks to Proposition 1370, we can re ne the procedure seen in Section 28.5.2 for the search
of local extremal points of a function f : A R ! R on a set C. To x ideas let us study
two important special cases.

Twice di erentiable functions

Suppose that f is twice di erentiable on the interior points of C, that is, on int C. The
omnibus procedure consists in the following two stages:

1. Determine the set S of stationary points by solving the rst-order condition f 0 (x) = 0.
If S = ; the procedure ends (we conclude that, since there are no stationary points,
there are no extremal ones); otherwise we move to the next step.

2. Calculate f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if the second-order condition is f 00 (x) < 0; it is a strong local minimizer if
this condition is f 00 (x) > 0; if f 00 (x) = 0, the procedure is not able to determine the
nature of x.

This is the classic procedure to nd local extremal points based on rst-order and second-
order conditions of Section 28.5.2. The version just presented improves what we have seen
there because, using again what we observed in a previous footnote, it requires only that the
function has two derivatives on int C, not necessarily continuous. However, we are still left
with the other limitations discussed in Section 28.5.2.
29.3. MULTIVARIABLE TAYLOR EXPANSION 911

In nitely di erentiable functions

Suppose that f is in nitely di erentiable on int C. The omnibus procedure consists in the
following stages:

1. Determine the set S of the stationary points by solving the equation f 0 (x) = 0. If
S = ;, the procedure ends; otherwise move to the next step.

2. Compute f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if f 00 (x) < 0, and a strong local minimizer if f 00 (x) > 0. Call S (2) the
subset of S of the points such that f 00 (x) = 0. If S (2) = ;, the procedure ends;
otherwise move to the next step.

3. Compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal one.
Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ;, the procedure ends;
otherwise move to the next step.

4. Compute f (iv) at each point of S (3) : the point x is a strong local maximizer if f (iv) (x) <
0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in which
f (iv) (x) = 0. If S (4) = ;, the procedure ends; otherwise move to the next step.

5. Iterate the procedure until S (n) = ;.

The procedure thus ends if there exists n such that S (n) = ;. Otherwise, the procedure
iterates ad libitum (or ad nauseam).

Example 1374 Consider again the function f (x) = x4 , with C = R. We saw in Example
1335 that for its maximizer x0 = 0 it was not possible to apply the su cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,

f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0

so that
S = S (2) = S (3) = f0g and S (4) = ;

Stage 1 identi es the set S = f0g, about which stage 2 has however nothing to say since
f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N

Naturally, the procedure is of practical interest when it ends after few stages.

29.3 Multivariable Taylor expansion


In this section we study a version of the Taylor expansion for functions of several variables.
The quadratic forms studied in Chapter 25 will play a key role.
912 CHAPTER 29. APPROXIMATION

29.3.1 Taylor expansion


By Theorem 1271, a function f : U ! R de ned on an open set U in Rn with continuous
partial derivatives is di erentiable at each x 2 U , that is, it can be linearly approximated as
f (x + h) = f (x) + df (x) (h) + o (khk) = f (x) + rf (x) h + o (khk) (29.26)
for every h 2 Rn such that x+h 2 U . As already seen in Section 27.2, if, with a small change
of notation, we denote by x0 the point at which f is di erentiable and we set h = x x0 ,
this approximation assumes the following equivalent, but more expressive, form:
f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (kx x0 k) (29.27)
= f (x0 ) + rf (x0 ) (x x0 ) + o (kx x0 k)
for every x 2 U .
We can now present the Taylor expansion for functions of several variables. As in the
scalar case, also in the general multivariable case the Taylor expansion re nes the rst-order
approximation (29.27). In stating it, we limit ourselves to a second-order approximation
that su ces for our purposes.6

Theorem 1375 Let f : U ! R be twice continuously di erentiable. Then, at each x0 2 U


we have
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2 (29.28)
2
for every x 2 U .

Expression (29.28) is called the quadratic (or second-order ) Taylor expansion (or for-
mula). The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
is called the Taylor polynomial of second degree at the point x0 . The second-degree term
is a quadratic form. Its associated matrix, the Hessian r2 f (x), is symmetric by Schwarz's
Theorem.
In the format (29.26) the quadratic Taylor expansion is
1
f (x + h) = f (x0 ) + rf (x0 ) h + h r2 f (x0 ) h
2
Xn n n n
@f (x0 ) 1 X @ 2 f (x0 ) 2 1 X X @ 2 f (x0 )
= f (x0 ) + hi + h i + hi hj
@xi
i=1
2 @x2i i=1
2 @xj @xi
i=1 j=1

where we have also wrote the expansion through sums, a version that may be useful to carry
out calculations.
Naturally, if terminated at the rst-order the Taylor's expansion reduces to (29.26) and
(29.27). Moreover, observe that in the scalar case Taylor's polynomial assumes the well-know
form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
6
In the rest of this section U is an open convex set.
29.3. MULTIVARIABLE TAYLOR EXPANSION 913

Indeed, in this case we have r2 f (x0 ) = f 00 (x0 ), and therefore

(x x0 ) r2 f (x0 ) (x x0 ) = f 00 (x0 ) (x x0 )2 (29.29)

As in the scalar case, here too we have a trade-o between the simplicity of the approx-
imation and its accuracy. Indeed, the rst-order approximation (29.27) has the advantage
of simplicity compared to the quadratic one: we approximate with a linear function rather
than with a second-degree polynomial, but to the detriment of the degree of accuracy of the
approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
Also in the multivariable case, therefore, the choice of the order at which to terminate
the Taylor expansion depends on the particular use we are interested in, and on which aspect
of the approximation is more important, simplicity or accuracy.
2
Example 1376 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2

and " #
2 2
6ex2 12x1 x2 ex2
r2 f (x) = 2 2
12x1 x2 ex2 6x21 ex2 1 + 2x22
By Theorem 1375, the Taylor expansion at x0 = (1; 1) is

f (x) = f (1; 1) + rf (1; 1) (x1 1; x2 1)


1
+ (x1 1; x2 1) r2 f (1; 1) (x1 1; x2 1) + o k(x1 1; x2 1)k2
2
= 3e + (6e; 6e) (x1 1; x2 1) +
1 6e 12e x1 1
(x1 1; x2 1) + o (x1 1)2 + (x2 1)2
2 12e 18e x2 1
= 3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22 + o (x1 1)2 + (x2 1)2

Hence, f is approximated at the point (1; 1) by the second-degree Taylor's polynomial

3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22

with level of accuracy given by o((x1 1)2 + (x2 1)2 ). N

Proof of Theorem 1375 For simplicity, assume that the domain of f is all Rn . Fix
a point y 2 Rn and introduce the auxiliary scalar functions ; : R ! R de ned by
(t) = f (x0 + ty) and (t) = x0 + ty for each t 2 R. We have (t) = f ( (t)) for every
t 2 R, i.e., = f . In particular,

(0) = f ( (0)) = f (x0 ) (29.30)

By Theorem 1274, the function is twice di erentiable on R. In particular, Taylor's formula


for scalar functions yields:

0 1 00
(t) = (0) + (0) t + (0) t2 + o t2 (29.31)
2
914 CHAPTER 29. APPROXIMATION

for each t 2 R. Since =f , by the chain rule we have:


n
X n
X
0 @f 0 @f
(t) = ( (t)) i (t) = (x0 + ty) yi (29.32)
@xi @xi
i=1 i=1

for each t 2 R. In particular,


n
X
0 @f
(0) = (x0 ) yi = rf (x0 ) y (29.33)
@xi
i=1

Consider now the auxiliary scalar function 'i : R ! R de ned by 'i (t) = (@f =@xi ) (x0 + ty)
for each t 2 R and each i = 1; ::; n. We have 'i = (@f =@xi ) , and so by the chain rule we
have:
n @ @f
X n
X
0 @xi @2f
'i (t) = ( (t)) 0j (t) = (x0 + ty) yj
@xj @xj @xi
j=1 j=1

Together with (29.32), this implies:


n
X n X
X n
00 @2f
(t) = '0i (t) yi = (x0 + ty) yj yi
@xj @xi
i=1 i=1 j=1

and therefore
n X
X n
00 @2f
(0) = (x0 ) yj yi = y r2 f (x0 ) y (29.34)
@xj @xi
i=1 j=1

Until now y was an arbitrary point of Rn . Now set y = (x x0 ) = kx x0 k in (29.30), (29.33)


and (29.34). When computed at t = kx x0 k, the expansion (29.31) becomes:
xx0
(kx x0 k) = f (x0 ) + rf (x0 ) kx x0 k
kxx0 k
1 x x0 x x0
+ r2 f (x0 ) kx x0 k2 + o kx x0 k2
2 kx x0 k kx x0 k

By de nition,
x x0
(kx x0 k) = f x0 + kx x0 k = f (x)
kx x0 k
Therefore, we conclude that:
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
as desired.

We close with a rst-order approximation with Lagrange remainder that sharpens the
approximation (29.26) with Peano remainder.7
7
Higher order approximations with Lagrange remainders are notationally cumbersome, and we leave them
to more advanced courses.
29.3. MULTIVARIABLE TAYLOR EXPANSION 915

Theorem 1377 Let f : U ! R be twice continuously di erentiable. If x0 2 U , then for


every 0 6= h 2 Rn such that x0 + h 2 U there exists 0 < # < 1 with
1
f (x0 + h) = f (x0 ) + rf (x0 ) h + h r2 f (x0 + #h) h (29.35)
2
Note that the same di erentiability assumption that permitted the quadratic approxi-
mation (29.28) with a Peano remainder, only allows for a rst-order approximation with the
sharper Lagrange remainder. As usual, no free meals.
Example 1378 If f : U ! R is only continuously di erentiable, in place of (29.35) we have
the following cruder approximation:
f (x0 + h) = f (x0 ) + rf (x0 + #h) h (29.36)
An ingenious example of Peano (1884) shows that the continuity of partial derivatives is
needed. De ne f : R2 ! R by
8
< px12x2 2 if x 6= (0; 0)
x1 +x2
f (x) =
: 0 if x = (0; 0)
We have discontinuous (why?) partial derivatives
8 !
>
> x3 x 3
< 2
3 ;
1
3 if x 6= (0; 0)
rf (x) = (x21 +x22 ) 2 (x21 +x22 ) 2
>
>
:
(0; 0) if x = (0; 0)
Assume that the approximation (29.36) holds:
@f (x1 + #h1 ; x2 + #h2 ) @f (x1 + #h1 ; x2 + #h2 )
f (x1 + h1 ; x2 + h2 ) = f (x1 ; x2 ) + h1 + h2
@x1 @x2
Let x = ( a; a) and h1 = h2 = a + b, where a and b are any two scalars. Then,
@f (t; t) @f (t; t)
f (b; b) = f ( a; a) + (a + b) + (a + b)
@x1 @x2
where t = a + # (a + b). That is,
b a @f (t; t) @f (t; t)
p =p + + (a + b)
2 2 @x1 @x2
On the other hand,
8
8 >
> 1
; 1
if t > 0
> t3 t3
>
> 22
3 3
22
< 3 ; 3 if t 6= 0 <
rf (t; t) = (2t2 ) 2 (2t2 ) 2 = (0; 0) if t = 0
>
: >
>
(0; 0) if t = 0 >
>
: 1
3 ; 1
3 if t < 0
22 22
8
>
> 1
p 1
; p if t > 0
>
> 2 2 2 2
<
= (0; 0) if t = 0
>
>
>
>
: 1
p ; 1
p if t < 0
2 2 2 2
916 CHAPTER 29. APPROXIMATION

So, 8
> p1 (a + b) if t > 0
>
< 2
b a @f (t; t) @f (t; t)
p p = + (a + b) = 0 if t = 0
2 2 @x1 @x2 >
>
: p1 (a + b) if t < 0
2

that is, 8
>
> 1 if t > 0
b a <
= 0 if t = 0
a+b >>
:
1 if t < 0
But this contradicts the arbitrary nature of the scalars a and b. For instance, if b = 2a then
(b a) = (a + b) = 1=3. N

29.3.2 Second-order conditions


Using the Taylor expansion (29.28) we can state a second-order condition for local extremal
points. Indeed, this expansion allows to approximate locally a function f : U ! R at a point
x0 2 U by a second-degree polynomial in the following way:
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
If x
^ is a local extremal point (either a maximizer or minimizer), by Fermat's Theorem we
have rf (^ x) = 0 and therefore the approximation becomes
1
f (x) = f (^
x) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x (29.37)
2
that is,
1
f (^
x + h) = f (^x) + h r2 f (^ x) h + o khk2
2
By heuristically neglecting the little-o, we can write
1
f (^
x + h) f (^
x) h r2 f (^
x) h
2
The extremal status of x^ thus corresponds, heuristically, to the sign of the quadratic form
2
h r f (x0 ) h, with x
^ a maximizer (minimizer) if and only if this sign is negative (positive).
The next result makes rigorous this heuristic argument.8

Theorem 1379 Let f : U ! R be twice continuously di erentiable. Let x


^ 2 U be a station-
ary point. 9

^ is a local maximizer (minimizer) on U , the quadratic form h r2 f (^


(i) If x x) h is negative
(positive) semi-de nite.
8
The non-trivial proof shows how delicate it can be to turn a simple heuristic in a formal argument.
9
For simplicity we continue to consider functions de ned on open sets. We leave to readers the routine
extension of the results to functions f : A Rn ! R and to interior points x ^ that belong to a choice set
C A.
29.3. MULTIVARIABLE TAYLOR EXPANSION 917

(ii) If the quadratic form h r2 f (^


x) h is negative (positive) de nite, then x
^ is a strong local
maximizer (minimizer).
This theorem is the multivariable analog of Corollary 1336. Indeed, in the proof we will
use such corollary since we will be able to reduce the problem from functions of several
variables to functions of a single variable.

Proof (i) Let x ^ be a local maximizer on U . We want to prove that the quadratic form
h r2 f (^ x) h is negative semi-de nite. For simplicity, let us suppose that x ^ is the origin
2
0 = (0; 0). First of all, let us prove that v r f (0) v 0 for every unit vector v of Rn . We
will then prove that h r2 f (0) h 0 for every vector h 2 Rn .
Since 0 is a local maximizer and U is open, there exists a small enough neighborhood
B" (0) so that B" (0) U and f (0) f (x) for every x 2 B" (0). Note that every vector
x 2 B" (0) can be written as x = tv, where v is a unit vector of Rn (i.e., jjvjj = 1) and
t 2 R.10 Clearly, tv 2 B" (0) if and only if jtj < ". Fix an arbitrary unit vector v in Rn , and
de ne the function v : ( "; ") ! R by v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have

v (0) = f (0) f (tv) = v (t)

for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di erentiable and t = 0 an interior point of the domain of v , by applying Corollary
1336 we get 0v (0) = 0 and 00v (0) 0. By applying the chain rule to the function

v (t) = f (tv1 ; tv2 ; :::; tvn )


we get 0v (t) = rf (tv) v and 00
v (t) = v r2 f (tv) v. The rst-order and second-order
conditions become
0 00
v (0) = rf (0) v = 0 and v (0) = v r2 f (0) v 0
Since the unit vector v of Rn is arbitrary, this last inequality holds for every unit vector of
Rn .
Now, let h 2 Rn . In much the same way as before, observe that h = th v for some unit
vector v 2 Rn and th 2 R such that jth j = jjhjj.

1.5 h=t v
h

1
v
0.5

0
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

10
Intuitively, v represents the direction of x and t its norm (indeed, jjxjj = jtj).
918 CHAPTER 29. APPROXIMATION

Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0. This holds for every h 2 Rn , so the
quadratic form h r2 f (0)h is negative semi-de nite.
(ii) We prove the result for maximizers (a similar argument holds for minimizers). Since
f is twice continuously di erentiable, we have, for each x 2 U ,
1
f (x) = f (^
x) + rf (^
x) (x x
^) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x
2
Consider the unit sphere @B1 (0) = fh 2 Rn : khk = 1g. De ne the quadratic form g : Rn !
R by
g (h) = h r2 f (^
x) h
Clearly, g is continuous. Moreover, by hypothesis g is negative de nite; hence, g (h) < 0 for
all h 2 @B1 (0). As the unit sphere is compact, by the Weierstrass Theorem there exists a
maximizer h 2 @B1 (0) of g, that is,

0>g h g (h) 8h 2 @B1 (0)

Set " = g h =2 > 0. By de nition of little-o, there exists > 0 such that, for all x 2 U ,

o kx ^k2
x
0 < kx x
^k < =) <" (29.38)
kx ^k2
x

Since U is open, there exists 0 < such that B (^


x) U . Let x 2 B (^
x) with x 6= x
^.
We can write
x=x
^ + th
for some t 2 (0; ) and some h 2 @B1 (0). Since rf (^
x) = 0 and 0 < kx x
^k < , by (29.38)
we then have:

f (x) f (^
x) 1
(x ^) r2 f (^
x x) (x x
^) o kx ^k2
x
2
2 = +
kx x
^k kx ^ k2
x kx ^k2
x

x) th o kx x
1 th r2 f (^ ^k2 x) h o kx x
1 h r2 f (^ ^k2
= + = +
2 kthk2 ^k2
kx x 2 khk2 ^k2
kx x

1 g (h) o kx x^ k2 1 o kx x^k2
= + g h + < "+"=0
2 khk2 ^ k2
kx x 2 ^k2
kx x

Since x was arbitrarily chosen in B (^x), we conclude that f (x) f (^


x) < 0 for all x 2 B (^
x)
with x 6= x
^, proving that x
^ is a strong local maximizer.

In the scalar case we get back to the usual second-order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (29.29) that in the scalar case
one has
x r2 f (^
x) x = f 00 (^
x) x2
29.3. MULTIVARIABLE TAYLOR EXPANSION 919

Thus, in this case the sign of the quadratic form depends only on the sign of f 00 (^ x). That is,
it is negative (positive) de nite if and only if f 00 (^
x) < 0 (> 0), and it is negative (positive)
semi-de nite if and only if f 00 (^
x) 0 ( 0).
Naturally, as in the scalar case, also in this general multivariable case condition (i) is
only necessary for x ^ to be a local maximizer on U . Note that this condition implies that, if
the quadratic form h r2 f (^ x) h is inde nite, the point x ^ is neither a local maximizer nor a
local minimizer on U .

Example 1380 Consider the function f (x1 ; x2 ) = x21 x2 . At the origin x ^ = 0 we have
r2 f (0) = O. The corresponding quadratic form x r2 f (0) x is identically zero and is
therefore both negative and positive semi-de nite. Nevertheless, x ^ = 0 is neither a local
maximizer nor a local minimizer. Indeed, by taking a generic neighborhood B" (0), let
x = (x1 ; x2 ) 2 B" (^ x) be such that x1 = x2 . Let t be such a common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2
Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.11 N

Example 1381 Following Peano (1884), de ne f : R2 ! R by

f (x) = x42 3x1 x22 + 2x21 = x22 2x1 x22 x1

We have
@f @f
rf (x) = (x); (x) = 3x22 + 4x1 ; 4x32 6x1 x2
@x1 @x2
and 2 3
@2f @2f
@x21
(x) @x1 @x2 (x) 4 6x2
r2 f (x) = 4 5=
@2f @2f 6x2 12x22 6x1
@x2 @x1 (x) @x22
(x)
So, at the origin x
^ = 0 the Hessian is the positive semi-de nite matrix

4 0
r2 f (0) =
0 0

Givenpa scalar m 2 R, let Em be the collection of points (x1 ; x2 ) 2 R2 of the plane such that
x2 = 2mx1 , that is,
8 p
>
> x1 ; 2mx1 : x1 0 if m > 0
>
>
<
Em = f(x1 ; 0) : x1 2 Rg if m = 0
>
>
>
>
: x ; p2mx : x 0 if m < 0
1 1 1

11
In an alternative way, it is su cient to observe that at each point of the I or II quadrant, except the
axes, we have f (x1 ; x2 ) > 0, and that at each point of the III or IV quadrant, except the axes, we have
f (x1 ; x2 ) < 0. Every neighborhood of the origin contains necessarily both points of the I and II quadrants
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrants (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor a local minimizer.
920 CHAPTER 29. APPROXIMATION

Clearly, 0 2 Em for all m 2 R. Each Em with m 6= 0 is the graph of a root function. For
p
instance, for m = 1=2 we have E1=2 = x1 ; x1 2 R2 : x1 0 , which is the graph of the
basic root function.
For each x 2 Em , we have
p 1
f (x1 ; x2 ) = f x1 ; 2mx1 = (2mx1 2x1 ) (2mx1 x1 ) = 4x21 (m 1) m
2
Thus
1
m 2 ; 1 =) f (x1 ; x2 ) < 0 80 6= (x1 ; x2 ) 2 Em
2
1
m 2
= ; 1 =) f (x1 ; x2 ) > 0 80 6= (x1 ; x2 ) 2 Em
2
Since f (0) = 0 we conclude that the origin is neither a local minimizer nor a local maximizer.
Yet, the origin is a maximizer of f on the set Em if m 2 (1=2; 1), while it is minimizer of f
on the set Em if m 2 = [1=2; 1]. So, if we approach the origin along the graph of a root function
we reach at the origin a maximum value if m 2 (1=2; 1), a minimum value if m 2 = [1=2; 1].
The behavior of this function is quite peculiar. N
Similarly, condition (ii) is only su cient for x
^ to be a local maximizer.
Example 1382 Consider the function f (x) = x21 x22 . The origin x
^ = 0 is clearly a (global)
maximizer for the function f but r2 f (0) = O, so the corresponding quadratic form x
r2 f (0) x is not negative de nite. N
The Hessian r2 f (^x) is the symmetric matrix associated to the quadratic form x
2
r f (^
x) x. We can therefore equivalently state Theorem 1379 in the following way:
a necessary condition for x
^ to be a maximizer (minimizer) is that the Hessian matrix
2
r f (^
x) is negative (positive) semi-de nite,
a su cient condition for x^ to be a strong maximizer (minimizer) is that the Hessian
matrix is negative (positive) de nite.
This Hessian version is important operationally because there exist criteria, such as the
Sylvester-Jacobi one, to determine whether a symmetric matrix is positive/negative de nite
or semi-de nite. For instance, consider a generic function of two variables f : R2 ! R that
is twice continuously di erentiable. Let x0 2 R2 be a stationary point rf (x0 ) = (0; 0) and
let 2 @2f 3
@2f
2 (x0 ) @x @x (x 0 )
5= a b
@x1 1 2
r2 f (x0 ) = 4 (29.39)
@2f @2f c d
@x2 @x1 (x0 ) @x 2
2
(x 0 )
be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the point
is a candidate to be a maximizer or minimizer of f . To determine its exact nature, it is
necessary to analyze the Hessian matrix at the point. By Theorem 1379, x0 is a maximizer
if the Hessian is negative de nite, a minimizer if it is positive de nite, and it is neither a
maximizer, nor a minimizer if it is inde nite. If the Hessian is only semi-de nite, positive or
negative, it is not possible to draw conclusions on the nature of x0 . Applying the Sylvester-
Jacobi criterion to the matrix (29.39) we have that:
29.3. MULTIVARIABLE TAYLOR EXPANSION 921

(i) if a > 0 and ad bc > 0, the Hessian is positive de nite, so x0 is a strong local
minimizer;

(ii) if a < 0 and ad bc > 0, the Hessian is negative de nite, so x0 is a strong local
maximizer;

(iii) if ad bc < 0, the Hessian is inde nite, and therefore x0 is neither a local maximizer,
nor a local minimizer.

In all the other cases it is not possible to say anything on the nature of the point x0 .

Example 1383 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 + x22 + 6x1 . We have rf (x) =
(6x1 + 6; 2x2 ) and
6 0
r2 f (x) =
0 2
It is easy to see that the unique point where the gradient vanishes is x0 = ( 1; 0) 2 R2 ,
that is, rf ( 1; 0) = (0; 0). Moreover, in view of the previous discussion, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N

Example 1384 Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 . We
have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2
and 2 3
6x1 + 2x22 4x1 x2 0
6 7
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6
The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9 3
2 9 0
r2 f x0 = 4 9 9
2 0
5
0 0 6

and therefore
9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2

By the Sylvester-Jacobi criterion the Hessian matrix is inde nite. By Theorem 1379, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6

which is positive semi-de nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de nite:
for example, we have (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N
922 CHAPTER 29. APPROXIMATION

29.3.3 Multivariable unconstrained optima


Lastly, we can generalize to the multivariable case the partial procedure for the solution
of unconstrained optimization problems, discussed in Section 28.5.3. Consider the uncon-
strained optimization problem

max f (x) sub x 2 C


x

where C is an open convex set of Rn . Assume that f 2 C 2 (C). By Theorem 1379-(i), the
procedure of Section 28.5.3 assumes the following form:

1. Determine the set S C of the stationary interior points of f by solving the rst-order
condition rf (x) = 0 (Section 28.1.3).

2. Calculate the Hessian matrix r2 f at each of the stationary points x 2 S and determine
the set
S2 = x 2 S : r2 f (^x) is negative semi-de nite

3. Determine the set

S3 = x 2 S2 : f (x) f x0 for every x0 2 S2

of the points of C that are candidate solutions of the optimization problem.

Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such existence theorems with the di erential methods.

Example 1385 Let f : R2 ! R be given by f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3


and consider the unconstrained optimization problem

max f (x) sub x 2 R2++


x

Here C = R2++ is the rst quadrant of the plane without the axes (hence an open set). We
have
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
Therefore, from the rst-order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have

4 1
r2 f (x) =
2 1

By the Sylvester-Jacobi criterion, the Hessian matrix r2 f (x) is negative de nite.12 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be a solution of the unconstrained optimization
problem. One can show that this point is the solution of the problem. For the moment we
can only say that, by Theorem 1379-(ii), it is a local maximizer. N

12
Since r2 f (x) is negative de nite for all x 2 Rn
++ , this also proves that f is concave.
Chapter 30

Analytic functions

30.1 A calculus paradise


Eighteen century mathematicians, most notably Euler and Lagrange, focused on functions
that can be studied via power series. In particular, Lagrange in his 1813 calculus book
( rst edition 1797) considered functions f that, at each point x0 of their domain, could be
expressed via a power series in h as follows:
2 k
f (x0 + h) = f (x0 ) + 1h + 2h + + kh + (30.1)
1
X
k
= f (x0 ) + kh
k=0

The alpha coe cients depend, of course, on the point x0 . Lagrange de ned the rst derivative
f 0 (x0 ) at x0 as the rst coe cient 1 . As x0 varies, this de nes a new function f 0 , the
derivative function, which it is assumed to admit, like the parent function, a power expansion
in h at x0 . Lagrange de ned the second derivative f 00 (x0 ) as the rst coe cient of the
expansion of f 0 . One can continue in this way to de ne derivatives of all orders. These
functions are, indeed, assumed to be in nitely di erentiable.

Example 1386 By the Newton binomial formula, for the power function f (x) = xn we
have
n n 1 n n
f (x0 + h) = (x0 + h)n = xn0 + x0 h + x 2 2
h + + hn
1 2 0
n (n 1) n 2 2
= f (x0 ) + nxn0 1
h+ x0 h + + hn
2
So, f 0 (x0 ) = nxn0 1
. In turn,
n 1
f 0 (x0 + h) = n (x0 + h)n 1
= nxn0 1
+n x0n 2
h+ + nhn 1
1
(n 1) (n 2)
= f 0 (x0 ) + n (n 1) xn0 2
h+ x0n 3 2
h + + nhn 1
2
So, f 00 (x0 ) = n (n 1) xn0 2 . By iterating, one nds the higher order derivatives of the power
function (which are 0 for orders > n). N

923
924 CHAPTER 30. ANALYTIC FUNCTIONS

Lagrange then proves, in a key result, that the alpha coe cients can be expressed in
terms of the higher order derivatives, so that (30.1) becomes

1 f (k) (x0 ) k
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + h + (30.2)
2 k!
X1 (k)
f (x0 ) k
= f (x0 ) + h
k!
k=1

This exact Taylor formula makes these functions the best behaved from a di erentiable
viewpoint, a calculus paradise.
Lagrange's calculus thus builds on power series, a di erent approach to calculus from the
incremental one adopted in this book, which is the standard approach since the nineteen
century works, in primis by Cauchy and Weierstrass, that made calculus rigorous by freeing
it from the uncertain metaphysical status of in nitesimals. Indeed, such a status was a main
concern for Lagrange, whose approach was motivated by the desire to introduce derivatives
without using any, suspiciously metaphysical, notion of in nitesimal.1
In this chapter we will study, within the incremental approach which we adopted, the
functions that admit a power series representation (30.1), the so-called analytic functions. In
particular, in Proposition 1398 we will establish for them an exact Taylor formula (30.2), thus
recovering in our setup Lagrange's important representation. Many di erentiable functions
are not analytic, as it should be obvious by now, and indeed analytic functions no longer
play the central theoretical role that they had at the time of Lagrange. Yet, they are widely
used in applications because of their remarkable di erential properties.

30.2 Asymptotic scales and expansions


Up to now, in the book we have considered polynomial expansions. Although they are the
most relevant, it may be useful to mention other expansions, so to better contextualize the
polynomial case itself. Their study was pioneered by Henri Poincare in 1886.
Let us take any open interval (a; b), bounded or unbounded.2 A family of scalar functions
= f'n g1 3
n=0 de ned on (a; b) is said to be an asymptotic scale at x0 2 [a; b] if, for every
n 0, we have
'n+1 = o ('n ) as x ! x0

Example 1387 (i) Power functions

'n (x) = (x x0 )n

are an asymptotic scale at x0 2 (a; b).


1
This purpose is already clear from the full title of his book, whose translation is \Theory of analytic
functions containing the principles of di erential calculus without any consideration of in nitesimally small,
of vanishing quantities, of limits and uxions, and reduced to the algebraic analysis of nite quantities."
2
In other words, a; b 2 R. Throughout the chapter we will maintain this assumption (an instance of an
unbounded (a; b) is, of course, the entire real line).
3
The expression x0 2 [a; b] entails that x0 is an accumulation point of (a; b). For example, x0 2 [ 1; 1]
when (a; b) = ( 1; 1).
30.2. ASYMPTOTIC SCALES AND EXPANSIONS 925

(ii) Negative power functions


n
'n (x) = x
are an asymptotic scale at x0 = +1.4 More generally, powers 'n (x) = x n form an
asymptotic scale at x0 = +1 as long as n+1 > n for every n 1.
(iii) The trigonometric functions

'n (x) = sinn (x x0 )

form an asymptotic scale at x0 2 (a; b), while the logarithms


1
'n (x) = log n x

form an asymptotic scale at x0 = +1. N

Let us now give a general de nition of expansion.

De nition 1388 A function f : (a; b) ! R admits at a point x0 2 [a; b] an expansion of


order n with respect to the scale if there exist scalars f k gnk=0 such that
n
X
f (x) = k 'k (x) + o ('n ) as x ! x0 (30.3)
k=0

for every x 2 (a; b).

Polynomial expansions (29.3), i.e.,


n
X
f (x) = k (x x0 )k + o ((x x0 )n ) as x ! x0
k=0

are the special case of (30.3) in which the asymptotic scale is formed by power functions.
Contrary to the polynomial case, where x0 had to be a scalar, now we can take x0 =
1. Indeed, general expansions are relevant because, relative to special case of polynomial
expansions, they may allow us to approximate functions for large values of the argument,
that is, asymptotically.
In symbols, condition (30.3) can be expressed as
n
X
f (x) k 'k (x) as x ! x0
k=0

For example, for n = 2 we get the quadratic approximation:

f (x) 0 '0 (x) + 1 '1 (x) + 2 '2 (x) as x ! x0

By using the scale of power functions, we get back to the usual polynomial quadratic ap-
proximation
2
f (x) 0 + 1x + 2x as x ! 0
4
When, as in this example, we have x0 = +1 the interval (a; b) is understood to be unbounded with
b = +1 (the example of the negative power function scale was made by Poincare himself.)
926 CHAPTER 30. ANALYTIC FUNCTIONS

However, with the scale of negative power functions we get:


1 2
f (x) 0 + + as x ! +1
x x2
In such a case, being x0 = +1, we are dealing with a quadratic asymptotic approximation.

Example 1389 It holds:


1 1 1
+ 2 as x ! +1 (30.4)
x 1 x x
Indeed,
1 1 1 1 1
+ 2 = =o as x ! +1
x 1 x x (x 1) x2 x2
Approximation (30.4) is asymptotic. For values close to 0, we may instead consider the
quadratic polynomial approximation:
1
1 x 2x2 as x ! 0
x 1
N

The key uniqueness property of polynomial expansions (Lemma 1358) still holds in the
general case.

Lemma 1390 A function f : (a; b) ! R has, at each point x0 2 [a; b], at most a unique
expansion of order n with respect to scale .
P
Proof Consider the expansion nk=0 k 'k (x) + o ('n ) at x0 2 [a; b]. We have
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (30.5)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (30.6)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (30.7)
x!x0 'n (x)
Suppose that, for every x 2 (a; b), there are two di erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (30.8)
k=0 k=0

Equalities (30.5)-(30.7) must hold for both expansions. Hence, by (30.5) we have that 0 =
0 . Iterating such a procedure, from equality (30.6) we get 1 = 1 , and so on until n = n .

Limits (30.5)-(30.7) are crucial: it is easy to prove that the expansion (30.3) holds if
and only if the limits exist (and are nite).5 Such limits, in turn, determine the expansion
coe cients f k gnk=0 .
5
The \only if" part is shown in the previous proof, the reader can verity the converse.
30.2. ASYMPTOTIC SCALES AND EXPANSIONS 927

Example 1391 Let us determine the quadratic asymptotic approximation with respect to
the scale of negative power functions for the function f : ( 1; 1) ! R de ned by
1
f (x) =
1+x
Thanks to the equalities (30.5)-(30.7), we have
1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x
Hence, the desired approximation is
2
1 1 1
as x ! +1
1+x x x
By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N

If we change the scale, the expansion as well changes. For example, approximation
(30.4) is a quadratic approximation for 1= (x 1) with respect to the scale of negative power
functions. However, by changing scale one obtains a di erent quadratic approximation: for
example when at x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n we obtain
the quadratic asymptotic approximation
1 x+1 x+1
+ as x ! +1
x 1 x2 x4
In fact,
1 x+1 x+1 1 x+1
+ = =o as x ! +1
x 1 x2 x4 (x 1) x4 x4
In conclusion, di erent asymptotic scales lead to di erent unique approximations (as
long as they exist). Di erent functions can, however, share the same expansion, as the next
example shows.

Example 1392 As x ! +1, we have both


2 2
1 1 1 1+e x 1 1
and
1+x x x 1+x x x
Indeed, !
2
1+e x 1 1 1 + x2 e x 1
= =o as x ! +1
1+x x x (1 + x) x2 x2
Therefore 1=x 1=x2 is the quadratic asymptotic approximation of both functions 1= (1 + x)
and (1 + e x ) = (1 + x). N
928 CHAPTER 30. ANALYTIC FUNCTIONS

Earlier in the book we considered two formulations of the De Moivre-Stirling formula:


log n! = n log n n + o (n)
1 p
= n log n n + log n + log 2 + o (1)
2
The rst one is slightly less precise, but easier to derive (Section 8.14.7). Although they
deal with discrete variables, these formulas are in spirit two expansions, as n ! +1, of the
function log n!. The former is a quadratic asymptotic approximation with respect to a scale
whose rst two terms are fn log n; ng, for example the scale
1 1
n log n; n; 1; ; 2 ; :::
n n
The latter is an expansion of order 4 with respect to a scale whose rst four terms are
fn log n; n; log n; 1g, for example the scale
1
n log n; n; log n; 1; ; :::
n
To incarnate this spirit, we introduce a famous function
De nition 1393 The gamma function : (0; 1) ! R is de ned by the improper integral
Z 1
(x) = tx 1 e t dt
0

This function is log-convex (cf. Example 949). Moreover, it satis es a key recursion.
Lemma 1394 (x + 1) = x (x) for all x > 0.
Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
x t t x b x 1 t b x a x
t e dt = e t a + x t e dt = e b + e a + x tx 1
e t dt
a a a

If a # 0 we have e a ax ! 0, while if b " +1 we have e b bx ! 0,6 thus implying the desired


result.

By iterating, for every n 1 we thus have:


(n + 1) = n (n) = n (n 1) (n 1) = = n! (1) = n!
since (1) = 1. The gamma function can therefore be viewed as the extension on the real
line of the factorial function
f (n) = n!
which is de ned only on the natural numbers (so, it is a sequence).7 It is an important
function that, through the next remarkable result, makes more rigorous the interpretation
in terms of expansion of the two versions of the De Moivre-Stirling formula.
6
Since x > 0, we have lima!0 ax = 0 as 1 = x lima!0 log a = lima!0 log ax .
7
Instead of (n + 1) = n! we would have exactly (n) = n! if in the gamma function the exponent was x
instead of x 1 (we adopt the standard notation). This detail also explains the opposite sign of the logarithmic
term in the approximations of n! and of (x). The properties of the gamma function, including the next
theorem and its proof, can be found in Artin (1964).
30.3. ANALYTIC FUNCTIONS 929

Theorem 1395 We have, as x ! +1,

log (x) = x log x x + o (x)


1 p
= x log x x log x + log 2 + o (1)
2
In the expansion notation, we can thus write that, for x ! +1,

log (x) x log x x


1 p
x log x x log x + log 2
2

30.3 Analytic functions


30.3.1 Generalities
1
If a sequence of coe cients f k gk=0 is such that (30.3) holds for every n, we write
1
X
f (x) k 'k (x) as x ! x0
k=0
P
for every x 2 (a; b). The expression 1 k=1 k 'k (x) is called asymptotic expansion of f at
x0 . It is a series that, in general, not necessarily converges to the value f (x). It might not
even converge at all. Indeed, an asymptotic expansion is an approximation with a certain
degree of accuracy, nothing more. The next example presents the di erent, fortunate or less
fortunate, cases one can encounter.

Example 1396 (i) The function f : (1; 1) ! R de ned by


1
f (x) =
x 1
has, with respect to the scale of negative power functions, the asymptotic expansion
1
X 1
f (x) as x ! +1 (30.9)
xk
k=1

The asymptotic expansion is, for every given x, a geometric series. Therefore, it converges
for every x > 1 { i.e., for every x in the domain of f { with
1
X 1
f (x) =
xk
k=1

In this (fortunate) case the asymptotic expansion is correct: the series determined by the
asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; 1) ! R de ned by

1+e x
f (x) =
x 1
930 CHAPTER 30. ANALYTIC FUNCTIONS

has, with respect to the scale of negative power functions, the asymptotic expansion (30.9)
for x ! +1. However, in this case we have, for every x > 1,
1
X 1
f (x) 6=
xk
k=1

In this example the asymptotic expansion is merely an approximation, with degree of accu-
racy x n for every n.
(iii) Consider the function f : (1; 1) ! R de ned by:8
Z x t
x e
f (x) = e dt
1 t

By repeatedly integrating by parts, we get that:


Z x t x Z x t x x Z x t Z x t
e et e et et 2e t 1 1 x 2e
dt = + 2
dt = + 2 + 3
dt = e + 2 + 3
dt
1 t t 1 1 t t 1 t 1 1 t t t 1 1 t
x x x Z x Z x
et et 2et 3!et t 1 1 2! x 3!et
= + 2 + 3 + dt = e + + + dt
t 1 t 1 t 1 1 t3 t t2 t3 1 1 t3
x Z x
t 1 1 2! (n 1)! et
= e + 2+ 3+ + + n! dt
t t t tn 1 1 tn+1
Since
Rx R x
et
Rx et
x
ex
et 2
dt + dt e2 + n+1
1 tn+1
dt 1 tn+1 x
2 tn+1 ( x2 )
0 lim ex = lim ex lim ex
x!1 x!1 x!1
xn xn xn
xn 2n+1
= lim x + =0
x!1 e2 x
We have Z x
et ex
n+1
dt = o as x ! +1
1 t xn
Hence,
g (x) 1 1 2! 3! (n 1)! 1
f (x) = x
= + 2+ 3+ 4+ + +o as x ! +1
e x x x x xn xn
and
1
X (k 1)!
f (x) as x ! +1
xk
k=1
For any given x > 1, the ratio criterion implies
1
X 1
X
(k 1)! k!
= = +1
xk kxk
k=1 k=1

The asymptotic expansion thus determines a divergent series. In this, very unfortunate, case
not only the series does not converge to f , but it even diverges. N
8
This example is from de Bruijn (1961).
30.3. ANALYTIC FUNCTIONS 931

Let us go back to the polynomial case, in which the asymptotic expansion of a function
f : (a; b) ! R at a point x0 2 (a; b) has the power series form:
1
X
f (x) k (x x0 )k as x ! x0
k=0

When f is in nitely di erentiable at x0 , by Taylor-Peano's Theorem the asymptotic expan-


sion becomes
X1
f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0
The right-hand side of the expansion is a power series called the Taylor series { Maclaurin
when x0 = 0 { of the function f at the point x0 , with coe cients k = f (k) (x0 ) =k!. By
setting h = x x0 , the Taylor series becomes
1
X f (k) (x0 )
hk
k!
k=0

So, the Taylor series is a power series in h (so, in the di erence x x0 ), a key observation.
But, when can we turn in =, that is, when can these approximations become, at least
locally, exact? To answer this important question, we introduce the following classic notion.

De nition 1397 A function f : (a; b) ! R is said to be analytic at x0 2 (a; b) if there is a


neighborhood B (x0 ) and a sequence of scalar coe cients f k g1
k=0 such that
1
X
f (x) = k (x x0 )k 8x 2 B (x0 ) (30.10)
k=0

A function f : (a; b) ! R is said to be analytic if it is analytic at each point of its domain.


In words, f is analytic at a point x0 of its domain when the polynomial asymptotic expansion
of f at x0 is no longer an approximation but, at least locally in a neighborhood B (x0 ) of x0 ,
exact. Next we show that, remarkably, the power series in this exact expansion is actually
the Taylor series of f at x0 (so, it is uniquely pinned down).

Proposition 1398 A function f : (a; b) ! R is analytic if and only if it is in nitely di er-


entiable and, at each x0 2 (a; b), there is a neighborhood B (x0 ) such that
1
X f (k) (x0 )
f (x) = (x x0 )k (30.11)
k!
k=0

for all x 2 B (x0 ).

We can equivalently write this formula as


1
X f (k) (x0 )
f (x) = f (x0 ) + (x x0 )k
k!
k=1

because at k = 0 we have f (0) = f .


932 CHAPTER 30. ANALYTIC FUNCTIONS

Proof The converse being


P trivial, let usk consider the \only if". Let f be analytic. Since, by
hypothesis, the series 1
k=0 k (x x0 ) in (30.10) is convergent for every x 2 B (x0 ), with
sum f (x), one can show that f is in nitely di erentiable at each x0 2 (a; b). Let n 1. By
Taylor-Peano's Theorem, we have
n
X f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0

Lemma 1390 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.

This proposition shows that analytic functions are, indeed, the class of functions studied
by Lagrange in his 1813 book. A particularly interesting case occurs when one can take
B (x0 ) equal to the entire domain (a; b) at a point x0 2 (a; b), so that the analytic function
f admits at x0 a global exact Taylor expansion
1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 (a; b)
k!
k=0

At the origin, this global exact Taylor expansion takes the convenient Maclaurin form
1
X f (k) (0)
f (x) = xk 8x 2 (a; b)
k!
k=0

A global exact expansion, in particular at the origin, renders f most tractable. Not sur-
prisingly, next we show that this property is enjoyed, at all points of its domain, by the
marvelous exponential function.

Example 1399 By Theorem 399,


1
X
x xk
e = 8x 2 R (30.12)
k!
k=0

By substitution, we then have


n
X (x x0 )k
e x = ex 0 + e x 0 8x 2 R
k!
k=1

The exponential function is thus analytic. Remarkably, it admits at each point x0 of its do-
main a global exact Taylor expansion, the Maclaurin one (30.12) being the most convenient.
N

30.3.2 Generating functions


We now present an important class of analytic functions. Recall that a function f : A
R ! R is a generating function for a scalar sequence fan g if
1
X
f (x) = an xn 8x 2 A
n=0
30.3. ANALYTIC FUNCTIONS 933

The domain A is the interval of convergence of the power series on the right-hand side. Its
interior is the open interval of convergence ( r; r) identi ed by the radius of convergence r
of the power series (Chapter 11).
Generating functions are, by de nition, expandable as power series on their domains:
next we show that, as one may expect, this property makes them analytic on the open
interval of convergence. In so doing, we generalize Proposition 480.
P
1
Proposition 1400 Let an xn be a power series with radius of convergence r 2 (0; 1].
n=0
P
1
The function f : ( r; r) ! R given by f (x) = an xn is analytic, with
n=0

f (n) (0)
an = 8n 0 (30.13)
n!
Proof Let x0 2 ( r; r) and B" (x0 ) ( r; r). By the binomial formula, for each x 2 B" (x0 )
we have
1
X 1
X 1
X n
X
n n n
f (x) = n
an x = an (x x0 + x0 ) = an x m
(x x0 )m
m 0
n=0 n=0 n=0 m=0
1 1
!
X X n
= an xn0 m
(x x0 )m
n=m
m
m=0

where for the change in the order of summation in the last step we refer readers to, e.g.,
Rudin (1976) p. 176. By setting
1
X n
bm = an xn0 m

n=m
m

P
1
we then have f (x) = bm (x x0 )m for all x 2 B" (x0 ). This proves the analyticity of f .
m=0

A generating function f is thus analytic on the interior of its domain. As a result, at


each point x0 2 ( r; r) there is a neighborhood B (x0 ) ( r; r) in which f has an exact
Taylor expansion, i.e.,
1
X f (n) (x0 )
f (x) = (x x0 )n 8x 2 B (x0 )
n!
n=0

In particular, by (30.13) at x0 = 0 we have the global exact Maclaurin expansion


1
X f (n) (0)
f (x) = xn 8x 2 ( r; r) (30.14)
n!
n=0

that is, we can take B (0) = ( r; r). We can thus express f in di erential form.
For perspective, we close with easy corollary for general power series, not necessarily
centered at the origin.
934 CHAPTER 30. ANALYTIC FUNCTIONS

P
1
Corollary 1401 Let an (x x0 )n be a power series with x0 2 R and radius of conver-
n=0
P
1
gence r 2 (0; 1]. The function f : (x0 r; x0 + r) ! R given by f (x) = an (x x0 )n is
n=0
analytic, with
f (n) (x0 )
an = 8n 0
n!
As the reader can check, here we have a global exact Taylor expansion at x0 .

30.3.3 Analytic failures


Summing up, to answer the previous \approximation vs. exact" question amounts to estab-
lish the analyticity of a function: we can then turn in =, at least locally. By Proposition
1398, to be in nitely di erentiable is a necessary condition for a function to be analytic.
However, the following remarkable example shows that such a condition is not su cient, a
surprising fact that makes it necessary to introduce analytic functions as the class of in nitely
di erentiable functions for which such failure does not occur (again Proposition 1398). As n
times di erentiable functions may not be n+1 times di erentiable, so in nitely di erentiable
ones may not be analytic.
Example 1402 (i) The function f : R ! R given by
( 1
e x2 if x 6= 0
f (x) = (30.15)
0 if x = 0
is in nitely di erentiable at every point of the real line, in particular at the origin. So,
1
X f (k) (0)
f (x) xk as x ! 0
k!
k=0

However, as observed by Cauchy, we have (why?) f (n) (0) = 0 for every n 1, so


n
X f (k) (0)
f (x) 6= 0 = xk 80 6= x 2 R
k!
k=0

This means that the Maclaurin series


n
X f (k) (0)
xk
k!
k=0

converges (trivially) to the zero function, not to f (x), for all x 6= x0 . As a result, the
function f is not analytic although it is in nitely di erentiable.
(ii) Relatedly, the function f : R ! R given by
( 1
e x if x > 0
f (x) = (30.16)
0 if x 0
is in nitely di erentiable, but non-analytic at the origin. Indeed, here as well one can prove
that f (n) (0) = 0 for every n 1. So, the Maclaurin series converges to the zero function,
not to f (x), for all x > 0. N
30.3. ANALYTIC FUNCTIONS 935

In these examples analyticity fails at the origin because the Maclaurin series { i.e., the
Taylor series at the origin { converges but, unfortunately, not to the function f . There
exist more dramatic examples, discovered around the year 1880, where analyticity fails at
a point because the Taylor series does not even converge (except, trivially, at the origin
itself). Functions of this kind are called nowhere analytic. The existence of this class of
functions is counter-intuitive; interestingly, they came up at about the same time when
nowhere di erentiable functions, another counter-intuitive class of functions, emerged.
Rather than reporting a (typically involved) example of a nowhere analytic function, here
we observe that their existence is a consequence of the Borel-Peano Theorem that will be
presented in the coda. Indeed, consider the power series
1
X
n!xn
n=0

It has radius of convergence 0, so it diverges at all x 6= 0 (Example 473). By the Borel-Peano


Theorem there exists an in nitely di erentiable function f : R ! R such that, for each
n 0,
f (n) (0) = n!
This function is nowhere analytic because its Maclaurin series at the origin does not converge
for each x 6= 0.
Each power series with a zero radius of convergence thus determines a nowhere analytic
function. As there are plenty of them, this shows that there exist, pace Lagrange, plenty of
nowhere analytic functions. In so doing, it also shows the depth of the Borel-Peano Theorem.
Summing up, analyticity may fail at a point x0 because in each neighborhood B (x0 ) the
Taylor series
X1
f (k) (x0 )
(x x0 )k
k!
k=1

at x0 either converges to the \wrong" function { i.e., not to f { or, more dramatically, does
not converge at all (except, trivially, at the point x0 itself).

30.3.4 Analyticity criteria


Next we present two classic analyticity criteria.9 To introduce the rst one, for an in nitely
di erentiable function f : (a; b) ! R de ne an auxiliary function rf : (a; b) ! [0; 1] by

1
rf (x) = r
k f (k) (x)
lim supk!1 k!

The value rf (x) is, by the Cauchy-Hadamard Theorem, the radius of convergence of the
Taylor series of f at x. If f is analytic, we clearly have

rf (x) > 0 8x 2 (a; b) (30.17)


9
They have been proved by Alfred Pringsheim in 1893 and by Sergei Bernstein in 1912. We only outline
the proof of Bernstein's result and refer interested readers to Krantz and Parks (2002).
936 CHAPTER 30. ANALYTIC FUNCTIONS

Indeed, rf (x0 ) = 0 at a point x0 2 (a; b) would imply that the Taylor series of f at x0 does
not converge at any point x 6= x0 . Condition (30.17) is thus necessary for analyticity. It
is, however, not su cient: for instance, it is satis ed by the non-analytic function f de ned
in (30.16) since f is analytic at all x 6= 0 and rf (0) = +1.10 A stronger version becomes,
however, su cient.

Theorem 1403 (Pringsheim) An in nitely di erentiable function f : (a; b) ! R is ana-


lytic if there is > 0 such that rf .

In words, an in nitely di erentiable function is analytic if the radius of convergence of


its Taylor series is uniformly bounded away from zero.

Example 1404 Consider the hyperbola f : (0; 1) ! R de ned by f (x) = 1=x. It holds
n!
f (n) (x) = 8n 0
xn+1
and so
1
rf (x) = 1 =x
limn!1 n
x n+1
By the Pringsheim criterion, f is analytic on the interval (a; 1) for every a > 0. In turn,
this readily implies the analyticity of f . N

The second, quite striking, analyticity criterion is based on the sign of the derivatives. It
is the occasion to introduce an important class of functions.

De nition 1405 An in nitely di erentiable function f : (a; b) ! R is said to be absolutely


monotone if
f (k) 0 8k 0 (30.18)

In words, an in nitely di erentiable function is absolutely monotone when its derivatives


of all orders are positive at all points of its domain. Observe that for k = 0 condition (30.18)
becomes f = f (0) 0; absolutely monotone functions are thus positive. For k 1, this
condition implies that absolutely monotone functions and all their derivative functions are
increasing. More importantly, next we show that they are analytic.

Theorem 1406 (Bernstein) An absolutely monotone function is analytic.

Proof Let f : (a; b) ! R be absolutely monotone. We prove the theorem in the special case
when, at each x 2 (a; b), there is a constant Mx > 0 such that
f (n) (x)
Mx 8n 1 (30.19)
n!
Let x0 2 (a; b) and x 0 < h < 1 such that x0 + h < b. By the Lagrange-Taylor formula
(28.8), we have
n
X1 f (k) (x0 ) f (n) (^
x) n
f (x0 + h) f (x0 ) = hk + h
k! n!
k=1
10
Indeed, the coe cients of the Maclaurin series are all zero and so it converges on the entire real line {
though not to f (x) for x > 0.
30.3. ANALYTIC FUNCTIONS 937

Because of the positivity of derivatives, the function f and its derivatives are increasing
functions. So,

f (n) (^
x) n f (n) (x0 + h) n
0 h h Mx0 +h hn ! 0 as n ! 1
n! n!
under the hypothesis (30.19). In turn, this implies
1
X f (k) (x0 )
f (x0 + h) f (x0 ) = hk (30.20)
k!
k=1

Similarly, by the dual formula (28.9), if we take 1 < h < 0 such that x0 + h > a we
have
n
X1 f (k) (x0 ) f (n) (^
x) n
f (x0 + h) f (x0 ) = hk + h
k! n!
k=1

and
f (n) (^
x) n f (n) (x0 + h) n
0 h h Mx0 +h hn ! 0 as n ! 1
n! n!
under the hypothesis (30.19). In turn, this implies (30.20).
We conclude that there exists a small enough neighborhood B (x0 ) of x0 such that
1
X f (k) (x0 )
f (x) = f (x0 ) + (x x0 )k
k!
k=1

for all x 2 B (x0 ).

Example 1407 (i) The exponential function ex is, clearly, absolutely continuous on the real
line. (ii) The function f : ( 1; 0) ! R de ned by f (x) = log ( x) is easily seen to be
absolutely continuous. N

Example 1408 For the function f : R f1g ! R de ned by

1
f (x) =
1 x
we have, for all k 1,
k!
f (k) (x) =
(1 x)k
Indeed, we can proceed by induction. For k = 1, the result is obvious. If we assume that
the result is true for k 1 (induction hypothesis), then
(k 1)!
df (k 1) (x) d d (1 x)1 k
d (1 x)1 k
(k) (1 x)k 1
f (x) = = = (k 1)! = (k 1)!
dx dx ! dx dx
(1 k) k!
= (k 1)! k
=
(1 x) (1 x)k
938 CHAPTER 30. ANALYTIC FUNCTIONS

as desired. At all x < 1 we thus have f (k) (x) 0 for all k 1. That is, the function f is
absolutely continuous, so analytic by Bernstein's Theorem, on the interval ( 1; 1).
Hence, at all points x0 < 1 there is a neighborhood B (x0 ) ( 1; 1) such that
1
X 1
X k
f (k) (x0 ) k x x0
f (x) = (x x0 ) = 8x 2 B (x0 )
k! 1 x0
k=0 k=0

Note that, by the properties of the geometric series,


1
X k
x x0
f (x) = 8x 2 (2x0 1; 1)
1 x0
k=0

because j(x x0 ) = (1 x0 )j < 1 if and only if x 2 (2x0 1; 1). So, we can take B (x0 ) =
(2x0 1; 1), a neighborhood of x0 of radius 1 x0 .11 For instance, at x0 = 0 we have
1
X 1
X
f (k) (0) k
f (x) = x = xk 8x 2 ( 1; 1)
k!
k=0 k=0

Here B (0) = ( 1; 1). N

Example 1409 Given any 2 R, de ne f : ( 1; 1) ! R by

f (x) = (1 + x)

One can show by induction that, for each x 2 ( 1; 1),

( 1) ( k) (k+1)
f (k) (x) = ( 1) ( k + 1) (1 + x) = (k)
(1 + x) 0

The function f is thus absolutely continuous, so analytic by Bernstein's Theorem. It holds:12


1
X
(1 + x) = xk 8x 2 ( 1; 1)
k
k=0

This is the beautiful formula (11.12) that in Example 485 permitted to say that f is the
generating function of the binomial sequence (11.11). As remarked back then, it generalizes
formula (B.8). To see why it holds, observe that, for each natural number k 0,

f (k) (0) = (k)

Thus, for each x 2 ( 1; 1),


1
X (k) 1
X
f (0)
f (x) = xk = xk
k! k
k=0 k=0

as desired. N
11
Note that x0 < 1 implies 2x0 1 < 1.
12
The relevant terminology is in Example 485.
30.3. ANALYTIC FUNCTIONS 939

Absolute monotonicity has an alter ego.

De nition 1410 An in nitely di erentiable function f : (a; b) ! R is said to be completely


monotone if
( 1)k f (k) 0 8k 0 (30.21)

In words, an in nitely di erentiable function is completely monotone if its derivative


functions alternating signs. For k = 0 condition (30.21) becomes ( 1)0 f (0) = f 0 and so
completely monotone functions are automatically positive. For k = 1 this condition implies
that f is decreasing and for k = 2 that it is convex.

Example 1411 (i) The function f : R ! R given by f (x) = e x , with > 0, is completely
monotone. (ii) The hyperbola f : (0; 1) ! R given by f (x) = 1=x is completely monotone.
N

Complete and absolute monotonicity are dual notions: a function f (x) is absolutely
monotone if and only f (a + b x) is completely monotone.13 In turn, this duality readily
proves the following corollary of Bernstein's Theorem.

Corollary 1412 A completely monotone function is analytic.

In sum, absolutely and completely monotone functions are important examples of ana-
lytic functions. For instance, the analyticity of the exponential function can be seen as a
consequence of its absolute continuity.

30.3.5 Basic properties


We begin by showing that linear combinations and products of analytic functions are analytic.

Proposition 1413 Let f; g : (a; b) ! R be analytic.

(i) The linear combination f + g : (a; b) ! R is analytic for all ; 2 R;

(ii) The product f g : (a; b) ! R is analytic.

Proof We prove only (i) and leave (ii) to the reader. Let x0 be any point of the interval (a; b).
We want to show that the function f + g : (a; b) ! R is analytic at xn0 . By
o de nition, there1
f g 1
exist neighborhoods B f (x0 ) and B g (x0 ) as well as scalar sequences k and k k=0
k=0
such that
1
X 1
X
f g
f (x) = k (x x0 )k 8x 2 B f (x0 ) ; g (x) = k (x x0 )k 8x 2 B g (x0 )
k=0 k=0

Thus,
1
X f g
( f + g) (x) = ( k + k ) (x x0 )k 8x 2 B f (x0 ) \ B g (x0 )
k=0
13
To sum a + b x presupposes that the interval (a; b) is bounded. Yet, the duality is readily extended to
unbounded open intervals. For instance, on the real line f (x) is absolutely monotone if and only f ( x) is
completely monotone.
940 CHAPTER 30. ANALYTIC FUNCTIONS

As B f (x0 ) \ B g (x0 ) is a neighborhood of the point x0 , this proves that f + g is analytic


at x0 .

Next we turn to compositions.14

Proposition 1414 If the functions f : (a; b) ! R and g : (c; d) ! R are both analytic, with
Im f (c; d), then their composite function g f : (a; b) ! R is analytic.

Since the Faa di Bruno formula gives the derivatives of all orders of the composite function
g f , this result provides a chain rule for analytic functions.

Example 1415 De ne g : ( 1; 1) ! R and h : ( 1; 1) ! R by


1 1
g (x) = and h (x) =
1 x 1+x
We have h (x) = g ( x) for all x > 1, so h = g f where f (x) = x. Since g and f are
analytic functions, we conclude that also h is an analytic function. N

Combined with analyticity criteria, these stability properties under linear combinations,
products and compositions permit to establish that many functions of interest are analytic.
The following result shows that, indeed, some classic elementary functions are analytic.

Proposition 1416 (i) The logarithmic function is analytic, with


1
X xk
log (1 + x) = ( 1)k+1 8x 2 ( 1; 1] (30.22)
k
k=1

(ii) The trigonometric functions sine and cosine are analytic, with
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0

Remarkably, when one looks at their polynomial asymptotic expansions, exponential and
logarithmic functions look much more similar to trigonometric functions than they appear to
be prima facie. The emergence of deeper connections between what may otherwise appear
to be very di erent classes of functions is a further dividend of analyticity.

Proof We consider only


P point (i). Expansion (30.22) holds by Corollary 1369. So, at x0 = 0
we have log (1 + x) = 1k=1 ( 1)k+1 k
x =k for
n every x 2 o( 1; 1]. So, this logarithmic function
is the generating function for the sequence ( 1)n+1 =n with the interval ( 1; 1] as domain.
By Proposition 1400, it is then analytic on ( 1; 1). We omit the proof of the analyticity on
the entire interval ( 1; 1).

Trigonometric functions thus have global exact Maclaurin expansions. In contrast, the
logarithmic function log (1 + x) has an exact Maclaurin expansion only for all x 2 ( 1; 1].
14
For a proof see Krantz and Parks (2002) p. 19.
30.3. ANALYTIC FUNCTIONS 941

Yet, by the last result log (1 + x) is analytic on the entire interval ( 1; 1). Thus, in view of
(29.17), for each x0 > 1 there is a neighborhood B (x0 ) ( 1; 1) such that
1
X ( 1)k+1 (x x0 )k
log (1 + x) = log (1 + x0 ) + 8x 2 B (x0 )
k=1
k (1 + x0 )k

Similarly, the function 1= (1 + x) is a generating function on the interval ( 1; 1), as seen in


Example 479, so it has an exact polynomial asymptotic expansion at x0 = 0 for x 2 ( 1; 1).
Yet, the last example showed that this function is analytic on the entire interval ( 1; 1).
Observe that at x = 1 the exact logarithmic expansion (30.22) takes the elegant form
1
X 1 1 1 1
log 2 = ( 1)k+1 =1 + +
k 2 3 4
k=1

This formula was proved earlier in the book in Proposition 407 with a direct method. We
can now see of which forest it is a tree.

Analytic functions have a striking zero property.

Proposition 1417 A non-zero analytic function has at most countably many zeros.

The proof is an immediate consequence of the following topological lemma.

Lemma 1418 The zeros of a non-zero analytic function are isolated points.

Proof Given an non-zero analytic function f : (a; b) ! R, let

Z = fx 2 (a; b) : f (x) = 0g

be the collection of the zero points of f . Let x0 2 Z. We want to show that x0 is an isolated
point. To this end, observe that, being f analytic, there is a neighborhood B (x0 ) and a
scalar sequence f k g1k=0 such that
1
X
f (x) = k (x x0 )k 8x 2 B (x0 )
k=0

Let k be the position of the rst non-zero k (as f is non-zero, such a position exists). De ne
' : B (x0 ) ! R by
1
X
' (x) = k (x x0 )k k
= k + k+1 (x x0 ) + k+2 (x x0 )2 +
k=k

This function is continuous (why?) and allows us to write

f (x) = (x x0 )k ' (x) 8x 2 B (x0 )

Clearly, ' (x0 ) = k 6= 0. As ' is continuous, there is a small enough > 0 such that
' (x) 6= 0 for all x 2 B (x0 ). As (x x0 )k for all x0 6= x 2 B (x0 ), we conclude that
942 CHAPTER 30. ANALYTIC FUNCTIONS

f (x) 6= 0 for all x0 6= x 2 B (x0 ). Thus fx0 g = Z \ B (x0 ), proving that the zero point x0
is isolated.

An analytic function is thus either identically zero or has at most countably many zeros.
This result nicely complements the Bolzano-type theorems that ensure the existence of ze-
ros. In particular, it implies that an equation de ned by an analytic function has at most
countably many solutions.
Next we present couple of interesting consequences of this cardinality property that
further illustrate the remarkable properties of analytic functions. The rst one is about
the cardinality of level sets.

Corollary 1419 The level sets of a non-constant analytic function are at most countable.

Proof Let (f = k) be a non-empty level set, with k 2 R. A point of the level set (f = k)
is a zero of the analytic function h = f k. As h is non-zero since f is non-constant, by
Proposition 1417 the level set has at most countably many elements.

The second consequence is a remarkable unique extension property. To appreciate it,


observe that I can be taken of arbitrarily small length.

Corollary 1420 Two analytic function f and g that are equal on an open interval I (a; b)
are equal everywhere.

Proof Let f (x) = g (x) for all x 2 I. By Proposition 1417, the analytic function h = f g
is identically zero.

In conclusion, analytic functions are a fundamental subclass of in nitely di erentiable


functions. Thanks to their asymptotic expansion, which is both polynomial and exact (what
more could one want?), they are the most tractable functions. This makes them perfect for
applications, which hardly can do without them.

30.4 Coda
30.4.1 Hille-Taylor's formula
We can now state a beautiful version of Taylor's formula, due to Einar Hille, for continuous
functions.15

Theorem 1421 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
1
X k f (x )
0
f (x0 + h) = lim hk (30.23)
!0 + k!
k=0

15
We omit its non-trivial proof (see Feller, 1966).
30.4. CODA 943

We call Hille-Taylor formula the limit (30.23). When f is in nitely di erentiable, this
formula intuitively should approach the series expansion (30.11), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0

because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 1256). This is actually
true when f is analytic because in this case (30.11) and (30.23) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0

Hille-Taylor formula, however, holds when f is just bounded and continuous, thus providing
a remarkable generalization of the Taylor expansion for analytic functions.

30.4.2 Borel's Theorem


Let f be the \Cauchy" non-analytic function of Example 1402, i.e.,
( 1
e x2 if x 6= 0
f (x) =
0 if x = 0

and let g : R ! R be the zero function (a trivially analytic function). We have f (k) (0) =
g (k) (0) for all k 0, so f and g are an example of two distinct in nitely di erentiable
functions that have the same Maclaurin series. Indeed, Taylor series pin down uniquely only
analytic functions.
But, do coe cients of Taylor (in particular, of Maclaurin) series have some character-
izing property? Is there some peculiar property that such coe cients satisfy? For analytic
functions the answer is positive: the Cauchy-Hadamard Theorem requires
p
n
lim sup j nj < +1

So, only scalar sequences f k g1 k=0 satisfying such a bound may qualify to be coe cients
of a Taylor series of some analytic function. Yet, we do not live in the Lagrange calculus
paradise, there exist in nitely di erentiable functions that are not analytic. Indeed, the next
deep theorem { whose highly non-trivial proof we omit { shows that, in general, the previous
questions have a negative answer.16

Theorem 1422 (Borel-Peano) For any sequence of scalars fck g1


k=0 there is an in nitely
di erentiable function f : R ! R such that

f (k) (0) = ck 8k 0 (30.24)


16
The theorem was independently proved between 1884 and 1895 by Giuseppe Peano and Emile Borel
(Borel's version is the best known, so the name of this subsection).
944 CHAPTER 30. ANALYTIC FUNCTIONS

Proof In his Annotation 67 in Genocchi and Peano (1884), Peano de ne f : R ! R by


1
X ak xk
f (x) =
1 + bk x2
k=0

where the scalar sequence fak g is arbitrary, while the scalar sequence fbk g is positive and
chosen to ensure the convergence of series in a neighborhood of the origin. Peano showed
that, given any scalar sequence fck g, by judiciously choosing the sequences fak g and fbk g
one can establish (30.24).

sequence f k g1
So, anything goes: given any scalar p k=0 , there is an in nitely di erentiable
function f { not analytic when lim sup n j n j = +1 { such that f (k) (0) = k k! for all k, so
with those scalars as the coe cients of its Maclaurin series.
This function is actually not unique: given any function f satisfying (30.24) and any
scalar , the function f : R ! R de ned by
( 1
f (x) + e x2 if x 6= 0
f (x) =
f (0) if x = 0

is easily seen to satisfy (30.24) as well. A continuum of in nitely di erentiable functions that
satisfy (30.24) thus exist.
Chapter 31

Concavity and di erentiability

Concave functions have remarkable di erential properties that con rm the great tractability
of these widely used functions. The study of these properties is the subject matter of this
chapter. We begin with scalar functions and then move to functions of several variables.
Throughout the chapter C always denotes a convex set (so an interval in the scalar case).
For brevity, we will focus on concave functions, leaving to the readers the dual results that
hold for convex functions.

31.1 Scalar functions


31.1.1 Decreasing marginal e ects
The di erential properties of a scalar concave function f : C R ! R follow from a simple
geometric observation. Given two points x and y in the domain of f , the chord that joins
the points (x; f (x)) and (y; f (y)) of the graph has slope

f (y) f (x)
y x

as one can verify with a simple modi cation of what done for (26.6). Graphically:

f(y)
4
f(y)-f(x)
3
f(x)
2
y-x

0
O x y
-1
-1 0 1 2 3 4 5 6

945
946 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

If the function f is concave, the slope of the chord decreases when we move the chord
rightward. This basic geometric property characterizes concavity, as the next lemma shows.
Lemma 1423 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w y z, x 6= y and w 6= z, we have
f (y) f (x) f (z) f (w)
(31.1)
y x | z {z w }
| {z }
slope AC slope BD

In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:

5 D
C
4

3
B
2

1 A

0
O x w y z
-1
-1 0 1 2 3 4 5 6

Note that a strict inequality in (31.1) characterizes strict concavity.

Proof \Only if". Let f be concave. The proof is divided in two steps: rst we show that
the chord AC has a greater slope than the chord BC:

5
C
4

3
B
2

1 A

0
O x w y
-1
-1 0 1 2 3 4 5 6
31.1. SCALAR FUNCTIONS 947

Then, we show that the chord BC has a greater slope than the chord BD:

5 D
C
4

3
B
2

0
O w y z
-1
-1 0 1 2 3 4 5 6

The rst step amounts to proving (31.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave, we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (31.2)
y w y x (1 )y y x
This completes the rst step. We now move to the second step, which amounts to proving
(31.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z.
Further, since f is concave we have f (y) f (w) + (1 )f (z), so that

f (y) f (w) f (w) + (1 )f (z) f (w) f (z) f (w)


= (31.3)
y w w + (1 )z w z w
Finally, from (31.2) and (31.3) it follows that
f (z) f (w) f (y) f (w) f (y) f (x)
(31.4)
| z {z w } |
y w
{z } |
y x
{z }
slope BD slope BC slope AC

as desired.
\If". Assume (31.1). Let x; z 2 C, with x < z, and 2 [0; 1]. Set y = x + (1 ) z. If
in (31.1) we set w = x, we have
f ( x + (1 ) z) f (x) f (z) f (x)
x + (1 )z x z x
Since x + (1 )z x = (1 ) (z x), we then have
f ( x + (1 ) z) f (x) f (z) f (x)
(1 ) (z x) z x
that is, f ( x + (1 ) z) f (x) (1 ) (f (z) f (x)). In turn, this implies that f is
concave, as desired.

The geometric property (31.1) has the following analytical counterpart, of great economic
signi cance.
948 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Proposition 1424 If f : C R ! R is concave, then it has decreasing increments (or


di erences), i.e.,
f (x + h) f (x) f (y + h) f (y) (31.5)

for all x; y 2 C, h 0 and x y with y + h 2 C. The converse is true if f is continuous.

Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have

f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :

as desired. We omit the proof of the converse.

The inequality (31.5) does not change if we divide both sides by a scalar h > 0. Hence,

f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h

provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilateral)
derivative exists. Concave functions f thus feature decreasing marginal e ects as their
argument increases, so embody a fundamental economic principle: additional units have
a lower and lower marginal impact on levels (of utility, of production, and so on; we then
talk of decreasing marginal utility, decreasing marginal returns, and so on). It is through
this principle that forms of concavity rst entered economics.1
The next lemma establishes this property rigorously by showing that one-sided derivatives
exist and are decreasing.

Proposition 1425 Let f : C R ! R be concave. Then,

(i) the right f+0 (x) and left f 0 (x) derivatives exist at each x 2 int C;2

(ii) the right f+0 and left f 0 derivative functions are both decreasing on int C;

(iii) f+0 (x) f 0 (x) for each x 2 int C.


1
In his famous 1738 essay, Daniel Bernoulli wrote: \Now it is highly probable that any increase in wealth,
no matter how insigni cant, will always result in an increase in utility which is inversely proportionate to
the quantity of goods already possessed." This is where the principle rst appeared, and through it Bernoulli
justi ed the use of a logarithmic (so concave) utility function. This magni cent insight of Bernoulli was way
ahead of his time (see for instance Stigler, 1950).
2
The interior, int C, of an interval C is an open interval: whether C is either [a; b] or [a; b) or (a; b], we
always have int C = (a; b).
31.1. SCALAR FUNCTIONS 949

A concave function has therefore remarkable properties of regularity: at each interior


point of its domain, it is automatically continuous (Theorem 833) and has decreasing one-
sided derivative functions.3

Proof Since x0 is an interior point, it has a neighborhood (x0 "; x0 + ") included in C,
that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a] C. Let
: [ a; a] ! R be de ned by

f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (31.1) implies that is decreasing, that is,

f (x0 + h0 ) f (x0 ) f (x0 + h00 ) f (x0 )


h0 h00 =) h0 = h00 = (31.6)
x0 + h0 x0 x0 + h00 x0

Indeed, if h0 < 0 < h00 it is su cient to apply (31.1) with w = y = x0 , x = x0 + h0 and


z = x0 + h00 . If h0 h00 < 0, apply (31.2) with y = x0 , x = x0 + h0 and w = x0 + h00 . If
0<h 0 h apply (31.3) with w = x0 , y = x0 + h0 and z = x0 + h00 .
00

Since is decreasing on [ a; a], we have (a) (h) ( a) for every h 2 [ a; a],


that is, is bounded. Therefore, is both decreasing and bounded, which implies that the
right limit and the left limit of exist and are nite. This proves the existence of one-sided
derivatives. Moreover, the fact that is monotonically decreasing implies (h0 ) (h00 ) for
0 00
every h < 0 < h , so that

f+0 (x0 ) = lim (h) lim (h) = f 0 (x0 )


h!0+ h!0

To show the monotonicity, consider x; y 2 int C such that x < y. By (31.5),

f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative function is decreasing. A similar argument holds for
the left derivative function.

Clearly, if in addition f is di erentiable at x, then f 0 (x) = f+0 (x) = f 0 (x). In particular:

Corollary 1426 If a concave function f : C R ! R is di erentiable on int C, then its


derivative function f 0 is decreasing on int C.
3
For brevity, one often says that a \derivative is increasing" rather than the more precise a \derivative
function is increasing". In what follows, at times we too will take this liberty. The more one masters a
topic, the more one is tempted to abuse notation and terminology for the sake of brevity. Sometimes this is
needed not to get trapped in pedantic matters, but other times it is what makes impenetrable some topics to
beginners. As we already remarked (Section 1.1.2), a proper balance between rigor and pedantry is key for
e ective scienti c communication.
950 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Example 1427 (i) The concave function f (x) = jxj does not have a derivative at x = 0.
Nevertheless, the one-sided derivatives exist at each point of the domain, with
(
1 if x < 0
f+0 (x) =
1 if x 0

and (
0
1 if x 0
f (x) =
1 if x > 0
Therefore, f+0 (x) f0(x) for every x 2 R and both one-sided derivative functions are
decreasing.
(ii) The concave function
8
>
> x+1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
1 x if x 1

does not have a derivative at x = 1 and at x = 1. Yet, the one-sided derivatives exist at
each point of the domain, with
8
>
> 1 if x < 1
<
f+0 (x) = 0 if 1 x<1
>
>
:
1 if x 1

and 8
>
> 1 if x 1
<
f 0 (x) = 0 if 1<x 1
>
>
:
1 if x > 1
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(iii) The concave function f (x) = 1 x2 is di erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N

Proposition 1425 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result says that we actually have f+0 (x) = f 0 (x), and so f is di erentiable at x, at
all points x 2 C except those belonging to an, at most, countable subset of C. For the three
concave functions of the previous example, such set of non-di erentiability is f0g, f 1; 1g
and ;, respectively.

Theorem 1428 A concave function f : C R ! R is di erentiable at all the points of


C with the exception of an, at most, countable subset E C.

The proof relies on few interesting lemmas that, in a crescendo, shed further light on
one-sided derivatives of concave functions. The rst one signi cantly re nes the inequality
established in Proposition 1425-(iii).
31.1. SCALAR FUNCTIONS 951

Lemma 1429 If f : C R ! R is concave, then


x < y =) f 0 (x) f+0 (x) f 0 (y) f+0 (y) 8x; y 2 int C
Proof Let x; y 2 int C be such that x < y. By Proposition 1425-(iii), we have f 0 (x) f+0 (x)
and f 0 (y) f+0 (y). It remains to prove that f+0 (x) f 0 (y). In view of (31.4), the reader
can check that
f (z) f (w) f (y) f (w) f (y) f (x) f (w) f (x)
(31.7)
| z {z w } | y {z w } | y {z x } | w {z x }
slope BD slope BC slope AC slope AB

provided x < w < y < z. In particular, we have (why?):


f (w) f (y) f (y) f (w) f (w) f (x)
f 0 (y) = f+0 (x)
w y y w w x
as desired.

Next we prove the one-sided continuity (Section 23.1) of one-sided derivatives.


Lemma 1430 The (left) right derivative function of a concave f : C R ! R is continuous
from the (left) right, i.e.,
lim f+0 (y) = f+0 (x) and lim f 0 (y) = f 0 (x) 8x 2 int C
y!x+ y!x

Proof Let x 2 int C. We show only that limy!x+ f+0 (y) = f+0 (x), as a similar argu-
ment proves that limy!x f 0 (y) = f 0 (x). Since f+0 : int C ! R is decreasing, the limit
limy!x+ f+0 (y) exists. In particular, f+0 (y) f+0 (x) for all x < y 2 C, and so
lim f+0 (y) f+0 (x) (31.8)
y!x+

Let x < z 2 int C. By (31.7),


f (y) f (w) f (z) f (w)
y w z w
if w < y < z. So,
f (z) f (w)
f+0 (w) 8w 2 (x; z)
z w
By the continuity of f (Theorem 833), we then have
f (z) f (x)
lim f+0 (w)
w!x+ z x
In turn, this implies
f (z) f (x)
lim f+0 (w) lim = f+0 (x)
w!x+ z!x+ z x
In view of (31.8), we conclude that limy!x+ f+0 (y) = f+0 (x). A similar argument proves that
limy!x f 0 (y) = f 0 (x).

The nal lemma re nes the previous one by showing that concave functions can have
jump discontinuities of the form f+0 (x) ; f 0 (x) .
952 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Lemma 1431 For a concave f : C R ! R it holds, for all x 2 int C,

lim f+0 (y) = f+0 (x) f 0 (x) = lim f+0 (y)


y!x+ y!x

and
lim f 0 (y) = f+0 (x) f 0 (x) = lim f 0 (y)
y!x+ y!x

Proof By Lemma 1429, we have f 0 (x) f+0 (y) f 0 (y) for all x > y 2 int C. So, by
Lemma 1430 we have

f 0 (x) lim f+0 (y) lim f 0 (y) = f 0 (x)


y!x y!x

We conclude that limy!x f+0 (y) = f 0 (x) for all x 2 C. A similar argument proves that
f+0 (x) = limy!x+ f 0 (y)for all x 2 C.

Proof of Theorem 1428 Let x 2 int C. A function f is di erentiable at x if and only if


f+0 (x) = f 0 (x). By the Lemma 1431, we have f+0 (x) = f 0 (x) if and only if limy!x+ f+0 (y) =
limy!x f+0 (y), i.e., if and only if the right derivative function f+0 : int C ! R is continuous
at x. Since f+0 is decreasing, the set D int C of its discontinuities is at most countable
(Proposition 564). If x 2 = D, we conclude that f is di erentiable at x. Thus, the set E of
points where f is not di erentiable is included in the set D [ @C. This proves the theorem.4

A nal remark on this proof. Since f 0 (x) = f+0 (x) = f 0 (x) at all x 2 C D, the
derivative function f 0 is continuous on C D (cf. Section 23.1.1). The concave function f
is thus continuously di erentiable on C D.

31.1.2 Chords and tangents


Theorem 1432 Let f : (a; b) ! R be di erentiable at x 2 (a; b). If f is concave, then

f (y) f (x) + f 0 (x) (y x) 8y 2 (a; b) (31.9)

If f is strictly concave, the inequality is strict when x 6= y.

Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1), we
have

f (x + (1 ) (y x)) = f ( x + (1 ) y) f (x) + (1 ) f (y)


= f (x) + (1 ) [f (y) f (x)]

Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )
4
Since f 0 (x) = f+
0
(x) = f 0 (x) at all x 2 C D, the derivative function f 0 is continuous on C D (cf.
Section 23.1.1). So, f is continuously di erentiable on C D.
31.1. SCALAR FUNCTIONS 953

Dividing and multiplying the left-hand side by y x, we get

f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)

This inequality holds for every 2 (0; 1). Hence, thanks to the di erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)
Therefore, f 0 (x) (y x) f (y) f (x), as desired.
Let f be strictly concave. Suppose there exists y 2 (a; b), with y 6= x, such that f (y) =
f (x) + f 0 (x) (y x). Then,

1 1 1 1 1 1
f x+ y > f (x) + f (y) = f (x) + f (x) + f 0 (x) (y x)
2 2 2 2 2 2
(y x) 1 1
= f (x) + f 0 (x) f x+ y
2 2 2

where the last inequality follows from (31.9). This contradiction completes the proof.

The right-hand side of inequality (31.9) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 1432, such line always lies above
the graph of the function, the approximation is in \excess".
Geometrically, this remarkable property is clear: the de nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y, and hence that it lies above it outside that interval.5
Letting y tend to x, the straight line becomes tangent and lies all above the curve.

5 f(x)+f'(x)(y-x)

4.5
f(x)
4
f(y)
3.5

3
f(y )
2

2.5

2 f(y )
1
1.5

0.5
O y y y x
1 2
0
0 1 2 3 4 5

5
For completeness, let us prove it. Let z be outside the interval [x; y]: suppose that z > y. We can then
write y = x + (1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x) + (1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, since 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x, we
reason similarly.
954 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

In the previous theorem we assumed di erentiability at a given point x. If we assume it


on the entire interval (a; b), the inequality (31.9) characterizes concavity.

Theorem 1433 Let f : (a; b) ! R be di erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (31.10)

while f is strictly concave if and only if inequality (31.10) is strict when x 6= y.

Thus, for a function f di erentiable on an open interval, a necessary and su cient con-
dition for concavity of f is that the tangent lines at the various points of its domain all lie
above its graph.

Proof The \only if" follows from the previous theorem. We prove the \if". Suppose that
inequality (31.10) holds and consider the point z = x + (1 ) y. Let us consider (31.10)
twice: rst with the points z and x, and then with the points z and y. Then:

f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)

Let us multiply the rst inequality by , the second one by (1 ), and add them. We get

0 f (x) + (1 ) f (y) f ( x + (1 ) y)

Given the arbitrariness of x and y, we conclude that f is concave. A similar argument shows
that, if inequality (31.10) is strict when x 6= y, then f is strictly concave.

31.1.3 Concavity criteria


The last theorem established a rst di erential characterization of concavity. Condition
(31.10) can be viewed as a concavity criterion that can be used to check whether a given
di erentiable function is, indeed, concave. However, though key conceptually, condition
(31.10) turns out to be not that useful operationally as a concavity criterion. For this
reason, in this section we will establish other di erential characterizations of concavity that
lead to more useful concavity criteria.
To this end, remember that a signi cant property established in Proposition 1425 is the
decreasing monotonicity of the one-sided derivative functions of concave functions. The next
important result shows that for continuous functions this property characterizes concavity.

Theorem 1434 Let f : C R ! R be continuous. Then:

(i) f is concave if and only if the right derivative function f+0 exists and is decreasing on
int C;

(ii) f is strictly concave if and only if the right derivative function f+0 exists and is strictly
decreasing on int C.
31.1. SCALAR FUNCTIONS 955

Proof (i) We only prove the \if" since the converse follows from Proposition 1425. For
simplicity, assume that f is di erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,

f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y
Being x + (1 )y x x = (1 ) (y x) and y x (1 )y = (y x), we then
have
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
(1 ) (y x) (y x)
In turn, this easily implies f ( x + (1 ) z) f (x) + (1 ) f (z), as desired.6 (ii) This
part is left to the reader.

A similar result, left to the reader, holds for the other one-sided derivative f 0 . This
theorem thus establishes a di erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of one-sided derivative functions.

Example 1435 Let f : R ! R be given by f (x) = x + x3 , that is,


(
x + x3 if x < 0
f (x) =
x x3 if x 0

The function f is continuous. It has one-sided derivatives at each point of the domain, with
(
0 1 + 3x2 if x < 0
f+ (x) =
1 3x2 if x 0

and (
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0
To see that this is the case, consider the origin, which is the most delicate point. We have
f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h
and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing. By Theorem 1434, the function f is concave. N
6
Using a version of the Mean Value Theorem for unilateral derivatives, we can prove the result without
any di erentiability assumption on f .
956 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

One-sided derivatives are key in the previous theorem because concavity per se only
ensures their existence, not that of the two-sided derivatives. One-sided derivatives are,
however, less easy to handle than the two-sided derivative. So, in applications di erentiability
is often assumed. In this case we have the following simple consequence of the previous
theorem that provides a useful concavity criterion for functions.

Corollary 1436 Let f : C R ! R be di erentiable on int C and continuous on C. Then:

(i) f is concave if and only if f 0 is decreasing on int C;

(ii) f is strictly concave if and only if f 0 is strictly decreasing on int C.

Under di erentiability, a necessary and su cient condition for a function to be (strictly)


concave is, thus, that its rst derivative is (strictly) decreasing.7

Proof We only prove (i), as (ii) is similar. Let f : C R ! R be di erentiable on int C and
continuous on C. If f is concave, Theorem 1434 implies that f 0 = f+0 is decreasing. Vice
versa, if f 0 = f+0 is decreasing then Theorem 1434 implies that f is concave.

Example 1437 Consider the functions f; g : R ! R given by f (x) = x3 and g (x) =


e x . The graph of f is:

3 y

0
O x
-1

-2

-3

-4
-3 -2 -1 0 1 2 3 4 5

7
When C is open, the continuity assumption become super uous (a similar observation applies to Corollary
1438 below).
31.1. SCALAR FUNCTIONS 957

while the graph of g is:

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

Both functions are di erentiable on their domain, with


(
3x2 if x 0
f 0 (x) = and g 0 (x) = e x
3x2 if x > 0

The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 1436. N

The previous corollary provides a simple di erential criterion of concavity that reduces
the test of concavity to that, often operationally simple, of a property of rst derivatives. The
next result shows that it is, actually, possible to do even better by recalling the di erential
characterization of monotonicity seen in Section 28.4.

Corollary 1438 Let f : C R ! R be twice di erentiable on int C and continuous on C.


Then:

(i) f is concave if and only if f 00 0 on int C;

(ii) f is strictly concave if f 00 < 0 on int C.

Proof (i) It is su cient to observe that, thanks to the \decreasing" version of Proposition
1322, the rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the \strictly decreasing" version of Proposition 1324.

Under the further hypothesis that f is twice di erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the rst derivative. In any case, thanks to the last two
corollaries we now have powerful di erential tests of concavity.8
8
As the reader can check, dual results hold for convex functions, with increasing monotonicity instead of
decreasing monotonicity (and f 00 0 instead of f 00 0).
958 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Note the asymmetry between points (i) and (ii): while in (i) the condition f 00 0 is a
necessary and su cient condition for concavity, in (ii) the condition f 00 < 0 is only a su cient
condition for strict concavity, as the function f (x) = x4 exempli es. This follows from the
analogous asymmetry for monotonicity between Propositions 1322 and 1324.
p
Example 1439 (i) The functions f (x) = x and g (x) = log x have, respectively, deriva-
p
tives f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are
strictly concave. The second derivatives f 00 (x) = 1=4x3=2 < 0 and g 00 (x) = 1=x2 < 0
con rm this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; 1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; 1). N

Example 1440 (i) The power functions f : [0; 1) ! R de ned by f (x) = xn are strictly
convex for all n > 1. Indeed, we have f 00 (x) = n (n 1) xn 2 0 for all x > 0. (ii) The
1
fractional power functions f : [0; 1) ! R de ned by f (x) = x n are strictly concave for all
n > 1. Indeed, we have
1 1 1
f 00 (x) = 1 xn 2 < 0
n n
for all x > 0. (iii) The negative exponential function f : R ! R de ned by f (x) = e x is
strictly concave because f 00 (x) = e x < 0. N

We close with a monotonicity twist: the converse of Proposition 1324 holds under con-
cavity. So, under this property the strict monotonicity of a function and the strict positivity
of its derivative become equivalent properties.

Corollary 1441 A concave and di erentiable f : (a; b) ! R, with a; b 2 R, is strictly


increasing if and only if f 0 > 0.

The negative exponential f (x) = e x illustrates this result. On the other hand, the
cubic function f (x) = x3 often used to show the failure of the converse in Proposition 1324
is not concave (cf. Example 1325).

Proof The \if" follows from Proposition 1324. As to the converse, we have f 0 0 since f
is increasing (Proposition 1322). By Corollary 1436-(i), f 0 is decreasing since f is concave.
It remains to prove that f 0 > 0. Suppose, by contradiction, that f 0 (x0 ) = 0 for some
x0 2 (a; b). Since f 0 is decreasing and f 0 0 , this implies f 0 (x) = 0 for all x 2 [x0 ; b).
By Corollary 1312, f is constant on any closed subinterval of [x0 ; b), thus contradicting the
strict monotonicity of f . We conclude that f 0 > 0.

We could further strengthen the previous corollary by dropping the di erentiability as-
sumption. Since concave functions always admit right and left derivative at each interior
point of their domain (Proposition 1425), the result continue to hold if we replace f 0 with
either f 0 or f+0 .
31.1. SCALAR FUNCTIONS 959

31.1.4 Degree of concavity


When is a scalar function more concave than another one? This question is important in
any application in which a property is modelled via the concavity of a function, so a higher
degree of concavity corresponds to a higher \intensity" of such property. A classic example
is risk aversion, a behavioral property characterized analytically through the concavity of
utility functions. More concave utility functions thus correspond to a higher degree of risk
aversion, a comparison that often plays a key role in economic applications.
The next de nition addresses the opening question.
De nition 1442 A function h : C R ! R is ( strictly) more concave than a function
g : C R ! R, with Im g convex, if there exists a transformation f : Im g ! R concave and
strictly increasing such that
h=f g (31.11)
Intuitively, the concavity of f magni es that of g, thus making the composition f g {
i.e., the function h { more concave than g. Geometrically, h is more \curved" than g.
p
Example 1443 (i) The function h (x) = 2 1 log x is more concave than g (x) = x on
C = (0; 1). Indeed, just take f (x) = log x. (ii) Let g (x) = x be the identity function. It is
easy to check that a strictly increasing function h is more concave than g if and only if it is
concave. N
The last de nition does not require the function g to be concave: in principle, a function
can be more concave than another one even though neither is concave. Yet, the case when
g is concave is the most important and is useful to x ideas. In this case, also h is concave
(Proposition 844) and so the de nition compares the degree of concavity of two concave
functions. In particular, by the Intermediate Value Theorem the clause \Im g convex" in the
last de nition holds if g continuous.

So far so good. Yet, given two functions h and g, it is not easy in general to check directly
the existence of a concave and strictly increasing transformation f such that h = f g or
g = f h, as the last de nition requires. Fortunately, there is a simple di erential criterion.
To state it, assume that the convex set C is an open interval (a; b), possibly unbounded, and
that g is twice di erentiable with g 0 (x) 6= 0 for all x 2 (a; b). The function g : (a; b) ! R
de ned by
g 00 (x)
g (x) = 8x 2 (a; b)
g 0 (x)
is called index of (relative) concavity of g.9 The computation of this index requires only the
knowledge of the rst and second derivatives of the function at hand.
Example 1444 (i) Let g : (0; 1) ! R be the logarithmic function g(x) = log x. We have
g 0 (x) = 1=x and g 00 (x) = 1=x2 . So, g : (0; 1) ! R is given by
1
g 00 (x) x2 1
g (x) = = 1 =
g 0 (x) x
x
9
This index was introduced by de Finetti (1952) p. 700 and, independently, by Pratt (1964) and Arrow
(1971) who developed its far-reaching economic applications. For this reason, it is often called the Arrow-Pratt
index (and Theorem 1445 is named after them).
960 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

The concavity index is here strictly decreasing.


(ii) Let g : R ! R be the negative exponential function g(x) = e kx , with k 2 R. We
have g 0 (x) = ke kx and g 00 (x) = k 2 e kx . So, g : R ! R is given by

g 00 (x) k2 e kx
g (x) = = =k
g 0 (x) ke kx

The concavity index is here constant.


(iii) Let g : R ! R be the power function g(x) = xp , with p 2 R. We have g 0 (x) = pxp 1

and g 00 (x) = p (p 1) xp 2 . So, g : R ! R is given by

g 00 (x) p (p 1) xp 2 1 p
g (x) = = =
g 0 (x) pxp 1 x

Since 0g (x) = (p 1) =x2 , this concavity index is strictly increasing if p > 1, strictly de-
creasing if p < 1, and constant if p = 1 (i.e., if g is a straight line). N

The next remarkable result shows that the comparison of concavity of two functions can
be performed via their concavity indexes. This result makes operational the comparative
notion introduced in the last de nition.

Theorem 1445 (Arrow-Pratt) Let g; h : (a; b) ! R be twice di erentiable, with g 0 ; h0 > 0.


The following conditions are equivalent:

(i) h is more concave than g;

(ii) 10
h g.

The proof of this theorem relies on a lemma of independent interest.

Lemma 1446 Let f : (a; b) ! R be strictly monotone and di erentiable. If f is twice di er-
entiable at x0 2 (a; b), with f 0 (x0 ) 6= 0, then its inverse function f 1 is twice di erentiable
at y0 = f (x0 ), with
00 f 00 (x0 )
f 1 (y0 ) =
[f 0 (x0 )]3

This formula is a special case of a combinatorial formula for higher order derivatives of
inverses that reminds that of Faa di Bruno.11 For instance, the third derivative of the inverse
is:
000 [f 00 (x0 )]2 f 000 (x0 )
f 1 (y0 ) = 3
[f 0 (x0 )]5 [f 0 (x0 )]4

Proof Suppose that f is strictly increasing (the decreasing case is similar), so that f 0 (x0 ) >
0. Since f is twice di erentiable, the derivative function f 0 is continuous. By Theorem 532,
there exists a neighborhood B" (x0 ) (a; b) such that f 0 (x) > 0 for all x 2 B" (x0 ). Without
loss of generality, assume that B" (x0 ) = (a; b), i.e., that f 0 > 0.
10
That is, h (x) g (x) for all x 2 (a; b).
11
We refer interested readers to Johnson (2002).
31.1. SCALAR FUNCTIONS 961

To ease notation, denote f 1 by '. Set ' (y0 + h) = x0 + k and observe that, by the
continuity of ', when h ! 0, also k ! 0. Moreover, we have h = f (x0 + k) f (x0 ) because
' 1 = f . That said, by Theorem 1234 we have '0 (y) = 1=f 0 (' (y)) for all y 2 Im f . We
then have
1 1
'0 (y0 + h) '0 (y0 ) f 0 ('(y0 +h)) f 0 (x0 ) f 0 (' (y0 + h))
f 0 (x0 )
lim = lim = lim
h!0 h h!0 h h!0 hf 0 (x0 ) f 0 (' (y0 + h))
1 f (' (y0 + h)) f 0 (x0 )
0 1
= 0
lim lim 0
f (x0 ) h!0 h h!0 f (' (y0 + h))
1 0 0
f (x0 + k) f (x0 )
= lim
2 k!0
0
[f (x0 )] h
1 f 0 (x0 ) f 0 (x0 + k)
= lim
[f 0 (x0 )]2 k!0 f (x0 + k) f (x0 )
f 0 (x0 ) f 0 (x0 +k)
1 k
= lim
[f (x0 )]2 k!0
0 f (x0 +k) f (x0 )
k
f 0 (x0 ) f 0 (x0 +k)
1 limk!0 k
= 2
[f (x0 )] limk!0 f (x0 +k)k f (x0 )
0

1 f 00 (x0 ) f 00 (x0 )
= =
[f 0 (x0 )]2 f 0 (x0 ) [f 0 (x0 )]3

where the fourth equality follows from the continuity of f 0 and the seventh one from the
condition f 0 (x0 ) 6= 0.

Proof of Arrow-Pratt's Theorem (i) implies (ii). Suppose that there exists a concave
and strictly increasing function f : Im g ! R such that h = f g. Since h and g are injective,
simple algebra shows that f = h g 1 . Since g is twice di erentiable, with g 0 > 0, by Lemma
1446 the inverse g 1 is also twice di erentiable. Since h is twice di erentiable, also f is then
twice di erentiable (cf. the Theorem of Faa di Bruno).
Since h = f g, by the chain rule we have

h0 (x) = f 0 (g (x)) g 0 (x) and h00 (x) = f 00 (g (x)) g 0 (x)2 + g 00 (x) f 0 (g (x)) (31.12)

So,

h00 (x) f 00 (g (x)) g 0 (x)2 + g 00 (x) f 0 (g (x)) f 00 (g (x)) g 0 (x)2


h (x) = = = + g (x)
h0 (x) h0 (x) h0 (x)

In turn, this implies:


h0 (x)
( g (x) h (x)) = f 00 (g (x)) (31.13)
g 0 (x)2
We have f 00 (g (x)) 0 because f is concave, as well as h0 (x) =g 0 (x)2 > 0 because h0 > 0.
We conclude that h g.
(ii) implies (i). Assume that h 1
g on C. Set f = h g . Since g is strictly increasing
and twice di erentiable, with g 0 > 0, by Lemma 1446 its inverse g 1 is twice di erentiable
962 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

as well. Since h is twice di erentiable, also f is twice di erentiable (again, cf. the Theorem
of Faa di Bruno).
Since h = f g and h is twice di erentiable, (31.12) holds. In particular, from h0 ; g 0 > 0 it
follows that f 0 > 0, while by (31.13) we have that g g implies that f
00 0. We conclude
that f is strictly increasing and concave.

Inspection of the proof shows that, under the hypotheses of the last theorem, the trans-
formation f in (31.11) is twice di erentiable.

Example 1447 (i) In view of the last example, the negative exponential function can be
written as e x , where the scalar 2 R is the constant value of the concavity index of
this function. Given any two such functions g(x) = e 1 x and h(x) = e 2 x , by the last
theorem g is more averse than h if and only if 1 2.
The coe cient thus characterizes the degree of concavity of the negative exponential
function, which is higher the greater is this coe cient. It is a remarkable property of this
function.
(ii) Consider the power functions g; h : (0; 1) ! R given by g(x) = xp and h (x) = xq ,
with non-zero p; q 2 R. On (0; 1) we have g 0 ; h0 > 0, so we can invoke the Arrow-Pratt
Theorem. In particular, it implies that h is more concave than g if and only if
1 q 1 p
8x > 0
x x
that is, if and only if q p. Note that these power functions are concave when their
exponents are < 1 and convex when they are > 1. This example thus con rms that, as
remarked before, we can compare the degree of concavity also of non-concave functions. N

A key property of a concave function g : C R ! R is the Jensen inequality: for every


nite collection fx1 ; x2 ; :::; xn g of elements of a convex set C, we have
n
! n
X X
g i xi i g (xi ) (31.14)
i=1 i=1
P
for all i 0 such that ni=1 i = 1. A strictly increasing g has a strictly increasing inverse
g 1 (Proposition 222), and so this inequality can be equivalently stated as
n n
!
X X
i xi g 1 i g (xi )
i=1 i=1

that is, as an inequality between a weighted arithmetic mean and a quasi-arithmetic mean
(Section 15.10). This \quasi-arithmetic" angle on the Jensen inequality paves the way to the
next result.

Theorem 1448 (de Finetti-Jessen) Let g; h : C R ! R be strictly increasing. The


following conditions are equivalent:

(i) h is more concave than g;


31.1. SCALAR FUNCTIONS 963

(ii) for every nite collection fx1 ; x2 ; :::; xn g of elements of C, we have


n
! n
!
X X
h 1 i h (xi ) g 1 i g (xi ) (31.15)
i=1 i=1
Pn
for all i 0 such that i=1 i = 1.

Thus, quasi-arithmetic means can be ranked according to the degree of concavity of the
strictly increasing functions that de ne them. The Jensen inequality for strictly increasing
functions is the special case in which h is the identity function h (x) = x, so that g is
concave (cf. Example 1443).
This quasi-arithmetic generalization of the Jensen inequality plays a key role in risk
theory, in particular in the study of risk aversion. It was independently proved in 1931 by
Bruno de Finetti and B rge Jessen.

Proof (i) implies (ii). Suppose that h is more concave than g, i.e., h = f g with f :
Im g ! R concave and strictly increasing. Since both g and f are strictly increasing, we
1
have h 1 = (f g) P = g 1 f 1 . Let fx1 ; x2 ; :::; xn g be a nite collection in C and let
i 0 be such that ni=1 i = 1. We have:
n
! n
!
X X
h 1 h (xi ) i = g 1 f 1 (f g) (xi ) i
i=1 i=1
n
!!
X
1 1
g f f g (xi ) i
i=1
n
! n
!
X X
1 1 1
= g f f g (xi ) i =g g (xi ) i
i=1 i=1

where the inequality holds because the function gP 1 f 1 is strictly increasing P and, by
the Jensen inequality, the concavity of f implies ni=1 (f g) (xi ) i f ( ni=1 g (xi ) i ).
Moreover, the last equality holds because f is strictly increasing, so f 1 f (x) = x for all
x 2 Im g.
(ii) implies (i). Suppose that (31.15) holds. Since g is strictly increasing, we can de ne
f : Im g ! R as f = h g 1 . We then have
1 1
h h (x) = x = g g (x) 8x 2 C

So
1 1
h (x) = h g g (x) = h g g (x) = (f g) (x) 8x 2 C
We thus have h = f g. It remains to show that f is concave and strictly increasing.
The latter property holds because g 1 is strictly increasing, being g strictly increasing. As
to
Pnconcavity, let fx1 ; x2 ; :::; xn g be a nite collection in C and let i 0 be such that
i=1 i = 1. By (31.15) we have
n
!! n
!!
X X
h g 1 g (xi ) i h h 1 h (xi ) i
i=1 i=1
964 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

because h strictly increasing. Since f = h g 1 , in turn this inequality implies


n
! n
!! n
!!
X X X
f g (xi ) i = h g 1 g (xi ) i h h 1 h (xi ) i (31.16)
i=1 i=1 i=1
n
X n
X
= h (xi ) i = (f g) (xi ) i
i=1 i=1
P Pn
It thus holds the Jensen inequality f ( ni=1 g (xi ) i) i=1 (f g) (xi ) i. So, f is concave
because Im g is its domain.
x 1
Example 1449 (i) The inverse of the negative exponential e is log ( x). It is a
special case of formula 6.11 that can be checked directly:
1 x 1 x 1
log e = log e = ( x) = x 8x 2 R

So, if we let h(x) = e 1 x and g(x) = e 2x with 1 2, then for every nite collection
fx1 ; x2 ; :::; xn g of scalars we have
n
X n
X
1 1 xi
1 2 xi
log e i log e i
1 i=1 2 i=1
P
for all i 0 such that ni=1 i = 1. Note that this inequality involves log-exponential
functions (Example 912).
p
(ii) Since h (x) = 2 1 log x is more concave than g (x) = x on (0; 1), for every nite
collection fx1 ; x2 ; :::; xn g of strictly positive scalars we have
n
!2
Pn p X p
2 i log xi
e i=1
i xi
i=1
Pn
for all i 0 such that i=1 i = 1. Indeed, g 1 (x) = x2 for x 0 and h 1 (x) = e2x . N

The next result is a powerful, though straightforward, consequence of the de Finetti-


Jessen theorem.

Corollary 1450 Let 1 < q p < +1 be non-zero. For every nite collection fx1 ; x2 ; :::; xn g
of strictly positive scalars, we have

n
!1 n
!1
X q
q X p
p

i xi i xi (31.17)
i=1 i=1
Pn
for all i 0 such that i=1 i = 1.

Inequality (31.17) is called power mean inequality because it compares power means.

Proof In view of the de Finetti-Jessen theorem, it is enough to note that, by Example


1447-(ii), the function h (x) = xq is more concave than the function g(x) = xp if q p.
31.1. SCALAR FUNCTIONS 965

We close with a pointwise dominance characterization of relative concavity that shows a


somewhat surprising connection between two, prima facie, quite di erent ways to compare
functions. To this end, note that we can always normalize a strictly increasing function
f : [a; b] ! R, so that it assumes value 0 at a and 1 at b, via its transformation fa;b : [a; b] ! R
de ned by
f (x) f (a)
fa;b (x) = 8x 2 [a; b]
f (b) f (a)
To see that fa;b is a transformation of f , just note that fa;b = f + , where

1 f (a)
= >0 and =
f (b) f (a) f (b) f (a)
So, fa;b = ' f , where ' is the strictly increasing a ne function ' (x) = x + .
The function f and its normalization fa;b share the same monotonicity, continuity, and
concavity properties. More importantly, they share the same degree of concavity. So, if we
take any two functions f; g : [a; b] ! R, their normalizations fa;b and ga;b factor out some
di erences that are immaterial for the comparison of their degrees of concavity.
More generally, we can normalize f on any subinterval [c; d] [a; b], so that it assumes
value 0 at c and 1 at d, via its transformation fc;d : [a; b] ! R de ned by

f (x) f (c)
fc;d (x) = 8x 2 [a; b]
f (d) f (c)
Now, fc;d = f + where

1 f (c)
= >0 and = (31.18)
f (d) f (c) f (d) f (c)
Again, the function f and its normalization fc;d share the same monotonicity, continuity,
and concavity properties as well as the same degree of concavity.
The next de nition builds upon these normalizations.

De nition 1451 Given any two functions f; g : [a; b] ! R, we say that f hereditarily
dominates g if fc;d gc;d for all a c < d b.12

In words, f hereditarily dominates g if all normalizations of f pointwise dominate those


of g. This notion leads to a noteworthy pointwise dominance characterization of their relative
concavity.

Theorem 1452 Let f; g : [a; b] ! R be strictly increasing, continuous and concave func-
tions. Then, f is more concave than g if and only if f hereditarily dominates g.

Proof Since f and g are strictly increasing and continuous, there exists a strictly increasing
and continuous function h : Im g ! Im f such that f = h g. Let ; 0 > 0 and ; 0 2 R be
such that f~ = f + and g~ = 0 g + 0 . De ne h ~ : [0; 1] ! [0; 1] to be such that

~ (t) = h t
h 0
+ 8t 2 Im g~ (31.19)

12
That is, fc;d (x) gc;d (x) for all x 2 [c; d].
966 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Moreover, we have
~
h 0s + 0
h (s) = 8s 2 Im g (31.20)
as well as
f~ (x) g~ 0
= f (x) = h (g (x)) = h 0
8x 2 [a; b]

which yields f~ = h
~ g~.

(i) implies (ii). The proof of this implication is divided in a few claims.

Claim 1 f is more concave than g if and only if f + is more concave than f + for all
; 0 > 0 and ; 0 2 R.

Proof of Claim 1 Observe that f is more concave than g if and only if h is concave if and
~ is concave if and only if f~ is more concave than g~.
only if h
0 0
Claim 2 If f + is more concave than f + for all ; > 0 and ; 2 R, then fc;d is
more concave than gc;d for all a c < d b.

Proof of Claim 2 Consider c; d 2 [a; b] such that c < d. Observe that fc;d = f + and
gc;d = 0 g + 0 where and are de ned as in (31.18) and 0 and 0 are de ned similarly
with g in place of f . By hypothesis and since c and d were arbitrarily chosen, this implies
the thesis.

Claim 3 If fc;d is more concave than gc;d for all a c<d b, then f hereditarily dominates
g.

Proof of Claim 3 Consider c; d 2 [a; b] such that c < d. Since fc;d is more concave than
gc;d , we have that there exists hc;d : Im fc;d ! gc;d where hc;d is concave and such that
fc;d = hc;d gc;d . Since fc;d (c) = 0 = gc;d (c) and fc;d (d) = 1 = gc;d (d), we have that
hc;d (0) = 0 and hc;d (1) = 1. Since hc;d is concave, it follows that

hc;d (t) = hc;d (t1 + (1 t) 0) thc;d (1) + (1 t) hc;d (0) = t 8t 2 [0; 1]

Since gc;d (c) gc;d (x) gc;d (d) for all x 2 [c; d], we conclude that fc;d = hc;d gc;d gc;d .
Since c and d were arbitrarily chosen, the implication follows.

Together, these claims prove that (i) implies (ii) and, on the way, establish some proper-
ties of some independent interest.

(ii) implies (i). By contradiction, assume that f is not more concave than g. This implies
that h is not concave. It follows that there exist t1 ; t2 ; t3 2 Im g such that t1 < t2 < t3
t3 t2 t2 t1
h (t2 ) < h (t1 ) + h (t3 )
t3 t1 t3 t1
which yields
h (t2 ) h (t1 ) t2 t1
<
h (t3 ) h (t1 ) t3 t1
31.2. INTERMEZZO: INNER MONOTONE OPERATORS 967

By construction, there exist x1 ; x2 ; x3 2 [a; b] such that g (xi ) = ti for all i = 1; 2; 3. Note
that x1 < x2 < x3 . We have that
f (x2 ) f (x1 ) h (t2 ) h (t1 ) g (x2 ) g (x1 )
= < (31.21)
f (x3 ) f (x1 ) h (t3 ) h (t1 ) g (x3 ) g (x1 )

Next, consider ; 0 > 0 and ; 0


2 R. De ne f~ = f + and g~ = 0g + 0
. Note that

f~ (x2 ) f~ (x1 ) f (x2 ) f (x1 ) g~ (x2 ) g~ (x1 ) g (x2 ) g (x1 )


= and = (31.22)
f~ (x3 ) f~ (x1 ) f (x3 ) f (x1 ) g~ (x3 ) g~ (x1 ) g (x3 ) g (x1 )

Since ; 0 > 0 and ; 0 2 R were arbitrarily chosen, this latter fact is true if we choose
them to be such that f~ = fx1 ;x3 and g~ = gx1 ;x3 . By (31.22) and (31.21), we conclude that

f~ (x2 ) f~ (x1 ) f (x2 ) f (x1 ) g (x2 ) g (x1 ) g~ (x2 ) g~ (x1 )


f~ (x2 ) = = < = = g~ (x2 )
f~ (x3 ) ~
f (x1 ) f (x3 ) f (x1 ) g (x3 ) g (x1 ) g~ (x3 ) g~ (x1 )

which contradicts the fact that f hereditarily dominates g.

31.2 Intermezzo: inner monotone operators


In the next section we will study the di erential properties of concave functions of several
variables. This important topic relies, in turn, on an important topic, monotone operators,
that we now present. For this topic, we also need a more general view on de nite and
semi-de nite matrices.

31.2.1 De nite matrices revisited


A moment's re ection shows that the classi cation of de nite and semi-de nite matrices
introduced in Section 29.3 actually holds for any square matrix, not necessarily symmetric.

De nition 1453 A square matrix A of order n is said to be:

(i) positive ( negative) semi-de nite if x Ax 0( 0) for all x 2 Rn ;

(ii) positive ( negative) de nite if x Ax > 0 (< 0) for all 0 6= x 2 Rn .

That said, given any n n matrix A, its symmetric part is the n n symmetric matrix
As de ned by
1
A + ATAs =
2
To see the symmetry of As note that, by (15.6), we have
1 T 1 T 1
AT
s = A + AT = A +A = A + AT = As
2 2 2
In particular, A is symmetric if and only if A = As .

Lemma 1454 We have x Ax = x As x for all square matrices A of order n.


968 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

So, by considering their symmetric parts the general classi cation of square matrices of
the last de nition reduces to that of symmetric matrices of Section 29.3. Nevertheless, the
next subsection will illustrate the usefulness of the general classi cation.

Proof We have
n X
X n n X
X n
1 1
x Ax = aij xi xj = aij + aij xi xj
2 2
i=1 j=1 i=1 j=1
Xn X n Xn Xn n n
1 1 1 1 XX
= aij xi xj + aij xi xj = x Ax + aij xj xi
2 2 2 2
i=1 j=1 i=1 j=1 j=1 i=1
n X
X n
1 1 1 1
= x Ax + aji xi xj = x Ax + x AT x = x As x
2 2 2 2
i=1 j=1

as desired.

31.2.2 Monotone operators and the law of demand


The following important class of operators, which exploits the inner product structure of Rn ,
was introduced by George Minty in 1962.

De nition 1455 An operator g = (g1 ; :::; gn ) : C Rn ! Rn is said to be inner decreasing


if
Xn
(g (x) g (y)) (x y) = (gi (x) gi (y)) (xi yi ) 0 8x; y 2 C (31.23)
i=1

It is strictly inner decreasing if the inequality is strict when x 6= y.

When the inequality (31.23) is reversed, we have an inner increasing operator, which
becomes strictly increasing when the inequality is strict. A function is inner monotone when
it is inner decreasing or increasing.13

Example 1456 Let T : Rn ! Rn be a linear operator given by T (x) = Ax, where A is a


square matrix of order n. By the linearity of T , the inequality (31.23) becomes

(T (x) T (y)) (x y) = T (x y) (x y) 0 8x; y 2 Rn

that is,
Az z = T (z) z 0 8z 2 Rn
We conclude that T is:

(i) inner decreasing (increasing) if and only if A is negative (positive) semi-de nite;

(ii) strictly inner decreasing (increasing) if and only if A is negative (positive) de nite.
13
We use the adjective \inner" to distinguish this notion, based on inner products, from the more standard
notion of monotonicity in Rn . Yet, this adjective is not standard.
31.2. INTERMEZZO: INNER MONOTONE OPERATORS 969

The inner monotonicity of a linear operator T and the de niteness of its matrix A are
two faces of the same coin. N

Example 1457 De ne f : R2 ! R2 by

f (x1 ; x2 ) = (x1 ; x1 + x2 )

Here g1 ; g2 : R2 ! R are given by g1 (x1 ; x2 ) = x1 and g2 (x1 ; x2 ) = x1 + x2 . We have:

(f (x) f (y)) (x y) = (f1 (x) f1 (y)) (x1 y1 ) + (f2 (x) f2 (y)) (x2 y2 )
= (x1 y1 ) + (x1 y1 ) + (x1 + x2 y1 y2 ) (x2 y2 )
2 2
= (x1 y1 ) + (x1 y1 ) (x2 y2 ) + (x2 y2 )
1 1
(x1 y1 )2 + (x2 y2 )2 0
2 2
The penultimate inequality follows from the high school inequality:

1 2 1 2 1 1
a2 + b2 + ab a2 + b2 a + b = a2 + b2 0 8a; b 2 R
2 2 2 2

We conclude that f is inner increasing. N

The reader can verify that for n = 1 inner monotonicity is equivalent to standard mono-
tonicity. When n 2, the two notions become altogether independent: in the next example
we present a inner monotone operator that is not monotone as well as a monotone operator
that is not inner monotone.

Example 1458 (i) De ne g = (g1 ; g2 ) : R2 ! R2 by

g (x1 ; x2 ) = (x2 ; x1 )

Here g1 ; g2 : R2 ! R are given by g1 (x1 ; x2 ) = x2 and g2 (x1 ; x2 ) = x1 . This operator


is, clearly, strictly decreasing, i.e., x > y implies g (x) < g (y). It is not inner monotone: for
x = (1; 0) and y = (0; 1), we have

(g (x) g (y)) (x y) = (g1 (x) g1 (y)) (x1 y1 ) + (g2 (x) g2 (y)) (x2 y2 )
= 1 1 + ( 1) 1=2

while for x = ( 1; 0) and y = (0; 1), we have

(g (x) g (y)) (x y) = (g1 (x) g1 (y)) (x1 y1 ) + (g2 (x) g2 (y)) (x2 y2 )
= 1 1+1 1= 2

Thus, g is an instance of a monotone operator which is not inner monotone.


(ii) De ne g = (g1 ; g2 ) : R2++ ! R2 by
r r
x2 x1
g (x1 ; x2 ) = ;
x1 x2
970 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
p p
Here g1 ; g2 : R2 ! R are given by g1 (x1 ; x2 ) = x2 =x1 and g2 (x1 ; x2 ) = x1 =x2 . This
operator is the derivative operator of the concave function f : (0; 1) ! R given by f (x) =
p
x1 x2 . As it will be seen later in the chapter (Theorem 1473), this ensures that g is a inner
decreasing operator. This function is not monotone: if we take x = (1; 1) and y = (4; 1), we
have x < y but the images f (x) = (1; 1) and f (y) = (1=2; 2) are not comparable. Thus, g is
an instance of a inner monotone operator which is not monotone. N

When g is inner decreasing and the vectors x and y have equal components, except for
an index i, then
xi > yi =) gi (x) gi (y) (31.24)
because in this case (g (x) g (y)) (x y) = (gi (x) gi (y)) (xi yi ). The sharper impli-
cation
xi > yi =) gi (x) < gi (y) (31.25)
holds when g is strictly inner decreasing.

Proposition 1459 Let g : C Rn ! Rn be a continuously di erentiable operator de ned


on an open convex set. Then,

(i) g is inner decreasing if and only if the Jacobian matrix Dg (x) is negative semi-de nite
for all x 2 C;

(ii) g is strictly inner decreasing if the Jacobian matrix Dg (x) is negative de nite for all
x 2 C.

This di erential criterion is the multivariable version of Propositions 1322 and 1324. A
dual \positive" version holds for inner increasing operators, as the reader can check.

Proof We only prove (i) and leave (ii) to the reader. Suppose that g is inner decreasing.
Let x 2 C and y 2 Rn . Then, for a scalar h > 0 small enough we have (g (x + hy) g (x))
((x + hy) x) 0. Since g is continuously di erentiable, we have

(g (x + hy) g (x)) ((x + hy) x)


0 lim
h!0+ h
g (x + hy) g (x)
= lim y = Dg (x) y y
h!0+ h
Since this holds for any y 2 Rn , we conclude that Dg (x) is negative semi-de nite.
Conversely, suppose that Dg (x) is negative semi-de nite at all x 2 C. Let x1 ; x2 2 C
and de ne : [0; 1] ! R by

(t) = (x1 x2 ) (g (tx1 + (1 t) x2 ) g (x2 ))

To prove that g is inner decreasing it is enough to show that (1) 0. But, (0) = 0 and
is decreasing since, for all t 2 (0; 1),
0
(t) = (x1 x2 ) Dg (tx1 + (1 t) x2 ) (x1 x2 ) 0

Hence, (1) (0) = 0.


31.2. INTERMEZZO: INNER MONOTONE OPERATORS 971

Example 1460 Consider an a ne operator f : Rn ! Rn given by f (x) = Ax + b, where


A is a n n matrix and b 2 Rn . By the last result, f is inner decreasing if and only if A
is negative semi-de nite, while it is strictly inner decreasing if A is negative de nite. This
con rms what we saw in Example 1456. N

A market demand function D : Rn+ ! Rn+ (Section 22.9) is a strictly inner decreasing
operator if, for all price vectors p 6= p0 ,

D (p) D p0 p p0 < 0

that is, if D satis es the law of demand. In this case, (31.25) takes the form

pi > p0i =) Di (p) < Di p0

which means that, ceteris paribus, a higher price of good i results in a lower demand for this
good. Inner monotonicity thus formalizes a key economic concept. Its Jacobian characteri-
zation established in the last proposition plays an important role in demand theory.
With this motivation, we turn to an interesting property of inner monotone operators.
To state it, we need a piece of terminology: a vector 2 Rn de nes the level set (g = ) =
fx 2 C : g (x) = g of an operator g : C Rn ! Rn .

Proposition 1461 The level sets of a inner monotone operator are convex.

This result generalizes the simple observation that the level sets of monotone scalar
functions are intervals. It holds trivially, however, for strictly inner monotone operators as
they are automatically injective: (g (x) g (y)) (x y) 6= 0 and x 6= y imply g (x) 6= g (y).

Proof Let g : C Rn ! Rn be a inner decreasing operator (the argument for an increasing


one is similar). For a vector 2 Rn , consider the level set (g = ). Let x; y 2 (g = ) and
2 [0; 1]. We want to show that g ( x + (1 ) y) = . Let x 6= y, otherwise this property
trivially holds. To ease notion, set = 1 . It holds

(g ( x + y) g (x)) ( x + y x) 0 and (g ( x + y) g (y)) ( x + y y) 0

that is,

(g ( x + y) g (x)) (y x) 0 and (g ( x + y) g (y)) (x y) 0

By adding up, we have

(g ( x + y) g (x)) (y x) + (g ( x + y) g (y)) (x y)
= (g ( x + y) g (x) g ( x + y) + g (y)) (y x) = 0

Thus,
(g ( x + y) g (x)) (y x) = (g ( x + y) g (y)) (x y) = 0
because two negative scalars that add up to zero are, in turn, both equal to zero. As x 6= y,
we conclude that g ( x + y) = g (x) = , as desired.
972 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

31.3 Multivariable case


Concave functions of several variables have important di erential properties. Armed with
what we learned in the Intermezzo, we now study them.
Unless otherwise stated, in the rest of this section C denotes an open and convex set in
Rn . This assumption eases the exposition, but in view of what we did in the scalar case
readers should be able to easily extend the analysis to any convex set, open or not.

31.3.1 Derivability and di erentiability


We begin by studying directional derivatives that continue to play a key role also in the
multivariable case. We introduce them for functions de ned on an open set U .

De nition 1462 A function f : U ! R is said to be derivable from the right at a point


x 2 U along the direction y 2 Rn if the limit
f (x + hy) f (x)
f+0 (x; y) = lim (31.26)
h!0+ h
exists and is nite. This limit is called the directional right derivative of f at x along the
direction y.

The function f+0 (x; ) : Rn ! R is called the directional right derivative of f at x.


In a similar manner, by considering h ! 0 we can de ne the directional left derivative
f 0 (x; ) : Rn ! R of f at x.
Clearly, f is derivable at x if and only if it is both left and right derivable at x with
f (x; ) = f+0 (x; ). In this case, we have f 0 (x; ) = f+0 (x; ) = f 0 (x; ). The following
0

duality result between the two one-sided directional derivative functions is useful.

Proposition 1463 If a function f : U ! R is derivable at x 2 U from one side, so does on


the other side. In this case

f 0 (x; y) = f+0 (x; y) 8y 2 Rn (31.27)

This result implies, inter alia, that f+0 (x; ) is superlinear if and only if f 0 (x; ) is sub-
linear.

Proof Assume that f is derivable from the right at x 2 U . For each y 2 Rn we then have:
f (x + h ( y)) f (x) f (x + ( h) y) f (x)
f+0 (x; y) = lim = lim
h!0+ h h!0 + h
f (x + hy) f (x)
= lim = f 0 (x; y)
h!0 h
So, f is derivable from the left at x, and (31.27) holds. A similar argument shows that
derivability from the left yields derivability from the right.

Next we collect few important properties of one-sided directional derivatives.

Proposition 1464 Let f : C ! R be concave. Then,


31.3. MULTIVARIABLE CASE 973

(i) the right f+0 (x; ) : Rn ! R and left f 0 (x; ) : Rn ! R directional derivatives exist at
each x 2 C;

(ii) the right directional derivative f+0 (x; ) : Rn ! R is superlinear at each x 2 C;

(iii) the left directional derivative f 0 (x; ) : Rn ! R is sublinear at each x 2 C;

(iv) f+0 (x; ) f 0 (x; ) for each x 2 C.

The proof relies on the following lemma which shows that the di erence quotient is
decreasing.

Lemma 1465 Let f : C ! R be concave. Given any x 2 C and y 2 Rn there exists " > 0
such that x + hy 2 C for all h 2 ( "; ") and the function

f (x + hy) f (x)
h7 ! (31.28)
h
is decreasing on ( "; 0) [ (0; ").

Proof Let x 2 C and y 2 Rn . If y = 0, the statement is trivially true. So, let y 6= 0. Since
C is open, there exists a ball B (x) contained in C where > 0. If we set " = =2 kyk, then
x + hy 2 C for all h 2 ( "; "). De ne g : ( "; ") ! R by g (h) = f (x + hy). Since f is
concave, the reader can easily verify that g is concave too. Next, observe that

f (x + hy) f (x) g (h) g (0)


= (31.29)
h h
for all h 2 ( "; 0) [ (0; "). Since g is concave, the same arguments used to prove (31.6) show
that the right-hand side of (31.29) is decreasing on ( "; 0) [ (0; ").

Proof of Proposition 1464 (i) In view of Proposition 1463, we can focus on the right
derivative function f+0 (x; ) : Rn ! R. By Lemma 1465, the di erence quotient is decreasing
on ( "; 0) [ (0; "), so the limit (31.26) exists

f (x + hy) f (x) f (x + hy) f (x)


lim = sup
h!0+ h h>0 h

To show that it is nite observe that the di erence quotient is decreasing on ( "; 0) [ (0; "),
thus for each h 2 (0; ")
"
f (x + hy) f (x) f x+ 2 y f (x)
" 2R
h 2

(ii) The proof of the positive homogeneity of f+0 (x; ) is analogous to that of the homo-
geneity of f 0 (x; ) in Corollary 1284. For each 2 [0; 1], we have

f (x + h ( y1 + (1 ) y2 )) f (x) (f (x + hw1 ) f (x)) + (1 ) (f (x + hw2 ) f (x))


h h
974 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Taking limits as h ! 0+ , this implies that f+0 (x; ) : Rn ! R is concave. Hence,

y1 + y2 y1 + y 2
f+0 (x; y1 + y2 ) = f+0 x; 2 = 2f+0 x;
2 2
0 0
f+ (x; y1 ) f+ (x; y2 )
2 + = f+0 (x; y1 ) + f+0 (x; y2 ) .
2 2

This shows that f+0 (x; ) : Rn ! R is superadditive, and so superlinear.


(iii) By Proposition 1463, it follows from point (ii).
(iv) Since f+0 (x; ) : Rn ! R is superlinear, by Proposition 874 we have f+0 (x; y)
f+ (x; y) for each y 2 Rn . By Proposition 1463, the result then follows.
0

The last result leads to interesting characterization of derivability via one-sided derivative
functions.

Corollary 1466 Let f : C ! R be concave. Given x 2 C, the following properties are


equivalent:

(i) f is derivable at x;

(ii) f+0 (x; ) = f 0 (x; );

(iii) f+0 (x; ) : Rn ! R is linear;

(iv) f 0 (x; ) : Rn ! R is linear.

In this case, the directional derivative function f 0 (x; ) : Rn ! R is linear, with

f 0 (x; y) = rf (x) y 8y 2 Rn (31.30)

A concave function derivable at a point has, thus, a linear directional derivative function
represented via the inner product (31.30). Since, in general, the directional derivative func-
tion is only homogeneous (Corollary 1284), it is a further noteworthy property of concavity
that the much stronger property of linearity, with its inner product representation, holds.

Proof (iv) implies (iii). Assume that f 0 (x; ) : Rn ! R is linear. By (31.27), we have, for
all y; y 0 2 Rn and all ; 2 R,

f 0 x; y + y 0 = f+0 x; y y 0 = f+0 (x; y) + f+0 x; y 0


= f+0 (x; y) f+0 x; y 0 = f 0 (x; y) + f 0 x; y 0

So, f+0 (x; ) : Rn ! R is linear.


(iii) implies (ii). Assume that f+0 (x; ) : Rn ! R is linear. Since f+0 (x; ) f 0 (x; ), we
have f+0 (x; y) f 0 (x; y) and f+0 (x; y) f 0 (x; y) for each y 2 Rn , so

f+0 (x; y) f 0 (x; y) = f 0 (x; y) f+0 (x; y) = f+0 (x; y)

This proves that f+0 (x; ) = f 0 (x; ).


31.3. MULTIVARIABLE CASE 975

(ii) implies (i). Assume that f+0 (x; ) = f 0 (x; ). By (31.27), for each y 2 Rn we have
f (x + hy) f (x) f (x + hy) f (x)
lim = f 0 (x; y) = f 0 (x; y) = lim
h!0+ h h!0 h
and so the bilateral limit
f (x + hy) f (y)
f 0 (x; y) = lim
h!0 h
exists nite. We conclude that f is derivable at x.
(i) implies (iv). Assume that f is derivable at x. In view of Proposition 1464, the
directional derivative function f 0 (x; ) : Rn ! R is linear because it is both superlinear,
being f 0 (x; ) = f+0 (x; ), and sublinear, being f 0 (x; ) = f 0 (x; ). Thus, f 0 (x; ) : Rn ! R
is linear. This completes the proof of the equivalence among conditions (i)-(iv).
Finally, assume that f is derivable (so, partially derivable) at x. By what just proved,
f 0 (x; ) : Rn ! R is linear. By Riesz's Theorem, there is a vector 2 Rn such that
0
f (x; y) = n
y for every y 2 R . Then,
@f (x)
= f 0 x; ei = ei = i 8i = 1; :::; n
@xi
Thus, = rf (x).

A remarkable property of concave functions of several variables is that for them partial
derivability and di erentiability are equivalent notions.

Theorem 1467 Let f : C ! R be concave. Given x 2 C, the following properties are


equivalent:

(i) f is partially derivable at x;


(ii) f is derivable at x;
(iii) f is di erentiable at x.

Thus, for concave functions we recover the remarkable equivalence between derivability
and di erentiability that holds for scalar functions but fails, in general, for functions of
several variables (cf. Section 27.2.1). This is another sign of the great analytical convenience
of concavity.

Proof Since (iii) implies (i) by Theorem 1268, it is enough to prove that (i) implies (ii)
and that (ii) implies (iii). (i) implies (ii). Suppose f is partially derivable at x. Then,
f+0 x; ei = f 0 x; ei for each versor ei of Rn . Let 0 6= y 2 Rn+ . By Proposition
1464, f+0 (x; ) is superlinear and f 0 (x; ) is sublinear. So, f+0 (x; 0) = f 0 (x; 0) = 0. Let
0 6= y 2 Rn+ . Since f+0 x; ei = f 0 x; ei , we have:
n
! n
! n
! n
X X yi X X y
0
f+ (x; y) = 0
yi f+ x; Pn ei
yi Pn i f+0 x; ei
y
i=1 i i=1 yi
i=1 i=1 i=1 i=1
n
! n n
! n
!
X X yi X X y i
= yi Pn f 0 x; ei yi f 0 x; Pn ei = f 0 (x; y)
i=1 i=1 i=1 yi i=1 i=1 yi
i=1
976 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

So, f+0 (x; y) = f 0 (x; y) because, again by Proposition 1464, f+0 (x; y) f 0 (x; y). We
conclude that f+ (x; ) = f (x; ) on R+ . A similar argument, based on f+0 x; ei =
0 0 n

f 0 x; ei , shows that f+0 (x; ) = f 0 (x; ) on Rn . Let y 2 Rn . De ne the positive vectors


y + = max fy; 0g and y = min fy; 0g. Since y = y + y , we have

f+0 (x; y) = f+0 x; y + y f+0 x; y + + f+0 x; y


= f 0 x; y + + f 0 x; y f 0 x; y + y = f 0 (x; y)

By Proposition 1464, we conclude that f+0 (x; y) = f 0 (x; y). In turn, this implies f+0 (x; ) =
f 0 (x; ) on Rn . By Corollary 1466, f is derivable.
(ii) implies (iii). Suppose f is derivable at x. To show that f is di erentiable at x, in
view of the last corollary we need to show that

f (x + h) f (x) rf (x) h
lim =0
h!0 khk

We omit the non-trivial proof.

If we require di erential properties on the entire domain C, not just at a point x 2 C,


we can sharpen the last result by adding a fourth remarkable equivalent property.

Theorem 1468 Let f : C ! R be concave. The following properties are equivalent:

(i) f is partially derivable;

(ii) f is derivable;

(iii) f is di erentiable

(iv) f is continuously di erentiable.

Thus, di erentiable concave functions are actually continuously di erentiable.

Proof Properties (i)-(iii) are equivalent by the last theorem. As (iv) implies (iii) by Theorem
1271, it remains to prove that (iii) implies (vi). So, let f be di erentiable. We want to show
that it is continuously di erentiable. We will actually prove that, at each x 2 C,

f 0 (x; y) = lim f 0 (xn ; y) 8y 2 Rn (31.31)

for all sequences fxn g C that converge to x. To this end, rst observe that, since f is
derivable on C, at each x 2 C it holds

f 0 (x; y) = f 0 (x; y) = f+0 (x; y) 8y 2 Rn

This key equality will be tacitly used throughout the proof. Let x 2 C. Take fxn g C such
that xn ! x. Let k > 0 be such that f+0 (x; y) < k. Then, there exists h > 0 small enough
so that
f (x + hy) f (x)
<k
h
31.3. MULTIVARIABLE CASE 977

Since f is continuous (Theorem 833), for n large enough we have

f (xn + hy) f (xn )


<k
h
Thus, f+0 (xn ; y) = limh!0+ [f (xn + hy) f (xn )] =h k. In turn, this implies

lim sup f+0 (xn ; y) k (31.32)

Set km = f+0 (x; y) + 1=m. By (31.32), lim sup f+0 (xn ; y) km for each m. Hence,

lim sup f 0 (xn ; y) = lim sup f+0 (xn ; y) lim km = f+0 (x; y) = f 0 (x; y)
m!1

By (31.27), we then have

lim inf f 0 (xn ; y) = lim inf f 0 (xn ; y) = lim inf f+0 (xn ; y)
= lim sup f+0 (xn ; y) f+0 (x; y) = f 0 (x; y) = f 0 (x; y)

We conclude that
lim sup f 0 (xn ; y) f 0 (x; y) lim inf f 0 (xn ; y)
that is,
lim sup f 0 (xn ; y) = f 0 (x; y) = lim inf f 0 (xn ; y)
In view of Proposition 412, this implies (31.31).

31.3.2 A key inequality


To state the multivariable version of the key inequality (31.9), we take a closer look at mul-
tivariable concavity. Intuitively, the concavity of a function f : C ! R de ned on a convex
set of Rn is closely related to its concavity on all line segments ftx + (1 t) y : t 2 [0; 1]g
determined by vectors x and y that belong to C. Proposition 1470 will make precise this
intuition that is important both conceptually, to better understand the scope of concavity,
and operationally since the restrictions on line segments of f are scalar functions, in general
much easier to study than the original function f .
Given a convex set C and x; y 2 C, set Cx;y = ft 2 R : (1 t) x + ty 2 Cg. That is, Cx;y
is the set of all t values such that (1 t) x + ty 2 C. Clearly, [0; 1] Cx;y . Moreover, we
have the following property (under our maintained hypothesis that C is an open convex set),
as the reader can prove.

Lemma 1469 Cx;y is an open interval.

De ne x;y : Cx;y ! R by

x;y (t) = f ((1 t) x + ty) (31.33)

Proposition 1470 For a function f : C ! R, the following properties are equivalent:

(i) f is (strictly) concave;

(ii) x;y is (strictly) concave for all x; y 2 C;


978 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

(iii) x;y is (strictly) concave on [0; 1] for all x; y 2 C.

Proof We consider the concave case, and leave to the reader the strictly concave one. (i)
implies (ii). Suppose f is concave. Let x; y 2 C and t1 ; t2 2 Cx;y . Then, for each 2 [0; 1],

x;y ( t1 + (1 ) t2 ) = f ((1 ( t1 + (1 ) t2 )) x + ( t1 + (1 ) t2 ) y)
= f ( ((1 t1 ) x + t1 y) + (1 ) ((1 t2 ) x + t2 y))
f ((1 t1 ) x + t1 y) + (1 ) f ((1 t2 ) x + t2 y)
= x;y (t1 ) + (1 ) x;y (t2 )
and so x;y is concave.
Since (ii) trivially implies (iii), it remains to prove that (iii) implies (i). Let x; y 2 C.
Since x;y is concave on [0; 1], we have
f ((1 t) x + ty) = x;y (t) t x;y (1) + (1 t) x;y (0) = (1 t) f (x) + tf (y)
for all t 2 [0; 1], as desired.

The previous result permits to establish the sought-after multivariable inequality.

Theorem 1471 Let f : C ! R be di erentiable at x 2 C. If f is concave, then


f (y) f (x) + rf (x) (y x) 8y 2 C (31.34)
If f is strictly concave, the inequality is strict when x 6= y.

Proof Let f be concave. Fix x; y 2 C. Let x;y : Cx;y ! R be given by (31.33). By Lemma
1469, Cx;y is an open interval, and by Proposition 1470 the function x;y is concave on Cx;y .
Since f di erentiable at x, note that:14
(") (0) f (x + " (y x)) f (x)
lim = lim
"!0 " "!0 "
where the latter limit exists and is nite. So, is di erentiable at 0 2 Cx;y . Since [0; 1]
Cx;y , by (31.9) we have
0
(1) (0) + (0) = (0) + f 0 (x; y x)
i.e., f (y) f (x) + rf (x) (y x) (Theorem 1287). So, the inequality (31.34) holds. We
leave to the reader the strictly concave part.

31.3.3 Concavity criteria


So far we considered the di erentiability properties of concave functions of several variables.
We now change angle and ask whether, given a di erentiable function of several variables,
there exist some criteria based on di erentiability that allow us to determine whether the
function is concave. For instance, is there a multivariable counterpart of the property of
decreasing monotonicity of the rst derivative?
The key inequality (31.34) permits to establish a rst di erential characterization of
concavity that extends Theorem 1433 to functions of several variables.
14
To ease notation, in the rest of the proof we use in place of x;y .
31.3. MULTIVARIABLE CASE 979

Theorem 1472 Let f : C ! R be di erentiable. Then, f is concave if and only if

f (y) f (x) + rf (x) (y x) 8x; y 2 C (31.35)

while f is strictly concave if and only if inequality (31.35) is strict when x 6= y.

The right-hand side of (31.35) is the linear approximation of f at x; geometrically, it


is the hyperplane tangent to f at x, that is, the multivariable version of the tangent line.
By this theorem, such approximation is from above, that is, the tangent hyperplane always
lies above the graph of a concave function. The di erential characterizations of concavity
discussed in the previous section for scalar functions, thus nicely extend to functions of
several variables.

Proof The \only if" follows from (31.34). As to the converse, suppose that (31.35) holds.
For each x 2 C, consider the function Fx : C ! R given by Fx (y) = f (x) + rf (x) (y x).
By (31.35), f (y) Fx (y) for all x; y 2 C. Since Fx (x) = f (x), we conclude that f (y) =
minx2C Fx (y) for each y 2 C. Since each Fx is a ne, we conclude that f is concave since,
as the reader can check, a function that is a minimum of a family of concave functions is
concave. We leave to the reader the strictly concave part.

Though conceptually important, the previous di erential characterization of concavity


is less useful operationally. In this regard, the next result is more useful. It shows that
inner monotonicity is the multivariable counterpart of the decreasing monotonicity of the
rst derivative that, in the scalar case, characterizes concavity (Corollary 1436). Note that
for functions of several variables the derivative function f 0 becomes the derivative operator
rf : C ! Rn (Section 27.1.3).

Theorem 1473 Let f : C ! R be di erentiable. Then,

(i) f is concave if and only if the derivative operator rf : C ! Rn is inner decreasing,


i.e.,
(rf (y) rf (x)) (y x) 0 8x; y 2 C (31.36)

(ii) f is strictly concave if and only if rf : C ! Rn is strictly inner decreasing, i.e., the
previous inequality is strict when x 6= y.

This result is a major motivation for the study of inner monotonicity. Yet, not all inner
monotone operators are gradient operator of concave functions: for instance, the operator
in Example 1457 is inner monotone but not a gradient operator, as the reader can check.

Proof (i) Suppose f is concave. Let x; y 2 C. By (31.35),

f (y) f (x) + rf (x) (y x) and f (x) f (y) + rf (y) (x y)

So, rf (x) (x y) f (x) f (y) rf (y) (x y). In turn, this implies (rf (x) rf (y))
(x y) 0 and we conclude that rf : C ! Rn is inner decreasing.
Conversely, suppose rf : C ! Rn is inner decreasing, i.e., (31.36) holds. Suppose rst
that n = 1. Let x 2 C, and de ne x : C ! R by x (y) = f (y) f (x) rf (x) (y x).
980 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Then, 0x (y) = rf (y) rf (x), and so 0


x (y) 0 if y < x and 0
x (y) 0 if y > x. Hence,
x has a minimum at x, i.e.,

0= x (x) x (y) = f (y) f (x) rf (x) (y x) 8y 2 C

Since x was arbitrary, we conclude that f (y) f (x) + rf (x) (y x) for all x; y 2 C. By
Theorem 1472, f is concave. This completes the proof for n = 1.
Suppose now that n > 1. Let x; y 2 C and let x;y : Cx;y ! R be given by (31.33). By
Lemma 1469, Cx;y is an open interval, with [0; 1] Cx;y . Then, x;y is di erentiable on Cx;y ,
with
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y (31.37)
Let t2 t1 2 Cx;y . Since rf is inner decreasing, we then have

0 (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) ((1 t1 ) x + t1 y ((1 t2 ) x + t2 y))


= (t2 t1 ) (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (x y)

and so, by (31.37),


0 0
0 (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (x y) = x;y (t2 ) x;y (t1 )

and we conclude that 0x;y (t1 ) 0


x;y (t2 ), i.e.,
0
x;y is decreasing on Cx;y . By what already
proved, x;y is then concave, and so

f ((1 t) x + ty) = x;y (t) (1 t) x;y (0) + t x;y (1) = (1 t) f (x) + tf (y)

This shows that f is concave.


(ii) Assume that f is strictly concave. Thus, f is concave. By the previous part of the
proof, we have that

(rf (y) rf (x)) (y x) 0 8x; y 2 C:

Consider now x; y 2 C such that x 6= y. By Proposition 1470, x;y is strictly concave and
di erentiable on the open interval Cx;y . By Corollary 1436, 0x;y is strictly decreasing and
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y
0
Since x;y is strictly decreasing, we have that
0 0
(rf (y) rf (x)) (y x) = x;y (1) x;y (0) < 0

proving one implication. Assume now (31.36) holds with < whenever x 6= y. Let x; y 2 C
be such that x 6= y and consider one more time the function x;y : Cx;y ! R. Again, observe
that x;y is di erentiable on Cx;y
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y

Let t2 > t1 2 Cx;y . By assumption and since (1 t1 ) x + t1 y 6= (1 t2 ) x + t2 y, we get

0 > (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) ((1 t 1 ) x + t1 y ((1 t2 ) x + t2 y))


= (t1 t2 ) (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (y x)
31.3. MULTIVARIABLE CASE 981

yielding that
0 0
0 < (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (y x) = x;y (t1 ) x;y (t2 )

and we conclude that 0x;y (t2 ) < 0x;y (t1 ), i.e., 0x;y is strictly monotone decreasing on Cx;y .
By Corollary 1436 and Proposition 1470, x;y is strictly concave and so is f .

A dual result, with opposite inequality, characterizes convex functions. The next result
makes truly operational this characterization via a condition of negativity on the Hessian
matrix r2 f (x) of f { that is, the matrix of second partial derivatives of f { which generalizes
the condition f 00 (x) 0 of Corollary 1438. In other words, in the general case the role of
the second derivative is played by the Hessian matrix.

Proposition 1474 Let f : C ! R be twice continuously di erentiable. Then:

(i) f is concave if and only if r2 f (x) is negative semi-de nite for every x 2 C;

(ii) f is strictly concave if r2 f (x) is negative de nite for every x 2 C.

Proof The result follows from Proposition 1459 once one remembers that the Hessian matrix
of a function of several variables is the Jacobian matrix of its derivative operator (Exercise
1292). So, the Hessian matrix r2 f (x) of f is the Jacobian matrix of the derivative operator
rf : C ! Rn , which plays here the role of g in Proposition 1459.

This is the most useful di erential criterion to establish concavity and strict concavity
for functions of several variables. Naturally, dual results hold for convex functions, which
are characterized by having positive semi-de nite Hessian matrices.

Example 1475 (i) In Example 1206 we considered the function f : R3 ! R given by

f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2

and we saw how its Hessian matrix was positive de nite. By Proposition 1474, f is strictly
convex.
(ii) Consider the CES production function f : R2+ ! R de ned by
1
f (x) = ( x1 + (1 ) x2 )

with 2 [0; 1] and > 0 (Example 865). Some tedious algebra shows that the Hessian
matrix is 1
2 2 2
r2 f (x) = (1 ) (1 )t x1 x2 H
where t = x1 + (1 ) x2 and

x22 x1 x2
H=
x1 x2 x21

If = ( 1; 2 ), we have

H = x22 2
1 2x1 x2 1 2 + x21 2
2 = (x2 1 x1 2 )2 0
982 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Thus, the matrix H is positive semi-de nite. It follows that for > 1 the matrix r2 f (x)
is positive semi-de nite for all x1 ; x2 > 0, so by Proposition 1474 f is convex. While f is
concave when 0 < < 1.
In Corollary 880 we already established the concavity of the CES functions without doing
any calculation. Readers can compare the pros and cons of the two approaches. N

The next example considers an important class of functions and shows how fruitful it
can be to integrate the di erential criteria of the last result and some ad hoc reasoning that
exploits the speci c features of the functions at hand.

Example 1476 Given a square symmetric matrix A of order n, a vector b 2 Rn and a scalar
c 2 R, de ne the linear-quadratic function f : Rn ! R by

1
f (x) = x Ax + b x + c
2

It becomes a multivariable quadratic function when b = 0 and c = 0. Since r2 f (x) = A


(why?), by Proposition 1474 the function f is concave if and only if A is negative semi-
de nite. It also true that f is strictly concave if and only if A is negative de nite. The \if"
follows again from Proposition 1474. As to the \only if", suppose by contradiction that f is
strictly concave and A is negative semi-de nite but not negative de nite. Thus, there is a
vector 0 6= x 2 Rn such that x Ax = 0, and so for all 2 [0; 1] we have

f ( x + (1 ) 0) = f ( x) = b ( x) + c = (b x) + c + (1 ) c = f (x) + (1 ) f (0)

because f (0) = c. This contradicts the strict concavity of f , as desired. N

31.4 Ultramodular functions


The monotonicity of the increments is a key economic characterization of concave and convex
scalar functions (Section 31.1.1). Unfortunately, such characterization no longer holds for
functions of several variables, as it will seen momentarily. This motivates the next de nition.

De nition 1477 A function f : I Rn ! R is said to be ultramodular if, for all x; y 2 I


with x y and for all h 0, we have

f (x + h) f (x) f (y + h) f (y)

provided x + h; y + h 2 I, while it is said to be inframodular if the inequality is reversed.

In words, ultramodular functions exhibit increasing di erences { so increasing marginal


e ects, like scalar convex functions. Unlike the weaker De nition 933, they do not consider
such di erences only across di erent variables, but consider any possible increase h 0.
Similarly, inframodular functions exhibit increasing di erences, so decreasing marginal
e ects like scalar concave functions (Proposition 1424). Clearly, f is ultramodular if and
only if f is inframodular, so the two properties are dual and results stated for one are
easily translated for the other.
31.4. ULTRAMODULAR FUNCTIONS 983

Ultramodular functions are supermodular. Indeed, from the equality (20.1), we can set
h = x _ y y = x x ^ y 0. So, if f is ultramodular, we have

f (x) f (x ^ y) = f (x ^ y + h) f (x ^ y) f (y + h) f (y) = f (x _ y) f (y)

which implies that f is supermodular. The converse is false: for instance, the function
p
f (x1 ; x2 ) = x1 x2 is supermodular but not ultramodular (Example 1482).
The next result further clari es the relations between supermodularity and ultramodu-
larity.

Theorem 1478 If f : [a; b] Rn ! R is supermodular and separately convex,15 then it is


ultramodular. The converse holds provided f is locally bounded from below at a.

Proof Let f be ultramodular. Then, it is supermodular. It is easy to check that f ( ; x i ) :


[ai ; bi ] ! R is ultramodular. By Proposition 1424, the section f ( ; x i ) : [ai ; bi ] ! R is
convex. We omit the proof of the converse.

Earlier in the chapter we learned the remarkable di erential properties of concave func-
tions (Section 31.3). It is useful to compare them with those of inframodular functions, which
are also sharp (inframodular functions are, indeed, much better behaved that submodular
functions).16 A rst important result is that, like for concave functions (Theorem 1467), also
for inframodular functions partial derivability is equivalent to di erentiability.17

Proposition 1479 A bounded and inframodular function f : (a; b) Rn ! R is partially


derivable if and only if it is di erentiable.

Next, we consider a di erential criterion for inframodularity.

Proposition 1480 A partially derivable f : (a; b) Rn ! R is inframodular if and only if


the derivative operator rf : (a; b) Rn ! Rn is decreasing, i.e.,

x y =) rf (x) rf (y) 8x; y 2 (a; b)

Now the standard monotonicity of the derivative operator characterizes inframodularity,


while inner monotonicity characterized concavity (Theorem 1473).

Proposition 1481 A twice continuously di erentiable f : (a; b) Rn ! R is inframodular


if and only if the Hessian matrix r2 f (x) is negative, i.e.,

@f (x)
0 8i; j = 1; :::; n
@xi @xj
15
That is, each section f ( ; x i ) : [ai ; bi ] ! R is convex in xi .
16
We omit the proofs of these di erentiability results (their inframodular, rather than ultramodular, is
self-explanatory).
17
In reading the result, recall from Section 2.3 that (a; b) = fx 2 Rn : ai < xi < bi g.
984 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Again, a standard negativity condition on the Hessian matrix characterizes inframodular-


ity, while for concave functions we needed a notion of monotonicity based on quadratic forms
(Proposition 1474). Note that submodularity requires this negativity property only when
i 6= j. This di erential characterization thus sheds further light on the relations between
sub/supermodularity and infra/ultramodularity.

The di erential characterizations established in the last two results show that, unlike the
scalar case, inframodularity and concavity are quite unrelated properties in the multivariable
case, as we remarked at the beginning of this section.
Example 1482 (i) De ne f : R2+ ! R by f (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. This
function is supermodular (Example 937). Its Hessian matrix is
2 3
1( 1 1) x1 1 2 x2 2 1
1 2
1 1 2
6 x1 x2 7
r2 f (x) = 4 5
1 2 2
1
1 2
1 1 2 2 ( 2 1) x1 x2
x1 x2

So, f is ultramodular if and only if 1 ; 2 1. (ii) In view of the previous point, the
2 p
concave and supermodular function f : R+ ! R de ned by f (x1 ; x2 ) = x1 x2 is neither
ultramodular nor inframodular. (iii) The convex function f : R2 ! R de ned by f (x1 ; x2 ) =
log (ex1 + ex2 ) is neither ultramodular nor inframodular: its Hessian matrix is
2 ex1 ex1 +x2
3
ex1 +ex2 (ex1 +ex2 )2
r2 f (x) = 4 5
ex1 +x2 ex1
(ex1 +ex2 )2 ex1 +ex2

This function is, however, submodular. N

31.5 Global optimization


31.5.1 Su ciency of the rst-order condition
Though the rst-order condition is in general only necessary, in Section 22.6 we saw that
the maximizers of concave functions are necessarily global (Fenchel's Theorem). We may
then expect that for concave functions the rst-order condition may come to play a decisive
role. Indeed, the results studied in this chapter allow us to show that for concave functions
the rst-order condition is also su cient. In other words, a stationary point of a concave
function is, necessarily, a global maximizer. It is a truly remarkable property of concave
functions, a main reason behind their popularity.
To ease matters, we start by considering a scalar concave function f : (a; b) ! R that is
di erentiable. The inequality (31.10), that is,
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b)
implies that a point x^ 2 (a; b) is a global maximizer if f 0 (^
x) = 0. Indeed, if x
^ 2 (a; b) is such
0
that f (^x) = 0, the inequality implies
f (y) x) + f 0 (^
f (^ x) (y x
^) = f (^
x) 8y 2 (a; b)
^ 2 (a; b) is a maximizer, it follows that f 0 (^
On the other hand, if x x) = 0 by Fermat's
Theorem. Therefore:
31.5. GLOBAL OPTIMIZATION 985

Proposition 1483 Let f : (a; b) ! R be a concave and di erentiable function. A point


^ 2 (a; b) is a global maximizer of f on (a; b) if and only if f 0 (^
x x) = 0.

Example 1484 (i) Consider the function f : R ! R given by f (x) = (x+1)4 +2. We have
f 00 (x) = 12(x + 1)2 < 0. The function is concave on R and it is therefore su cient to nd
a point where its rst derivative is zero to nd a maximizer. We have f 0 (x) = 4(x + 1)3 .
Hence f 0 is zero only at x^ = 1. The point x ^ = 1 is the unique global maximizer, and the
maximum value of f on R is f ( 1) = 2.
(ii) Consider the function f : R ! R given by f (x) = x (1 x). Because f 0 (1=2) = 0
and f 00 (x) = 2 < 0, the point x ^ = 1=2 is the unique global maximizer of f on R. N

The last proposition easily extends to functions f : A Rn ! R of several variables using


the multivariable version (31.35) of inequality (31.10). We can therefore state the following
general result.

Theorem 1485 Let f : C ! R be a concave function di erentiable on int C and continuous


on C. A point x
^ 2 int C is a global maximizer of f on C if and only if rf (^
x) = 0.

Proof In view of Fermat's Theorem, we need to prove the \if" part, that is, su ciency. So,
let x
^ 2 int C be such that rf (^
x) = 0. We want to show that x^ is a global maximizer. By
inequality (31.34), we have

f (y) f (^
x) + rf (^
x) (y x) 8y 2 int C

Since f is continuous, the inequality is easily seen to hold for all y 2 C. Since rf (^
x) = 0,
we conclude that f (y) f (^ x) for all y 2 C, as desired.

It is hard to overestimate the importance of this result in optimization theory, as we will


learn later in the book in Section 37.

Example 1486 Consider the function f : R2 ! R given by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have

rf (x1 ; x2 ) = 2(x1 1) 2(x2 + 3)

The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N

Example 1487 In Section (22.10) we considered the least squares optimization problem

max g (x) sub x 2 Rn (31.38)


x

with g : Rn ! R de ned by g (x) = kAx bk2 . We learned that if (A) = n, then


there is a unique solution x
^ (Theorem 1057). In Section 24.4 we then noted, via the Pro-
1 T
^ = AT A
jection Theorem, that such solution is given by x A b. This can established
986 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

also from Theorem 1485. Indeed, rg (x) = 2AT (Ax b) and so the rst-order condition
2AT (Ax b) = 0 can be written as a linear system

AT Ax = AT b (31.39)

Since (A) = n, by Proposition 692 we have AT A = n, so the Gram matrix is invertible.


1 T
^ = AT A
By Cramer's Theorem, x A b is the unique solution of the linear system (31.39),
so by Theorem 1485 the only solution of the optimization problem (31.38). N

Example 1488 Let A be a square and negative de nite symmetric matrix of order n. Con-
sider the quadratic optimization problem

max f (x) sub x 2 Rn


x

where f : Rn ! R is the linear-quadratic function de ned by


1
f (x) = x Ax + b x + c
2
with b 2 Rn and c 2 R. This function is strictly concave because A is negative de nite
(cf. Example 1476). So, there is at most a unique maximizer (Theorem 1032). Since
rf (x) = Ax + b (why?), the rst-order condition is the linear system

Ax + b = 0

Since A is invertible (cf. Proposition 1195), by Theorem 1485 we conclude that the unique
maximizer is x^ = A 1 b. N

These examples show, inter alia, that to optimize quadratic functions amounts to solving
linear equations, the simplest and best understood class of equations. This explains the
popularity of quadratic optimization problems.
We close by noting that for scalar functions f : (a; b) ! R, where C = (a; b), the last
theorem also follows from Proposition 1343. That said, the last theorem is the result used in
applications because of the conceptual and analytical appeal of concavity (cf. the discussion
that ends Section 28.5.4).

31.5.2 A deeper result


A function f : C ! R de ned on a convex set of Rn is called (weakly) concavi able if there
exists a concave function g : C ! R that dominates f , that is, g f . As the examples below
will show, concavi ability is a much weaker condition than concavity.

Proposition 1489 Let f : C ! R be concavi able. Then there exists a concave function
co f : C ! R such that

(i) co f f;

(ii) h co f for all concave function h : C ! R such that h f.


31.5. GLOBAL OPTIMIZATION 987

In words, a concavi able function admits a smallest concave function that pointwise
dominates it.

Proof Let fgi gi2I be the collection of all concave functions gi : C ! R such that gi f.
This collection is not empty because f is concavi able. De ne co f : C ! R by

co f (x) = inf gi (x)


i2I

For each x 2 C, the scalar f (x) is a lower bound for the set fgi (x) : i 2 Ig. By the Least
Upper Bound Principle inf i2I gi (x) exists, so the function co f is well-de ned. It is easily
seen to be concave. Indeed, let 2 [0; 1] and x; y 2 C. By Proposition 127, for each " > 0
there exists i" such that

co f ( x + (1 ) y) > gi" ( x + (1 ) y) " gi" (x) + (1 ) gi" (y) "


co f (x) + (1 ) co f (y) "

Since this inequality holds for every " > 0, we conclude that co f ( x + (1 ) y) co f (x)+
(1 ) co f (y), so co f is concave. In turn, this implies that co f (x) = mini2I gi (x). In par-
ticular, co f satis es properties (i) and (ii).

The function co f is called the concave envelope of f .

Example 1490 (i) Both the sine and cosine functions are concavi able. Their concave
envelope is constant to 1, i.e., co sin (x) = co cos (x) = 1 for all x 2 R. (ii) Let f : R ! R be
2
the Gaussian function f (x) = e x . It is concavi able with
8 h i
< f (x) x 2 p1 ; p1
2 2
co f (x) =
: 1
e 2 else

(iii) The quadratic function is not concavi able on the real line. (iv) Functions that have
at least a global maximizer are automatically concavi able: just take the function constant
to the maximum value. For instance, continuous supercoercive functions f : Rn ! R are
concavi able. N

Concavi ability permits to generalize the fundamental Theorem 1485.

Theorem 1491 Let f : C ! R be a concavi able function di erentiable on int C and


continuous on C. A point x
^ 2 int C is a global maximizer of f on C if and only if rf (^
x) = 0
and co f (^
x) = f (^
x).

This remarkable result shows how concavity is deeply connected to global maximization,
more than it may appear prima facie. It is a result, however, mostly of theoretical interest
because concave envelopes are, in general, not easy to compute. Indeed, Theorem 1485 can
be regarded as its operational special case.
The proof relies on two elegant lemmas of independent interest.

Lemma 1492 Let f; g : C ! R with g concave and g f . If f is di erentiable at x 2 int C


and if g (x) = f (x), then g is di erentiable at x with rf (x) = rg (x).
988 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Proof Assume that f is di erentiable at x 2 int C. We have, for h small enough,


g (x + hy) g (x) f (x + hy) f (x)
8h > 0
h h
g (x + hy) g (x) f (x + hy) f (x)
8h < 0
h h
So, for all y 2 Rn we have:

0 g (x + hy) g (x) f (x + hy) f (x)


g+ (x; y) = lim lim
h!0+ h h!0+ h
f (x + hy) f (x) g (x + hy) g (x)
= lim lim = g 0 (x; y)
h!0 h h!0 h
By Proposition 1464-(iv), we conclude that g+ 0 (x; ) = g 0 (x; ) = f 0 (x; ). By Corollary

1466, we conclude that g is di erentiable, as well as that rf (x) = rg (x).

Lemma 1493 Let f : C ! R be concavi able. If a point x ^ of C is a global maximizer of f


on C, then it is a global maximizer of co f on C. In particular, co f (^
x) = f (^
x).

In words, global maximizers of concavi able functions are global maximizers of their
concave envelopes and they share the same maximum value.

Proof Let x^ 2 C be a global maximizer of f on C. The function constant to f (^


x) is a
concave function that pointwise dominates f . So,

f (^
x) co f (x) 8x 2 C (31.40)

In particular, we then have f (^x) co f (^


x) f (^ x), thus co f (^
x) = f (^x). In view of (31.40),
in turn this implies that co f (^
x) co f (x) for all x 2 C, so x^ is a global maximizer of co f
on C.

Proof of Theorem 1491 \If". By hypothesis, f is di erentiable at x ^ 2 int C. Since


co f (^
x) = f (^
x), by Lemma 1492 the convex envelope is di erentiable at x ^ with r co f (^x) =
rf (^x). So, r co f (^
x) = 0. Since f is continuous, by proceeding as in the proof of Theorem
1485 we can show that inequality (31.34) implies that x^ is a global maximizer of co f . Hence,

f (^
x) = co f (^
x) co f (x) f (x) 8x 2 C

We conclude that x^ is a global maximizer of f .


\Only if". Let x ^ 2 int C be a global maximizer of f on C. By Lemma 1493, x ^ is a
global maximizer of co f on C, with co f (^
x) = f (^
x). By Lemma 1492, co f is di erentiable
at x
^ with r co f (^
x) = rf (^ x). By Fermat's Theorem, r co f (^
x) = 0. We conclude that
rf (^x) = 0.

In view of Lemma 1493, in optimization problems with convex choice sets { e.g., consumer
problems since budget sets are convex { in terms of value attainment one can assume that
the objective function be concave. Thus, if in such problems we are interested only in the
value functions, without any loss we can just deal with concave objective functions.
31.5. GLOBAL OPTIMIZATION 989

This is no longer the case, however, if we are interested also in the solutions per se, i.e.,
in the solution correspondence. Indeed, in this regard Lemma 1493 says only that

arg max f (x) arg max co f (x)


x2C x2C

So, by replacing an objective function with its concave envelope we do not lose solutions,
but we might well get intruders that solve the concavi ed problem but not the original one.
To understand the scope of this issue, note that co (arg maxx2C f (x)) arg maxx2C co f (x)
because the solutions of a concave objective function form a convex set. Thus, the best one
can hope is that
co arg max f (x) = arg max co f (x)
x2C x2C

Even in such best case, there might well be many vectors that solve the optimization problem
for the concave envelope co f but not for the original objective function f . We thus might
end up overestimating the solution correspondence. For instance, if in a consumer problem
we replace a utility function with its concave envelope, we do not lose any optimal bundle
but we might well get \extraneous" bundles, optimal for the concave envelope but not for
the original utility function. For an analytical example, if we maximize the cosine function
over the real line, the maximizers are the points x ^ = 2k with k 2 Z (Example 976). If we
replace the cosine function with its concave envelope, the maximizers become all the points
of the real line. So, the solution set is vastly in ated. Still, the common maximum value is
1.
A nal remark. There is a dual notion of convex envelope of a function as the largest
dominated convex function, relevant for minimization problems (the reader can establish the
dual version of Theorem 1491).

31.5.3 A fundamental logarithmic inequality


As a dividend of the results of Section 31.5.1, next we prove an important logarithmic
inequality.

Proposition 1494 For all x > 0 it holds


x 1
log x x 1 (31.41)
x
with equality if and only if x = 1.

This inequality is sometimes stated in the equivalent form x= (1 + x) log (x + 1) x


for all x > 1.

Proof De ne the auxiliary convex function f : (0; 1) ! R by f (x) = 1 + x log x. We


have
1
f 0 (x) = 1 = 0 () x = 1
x
By the dual version for convex functions of Proposition 1483, we conclude that x ^ = 1 is the
global minimizer of f (the fact that its domain is an unbounded interval is immaterial, as
the reader can easily check). So, f (x) f (1) = 0 for all x > 0, which implies log x x 1
990 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

for all x > 0. Now, by applying this inequality to 1=x we get log (1=x) (1=x) 1 for all
x > 0, which implies (x 1) =x log x. This completes the proof of (31.41), as we leave to
the reader to check the equality part.

We can sharpen the last inequality through the following nding of Jovan Karamata: for
all x > 0 it holds
log x 1
p
x 1 x
This inequality permits to re ne (31.41) once we consider separately the two cases when
x is greater or lower than 1. Indeed, for 0 < x 1 we have
x 1
p log x x 1 (31.42)
x
while for x 1 we have
x 1 x 1
log x p (31.43)
x x
At the cost of a slightly more complicated expression, with two cases to consider, we thus
get sharper bounds for the logarithm. Next we illustrate them.

Example 1495 (i) De ne f : R+ ! R by f (x) = x log x with the convention 0 log 0 = 0,


that is, set (
x log x if x 6= 0
f (x) = (31.44)
0 if x = 0
This function is continuous and strictly convex. As to continuity, observe that f is continuous
at each x0 > 0 because on (0; 1) it is the product of the continuous functions x and log x.
To check that it is continuous also at the origin, let fxn g be a sequence of positive scalars
that converges to 0. Without loss of generality, assume that 0 xn 1 for all n 1. By
the inequality (31.42) we have, for each n 1, the sandwich
p
xn (xn 1) xn log xn xn (xn 1)

So, lim f (xn ) = 0 = f (0). By the sequential characterization of continuity (Proposition


552), we conclude that f is continuous at 0.
The function f is also strictly convex. Indeed, it is strictly convex on (0; 1) because
00
f (x) = 1=x > 0 for all x > 0 (the dual of Corollary 1438). Since it is continuous, f is then
strictly convex on [0; 1) (why?). P
(ii) De ne FP: Rn+ ! R by F (x) = ni=1 xi log xi with the convention 0 log 0 = 0, that
is, set F (x) = ni=1 f (xi ), where f : R+ ! R is given by (31.44). This function extends
the entropy (Example 239) on the entire positive orthant. Since f is continuous, also F
is continuous (cf. Example 558-(iii)). Similarly, F inherits the strict convexity of f . We
conclude that the entropy function de ned on Rn+ is continuous and strictly convex.
(iii) Inequalities (31.42) and (31.43) readily imply that, for x > 0,
p
log x x 1

Though less sharp, this simple inequality may su ce in some applications. N


31.6. CODA: STRONG CONCAVITY 991

Karamata's logarithmic inequality is part of the following 1949 nice result,18 with a
third and fourth order sharpening of Karamata's inequality that are, however, increasingly
complex.

Theorem 1496 (Blanusa-Karamata) For all x > 0 it holds


1 1 1
log x 7 + 16x 4 + 7x 2 1 + x3 1
1 1 3 5 1 p
x 1 7x 4 x + 18x
2 4 x + 7x 4 x+x 3 x

31.6 Coda: strong concavity


In this nal coda section we introduce a strong form of concavity that will turn out to have
remarkable optimality properties.

De nition 1497 A function f : C ! R de ned on a convex set of Rn is said to be strongly


concave if there exists k > 0 such that the function g : C ! R de ned by

g (x) = f (x) + k kxk2

is concave.

The next result shows the strength of this notion of concavity.

Proposition 1498 Strongly concave functions are strictly concave.

So,
strong concavity =) strict concavity =) concave
Intuitively, a strongly concave function is \so concave" that it remains concave even if added
with the quadratic, so strictly convex, function k kxk2 . Note that the sum of a concave
function and of a strongly concave function is strongly concave, so there is a simple way to
construct strongly concave functions.

Proof Let f : C ! R be strongly concave. By de nition, there exists k > 0 such that the
2
function g : C ! R de ned by
Pgn (x)2= f (x) + k kxk is concave. Let x; y 2 C, with x 6= y,
2
and 2 (0; 1). Since kxk = i=1 xi is strictly convex, we have

f ( x + (1 ) y) = g ( x + (1 ) y) k k x + (1 ) yk2
> g (x) + (1 ) g (y) k kxk2 + (1 ) kyk2
= f (x) + (1 ) f (y)

as desired.

Strong concavity is, thus, a strong version of strict concavity. The next result shows the
great interest of such stronger version.
18
Reported on pp. 77-78 (Karamata) and 156-157 (Blanusa) of the 1949 issue of the Bull. Soc. Math.
Phys. Serbie (see also Mitrinovic, 1970, p. 272).
992 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Proposition 1499 Let f : C ! R be strongly concave and upper semicontinuous on a closed


convex set of Rn . Then, f is coercive (supercoercive when C = Rn ).

In Example 1010 we showed that the function f (x) = 1 x2 is coercive. Since this
function is easily seen to be strongly concave, the example can be now seen as an illustration
of the proposition just stated.
The proof relies on a lemma of independent interest.

Lemma 1500 An upper semicontinuous and concave function f : C ! R admits a domi-


nating a ne function r : C ! R, i.e., r f .

Proof Since f is concave and upper semicontinuous, the convex set hypo f is closed. For,
let f(xn ; tn )g hypo f be such that (xn ; tn ) ! (x; t) 2 Rn+1 . We need to show that (x; t) 2
hypo f . By de nition, tn f (xn ) for each n 1, so t = lim tn lim sup f (xn ) f (x)
because f is upper semicontinuous. This shows that (x; t) 2 hypo f .
Let (x0 ; t0 ) 2
= hypo f , with x0 2 C and t0 > f (x0 ). By Proposition 1025, hypo f and
(x0 ; t0 ) are strongly separated, that is, there exist (a; c) 2 Rn+1 , b 2 R and " > 0 such that

a x0 + ct0 b+">b a x + ct 8 (x; t) 2 hypo f (31.45)

We have c > 0. For, suppose that c = 0. Then, a x0 b + " > b a x for all x 2 C, so
in particular a x0 > a x0 by taking x = x0 , a contradiction. Next, suppose c < 0. Again
by taking x = x0 and t = f (x0 ), from (31.45) it follows that ct0 > cf (x0 ). So t0 < f (x0 ),
which contradicts t0 > f (x0 ).
In sum, c > 0. Without loss of generality, set c = 1. De ne the a ne function r : C ! R
by r (x) = a (x0 x) + t0 . We then have r (x) t for all (x; t) 2 hypo f . In particular, this
is the case for (x; f (x)) for all x 2 C, so r (x) f (x) for all x 2 C. We conclude that r is
the sought-after a ne function.

Proof of Proposition 1499 We rst show that every upper contour set (f t) is bounded.
If C is bounded, then the statement is trivially true, being (f t) a subset of C. Otherwise,
suppose, by contradiction, that there exists an unbounded sequence fxn g (f t), i.e.,
such that kxn k ! +1. Since f is strongly concave, there exists k > 0 such that the
function g (x) = f (x) + k kxk2 is concave. Since f is upper semicontinuous, so is g. Since
g is concave and upper semicontinuous, by the previous lemma there is an a ne function
r : C ! R, with r (x) = a x + b for some a 2 Rn and b 2 R, such that r g. This implies
that a xn + b f (xn ) + k kxn k2 for all n. By the Cauchy-Schwarz inequality we have
a xn kak kxn k for all n, so

t f (xn ) a xn + b k kxn k2 kak kxn k + b k kxn k2 = b kxn k (k kxn k kak)

Then f (xn ) ! 1 as kxn k ! +1 because kxn k (k kxn k kak) ! +1 as kxn k ! +1.


But this contradicts f (xn ) t for all n. We conclude that (f t) is bounded. Since f is
upper semicontinuous and C is closed, the set (f t) is also closed (Proposition 1074), so
compact. This proves that f is coercive. Finally, since we proved that f (xn ) ! 1 as
kxn k ! +1, when C = Rn the function f is supercoercive.
31.7. ULTRACODA: PROJECTIONS ON CONVEX SETS 993

Along with Tonelli's Theorem (Theorem 1080), this proposition readily implies the fol-
lowing remarkable unique existence result that combines the best of the two worlds of coer-
civity and concavity: coercivity ensures that a maximizer exists, strict concavity guarantees
uniqueness.

Theorem 1501 Let f : C ! R be strongly concave and upper semicontinuous on a closed


convex set of Rn . Then, f has a unique maximizer in C, that is, there exists a unique x
^2C
such that f (^
x) = maxx2C f (x).

In view of this remarkable result one may wonder whether there are strong concavity
criteria. The next result shows that this is, indeed, the case.19

Proposition 1502 A twice continuously di erentiable function f : C ! R de ned on an


open convex set of Rn is strongly concave if and only if there exists c < 0 such that the matrix
r2 f (x) cI is negative semi-de nite, i.e.,

y r2 f (x) y c kyk2 (31.46)

for all x 2 C and all y 2 Rn .

In particular, a twice di erentiable scalar function f is strongly concave if and only if


there exists c < 0 such that f 00 (x) c < 0 for all x 2 C. In words, strong concavity amounts
to a uniformly strictly negative second derivative.

Proof \Only if" Since f is strongly concave, there exists k > 0 such that the function
g (x) = f (x) + k kxk2 is concave. Since f is twice continuously di erentiable, so is g. Since g
is concave, we have y r2 g (x) y 0 for all x 2 C and all y 2 Rn (Proposition 1474). Some
simple algebra shows that r2 g (x) = r2 f (x) + 2kI for all x 2 C, where I is the identity
matrix of order n (note that kxk2 = x Ix). By setting c = 2k, this proves the implication.
\If". Set k = c=2 > 0. De ne g to be such that g (x) = f (x) + k kxk2 . Since f is twice
continuously di erentiable, so is g and r2 g (x) = r2 f (x) + 2kI = r2 f (x) cI for all x 2 C.
Since r2 f (x) cI is negative semi-de nite for all x 2 C, so is r2 g (x). By Proposition 1474,
it follows that g is concave, yielding that f is strongly concave.

Needless to say, there is a dual notion of strong convexity: a function f : C ! R de ned


on a convex set of Rn is said to be strongly convex if there exists k > 0 such that the function
g : C ! R de ned by g (x) = f (x) k kxk2 is convex. Since f is strongly convex if and only
if f is strongly concave, readers can check that dual versions of the results of this section
hold for strongly convex functions.

31.7 Ultracoda: projections on convex sets


A nice application of strong concavity is a far-reaching generalization of the Projection
Theorem for closed convex sets.
19
A \ rst-order" di erential criterion based on derivative operators will be presented later in the book in
Section 43.3.
994 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Theorem 1503 (Projection Theorem) Let C be a closed and convex set of Rn . For
every x 2 Rn , the optimization problem

min kx yk sub y 2 C (31.47)


y

has a unique solution m 2 C, characterized by the condition

(x m) (m y) 0 8y 2 C (31.48)

The solution m of the minimization problem (31.47) is called projection of x onto C.


We can de ne an operator PC : Rn ! Rn , called projection, that associates to each vector
x 2 Rn its projection PC (x) 2 C.
This notion of projection generalizes the one studied earlier in the book (Section 24.2)
because this version of the Projection Theorem generalizes the earlier one for vector sub-
spaces. Indeed, the next simple result shows that, when C is a vector subspace, condition
(31.48) reduces to the orthogonality of the error { i.e., to the condition (x m) ?C { that
characterized the solution of the earlier version of the Projection Theorem.

Proposition 1504 If C is a vector subspace and m 2 C, condition (31.48) is equivalent to


(x m) ?C.

Proof Let C be a vector subspace. By taking y = 0 and y = 2m, condition (31.48) is easily
seen to imply (x m) m = 0. So, (x m) (m y) = (x m) ( y) 0 for all y 2 C. Fix
y 2 C. Then, (x m) ty 0 for t = 1, so (x m) y = 0. Since y was arbitrarily chosen,
we conclude that (x m) y = 0 for all y 2 C, i.e., (x m) ?C.
Conversely, assume (x m) ?C. Since C is a vector subspace and m 2 C, we have
m y 2 C for all y 2 C, yielding (x m) (m y) = 0 for all y 2 C, proving (31.48).

To prove this general form of the Projection Theorem, given an x 2 Rn we consider the
function f : Rn ! R de ned by f (y) = kx yk2 . Problem (31.47) can be rewritten as

max f (y) sub y 2 C (31.49)


y

Thanks to the following lemma, we can apply Theorem 1501 to this optimization prob-
lem.20

Lemma 1505 The function f is strongly concave on Rn . In particular, it is strongly concave


on C.

Proof Simple algebra shows that r2 f (y) = 2I for all y 2 Rn , so z r2 f (y) z = 2 kzk2
kzk2 for all y 2 Rn and all z 2 Rn . By taking c = 1, condition (31.46) is satis ed. This
proves that f is strongly concave on Rn and, in particular, when we restrict the domain to
C.
20
The reader should compare this result with Lemma 1124. In a similar vein, the function of Lemma
1058 can be shown to be strongly concave. In these cases, strong concavity combines strict concavity and
coercivity, thus con rming its dual role across concavity and coercivity.
31.7. ULTRACODA: PROJECTIONS ON CONVEX SETS 995

Proof of the Projection Theorem Recall that m 2 C is a solution of the optimization


problem (31.47) if and only if it satis es

kx mk2 kx yk2 8y 2 C (31.50)

if and only if it is a solution of the optimization problem (31.49). In view of the previous
lemma, f is strongly concave on C. Clearly, it is also continuous. Since C is a closed and
convex set of Rn , by Theorem 1501, there exists a unique solution m 2 C to the optimization
problem (31.49).
It remains to show that conditions (31.48) and (31.50) are equivalent, so that condition
(31.48) characterizes the minimizer m.21 Fix any y 2 C and let yt = ty + (1 t) m for
t 2 [0; 1]. From (31.50) it follows that, for each t 2 (0; 1], we have

0 kx mk2 kx y t k2 = km y t k2 2 (x m) (m yt )
2
= km ty (1 t) mk 2 (x m) (m ty (1 t) m)
2 2
= t k(m y)k 2t (x m) (m y)

In turn, this implies that

t k(m y)k2 2 (x m) (m y) 8t 2 (0; 1]

By letting t go to 0, we thus have (x m) (m y) 0. Since y was arbitrarily chosen, we


conclude that (31.48) holds. Conversely, assume (31.48). For all y 2 C we have

kx mk2 kx yk2 = kx mk2 k(x m) + (m y)k2


= kx mk2 kx mk2 + km yk2 + 2 (x m) (m y)
= km yk2 2 (x m) (m y)

Thus, (31.48) implies kx mk2 kx yk2 0, so (31.50). Summing up, we proved that
conditions (31.48) and (31.50) are equivalent.

Example 1506 Let C = fx 2 Rn : Ax = bg be the a ne set determined by a matrix A ,


m n
with m n (cf. Proposition 793). If A has full rank, i.e., (A) = m, then
1
PC (x) = x + AT AAT (b Ax) 8x 2 Rn (31.51)

In particular, if m = 1 so that C = fx 2 Rn : a x = bg, we have


b a x
PC (x) = x + a 8x 2 Rn
kak2

To prove (31.51), we have two possible methods. The rst one veri es condition (31.48). To
this end, we start by making few observations. First, note that for each square matrix B of
order n and each pair of vectors x and y in Rn
T
x By = xT By = xT By = yTB Tx = y B Tx = B Tx y
21
Here we follow Zarantonello (1971).
996 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

1
De ne the square matrix D of order n to be such that D = AT AAT A. We leave to the
reader to verify that D = DT = D2 . Next, note that
1 1
APC (x) = A x + AT AAT (b Ax) = Ax + AAT AAT (b Ax)
= Ax + (b Ax) = b
that is, PC (x) 2 C. With this in mind, we can show that PC (x) satis es (31.48). Let y 2 C.
Since Ay = b, observe that
(x PC (x)) (PC (x) y)
T T 1 1
= A AA (Ay Ax) AT AAT (Ay Ax) +
T T 1
A AA (Ay Ax) (x y)
= D (y x) D (y x) D (y x) (x y)
T
= D D (y x) (y x) D (y x) (x y)
2
= D (y x) (y x) D (y x) (x y)
= D (y x) (x y) D (y x) (x y) = 0
Since y was arbitrarily chosen in C, (31.48) holds.
The second method to prove (31.51) relies on results about constrained optimization that
we will study later in the book. Consider the optimization problem
min kx yk2 sub y 2 C
y

The Lagrangian is L (y; ) = kx yk2 + (b Ay), so ry L (y; ) = 2 (x y) + AT y. The


rst-order condition (38.17) is then
2 (y x) = AT
Ay = b
By multiplying the rst equation by A, it becomes 2A (x y) = AAT . Since (A) = m,
T
we have AAT = m (cf. Proposition 692 recalling that AAT = AT A). So, the matrix
1
AAT is invertible and, by solving for , we then get = 2 AAT A (x y). By replacing
this value of in the rst equation, we get
1 1 1
y x = AT AAT A (x y) = AT AAT Ax AT AAT Ay
T T 1 T T 1 T T 1
= A AA Ax A AA b=A AA A (x b)
1
Thus, y = x + AT AAT A (x b) solves the optimization problem (cf. Theorem 1736).N

Example 1507 Let C = Rn+ be the positive orthant. Then,


PC (x) = x+ 8x 2 Rn (31.52)
where x+ = max fx; 0g is the positive part of vector x. For instance, if n = 3 we have
PC (1; 3; 2) = (1; 0; 2). To verify the form of this projection, we use the characterization
(31.48). So, let m 0 be such that
(x m) (m y) 0 8y 2 Rn+
31.7. ULTRACODA: PROJECTIONS ON CONVEX SETS 997

We want to show that m = x+ . By setting y = 0 and y = 2m, we get (x m) m 0


and (x m) m 0, respectively. So, (x m) m = 0.PBy setting y = ei , we then have
0 = (x m) m xi mi , so m x. In turn, from 0 = ni=1 (xi mi ) mi it then follows
xi = mi if xi > 0 and mi = 0 if xi 0. That is, m = x+ . N

Example 1508 The uniqueness of the solution of the minimization problem (31.47) relies
on the convexity of C. Indeed, consider the optimization problem

min kx yk sub y 2 @B1 (0)


y

where @B1 (0) = fx 2 Rn : kxk = 1g is the unit sphere, a closed but non-convex set. If we
take the origin x = 0, it is easy to see (just draw a picture for the case n = 2) that

arg min kyk = @B1 (0)


y2@B1 (0)

That is, every element of the unit sphere is a projection of the origin onto the unit sphere it-
self. The lack of convexity of the unit sphere thus caused a dramatic failure of the uniqueness
of the projection of the origin. N

We now turn to some basic properties of the projection operator PC : Rn ! Rn de ned


by a closed and convex set C. We begin with a useful lemma.

Lemma 1509 It holds

[(I PC ) (x) (I PC ) (y)] (PC (x) PC (y)) 0 (31.53)

for all x; y 2 Rn .

Proof Let x; y 2 Rn . By (31.48),

(x PC (x)) (PC (x) PC (y)) 0

as well as
(y PC (y)) (PC (y) PC (x)) 0
By adding, we get 31.53.

A rst interesting property is inner monotonicity.

Proposition 1510 The projection operator PC : Rn ! Rn is inner increasing, i.e.,

(PC (x) PC (y)) (x y) 0

for all x; y 2 Rn .

Proof Let x; y 2 Rn . By (31.53), we have

(x y) (PC (x) PC (y)) (PC (x) PC (y)) (PC (x) PC (y)) 0 (31.54)

as desired.

Projection operators feature a sharp form of Lipschitzianity.


998 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY

Proposition 1511 The projection operator PC : Rn ! Rn is nonexpansive, i.e., Lipschitz


with
kPC (x) PC (y)k kx yk (31.55)
for all x; y 2 Rn .

Proof Let x; y 2 Rn . By (31.54), we have

kPC (x) PC (y)k2 = (PC (x) PC (y)) (PC (x) PC (y)) (x y) (PC (x) PC (y))
kx yk kPC (x) PC (y)k

where the last inequality follows from the Cauchy-Schwarz inequality. In turn, this implies
(31.55).

A nonexpansive operator is thus a Lipschitz operator with unit coe cient. Being nonex-
pansive, projector operators are continuous, a nal noteworthy property that we report for
later reference.

Corollary 1512 The projection operator PC : Rn ! Rn is continuous.


Chapter 32

Convex Analysis

32.1 Superdi erentials


32.1.1 A useful surrogate
Di erentiable concave functions feature the important inequality (Theorem 1472):1

f (y) f (x) + rf (x) (y x) 8y 2 C

This inequality has a natural geometric interpretation: the tangent hyperplane (line, in the
scalar case) lies above the graph of f , which it touches only at (x; f (x)). Remarkably, next
we show that this property actually characterizes the di erentiability of concave functions.
In other words, this geometric property is peculiar to the tangent hyperplanes of concave
functions.

Theorem 1513 A concave function f : C ! R is di erentiable at x 2 C if and only if there


exists a unique vector 2 Rn such that

f (y) f (x) + (y x) 8y 2 C (32.1)

In this case, = rf (x).

The proof relies on this lemma of independent interest.

Lemma 1514 Let f : C ! R be concave. Then, a vector 2 Rn satis es the inequality


(32.1) if and only if
f+0 (x; z) z 8z 2 Rn (32.2)

Proof \If". Suppose 2 Rn satis es (32.1). Let z 2 Rn . Since C is open, for h > 0 small
enough we have x + hz 2 C, so

h z= ((x + hz) x) f (x + hz) f (x)

Since, by Proposition 1464, f+0 (x; ) : Rn ! R exists, we then have


f (x + hz) f (x)
f+0 (x; z) = lim z
h!0+ h
1
Unless otherwise stated, throughout this chapter C denotes an open and convex set in Rn .

999
1000 CHAPTER 32. CONVEX ANALYSIS

so satis es (32.1).
\Only if". Assume that 2 Rn satis es (32.2). Let y 2 C. Since C is open, there is
h > 0 small enough so that x + h (y x) 2 C. Then, by Lemma 1465,

f (x + t (y x)) f (x)
(y x) f 0 (x; y x) (32.3)
t
which is (32.1) when t = 1.

Proof of Theorem 1513 \Only if". Assume f is di erentiable at x 2 C. By Theorem


1471, we have that rf (x) satis es (32.1). We are left to prove that rf (x) is the unique
vector doing this. Let 2 Rn satisfy (32.1). By Corollary 1466 and Lemma 1514 and since
f is di erentiable at x, we have that

rf (x) z = f+0 (x; z) z 8z 2 Rn (32.4)

By choosing z = ei (the i-th versor) with i 2 f1; :::; ng, (32.4) yields that the i-th component
of rf (x) is smaller than or equal to i for all i 2 f1; :::; ng. By choosing z = ei with
i 2 f1; :::; ng, (32.4) yields that the i-th component of rf (x) is greater than or equal to i
for all i 2 f1; :::; ng. This proves that rf (x) = .
\If". Assume there is a unique vector 2 Rn such that (32.1) holds. By the previous
lemma, is the unique vector satisfying f+0 (x; z) z for all z 2 Rn . By Proposition 1464,
0 n
we have that f+ (x; ) : R ! R is superlinear. By Theorem 1564 (there is no circularity
in using this result in the current proof), there exists a non-empty, compact and convex set
D Rn such that f+0 (x; z) = min 0 2D 0 z for all z 2 Rn . This implies that each vector
0 2 D satis es f 0 (x; z) 0 z for all z 2 Rn . Since is the unique vector satisfying this
+
condition, = 0 for all 0 2 D, that is, D = f g. We can conclude that f+0 (x; z) = z
n 0 n
for all z 2 R and, in particular, f+ (x; ) : R ! R is a linear function. By Corollary 1466,
f is derivable at x. By Theorem 1467, f is di erentiable at x.

For concave functions, di erentiability is thus equivalent to the existence of a unique


vector { the gradient { for which the basic inequality (32.1) holds. Equivalently, to the
existence of a unique linear function l : Rn ! R such that f (y) f (x) + l (y x) for all
y 2 C. Consequently, non di erentiability is equivalent either to the existence of multiple
vectors for which (32.1) holds or to the non existence of any such vector. This observation
motivates the next de nition, where C is any convex (possibly not open) set.

De nition 1515 A function f : C ! R is said to be superdi erentiable at a basepoint


x 2 C if the set @f (x) formed by the vectors 2 Rn such that

f (y) f (x) + (y x) 8y 2 C (32.5)

is non-empty. The set @f (x) is called superdi erential at x of f .

The superdi erential thus consists of all vectors (and so of the linear functions) for which
(32.1) holds. It may not exist any such vector (Example 1522 below); in this case the
superdi erential is empty and the function is not superdi erentiable at the basepoint.
32.1. SUPERDIFFERENTIALS 1001

To visualize the superdi erential, given a basepoint x 2 C consider the a ne function


r : Rn ! R de ned by:
r (y) = f (x) + (y x)
with 2 @f (x). The a ne function r is such that

r (x) = f (x) (32.6)


n
r (y) f (y) 8y 2 R (32.7)

In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi es the set of all a ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a ne functions are the straight
lines. So, in the next gure the straight lines r, r0 , and r00 belong to the superdi erential
@f (x) of a concave scalar function:

It is easy to see that, at the points where the function is di erentiable, the only straight
line that satis es conditions (32.6)-(32.7) is the tangent line f (x) + f 0 (x) (y x). But, at
the points where the function is not di erentiable, we might well have several straight lines
r : R ! R that satisfy such conditions, that is, that touch the graph of the function at
the basepoint x and that lie above such graph elsewhere. The superdi erential, being the
collection of these straight lines, can thus be viewed as a surrogate of the tangent line, i.e.,
of the di erential. This is the idea behind the superdi erential: it is a surrogate of the
di erential when it does not exist. The next result, an immediate consequence of Theorem
1513, con rms this intuition.

Proposition 1516 A concave function f : C ! R is di erentiable at x 2 C if and only if


@f (x) is a singleton. In this case, @f (x) = frf (x)g.

In the following example we determine the superdi erential of a simple scalar function.

Example 1517 Consider f : R ! R de ned by f (x) = 1 jxj. The only point where f
is not di erentiable is x = 0. By Proposition 1516, we have @f (x) = ff 0 (x)g for each
1002 CHAPTER 32. CONVEX ANALYSIS

x 6= 0. It remains to determine @f (0). This amounts to nding the scalars that satisfy
the inequality
1 jyj 1 j0j + (y 0) 8y 2 R
i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (32.8)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (32.8) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude that
@f (0) = [ 1; 1]. Thus: 8
>
> 1 if x > 0
<
@f (x) = [ 1; 1] if x = 0
>
>
:
1 if x < 0
N

We can recast what we found in the example as


( 0
f (x) if x 6= 0
@f (x) =
f+0 (x) ; f 0 (x) if x = 0

Next we show that this is always the case for scalar functions.

Proposition 1518 Let f : (a; b) ! R be a concave function, with a; b 2 R. Then,

@f (x) = f+0 (x) ; f 0 (x) 8x 2 (a; b) (32.9)

In words, the superdi erential of a scalar function consists of all coe cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.

Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de nition we have f (x + h) f (x) + h. If h > 0, we then have

f (x + h) f (x) f (x) + h f (x)


=
h h
and so f+0 (x) . If h < 0, then

f (x + h) f (x) f (x) + h f (x)


=
h h
and so f 0 (x). We conclude that 2 f+0 (x) ; f 0 (x) , as desired.

Next we compute the superdi erential of an important function of several variables.


32.1. SUPERDIFFERENTIALS 1003

Example 1519 Consider the Leontief function f : Rn ! R given by f (x) = mini=1;:::;n xi .


Let us nd @f (0), that is, the vectors 2 Rn such that x f (x) for all x 2 Rn . Let
2 @f (0). From:

i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn

i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then

x min xi ; :::; min xi = min xi 8x 2 Rn


i=1;:::;n i=1;:::;n i=1;:::;n

and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,

@f (x) = f 2 n 1 : x = f (x)g

i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N

Example 1520 We can generalize the previous example by showing that for any positively
homogeneous function f : Rn ! R we have

@f (x) = f 2 @f (0) : x = f (x)g (32.10)

provided @f (x) 6= ;. Indeed, let 2 @f (x). By positive homogeneity, if we take y = 2x in


(32.5) we have
2f (x) = f (2x) f (x) + (2x x) = f (x) + x
that is, f (x) x. By (18.2), if we take instead y = 0 we have

0 = f (0) f (x) + (0 x) = f (x) x

so f (x) x. We conclude that f (x) = x for all 2 @f (x). In turn, this implies that
(32.5) takes the form
f (y) y 8y 2 Rn
for all 2 @f (x), i.e., @f (x) @f (0). In turn, this is easily seen to imply (32.10).2 N

Before we argued that the superdi erential is a surrogate of the di erential. To be a useful
surrogate, however, it is necessary that it often exists, otherwise it would be of little help.
The next key result shows that, indeed, concave functions are everywhere superdi erentiable
and that, moreover, this is exactly a property that characterizes them (another proof of the
tight connection between superdi erentiability and concavity).
2
The argument shows that (32.10) actually holds for any superhomogeneous function f : Rn ! R with
f (0) = 0.
1004 CHAPTER 32. CONVEX ANALYSIS

Theorem 1521 A function f : C ! R is concave if and only if @f (x) is non-empty for all
x 2 C.

In view of Proposition 1516, this result generalizes Theorem 1513.

Proof \If". Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and t 2 [0; 1]. Let 2
@f (tx1 + (1 t) x2 ). By (32.5),

f (x1 ) f (tx1 + (1 t) x2 ) + (x1 (tx1 + (1 t) x2 ))


f (x2 ) f (tx1 + (1 t) x2 ) + (x2 (tx1 + (1 t) x2 ))

that is,

f (x1 ) (1 t) (x1 x2 ) f (tx1 + (1 t) x2 )


f (x2 ) t (x2 x1 ) f (tx1 + (1 t) x2 )

Hence,

f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )

as desired.
\Only if". Suppose f is concave. Let x 2 C. By Proposition 1464, we have that f+0 (x; ) :
n
R ! R is superlinear. By Theorem 1564 (there is no circularity in using this result in
the current proof), there exists a non-empty, compact and convex set D Rn such that
0
f+ (x; z) = min 2D z for all z 2 R . Since D is non-empty, there exists 2 Rn such that
n
0
f+ (x; z) z for all z 2 Rn . By Lemma 1514, @f (x) is non-empty.

The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 1522 Consider f : [0; 1) ! R de ned by f (x) = x. The only point of the
(closed) domain in which the function is not di erentiable is the boundary point x = 0. The
superdi erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (32.11)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (32.11) holds. It follows
that @f (0) = ;. We conclude that f is not superdi erentiable at the boundary point 0. N

N.B. In this section we focus on open convex sets C to ease matters and x ideas. Yet, this
example shows that non-open domains may be important. Fortunately, the results of this
section can be extended to such domains. For instance, Theorem 1521 can be stated for any
convex set C with non-empty interior { that is, C is possibly not open, but int C 6= ; { by
32.1. SUPERDIFFERENTIALS 1005

saying that a continuous function f : C ! R is concave on C if and only if @f (x) is non-


p
empty at all x 2 int C, i.e., at all interior points x of C.3 The concave function f (x) = x is
indeed di erentiable { and so superdi erentiable, with @f (x) = ff 0 (x)g { at all x 2 (0; 1),
that is, at all interior points of the function's domain R+ . O

There is a tight relationship between superdi erentials and directional derivatives, as the
next result shows. Note that (32.13) generalizes (32.9) to the multivariable case.

Theorem 1523 Let f : C ! R be concave. Then,

@f (x) = 2 Rn : f+0 (x; y) y for all y 2 Rn (32.12)


n
= 2R : f+0 (x; y) y 0
f (x; y) for all y 2 R n
(32.13)

and
f+0 (x; y) = min y 8y 2 Rn (32.14)
2@f (x)

Proof Lemma 1514 implies (32.12), while (32.13) follows from (32.12) via (31.27). Finally,
by (32.12), we have that f+0 (x; y) min 2@f (x) y for all y 2 Rn . On the other hand, by
0 n
Proposition 1464, we have that f+ (x; ) : R ! R is superlinear. By Theorem 1564 (there is
no circularity in using this result in the current proof), there exists a non-empty, compact
and convex set D Rn such that f+0 (x; y) = min 2D y for all y 2 Rn . Observe that if
0
2 D, then f+ (x; y) n
y for all y 2 R . By Lemma 1514, this implies that 2 @f (x),
that is, D @f (x). We can conclude that f+0 (x; y) = min 2D y min 2@f (x) y for all
y 2 Rn , proving the opposite inequality.

32.1.2 Properties
We begin with a rst important property of the superdi erential.

Proposition 1524 If f : C ! R is concave, then the set @f (x) is compact at each x 2 C.

Proof It is easy to check that @f (x) is closed and convex. To show that @f (x) is compact,
assume that it is is non-empty (otherwise the result is trivially true) and, without loss of
generality, that 0 2 C and x = 0. Since f is continuous on the open set C (Theorem 833),
by Lemma 905 there exists a neighborhood B" (0) C and a constant k > 0 such that
jf (y)j k kyk for all y 2 B" (0). Let 2 @f (0). Since y 2 B" (0) if and only if y 2 B" (0),
by (32.5) we have:

k kyk f ( y) y= y f (y) k kyk 8y 2 B" (0)

Hence, j yj k kyk for all y 2 B" (0). For each versor ei , there is > 0 small enough so
that ei 2 B" (0). Hence,

j ij = ei k ei = k 8i = 1; :::; n
3
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis ed by Theorem 833).
1006 CHAPTER 32. CONVEX ANALYSIS

so j i j k for each i = 1; :::; n. Since was arbitrarily chosen in @f (0), by Proposition 169
we conclude that @f (0) is a bounded (so, compact) set.

As it is the case for the standard di erential, the monotonicity of the functions a ects
the sign of the superdi erential.

Proposition 1525 If f : C ! R is increasing, then @f (x) Rn+ for all x 2 C.

Later, Theorem 1564 will establish a converse of this result for superlinear functions.

Proof Assume that @f (x) 6= ;, otherwise the result is trivially true. Let 2 @f (x). Set
y = x + ei with > 0. By taking small enough, we have y 2 C. Since f is increasing,
we have
0 f x + ei f (x) ei = i

for each i = 1; :::; n. We conclude that 0.

We can further restrict the superdi erential within the simplex.

Proposition 1526 If f : Rn ! R is increasing and normalized, then @f (k) n 1 for all


k 2 R. If, in addition, f is translation invariant, then @f (0) = @f (k) for all k 2 R.

The Leontief function f : Rn ! R given by f (x) = mini=1;:::;n xi is increasing, normalized


and translation invariant. As we saw in Example 1519, we have @f (k) = @f (0) = n 1 for
all k 2 R.

Proof We prove the result for k = 0 and leave the more general case to readers. Assume
that @f (0) 6= ;, otherwise the result is trivially true. Let

2 @f (0) = f 2 Rn : 8x 2 Rn ; f (x) xg
Pn
Since
Pn f is increasing, PnMoreover, i=1 i = 1 f (1) = 1 and
0 by Proposition 1525.
i=1 i = ( 1) f ( 1) = 1, so that i=1 i = 1. We conclude that 2 n 1 , as
desired.
Assume that, in addition, f is translation invariant. Let 2 @f (k). We have

f (x) f (k) + (x k) = f (0) + k + x k= x

for all x 2 Rn . We conclude that 2 @f (0). As to the converse, let 2 @f (0). We have

f (x) x = f (0) + k + x k = f (k) + (x k)

for all x 2 Rn . We conclude that 2 @f (k).

We can de ne the superdi erential correspondence @f : C Rn that to each vector


x 2 C associates its superdi erential @f (x). It is viable and compact-valued (Theorem 1521
and Proposition 1524) and reduces to the derivative operator rf : C ! Rn if it is single-
valued, i.e., if f is di erentiable (Proposition 1516). Superdi erential correspondences have
an important inner monotonicity property that generalizes, even for operators, the inner
monotonicity property established in Theorem 1473.
32.1. SUPERDIFFERENTIALS 1007

De nition 1527 A correspondence f : A Rn Rn is cyclically ( inner) decreasing if,


for any nite sequence x0 ; x1 ; :::; xm , it holds
y0 (x1 x0 ) + y1 (x2 x1 ) + + ym 1 (xm xm 1) + ym (x0 xm ) 0
for each yi 2 f (xi ) with i = 0; :::; m.
When m = 1 and f is single-valued, it reduces to the inner monotonicity condition
(31.23):
(x1 x0 ) (y1 y0 ) = (x1 x0 ) (y0 y1 ) = y0 (x1 x0 ) y1 (x0 x1 ) 0
where yi = f (xi ) with i = 0; 1. Even for functions cyclical monotonicity is a stronger
property than monotonicity, as the next example shows.
Example 1528 In Example 1460 we noted that an a ne operator f : Rn ! Rn given by
f (x) = Ax + b, where A is a n n matrix and b 2 Rn , is monotone if and only if A is
negative semi-de nite, that is, if and only if its symmetric part As is negative semi-de nite
(cf. Lemma 1454). One can show that f is cyclically monotone if and only if the n n
matrix A itself is symmetric and negative semi-de nite (see Rockafellar, 1970). So, any a ne
operator that features a negative semi-de nite matrix A which is not symmetric is monotone
but not cyclically monotone. N
We can state and prove the announced monotonicity property.
Theorem 1529 The superdi erential correspondence of a concave function f : C ! R is
cyclically decreasing.
In view of the last example, this result extends the scope of Theorem 1473 even for
di erentiable concave functions. In this case, cyclical monotonicity takes the form
rf (x0 ) (x1 x0 )+rf (x1 ) (x2 x1 )+ +rf (xm 1) (xm xm 1 )+rf (xm ) (x0 xm ) 0
Proof Let @f : C Rn be the superdi erential of f . Let x0 ; x1 ; :::; xm be a nite sequence
of vectors in C. By the de nition of superdi erential, we have
f (x1 ) f (x0 ) y0 (x1 x0 ) 8y0 2 @f (x0 )
f (x2 ) f (x1 ) y1 (x2 x1 ) 8y1 2 @f (x1 )

f (xm ) f (xm 1) ym 1 (xm xm 1) 8ym 1 2 @f (xm 1)


f (x0 ) f (xm ) ym (x0 xm ) 8ym 2 @f (xm )
So, by adding up the terms on both sides we have
0 = [f (x1 ) f (x0 )] + [f (x2 ) f (x1 )] + + [f (xm ) f (xm 1 )] + [f (x0 ) f (xm )]
y0 (x1 x0 ) + y1 (x2 x1 ) + + ym 1 (xm xm 1) + ym (x0 xm )
as desired.

A dual result, with a dual notion of cyclically increasing function, holds for convex
functions, as the reader can check.
1008 CHAPTER 32. CONVEX ANALYSIS

32.1.3 Supercalculus
There is a non-trivial calculus for superdi erentials. Next we give a non-trivial sum rule that
generalizes the analog classic di erential rule. In words, the \superdi erential of a sum" is
the \sum of the superdi erentials," where the latter is a sum of sets.4

Theorem 1530 If f; g : Rn ! R are concave, then

@ (f + g) (x) = @f (x) + @g (x) 8x 2 Rn (32.15)

Proof Let f 2 @f (x) and g 2 @g (x). We have, for all y 2 Rn ,

(f + g) (y) = f (y) + g (y) f (x) + f (y x) + g (x) + g (y x)


= (f + g) (x) + f + g (y x)

So, f + g 2 @ (f + g) (x). We conclude that @f (x) + @g (x) @ (f + g) (x).


As to the converse,5 let 2 @ (f + g) (x), i.e.,

f (y) + g (y) f (x) + g (x) + (y x) 8y 2 Rn (32.16)

We want to show that there exist f 2 @f (x) and g 2 @g (x) such that

= f + g (32.17)

De ne '; : Rn ! R by

' (y) = f (y) f (x) (y x) and (y) = g (x) g (y)

By (32.16), ' (y) (y) for all y 2 Rn . So, the epigraph

epi = f(y; t) 2 Rn R: (y) tg = f(y; t) 2 Rn R : g (x) g (y) tg

of and the hypograph

hypo ' = f(y; t) 2 Rn R : ' (y) tg = f(y; t) 2 Rn R : f (y) f (x) (y x) tg

of ' are sets with disjoint interiors. In particular, the set epi is convex because is a
convex function and the set hypo ' is convex because ' is a concave function. Since both
functions are continuous, both sets are closed with non-empty interiors. By Proposition 1027,
the sets epi and hypo ' are separated, i.e., there exists 0 6= ( ; ) 2 Rn R such that, for
all y 2 Rn , all t 2 R with (y; t) 2 epi , all z 2 Rn and all s 2 R with (z; s) 2 hypo ',

y+ t z+ s

By taking t = g (x) g (y) and s = f (z) f (x) (z x), we have

y + [g (x) g (y)] z + [f (z) f (x) (z x)]


4
Sums of sets are studied in Section 21.4. For convenience, in the next two results we assume that the
domain is Rn and leave to readers their versions for general convex domains.
5
The next separation argument is based on Ekeland and Temam (1999) p. 27.
32.1. SUPERDIFFERENTIALS 1009

for all y; z 2 Rn . If y = z = 0, we get [f (x) + g (x) f (0) g (0) x] 0. By (32.16),


f (x) + g (x) f (0) g (0) x 0. So, 0. We actually have > 0. For, suppose by
contradiction that = 0. We then have y z for all y; z 2 Rn . This implies = 0, which
contradicts ( ; ) 6= 0. We conclude that > 0.
Set = = . We have, for all y; z 2 Rn ,

y + g (x) g (y) z + f (z) f (x) (z x)

By taking z = x, we have

g (y) (y x) + g (x) 8y 2 Rn

So, 2 @g (x). By taking y = x, we have

f (z) f (x) + (z x) 8z 2 Rn

So, 2 @f (x). By setting g = and f = , this proves (32.17).

Example 1531 The function h (x) = 1 jxj + x2 2x can be written as h = f + g, where


f (x) = 1 jxj and g (x) = x2 2x. By the sum rule (32.15), we then have
8
>
> 2x 3 if x > 0
<
@h (x) = [ 3; 1] if x = 0
>
>
:
2x 1 if x < 0
N

Next we present a noteworthy chain-rule for superdi erentials. Note that it requires a
monotonicity assumption.

Theorem 1532 Let g = (g1 ; :::; gk ) : Rn ! Rk be concave. If f : Rk ! R is concave and


increasing, then ( k )
X
@ (f g) (x) = i @gi (x) : 2 @f (x) (32.18)
i=1

Proof
Pk By de nition, @gi (x) = f i 2 Rn : 8y 2 Rn ; gi (y) gi (x) + i (y x)g. Let 2
@g (x) Rn . There exist 2 @f (x) and i 2 @gi (x) for each i = 1; :::; k such that
Pki i
i=1
= i=1 i i . We have:

(f g) (y) = f (g1 (y) ; :::; gk (y)) f (g1 (x) + 1 (y x) ; :::; gk (x) + k (y x))
f (g1 (x) ; :::; gk (x))
+ (g1 (x) + 1 (y x) g1 (x) ; :::; gk (x) + k (y x) gk (x))
= (f g) (x) + ( 1 (y x) ; :::; k (y x))
k
X n
X n
X k
X
= (f g) (x) + i ij (yj xj ) = (f g) (x) + (yj xj ) i ij
i=1 j=1 j=1 i=1
Xn
= (f g) (x) + (yj xj ) j
j=1
1010 CHAPTER 32. CONVEX ANALYSIS

where the rst inequality holds because


nP f is increasing. Thus, o2 @ (f g) (x). Since was
k
arbitrarily chosen, this proves that i=1 i @gi (x) : 2 @f (x) @ (f g) (x).
As to the converse, let 2 @ (f g) (x).6 We want to show that there exists 2 @f (x)
such that
k
X
= i @gi (x) (32.19)
i=1
To this end, set
n o
A = (g (x + h) z; f (g (x)) + h) : h 2 Rn ; z 2 Rk+ Rk R

Consider the epigraph epi f = (z; t) 2 Rk R : f (z) t of f . Since f is increasing,

(z; t) 2 epi f , z 0 z and t0 t =) z 0 ; t0 2 epi f 8z; z 0 2 Rk ; 8t; t0 2 R (32.20)

In turn, this property implies that A epi f = (z; t) 2 Rk R : f (z) t . Indeed, since
2 @ (f g) (x) we have, for all h 2 Rn ,

(g (x + h) ; f (g (x)) + h) (g (x + h) ; f (g (x + h))) 2 Gr f g

By (32.20), we have (g (x + h) ; f (g (x)) + h) 2 epi f . For all z 0 and h 2 Rn , we have

(g (x + h) z; f (g (x)) + h) (g (x + h) ; f (g (x)) + h)

Again by (32.20), we have (g (x + h) z; f (g (x)) + h) 2 epi f for all z 0 and h 2 Rn .


We conclude that A epi f .
We also have co A epi f . Indeed, given a collection of weights f i gni=1 , by the concavity
of g we have, for all collections fhi g Rn and fzi g Rk ,
n
X
i (g (x + hi ) zi ; f (g (x)) + hi )
i=1
n n n
!
X X X
= i g (x + hi ) i zi ; f (g (x)) + i hi
i=1 i=1 i=1
n
! n n
!
X X X
g x+ i hi i zi ; f (g (x)) + i hi 2 epi f
i=1 i=1 i=1
Pn
By (32.20), i=1 i (g (x + hi ) + zi ; f (g (x)) + hi ) 2 epi f . We conclude that co A
epi f .
Since f is continuous, hypo f is a closed and convex set with a non-empty interior, which
is disjoint from co A epi f . By the remark after Proposition 1027, the convex sets co A
and hypo f of R k R are separated, i.e., there exists 0 6= ( ; ) 2 Rk R such that, for all
y 2 R , all t 2 R with (y; t) 2 hypo f , and all h 2 Rn ,
k

y+ t g (x + h) + (f (g (x)) + h)
6
This separation argument, somehow reminiscent of that used in Theorem 1530, is based on Lemaire
(1985) p. 106.
32.2. ORDINAL SUPERDIFFERENTIALS 1011

Taking y = g (x) and h = 0, we have [t f (g (x))] 0 for all t f (g (x)). So, 0.


We actually have < 0. For, suppose by contradiction that = 0. We then have z k,
where k = g (x + h), for all z 2 Rk . This implies = 0, which contradicts ( ; ) 6= 0. We
conclude that < 0.
Set = = . We thus have, for all y 2 Rk , all t 2 R with (y; t) 2 hypo f , and all h 2 Rn ,

y+t g (x + h) + f (g (x)) + h

By taking t = f (y) and h = 0, we have

f (y) f (g (x)) + (g (x) y) 8y 2 Rk

So, 2 @f (g (x)). Since f is increasing, 0. By taking y = g (x) and t = f (g (x)), we


also have, for all h 2 Rn ,

g (x) g (x + h) + h 8h 2 Rn

In turn, this implies g (x) g (y) h for all h 2 Rn , i.e.,

g (y) g (x) + h 8h 2 Rn (32.21)


P
De ne : Rn ! Rk by (x) = ki=1 i gi (x).
Since 0, by Theorem 1530 we have
P P
@ (x) = ki=1 i @gi (x). By (32.21), 2 @ (x) = ki=1 i @gi (x). By setting = ,
this proves (32.19).

Example 1533 Given n concave functions g1 ; :::; gn : Rn ! R, consider the function h :


Rn ! R de ned by
h (x) = min gi (x)
i=1;:::;n

The function h can be written as h = f g, where f : Rn ! R is given by the Leontief


f (x) = mini=1;:::;n xi . In view of Example 1519, by the chain rule (32.18) we
( k )
X
@h (k) = i @gi (k) : 2 n 1
i=1

for all k 2 R. N

32.2 Ordinal superdi erentials


32.2.1 A quasi-concave notion
The next de nition introduces a notion of superdi erential, due to Greenberg and Pierskalla
(1973), suitable for quasi-concave functions. In reading it, keep in mind that quasi-concavity
is an ordinal notion, unlike concavity which is cardinal (Section 17.3.3).

De nition 1534 A function f : C ! R is ordinally superdi erentiable at a point x 2 C if


the set @ o f (x) de ned by

@ o f (x) = f 2 Rn : (y x) 0 =) f (y) f (x) 8y 2 Cg

is non-empty. The set @ o f (x) is called ordinal superdi erential of f at x.


1012 CHAPTER 32. CONVEX ANALYSIS

The next result shows the ordinal nature of this notion, thus justifying its name.
Proposition 1535 Let g : C ! R be ordinally superdi erentiable at x 2 C. If f : D
R ! R is strictly increasing, with Im g D, then f g : C Rn ! R is ordinally
o o
superdi erentiable at x, with @ (f g) (x) = @ g (x).
Proof Given x; y 2 C, it is enough to observe that g (y) g (x) if and only if (f g) (y)
(f g) (x) (cf. Proposition 221).

Because of its ordinal nature, the ordinal superdi erential is a convex semicone, as the
next result shows.
Proposition 1536 Let f : C ! R be ordinally superdi erentiable at x 2 C. Then, @ o f (x)
is a convex semicone.
Proof Let ; 0 2 @ o f (x). In view of Proposition 885, we need to show that + 0 2
@ o f (x) whenever ; 0 and + > 0. Let y 2 C be such that
0 0
(y x) + (y x) = + (y x) 0
It follows that at least one addendum must be negative. Without loss of generality, say the
rst: (y x) 0. We have two cases: either > 0 or = 0. In the former case, we
have that (y x) 0. In the latter case, since + > 0, 0 (y x) 0 and > 0.
We can conclude that either (y x) 0 or 0 (y x) 0 which implies f (y) f (x),
given that ; 0 2 @ o f (x), yielding that + 0 2 @ o f (x).

Next we show that for concave functions the notions of ordinal superdi erential and
of superdi erential are connected. Before doing so, we introduce an ancillary result which
shows how monotonicity is captured by the ordinal superdi erential.
Proposition 1537 Let f : C ! R be ordinally superdi erentiable at x 2 C. If f is strongly
increasing, then @ o f (x) Rn+ f0g.
So, the elements of the ordinal superdi erential of a strongly increasing function are
positive and non-zero vectors.

Proof Note that 2 Rn is such that 2 @ o f (x) if and only if


f (y) > f (x) =) (y x) > 0 8y 2 C (32.22)
Let 2 @ o f (x). Take z 2 Rn++ . Since C is open, it follows that x + z=n 2 C for n
large enough and, in particular, x + z=n x. Since f is strongly increasing, we have that
f (x + z=n) > f (x), yielding that
z
>0
n
and so z > 0. Since z 2 Rn++ was arbitrarily chosen, we thus have z > 0 for all z 2 Rn++ .
By the continuity of the function x 7 ! x, we then have z 0 for all z 2 Rn+ , proving
that 0. Finally, let 1 be the constant vector whose components are all 1. Since 1 0,
the vector must be di erent from 0, otherwise we reach the contradiction 0 = 1 > 0.
We conclude that > 0.

We can now relate superdi erentials and the ordinal ones.


32.2. ORDINAL SUPERDIFFERENTIALS 1013

Proposition 1538 Given f : C ! R, we have @f (x) @ o f (x) for all x 2 C. If, in


addition, f is strongly increasing and concave, then
[
@ o f (x) = @f (x) = f : 2 @f (x) and > 0g (32.23)
>0

Proof Assume that @f (x) 6= ;, otherwise the result is trivially true. Let 2 @f (x). By
de nition, we have that f (y) f (x) (y x) for all y 2 C. This implies that if y 2 C
and (y x) 0, then f (y) f (x), yielding that 2 @ o f (x) and @f (x) @ o f (x). Now,
assume that f is concave, strongly increasing, and x 2 C. Note that @f (x) is non-empty.
By the previous part of[the proof, we have that @f (x) @ o f (x). By Proposition 1536, it
follows that @ o f (x) @f (x). Vice versa, consider 2 @ o f (x). By Proposition 1537
>0
and since f is strongly increasing, we have that > 0. Let y 2 Rn be such that y = 0.
It follows that for every h > 0 small enough x + hy 2 C and (x + hy) x. Since
2 @ o f (x), it follows that f (x + hy) f (x) 0 for every h > 0 small enough. We can
conclude that
f (x + hy) f (x)
f+0 (x; y) = lim 0
h!0+ h
Since y was arbitrarily chosen, it follows that f+0 (x; y) 0 for all y 2 Rn such that y = 0.
De ne V = fy 2 Rn : y = 0g and g : V ! R by g (y) = 0. Clearly, V is a vector subspace
and g is linear. By the Hahn-Banach's Theorem (Theorem 1563) and since f+0 (x; ) g and
f+0 (x; ) is superlinear, it follows that g admits a linear extension such that f+0 (x; y) g (y)
for every y 2 Rn . By Riesz's Theorem, there exists 0 2 Rn such that g (y) = 0 y for every
y 2 Rn . We conclude that
y = 0 =) 0 y = 0 (32.24)
By Theorem 1523, it follows that 0 2 @f (x). Since f is strongly increasing, we also have
that 0 > 0.7 We are left to show that = 0 for some > 0. By Theorem 1562 and since
(32.24) holds, we have that 0 = for some 2 R. Since > 0 and 0 > 0, we have that
> 0, it is enough to set = 1= > 0.

The next result shows that the ordinal superdi erential is to quasi-concave functions
what the superdi erential is to concave functions (Theorem 1521).

Theorem 1539 If a lower semicontinuous function f : C ! R is quasi-concave, then


@ o f (x) 6= ; for all x 2 C. The converse holds if f is bounded above.

Proof Let x 2 C. We have two cases: either x is a maximizer or not of f on C. In the rst
case, choose = 0. The implication
(y x) 0 =) f (y) f (x)
trivially holds because f (y) f (x) for all y 2 C, being x a maximizer. Thus, 2 @ o f (x)
and this latter set is non-empty. In the second case, since x is not a maximizer and f is lower
semicontinuous and quasi-concave, the strict upper contour set
(f > f (x)) = fy 2 C : f (y) > f (x)g
7 0
By the previous part of the proof, 2 @ o f (x). By Proposition 1537 and since f is strongly increasing,
0
2 @ o f (x) Rn
+ f0g.
1014 CHAPTER 32. CONVEX ANALYSIS

is non-empty, open and convex.8 Since x does not belong to it, by Proposition 1025 there
exists 2 Rn such that if y 2 (f > f (x)), that is f (y) > f (x), then y> x. By taking
the contrapositive, we have that 2 @ o f (x) and this latter set is non-empty.
As to the converse, assume that f is bounded above, i.e., there exists M 2 R such that
f (y) M for all y 2 C. We need to introduce two connected ancillary objects. We start
with the function G : Rn C ! R such that for every 2 Rn and for every x 2 C

G ( ; x) = sup ff (y) : y xg

Note that f (x) G ( ; x) M for every 2 Rn and for every x 2 C. If we x ,


note that the function x 7 ! G ( ; x) is quasi-concave on C. Indeed, consider z; z^ 2 C
and 2 [0; 1]. Without loss of generality, assume that z z^. It follows that
z ( z + (1 ) z^) z^. We thus have that

ff (y) : y zg ff (y) : y ( z + (1 ) z^)g

yielding that G ( ; z + (1 ) z^) G ( ; z) min fG ( ; z) ; G ( ; z^)g, proving quasi-


concavity. The second ancillary function is f^ : C ! R such that for every x 2 C

f^ (x) = infn G ( ; x)
2R

Observe that f (x) f^ (x) M for every x 2 C. Note that f^ is also quasi-concave on C
(why?).
We can now prove the quasi-concavity of f . Consider x 2 C. Let 2 @ o f (x). This
implies that if y 2 C is such that (y x) 0, then f (y) f (x). It follows that

f^ (x) G ( ; x) = sup ff (y) : y xg = f (x) f^ (x)

This implies that f^ (x) = f (x). Since x 2 C was arbitrarily chosen, we can conclude that
f = f^, yielding that f is quasi-concave.

In the next result, which relates ordinal superdi erentiability and di erentiability, the
semicone nature of ordinal superdi erentiability further emerges.

Proposition 1540 Let f : C ! R be a strongly increasing, continuous and quasi-concave


function. If f is di erentiable at x 2 C, then

@ o f (x) = f rf (x) : > 0g (32.25)

provided rf (x) 6= 0.

The proof relies on a lemma of some independent interest.

Lemma 1541 If f : C ! R is continuous, then

f0 6= 2 Rn : (y x) < 0 =) f (y) f (x) 8y 2 Cg @ o f (x)


8
This set is open by the dual of Proposition 1074.
32.2. ORDINAL SUPERDIFFERENTIALS 1015

Proof Let be an element of the set on the left hand side. To prove the inclusion, let
y 2 C be such that y x. We want to show that f (y) f (x). By assumption, if
(y x) < 0, then f (y) f (x). Suppose then that y = x. Since 6= 0, there is some
z 2 Rn such that z > 0. Let yn = y z=n. Since C is open, we have yn 2 C for n su ciently
large. Clearly, yn = y ( z) =n < x By assumption, it follows that f (yn ) f (x).
Since f is continuous, by taking the limit we have f (y) = limn!1 f (yn ) f (x), as desired.
We conclude that 2 @ o f (x).

Proof of Proposition 1540 Suppose f is di erentiable at x 2 C. Let us rst prove that


rf (x) 2 @ o f (x) provided rf (x) 6= 0. In view of Lemma 1541, it is enough to prove that
rf (x) (y x) < 0 implies f (y) f (x). Since f is di erentiable, by Theorem 1287 we
have that
f (x + t (y x)) f (x)
rf (x) (y x) = lim
t!0 t
If rf (x) (y x) < 0, then f (x + t (y x)) f (x) < 0 for t su ciently small and in
(0; 1). Namely, f ((1 t) x + ty) < f (x). Since f is quasi-concave, we have f (x) >
f ((1 t) x + ty) min ff (x) ; f (y)g, yielding that f (x) min ff (x) ; f (y)g = f (y). It
follows that rf (x) 2 @ o f (x). Since @ o f (x) is a semicone, we can also conclude that
f rf (x) : > 0g @ o f (x).
As to the converse inclusion @ o f (x) f rf (x) : > 0g, consider 2 @ o f (x). We
want to show that = rf (x) for some > 0. By Proposition 1537 and since f is strongly
increasing, we have that > 0. Let z 2 Rn be such that z = 0. For t small enough, we have
that x + tz 2 C and (x + tz) x. Since 2 @ o f (x), we have that f (x + tz) f (x)
for t su ciently small. By Theorem 1287, this implies that
f (x + tz) f (x)
rf (x) z = lim 0
t!0 t
Since z was arbitrarily chosen, we have that z = 0 implies rf (x) z 0. Since z=0
if and only if ( z) = 0, we can conclude that z = 0 implies rf (x) z = 0. By
Theorem 1562, we have that rf (x) = for some 2 R. Since f is strongly increasing and
rf (x) 6= 0, we have that rf (x) > 0. Since > 0 and rf (x) > 0, we have that > 0. If
we set = 1= > 0, = rf (x), proving the inclusion.

The conditions rf (x) 6= 0 and of strong increasing monotonicity in this proposition are
needed, as next we show.

Example 1542 For the continuous and quasi-concave function f (x) = x3 we have 0 =
= @ o f (0) = (0; 1). On the other hand, for the continuous and concave function
f 0 (0) 2
f (x) = x2 , the origin is a global maximum and 0 = f 0 (0) 2 @ o f (0) = R. N

32.2.2 Quasi-concavity criteria


We now turn to di erential criteria for quasi-concavity. We begin with the quasi-concave
counterpart of Theorem 1472.

Theorem 1543 A di erentiable f : C ! R is quasi-concave if and only if for each x; y 2 C

rf (x) (y x) < 0 =) f (y) < f (x) (32.26)


1016 CHAPTER 32. CONVEX ANALYSIS

Proof We begin by proving the \Only if" part. Note that, by contrapositive, (32.26) is
equivalent to the following property for each x; y 2 C

f (y) f (x) =) rf (x) (y x) 0 (32.27)

Consider x; y 2 C. Thus, assume that f (y) f (x). Since f is quasi-concave, it follows that
f ((1 t) x + ty) f (x) for every t 2 (0; 1). By Theorem 1287, we have that
f (x + t (y x)) f (x) f ((1 t) x + ty) f (x)
rf (x) (y x) = lim = lim 0
t!0 t t!0+ t
yielding that rf (x) (y x) 0.
We next prove the \If" part. By contradiction, assume that there exists x; y 2 C
and t^ 2 (0; 1) such that f 1 t^ x + t^y < min ff (x) ; f (y)g. De ne ' : [0; 1] ! R by
' (t) = f ((1 t) x + ty) for all t 2 [0; 1]. Since f is di erentiable on C, we have that ' is dif-
ferentiable on (0; 1), continuous on [0; 1], and ' (0) = f (x) as well as ' (1) = f (y). Consider
A = arg mint2[0;1] ' (t). By Weierstrass' Theorem, A is non-empty. Moreover, A is a closed
subset of [0; 1]. It follows that t~ = max A is well-de ned. Since ' t^ = f 1 t^ x + t^y <
min ff (x) ; f (y)g, we have that ' t~ ' t^ < min f' (0) ; ' (1)g, yielding that t~ 2 (0; 1).
Clearly, there exists t 2 t~; 1 such that '0 (t) > 0. Otherwise, we would have that '0 (t) 0
for all t 2 t~; 1 , yielding that ' is decreasing on t~; 1 and, in particular, by continuity, on
t~; 1 . This would imply that ' t~ ' (1) min f' (0) ; ' (1)g, a contradiction. Consider
then the set L = t 2 t~; 1 : '0 (t) > 0 . Since t 2 L, L is non-empty and bounded from
below by t~. We can de ne s = inf L. Next, we prove that s = t~. Clearly, s t~. By
contradiction, assume that s > t~. By de nition of L, this would imply that ' (t) 0 for all
0

t 2 t~; s , yielding that ' is decreasing on t~; s , and, in particular, by continuity, on t~; s .
It would follow that ' (s) ' t~ , that is, s 2 A and s > t~, a contradiction with t~ = max A.
By a dual version of Lemma 1001, there exists a sequence fsn g L such that sn ! s.
~ ~
Since ' is continuous, s = t, and ' (s) = ' t < min f' (0) ; ' (1)g, there exists n such that
' (sn ) < min f' (0) ; ' (1)g as well as '0 (sn ) > 0. Next, de ne also yn = (1 sn ) x + sn y.
Note that
yn x = sn (y x) and '0 (sn ) = rf (yn ) (y x)
Since '0 (sn ) > 0 and sn > 0, this implies that 0 < sn '0 (sn ) = rf (yn ) (yn x). By
assumption,
rf (yn ) (x yn ) < 0 =) f (x) < f (yn )
We can conclude that

min f' (0) ; ' (1)g ' (0) = f (x) < f (yn ) = ' (sn )

a contradiction.

The next result is the quasi-concave counterpart of Theorem 1473, where a suitable notion
of quasi-monotonicity is used.

Proposition 1544 A di erentiable f : C ! R is quasi-concave if and only if the derivative


operator rf : C ! Rn is ( inner) quasi-monotone, i.e.,

rf (x) (y x) < 0 =) rf (y) (y x) 0 8x; y 2 C (32.28)


32.2. ORDINAL SUPERDIFFERENTIALS 1017

Proof \If" Assume that the derivative operator rf : C ! Rn is quasi-monotone. Suppose,


by contradiction, that f is not quasi-concave. By (32.26), there exists a pair x; y 2 C
for which rf (x) (y x) < 0 and f (y) f (x). De ne ' : [0; 1] ! R by ' (t) =
f (ty + (1 t) x). Since f is di erentiable on C, we have that ' is di erentiable on (0; 1), con-
tinuous on [0; 1], and such that ' (1) = f (y) f (x) = ' (0). De ne also yt = (1 t) x + ty
for all t 2 [0; 1]. Note that for each t 2 (0; 1)

yt x = t (y x) and '0 (t) = rf (yt ) (y x) (32.29)

Since rf (x) (y x) < 0, we have that for each t 2 (0; 1)

rf (x) (yt x) = trf (x) (y x) < 0

So, by (32.28) and (32.29), we have that for each t 2 (0; 1)

t'0 (t) = trf (yt ) (y x) = rf (yt ) (yt x) 0

The function ' is thus decreasing on (0; 1). By continuity, ' is decreasing on [0; 1]. Since
' (1) ' (0), this implies that ' is constant on [0; 1]. Since '+ (0) = rf (x) (y x) (why?),
in turn, this implies that 0 = '0+ (0) = rf (x) (y x) < 0, a contradiction.
\Only if" Let f be quasi-concave and suppose that (32.28) does not hold. So, there exists
a pair x; y 2 C such that

rf (x) (y x) < 0 and rf (y) (y x) > 0

In particular, we have that rf (y) (x y) < 0. Since f is quasi-concave, by Theorem 1543


these two inequalities imply f (y) < f (x) and f (x) < f (y), a contradiction.

32.2.3 A normalization
A simple consequence of formula (32.23) is that when a concave f is strongly increasing and
normalized, we have the sharp equality @f (x) = @ o f (x) \ n 1 . This observation suggests
a normalized version of the ordinal superdi erential, whose elements are required to belong
to the simplex.

De nition 1545 The normalized ordinal superdi erential @ no f (x) of a function f : C ! R


at a point x 2 X is de ned by

@ no f (x) = f 2 n 1 : (y x) 0 =) f (y) f (x) 8y 2 Cg

This notion is best suited for strongly increasing functions, for which it has the same
scope as ordinal superdi erentiability.

Proposition 1546 A strongly increasing f : C ! R is ordinally superdi erentiable at x 2 C


if and only if @ no f (x) 6= ;.

Proof The \if" is obvious because @ no f (x) @ o f (x). As to the converse, suppose that
@ f (x) 6= ;. By Proposition 1537, @ f (x) R+ f0g. Let 2 @ o f (x), i.e.,
o o n

f (y) > f (x) =) (y x) > 0 8y 2 C (32.30)


1018 CHAPTER 32. CONVEX ANALYSIS

Pn
From > 0 it follows that i=1 i = 1 > 0, so that (32.30) amounts to
1
f (y) > f (x) =) Pn (y x) > 0 8y 2 C
i=1 i

We conclude that @ no f (x) = @ o f (x) \ n 1 6= ;.

Next we state a sharp inequality characterization.

Lemma 1547 If f : C ! R is strongly increasing and lower semicontinuous, then

@ no f (x) = f 2 n 1 : 8y 2 C; y< x =) f (y) < f (x)g 8x 2 Rn

Proof Let 2 f 2 n 1 : 8y 2 C; y< x =) f (y) < f (x)g. Let y 2 C be such that


y x. If y< x, then f (y) f (x). If y= x, de ne yn = y 1=n. Clearly,
yn < y for all n 1. Since C is open, eventually yn 2 C. So, eventually f (yn ) < f (x).
By passing to the limit and since f is lower semicontinuous, we have f (y) f (x). We
no
conclude that 2 @ f (x).
As to the converse inclusion, let 2 @ no f (x). Let y 2 C be such that y < x.
We want to show that f (y) < f (x). If we choose t > 0 small enough, we have y + t 2 C
and (y + t) x. Since 2 @ no f (x) and f is strongly monotone, this implies that
f (y) < f (y + t) f (x), as desired.

The inequality characterization of this lemma permits to prove the relevance of the nor-
malized ordinal superdi erential for strongly increasing quasi-concave functions.

Proposition 1548 If a strongly increasing f : C ! R is lower semicontinuous and quasi-


concave, then @ no f (x) is non-empty, convex and compact for all x 2 Rn .

Proof By Theorem 1539, @ no f (x) is non-empty and convex. It remains to show that it is
closed (so, compact because the simplex is compact). Suppose that the sequence f n g
@ no f (x) converges to 2 Rn . Let y 2 Rn be such that y< x. To prove that @ no f (x)
is closed we need to show that f (y) < f (x), so that 2 @ no f (x). Eventually, we have
n y < n x. Indeed, set " = x y > 0; eventually, n x > x "=2 and
n y < y + "=2, so that n y < y + "=2 = x "=2 < n x. We conclude that,
eventually, n y < n x. By Lemma 1547, this implies that f (y) < f (x), as desired.

Remarkably, the normalized ordinal superdi erential of a quasi-concave function is a


compact set, unlike the standard ordinal superdi erential. It is thus closer in nature to the
superdi erential of concave functions. This higher similarity is corroborated by the next
result, a direct consequence of Proposition 1538.

Proposition 1549 If f : C ! R is strongly increasing, then @f (x) @ no f (x) for all


x 2 Rn . If, in addition, f is concave, then @f (x) = @ no f (x).

We close with the sharp normalized form of Proposition 1540.

Proposition 1550 Let f : C ! R be a strongly increasing, continuous and quasi-concave


function. If f is di erentiable at x 2 C, then @ no f (x) = frf (x)g provided rf (x) 6= 0.
32.3. OPTIMIZATION 1019

32.3 Optimization
A main motivation for the study of superdi erentials is that they permit to establish neat
characterizations of (global) maximizers, as next we show. Here f is a generic real-valued
function, possibly non-concave, on a generic domain A, possibly non-convex.

Theorem 1551 Given a function f : A Rn ! R, the following conditions are equivalent:

(i) a point x
^ 2 A is a maximizer;

(ii) f is superdi erentiable at x


^ and 0 2 @f (^
x);

^ and 0 2 @ o f (^
(iii) f is ordinally superdi erentiable at x x).

Proof (i) implies (ii). Let x ^ 2 A be a maximizer. We have f (x) f (^ x)+0 (x x ^) for every
x 2 A, and so 0 2 @f (^ x). (ii) implies (iii). It is enough to observe that @f (x) @ o f (x) for
all x 2 A. (iii) implies (ii). Let 0 2 @f (^x). It follows that if y 2 A and 0 (y x ^) 0, then
f (y) f (^ x). Since 0 (y x ^) 0 holds for every y 2 A, we have that f (y) f (^ x) for all
y 2 A, i.e., x
^ 2 A is a maximizer.

This theorem gives as a corollary the most general version of the rst-order condition for
concave functions. Indeed, in view of Proposition 1516, the earlier Theorem 1485 is a special
case of this result.

Corollary 1552 Given a concave function f : C ! R, a point x


^ 2 C is a maximizer if and
only if 0 2 @f (^
x).

Proof It is enough to observe that, by Theorem 1521, @f (x) 6= ; for all x 2 C, i.e., f is
superdi erentiable at all x 2 C.

The next simple example shows how this corollary makes it possible to nd maximizers
of concave functions even when Fermat's Theorem does not apply because there are points
where the function is not di erentiable.

Example 1553 For the concave function f : R ! R de ned by f (x) = 1 jxj we have
(Example 1517): 8
>
> 1 if x > 0
<
@f (x) = [ 1; 1] if x = 0
>
>
:
1 if x < 0
By the last corollary, x
^ = 0 is the unique maximizer because 0 2 @f (0) and 0 2
= @f (x) for
all x 6= 0 N

Theorem 1551 permits also to prove a general rst-order condition for quasi-concave
functions.

Corollary 1554 Let f : C ! R be quasi-concave and lower semicontinuous. Then, x


^2C
o
is a maximizer if and only if 0 2 @ f (^
x).
1020 CHAPTER 32. CONVEX ANALYSIS

Proof It is enough to observe that, by Theorem 1539, @f (x) 6= ; for all x 2 C, i.e., f is
ordinally superdi erentiable at all x 2 C.

It is easy to check that @ o f (^


x) = Rn . So, for a quasi-concave function this result gives a
loose rst-order condition.

Example 1555 For the quasi-concave function f : R ! R de ned by f (x) = e1 jxj we


have: 8
>
> 1 if x > 0
<
@ o f (x) = R if x = 0
>
>
:
1 if x < 0
^ = 0 is the unique maximizer because 0 2 @ o f (0) and 0 2
By the last corollary, x = @ o f (x) for
all x 6= 0. N

32.4 Inclusion equations


32.4.1 Inclusion equations and xed points
A correspondence f : A Rn , with domain A Rn , de nes an inclusion equation

0 2 f (x)

If f is a function, the inclusion equation reduces to a standard equation f (x) = 0. The


generalized rst-order condition 0 2 @f (^ x) of Theorem 1551 is a most important example
of an inclusion equation. Later we will see that inclusion equations naturally arise in market
analysis.
Like equations, also inclusion equations may be solved via xed point analysis. Indeed,
such analysis can be generalized to correspondences. Speci cally, a correspondence f :
A Rn is said to be a self-correspondence if f (x) A for all x 2 A. In words, self-
correspondences associates a subset of A to each element of A. So, we often write f : A A.

Example 1556 (i) All correspondences f : Rn Rn are, trivially, self-correspondences.


(ii) The correspondence f : [0; 1] [0; 1] given by f (x) = [0; x2 ] is a self-correspondence
2
because x 2 [0; 1] for all x 2 [0; 1]. N

The notion of xed point naturally extends to self-correspondences.

De nition 1557 Given a self-correspondence f : A A, a vector x 2 A is said to be a


xed point of f if x 2 f (x).

For instance, for the self-correspondence f : [0; 1] [0; 1] given by f (x) = [0; x2 ], the
endpoints 0 and 1 are xed points in that 0 2 f (0) = f0g and 1 2 f (1) = [0; 1].
The next important theorem establishes the existence of xed points by generalizing
Brouwer's Theorem (we omit its non-trivial proof).9
9
It is named after Shizuo Kakutani, who proved it in 1941.
32.4. INCLUSION EQUATIONS 1021

Theorem 1558 (Kakutani) An upper hemicontinuous and convex-valued self-correspondence


f :K K de ned on a convex compact subset K of Rn has a xed point.

Clearly, Brouwer's Theorem is a special case of Kakutani's Theorem. It is an important


result because, like standard equations, also an inclusion equation 0 2 f (x) may be solved
by nding a suitable self-correspondence g : K K de ned on a convex compact subset
K of Rn such that 0 2 f (x) if and only if x 2 g (x). In this case, the solution of an
inclusion equation reduces to the search of the xed points of a self-correspondence. The
Arrow-Debreu Theorem will momentarily a classic illustration of this solution method.

32.4.2 Aggregate market analysis


The aggregate market analysis of Section 14.1.3 can be generalized to the case of demand and
supply correspondences D; S : Rn+ Rn+ . As we will see momentarily, such generalization
can be easily motivated within an exchange economy.
Let E : Rn+ Rn be the excess demand correspondence de ned by E (p) = D (p) S (p),
with positive part E + : Rn+ Rn+ de ned by E + (p) = fmax fz; 0g : z 2 E (p)g. A pair
n n
(p; q) 2 R+ R+ of prices and quantities is a weak market equilibrium if p is such that
E (p) \ Rn 6= ;, i.e.,
0 2 E + (p) (32.31)
and q 2 D (p). The pair (p; q) is a market equilibrium if p is such that

0 2 E (p) (32.32)

and q 2 D (p).
The existence of equilibria thus reduces to the solution of some inclusion equations de-
ned by the excess market demand correspondence. To solve these inclusion equations, and
thus establish the existence of equilibria, we consider the following assumptions on such
correspondence:

E.1 E is upper hemicontinuous, convex-valued, and bounded on Rn++ ;

A.2 E ( p) = E (p) for each > 0 and all p 2 Rn+ ;

W.1 p E (p) 0 for all p 2 Rn+ .10

E.4 Ei (p) Rn+ if pi = 0.

W.2 p E (p) = 0 for all p 2 Rn+ .

We denoted the assumptions as in the earlier Section 14.1.3 because they have the same
economic interpretation (upon which we already expatiated). We use the letter \E" for the
rst and fourth assumptions because they have to adapt their mathematical form to the
more general setting of correspondences.
We can now state and prove a general version of Arrow-Debreu's Theorem.
10 P
The inequality p E (p) 0 means i2I pi zi 0 for all z 2 E (p).
1022 CHAPTER 32. CONVEX ANALYSIS

Theorem 1559 (Arrow-Debreu) Under assumptions E.1, A.2, and W.1 a weak market
equilibrium exists. If, in addition, assumptions E.4 and W.2 hold, then a market equilibrium
exists.

Proof We follow Debreu (1959). Since E is bounded, there is a compact set K in Rn such
that E (p) K for all p 2 Rn+ . Without loss of generality, we can assume that K is convex.
By E.2, we can limit ourselves to the upper hemicontinuous restriction E : n 1 K.
De ne g : K n 1 by
g (z) = arg max p z
p2 n 1

By the Maximum Theorem (Chapter 41), g is a compact-valued and upper hemicontinuous


correspondence. Moreover, it is convex-valued (Proposition 1763). Consider the product
correspondence ' : n 1 K n 1 K de ned by ' (p; z) = g (z) E (p). The corre-
spondence ' is easily seen to be upper hemicontinuous and convex-valued (as readers can
check) on the compact and convex set n 1 K. By Kakutani's Theorem, there exists a
xed point (p; z) 2 n 1 K such that (p; z) 2 ' (p; z) = g (z) E (p). So, z 2 E (p) and
p 2 g (z), which respectively imply by W.1 p z 0 and by de nition p z p z for all
p 2 n 1 . Thus, we have
p z 0 8p 2 n 1
In particular, by taking the price versor ei 2 n 1, we then get

zi = ei z 0 8i 2 I

We conclude that z 2 E (p) \ Rn , so 0 2 E + (p).


Assume E.4 and W.2. We want to show that z = 0. Suppose, by contradiction, that
zi < 0 for some good i. By E.4, pi > 0. By W.2 and since prices are positive, then there
exists some j such that pj zj > 0, which contradicts z 0. We conclude that z = 0, yielding
that 0 2 E (p).

32.4.3 Back to agents: exchange economy


The previous aggregate market analysis with demand and supply correspondences can be
understood in terms of the simple exchange economy E = f(ui ; ! i )gi2I of Section 22.9. In
what follows, we assume that each agent has a bounded consumption set Ai = [0; bi ] where
bi 2 Rn++ . In the optimization problem

max ui (x) sub x 2 Bi (p; p ! i )


x

that, in his consumer role, agent i solves we no longer assume that the solution is unique,
but permit multiple optimal bundles. Consequently, now we have a demand correspondence
Di : Rn+ Rn+ de ned by

Di (p) = arg max ui (x) 8p 2 Rn+


x2Bi (p;p ! i )

The aggregate demand correspondence D : Rn+ Rn is still de ned by


X
D (p) = Di (p)
i2I
32.4. INCLUSION EQUATIONS 1023

where now, though, the sum is in the sense of (21.3). The aggregate demand correspondence
still inherits the invariance property of individual demand correspondences { i.e., D ( p) =
D (p) for all > 0 { since this invariance property is easily seen to continue to hold for each
agent.
The aggregate supply function S : Rn+ ! Rn continues to be S (p) = f!g. So, the weak
Walras' law still takes the form p E (p) 0, where E : Rn+ Rn is the excess demand
correspondence de ned by E (p) = D (p) f!g. If Walras' law holds for each agent i 2 I,
i.e., p Di (p) = p ! i for each i 2 I, then its aggregate version p E (p) = 0 holds.
jIj
Here a pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a weak Arrow-
Debreu (market) equilibrium of the exchange economy E if

(i) xi 2 Di (p) for each i 2 I,


P
(ii) i2I xi !.

The pair (p; x) becomes a Arrow-Debreu (market) equilibrium if in the market clearing
condition (ii) we have equality, so that optimal bundles exhaust endowments.
The next result, a general version of Lemma 1053, connects the Arrow-Debreu and the
aggregate market equilibrium notions.
n jIj
P 1560 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:

(i) Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.31) and
q 2 D (p);
(ii) weak Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.32)
and q 2 D (p).

We can now establish which properties of the utility functions and endowments of the
agents of economy E imply the properties of the aggregate demand correspondence that the
Arrow-Debreu's Theorem requires. For simplicity, we consider weak equilibria and prove the
desired existence result that generalizes Proposition 1054.

Proposition 1561 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I, the
endowment ! i is strictly positive and the utility function ui is continuous and quasi-concave
on a consumption set Ai = [0; bi ] where bi 2 Rn++ . Then, a weak market price equilibrium of
the exchange economy E exists.

This existence result generalizes Proposition 1054 in that utility functions are only re-
quired to be quasi-concave and not strictly quasi-concave.

Proof Let i 2 I. If ui is continuous on the compact set Ai , by the Maximum Theorem


(Chapter 41) the individual demand correspondence Di is bounded and upper hemicontin-
uous on Rn++ . Moreover, since ui is quasi-concave, Di is convex-valued (Proposition 1763).
The aggregate demand correspondence D inherits these properties, i.e., it is bounded, convex-
valued, and upper hemicontinuous continuous on Rn++ . So, condition E.1 is satis ed. Since
we already noted that conditions A.2 and W.1 hold, we conclude that a weak market price
equilibrium exists by the Arrow-Debreu's Theorem.
1024 CHAPTER 32. CONVEX ANALYSIS

32.5 Coda: a linear algebra aggregation result


The results of the last section rely on an interesting linear algebra result that next we state
and prove.

Theorem 1562 Let f i gki=1 Rn be a nite collection of vectors. A vector 2 Rn belongs


to span f 1 ; :::; k g if and only if

i x=0 8i = 1; :::; k =) x=0 8x 2 Rn (32.33)

Proof The \if" part is obvious and left to the reader. \Only if" Before starting, we introduce
some derived objects since reasoning in terms of linear functions rather than vectors will
simplify things quite signi cantly. De ne fi : Rn ! R by fi (x) = i x for each i = 1; :::; k.
Similarly, de ne f : Rn ! R by f (x) = x. Next, de ne the operator F : Rn ! Rk to be
such that the i-th component of F (x) is F (x)i = fi (x). Since F is linear (why?), note that
Im F is a vector subspace of Rk . Next, we de ne the function g : Im F ! R as follows:
1
g (v) = f (x) where x 2 F (v) 8v 2 Im F

First, we need to show that g is well-de ned. In other words, we need to check that to each
vector of Im F the function g assigns one and only one value. In fact, by de nition, given
v 2 Im F there always exists a vector x 2 Rn such that F (x) = v. The potential issue is
that there might exist a second vector y 2 Rn such that F (y) = v, but f (x) 6= f (y). We
next show that this latter inequality never holds. Indeed, since F is linear, if F (y) = v, then
F (x) F (y) = 0 and F (x y) = 0. By de nition of F , we have i (x y) = fi (x y) = 0
for every i = 1; :::; k. By (32.33), this yields that (x y) = f (x y) = 0, that is,
f (x) = f (y). We just proved that g is well-de ned. The reader can verify that g is also
linear. By the Hahn-Banach's Theorem (Theorem 760), g admits P an extension to Rk . By
the Riesz's Theorem, there exists a vector 2 R such that g (v) = ki=1 i vi for all v 2 Rk .
k

By de nition of fi , f , g, and F , we conclude that for every x 2 Rn


k
X k
X
x = f (x) = g (F (x)) = g (F (x)) = i fi (x) = i i x
i=1 i=1
Pk 11
yielding that = i=1 i i, i.e., 2 span f 1 ; :::; k g.

11
Readers who struggle with this last step should consult the proof of the Riesz's Theorem (in particular,
the part dealing with \uniqueness").
Chapter 33

Nonlinear Riesz's Theorems

Superdi erentials permit to establish representation results for superlinear functions that
generalize Riesz' Theorem. This beautiful topic is the subject matter of this chapter (for
coda readers).

33.1 The ultimate Hahn-Banach's Theorem


In presenting the Hahn-Banach's Theorem (Section 15.11), we remarked that a linear func-
tion de ned on a vector subspace of Rn admits, in general, many linear extensions. The next
more powerful version of the theorem gives some control over them.
Theorem 1563 (Hahn-Banach) Let g : Rn ! R be a concave function and V a vector
subspace of Rn . If f : V ! R is a linear function such that f (x) g (x) for all x 2 V ,
then there exists a linear function f : Rn ! R that extends f to Rn with f (x) g (x) for all
x 2 Rn .
The version of the theorem seen in Section 15.11 is a special case. Indeed, let f : V ! R
be any linear function de ned on V . Theorem 898 is easily seen to hold for linear functions
de ned on vector subspaces, so there is k > 0 such that jf (x)j k kxk for all x 2 V . The
function g : Rn ! R de ned by g (x) = k kxk is concave (Example 814). Since f (x) g (x)
for all x 2 V , by the last theorem there exists a linear function f : Rn ! R that extends f
to Rn .

Proof Let dim V = k n and let fx1 ; :::; xk g be a basis for V . If k = n, there is nothing
to prove since V = Rn . Otherwise, by Theorem 92 there are n k vectors fxk+1 ; :::; xn g
such that the overall set fx1 ; :::; xn g is a basis for Rn . Let V1 = span fx1 ; :::; xk+1 g. Clearly,
V V1 . Given any x 2 V1 , there exists a unique collection of scalars f i gk+1 i=1 R such
Pk Pk
that x = i=1 i xi + k+1 xk+1 . Since i=1 i xi 2 V , every element of V1 can be uniquely
written as x + xk+1 , with x 2 V and 2 R. That is, V1 = fx + xk+1 : x 2 V; 2 Rg.
Let r be an arbitrary scalar. De ne f1 : V1 ! R by f (x + xk+1 ) = f (x) + r for all
x 2 V and all 2 R. The function f1 is linear, with f1 (xk+1 ) = r, and is equal to f on V .
We need to show that r can be chosen so that f1 (x) g (x) for all x 2 V1 .
If > 0, we have that for every > 0 and every x 2 V
g (x + xk+1 ) f (x)
f1 (x + xk+1 ) g (x + xk+1 ) () f (x)+ r g (x + xk+1 ) () r

1025
1026 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS

So, for all > 0 and all x 2 V , we have


g (x + xk+1 ) f (x)
r
If < 0, we have that for every < 0 and every x 2 V
f (x) g (x (
f1 (x + xk+1 ) g (x + xk+1 ) () f (x) ( )r g (x ( ) xk+1 ) () r

So, for all > 0 and all y 2 V , we have


f (y) g (y xk+1 )
r

Summing up, we have f1 (x) g (x) for all x 2 V1 if and only if we choose r 2 R so that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
inf r sup
y2V; >0 x2V; >0

It remains to prove that such a choice of r is possible, i.e., that


f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
inf sup (33.1)
y2V; >0 x2V; >0

Note that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)

() f (y) g (y xk+1 ) g (x + xk+1 ) f (x)


() f (y) + f (x) g (y xk+1 ) + g (x + xk+1 )
() f ( y + x) g (y xk+1 ) + g (x + xk+1 )
But, since g is concave and f (x) g (x) for all x 2 V , we have

f ( y + x) = ( + ) f y+ x ( + )g y+ x
+ + + +

= ( + )g (y xk+1 ) + (x + xk+1 )
+ +

( + ) g (y xk+1 ) + g (x + xk+1 )
+ +
= g (y xk+1 ) + g (x + xk+1 )
Thus, for all x; y 2 V and all ; > 0, we have
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)

In turn, this implies (33.1), as desired. We conclude that there exists a linear function
f1 : V1 ! R that extends f and such that f1 (x) g (x) for all x 2 V1 .
Consider now V2 = span fx1 ; :::; xk+1 ; xk+2 g. By proceeding as before, we can show the
existence of a linear function f2 : V2 ! R that extends f1 and such that f2 (x) g (x)
for all x 2 V2 . In particular, being V V1 V2 , the linear function f2 is such that
f2 (x) = f1 (x) = f (x) for all x 2 V . So, f2 extends f to V2 . By iterating, we reach a
nal extension fn k : Rn ! R that extends f and is such that fn k (x) g (x) for all
n
x 2 Vn k = span fx1 ; :::; xn g = R . This completes the proof.
33.2. REPRESENTATION OF SUPERLINEAR FUNCTIONS 1027

33.2 Representation of superlinear functions


Next we establish a key characterization of superlinear functions. In reading the result, recall
that @f (0) = f 2 Rn : x f (x) for every x 2 Rn g is a non-empty compact and convex
set in Rn if f is superlinear (Section 32.1), as well as that a translation invariant function f
is normalized provided f (0) = 0 { e.g., f is superlinear { and f (1) = 1 (Section 19.3).

Theorem 1564 A function f : Rn ! R is superlinear if and only if there is a non-empty


compact and convex set C Rn such that

f (x) = min x 8x 2 Rn (33.2)


2C

Moreover, C is unique and is given by @f (0). In particular,

(i) @f (0) Rn+ if and only if f is increasing;


(ii) @f (0) Rn+ f0g if and only if f is strongly increasing;
(iii) @f (0) Rn++ if and only if f is strictly increasing;
(iv) @f (0) n 1 if and only if f is increasing and translation invariant with f (1) = 1.

This result, a consequence of the Hahn-Banach's Theorem, is a nonlinear version of Riesz's


Theorem which shows that superlinear functions can be represented as lower envelopes of the
linear functions l (x) = x that pointwise dominate them. Together, points (i)-(iii) form a
nonlinear version of the monotone Riesz's Theorem stated in Theorems 651 and 765, with
stronger conditions of monotonicity { recall (6.26) { that translate in stronger properties of
@f (0). Finally, point (iv) is a nonlinear versions of Proposition 755.

Proof We prove the \only if" part, as the \if" follows from Example 873. Suppose f is
superlinear. By the Hahn-Banach's Theorem, @f (0) is not empty. Indeed, let x 2 Rn and
consider the vector subspace Vx = f x : 2 Rg generated by x (see Example 87). De ne
lx : Vx ! R by lx ( x) = f (x) for all 2 R. The function lx is linear on the vector subspace
Vx . Since f is superlinear, recall that f (x) f ( x), that is, f (x) f ( x). We next
show that lx f on Vx . Since f is superlinear, if 0, then lx ( x) = f (x) = f ( x). If
< 0, then lx ( x) = f (x) = ( f (x)) f ( x) = f ( x), proving that lx f on
Vx . By the Hahn-Banach's Theorem, there exists l 2 (Rn )0 such that l f on Rn and l = lx
on Vx .1 By the Riesz's Theorem, there exists 2 Rn such that l (x) = x for all x 2 Rn .
We thus have showed that 2 @f (0) and f (x) = x. The rst fact implies that @f (0) is
not empty, hence min 2@f (0) x f (x) for all x 2 Rn , while the second fact implies that

f (x) = x = min x (33.3)


2@f (0)

Since x was arbitrarily chosen, (33.3) holds for every x 2 Rn . Next, suppose C; C 0 Rn are
any two non-empty convex and compact sets such that

f (x) = min x = min0 x 8x 2 Rn


2C 2C
1
Recall that (Rn )0 denotes the dual space of Rn , i.e., the collection of all linear functions on Rn (Section
15.1.2).
1028 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS

We want to show that C = C 0 . Suppose, by contradiction, that there is 2 C such that


= C 0 . Since C 0 is a non-empty compact and convex set in Rn , by Proposition 1025 there
2
is a \separating" pair (a; b) 2 Rn R such that a b+">b a for all 2 C 0 and
for some " > 0. Thus, we reach the contradiction

f (a) = min0 a>a min x = f (a)


2C 2C

We conclude that C = C 0 . In turn, in view of (33.3) this implies that @f (0) is the unique
non-empty compact and convex set in Rn for which (33.2) holds.
(i) Let @f (0) Rn+ . If x; y 2 Rn are such that x y, then x y for all 2 @f (0).
Let y 2 @f (0) be such that f (y) = y y. Then,

f (y) = min y= y y y x min x = f (x)


2@f (0) 2@f (0)

as desired. Conversely, assume that f is increasing. Then, for each i = 1; :::; n we have

0 f ei = min ei = min i
2@f (0) 2@f (0)

So, 0 i for all 2 @f (0), which implies 0 for all 2 @f (0).


(ii) The \only if" is similar to that of (i) and left to the reader. As to the converse, assume
that f is strongly increasing. Then, f is increasing, yielding that @f (0) Rn+ . Moreover,
we have that
Xn
0 < f (1) = min 1 = min i 8 2 @f (0)
2@f (0) 2@f (0)
i=1

So, 0 2= @f (0).
(iii) The proof is similar to (i) and left to the reader.
(iv) Let @f (0) n 1 . By (i), f is increasing. It remains to prove that it is translation
invariant. Let x 2 Rn and k 2 R. We have k = k because 2 n 1 . So,

f (x + k) = min (x + k) = min ( x+ k)
2@f (0) 2@f (0)
= min ( x + k) = k + min x = f (x) + k
2@f (0) 2@f (0)

as desired. Conversely, assume that f is increasing and translation invariant. By point (i),
@f (0) Rn+ . Moreover, since f (k) = k for all k 2 R, we have
n
X
i = 1 min 1 = f (1) = 1 8 2 @f (0)
2@f (0)
i=1

and
n
X
i = 1 min ( 1) = f ( 1) = 1 8 2 @f (0)
2@f (0)
i=1
Pn Pn Pn
So, we have both i=1 i 1 and i=1 i 1, which implies i=1 i = 1. We conclude
that @f (0) n 1.

The previous theorem has the following important corollary.


33.2. REPRESENTATION OF SUPERLINEAR FUNCTIONS 1029

Corollary 1565 A superlinear function f : Rn ! R is linear if and only if @f (0) is a


singleton.

Proof Let f be superlinear. By the Riesz's Theorem, if f is linear, then there exists 2 Rn
such that f (x) = x for all x 2 Rn . Note that 2 @f (0). Consider 2 @f (0). De ne
l : Rn ! R by l (x) = x for all x 2 Rn . Since 2 @f (0) and l 2 (Rn )0 , we have that
l f . By (18.5), this implies that f = l and, in particular, = , that is @f (0) = f g.
Conversely, if @f (0) is a singleton, say @f (0) = f g, then (33.2) implies f (x) = x for all
n
x 2 R , proving linearity.

We can actually say something more about the domain of additivity of a superlinear
function. To this end, consider the collection Af = fx 2 Rn : f (x) = f ( x)g of all vectors
where the gap f ( x) f (x) closes.

Proposition 1566 Let f : Rn ! R be a superlinear function. A vector y 2 Rn is such that

f (x + y) = f (x) + f (y) 8x 2 Rn (33.4)

if and only if y 2 Af . Moreover, Af is a vector subspace of Rn .

So, Af is a vector subspace of Rn that describes the domain of additivity of a superlinear


function f . In particular, f is linear if and only if Af = Rn . The dimension of Af is thus a
(rough) indication of the failure of additivity of f . For instance, by Lemma 908 a function
f : Rn ! R, with f (1) 6= 0, is translation invariant if and only if 1 2 Af ; in this case, the
dimension of Af is at least 1.

Proof We begin with a key observation. If y 2 Af , then

f (y) = y 8 2 @f (0) (33.5)

Indeed, for each 2 @f (0) we have f (y) y= ( y) f ( y), so f (y) = y.


\If". Suppose y 2 Af . Let 2 @f (0). By (33.2) and (33.5), we have for each x 2 R n

f (x + y) = min (x + y) = min ( x+ y) = min x+ y = f (x) + f (y)


2@f (0) 2@f (0) 2@f (0)

as desired. \Only if". By taking x = y in (33.4), we have 0 = f (0) = f ( y + y) =


f ( y) + f (y), so f (y) = f ( y), proving that y 2 Af .
Finally, we show that Af is a vector subspace. First, by de nition of Af , observe that
y 2 Af if and only if y 2 Af . Let y 2 Af and 2 R. If 0, we have f ( y) = f (y) =
f ( y) = f ( y), so y 2 Af . Since y 2 Af and given what we have just proved, if
< 0, then > 0 and y = ( ) ( y) 2 Af . We conclude that y 2 Af for all 2 R.
Let x; y 2 Af . We have that x; y 2 Af . Let 2 @f (0). By (33.2) and (33.5), we
then have

f (x + y) = min (x + y) = min ( x+ y) = x+ y= ( x) ( y)
2@f (0) 2@f (0)
= (f ( x) + f ( y)) f( x y) f (x + y)

So, f ( x y) = f (x + y), which implies x + y 2 Af . We conclude that Af is a vector


subspace of Rn .
1030 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS

33.3 Modelling bid-ask spreads


33.3.1 Setup
In Section 24.6 we studied a basic nance framework in which n primary assets L =
fy1 ; :::; yn g Rk are traded in a frictionless nancial market. In contrast, we now allow
for bid-ask spreads, a classic market friction in which primary assets might have di erent
buying and selling prices. Buying one unit of asset j costs paj , the ask price, while selling one
unit of the same asset j yields instead pbj , the bid price, possibly with paj 6= pbj . In nancial
markets, this is a fairly common situation. For an everyday example, readers may think of
buying and selling one unit of a currency, say euros for dollars, at a bank. The price of such
operations { the exchange rate { applied by the bank will be di erent depending on whether
we buy or sell one dollar; in particular, typically the price at which we buy is greater than
the one at which we sell, so paj pbj . Di erences between ask and bid prices are called bid-ask
spreads.
Here we thus assume that each primary asset j has bid and ask prices pbj and paj , with
paj pbj 0. Set pb = pb1 ; :::; pbn 2 Rn+ and pa = (pa1 ; :::; pan ) 2 Rn+ . The triple L; pb ; pa
describes a nancial market with bid-ask spreads. If paj = pbj for each j, we are back to the
frictionless framework of Section 24.6.

33.3.2 Market values


Recall from Example 919 that the decomposition

x = x+ x

can be interpreted as a trading strategy: if x denotes a portfolio, its positive and negative
parts x+ and x describe, respectively, the long and short positions that it involves { i.e.,
how much one has to buy and sell of each primary asset to form portfolio x. To describe
how much it costs to form a portfolio x, we then need the ask market value va : Rn ! R
de ned by
Xn Xn
va (x) = x+ p
j j
a
xj pbj 8x 2 Rn (33.6)
j=1 j=1

So, va (x) is the cost of portfolio x. In particular, since each primary asset yj corresponds to
the portfolio ej , we have va ej = paj . Note that we can attain the primary assets' holdings
of portfolio x also by buying and selling according to any pair of positive vectors x0 and x00
such that x = x0 x00 . In this case, the cost of x would be
n
X n
X
x0j paj x00j pbj (33.7)
j=1 j=1

A moment's re ection shows that there are actually in nite possible decompositions of x as
a di erence of two positive vectors x0 and x00 . Each of them is a possible trading strategy
that delivers the assets' holdings that portfolio x features. Yet, as observed in Example 919,
we have
x+ x0 and x x00
33.3. MODELLING BID-ASK SPREADS 1031

The positive and negative parts thus represent the minimal holdings of the primary assets
needed to construct portfolio x. As a result, they are readily seen to be the cheapest among
such trading strategies and so we can focus on them and forget about alternative, more
expensive, buying and selling pairs x0 and x00 .

Proposition 1567 The ask market value va : Rn ! R is such that, for each x 2 Rn ,
8 9
<Xn n
X =
va (x) = min x0j paj x00j pbj : x0 ; x00 0 and x = x0 x00
: ;
j=1 j=1

The ask market value has a noteworthy property.

Proposition 1568 The ask market value va : Rn ! R is sublinear.

Proof Consider x; x 2 Rn . Note that x+ + x+ (x + x)+ 0 and x + x (x + x) 0


(why?). At the same time, we have

x + x = x+ x + x+ x = x+ + x+ (x + x )

By Proposition 1567, we have


n
X n
X
va (x + x) = v^a (x + x) (x+ + a
j + xj )pj (xj + xj )pbj
j=1 j=1
n
X n
X n
X n
X
= x+ a
j pj xj pbj + x+ a
j pj xj pbj = va (x) + va (x)
j=1 j=1 j=1 j=1

proving subadditivity. Next, consider x 2 Rn and 0. Since x+ = ( x)+ and x =


( x) , we have that
n
X n
X n
X n
X
va ( x) = ( x)+ a
j pj ( x)j pbj = x+ a
j pj xj pbj = va (x)
j=1 j=1 j=1 j=1

proving positive homogeneity and the statement.

Let us take an alternative \bid" perspective: now x 2 Rn is no longer a portfolio that we


want to form, but rather a portfolio that we already hold and want to liquidate. How much
are we going to cash in if we were to sell it on the market? The answer is given by the bid
market value vb : Rn ! R de ned by
n
X n
X
vb (x) = x+ b
j pj xj paj
j=1 j=1

In particular, we have vb ej = pbj for each primary asset j. There is a tight relationship
between bid and ask market values, as next we show.
1032 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS

Proposition 1569 We have vb va , with

vb (x) = va ( x) 8x 2 Rn (33.8)

In particular, vb is superlinear.

So, ask and bid market values are one the dual of the other. The superlinearity of vb is
a rst dividend of this duality.

Proof By de nition of va and vb and since paj pbj for each j, we have that vb (x) va (x)
n n
for all x 2 R . If x 2 R , then
0 1 0 1
Xn Xn X n Xn
va ( x) = @ ( x)+ a
j pj ( x)j pbj A = @ xj paj x+ bA
j pj
j=1 j=1 j=1 j=1
n
X n
X
= x+ b
j pj xj paj = vb (x)
j=1 j=1

proving the rst part of the statement. Consider now x; x0 2 Rn . Since va is sublinear, we
have that va ( x x0 ) va ( x) + va ( x0 ), yielding that

vb x + x0 = va x x0 va ( x) va x0 vb (x) + vb x0

proving vb is superadditive. Finally, consider x 2 Rn and 0. Since va is sublinear, we


have that
vb ( x) = va ( x) = va ( ( x)) = va ( x) = vb (x)
proving positive homogeneity.

By Proposition 1566, the set of portfolios without bid-ask spreads fx 2 Rn : vb (x) = va (x)g
is a vector subspace of Rn over which the bid and ask market values are linear.

33.3.3 Law of one price


The law of one price continues to be key also in the presence
Xn of bid-ask spreads. Recall that
the payo operator R : Rn ! Rk de ned by R (x) = xj yj is a linear operator that
j=1
describes the contingent claim determined by each portfolio x (Section 24.6). In particular,
its image W = Im R is the set of replicable claims.

De nition 1570 The nancial market (L; pb ; pa ) satis es the law of one price (LOP) if,
for all portfolios x; x0 2 Rn , we have

R (x) = R x0 =) va (x) = va x0 (33.9)

or, equivalently,
R (x) = R x0 =) vb (x) = vb x0 (33.10)
33.3. MODELLING BID-ASK SPREADS 1033

Conditions (33.9) and (33.10) are equivalent because of the bid-ask duality (33.8), so the
de nition is well posed. Note that if pai = pbi for all i, then we get back to the LOP of Section
24.6 since va = v. The rationale behind this more general version of the LOP is, mutatis
mutandis, the same: portfolios that induce the same contingent claims should have the same
market value whether we form or liquidate them.
In a market with bid-ask spreads, the LOP allows us to de ne a pair of pricing rules.
Speci cally, the ask pricing rule fa : W ! R and the bid pricing rule fb : W ! R are the
functions that associate to each replicable contingent claim w 2 W their ask and bid prices,
respectively. That is, for each w 2 W we have

fa (w) = va (x) and fb (w) = vb (x)

where x 2 R 1 (w). Clearly, we have fb fa and, by the bid-ask duality (33.8), also the
pricing rules are dual:
fb (w) = fa ( w) 8w 2 W (33.11)
Next we show that they also inherit the shape of their corresponding market values.

Theorem 1571 Suppose the nancial market L; pb ; pa satis es the LOP. Then, the ask
pricing rule fa : W ! R is sublinear and the bid pricing rule fb : W ! R is superlinear.

In sum, the pricing of contingent claims made possible by the LOP inherits the bid and
ask duality of the underlying market values.

Proof First, we verify that fa is well-de ned. In other words, we are going to check that
to each vector w of W the rule de ning fa assigns one and only one value. Indeed, assume
that there exist x; x0 2 Rn such that R (x) = w = R (x0 ). The potential issue could be
that va (x) 6= va (x0 ). But, the LOP exactly prevents this from happening. Next, consider
w; w0 2 W . By de nition, there exist x; x0 2 Rn such that R (x) = w and R (x0 ) = w0 . Since
R is linear, we also have that R (x + x0 ) = R (x) + R (x0 ) = w + w0 . Since va is sublinear,
this yields that

fa w + w0 = va x + x0 va (x) + va x0 = fa (w) + fa w0
proving that f is subadditive. Consider now w 2 W and 0. By de nition, there exists
x 2 Rn such that R (x) = w. Since R is linear, we also have that R ( x) = R (x) = w.
Since va is sublinear, this yields that

fa ( w) = va ( x) = va (x) = fa (w)

proving that fa is positively homogeneous. We conclude that fa is sublinear. By the same


arguments, the function fb is also well-de ned. By the bid-ask duality (33.11), fb turns out
to be superlinear.

33.3.4 Pricing kernels


In Theorem 1564 we established a representation result for superlinear functions that we
can now use to provide a representation result for ask and bid pricing rules that generalizes
Theorem 1143. Recall that the nancial market is complete when W = Rk .
1034 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS

Theorem 1572 Suppose the nancial market L; pb ; pa is complete and satis es the LOP.
Then, there exists a unique non-empty, compact, and convex set C Rk such that

fa (w) = max w and fb (w) = min w


2C 2C

for all w 2 Rk . In particular, C = @fb (0).

Compared to the linear case of Section 24.6, bid-ask spreads result in a multiplicity of
pricing kernels , given by the set C. In particular, the ask price of a claim w can be
expressed as fa (w) = aw w and fb (w) = bw w via pricing kernels aw and bw in C that,
respectively, attain the maximum and the minimum for the linear pricing w.

Proof Consider fb : Rk ! R. By Theorem 1564 and since fb : Rk ! R is superlinear, there


exists a unique non-empty, compact, and convex set C Rk such that

fb (w) = min w 8w 2 Rk
2C

where C = @fb (0). Since fa (w) = fb ( w) for all w 2 Rk , it follows that

fa (w) = fb ( w) = min ( w) = max w 8w 2 Rk


2C 2C

proving the statement also for fa .

Let us continue to consider a complete market. In such a market there are no arbitrages
I if, for all x; x0 2 Rn ,
R x0 R (x) =) va x0 va (x) (33.12)
or, equivalently,2 if
R x0 R (x) =) vb x0 vb (x) (33.13)
Without bid-ask spreads, the unique pricing rule is linear, so each of these two conditions
reduces to (24.20) because for linear functions positivity and monotonicity are equivalent
properties (Proposition 650). Here we need to make explicit the monotonicity assumption
that in the linear case was implicitly assumed.
It is easy to see that the no arbitrage conditions (33.12) and (33.13) imply the LOPs
(33.9) and (33.10). Under such stronger conditions we can get a stronger version of the last
result in which the pricing kernels are positive, thus generalizing Proposition 1146.

Proposition 1573 Suppose the nancial market L; pb ; pa is complete and has no arbi-
trages I. Then, there exists a non-empty, compact, and convex set C Rk+ such that

fa (w) = max w and fb (w) = min w


2C 2C

for all w 2 Rk . If, in addition, the risk-free contingent claim 1 has no bid-ask spread, with
fa (1) = fb (1) = 1, then C n 1.
2
To see the equivalence, note that R (x0 ) R (x) =) R ( x0 ) R ( x) =) va ( x0 ) va ( x) =)
va ( x0 ) va ( x) =) vb (x0 ) vb (x).
33.3. MODELLING BID-ASK SPREADS 1035

Since the market is complete, by Proposition 1566 the set of contingent claims without
bid-ask spreads Afb = w 2 Rk : fb (w) = fa (w) is a vector subspace of Rk over which the
bid and ask pricing rules are linear. The second part of the result says that if the constant
(so, risk free) contingent claim 1 belongs to such subspace and if its price is normalized to
1, then the pricing kernels are actually probability measures.3

Proof Under condition (33.12), the superlinear function fb is easily seen to be increasing. By
Theorem 1564-(i), we then have C = @fb (0) Rn+ . If 1 2 Af , then f is translation invariant.
By Theorem 1564-(iv), we then have C = @fb (0) n 1 provided fa (1) = fb (1) = 1.

Finally, the absence of arbitrages II is here modelled via strong monotonicity. So, the
resulting nonlinear version of the Fundamental Theorem of Finance, in which C Rn++ ,
relies on Theorem 1564-(iii). We leave the details to readers.

3
A similar normalization holds in Proposition 1146, as the reader can check.
1036 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS
Chapter 34

Implicit functions

34.1 The problem


So far we have studied scalar functions f : A R ! R by writing them in explicit form:

y = f (x)

This form separates the independent variable x from the dependent one y, so it permits
to determine the values of the latter from those of the former. The same function can be
rewritten in implicit form through an equation that keeps all the variables on the same side
of the equality sign:
g (x; f (x)) = 0

where g is a function of two variables de ned by

g (x; y) = f (x) y

Example 1574 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N

Note that
1
g (0) \ (A Im f ) = Gr f

The graph of the function f thus coincides with the level curve g 1 (0) of the function g of
two variables.1

Example 1575 Consider the function f : [ 1; 1] ! R de ned by f (x) = 1 x2 , whose

1
The rectangle A Im f has as its factors { its edges, geometrically { the domain and image of f . Clearly,
p 2
Gr f A Im f . For example, p for the function f (x) = x this rectangle is the orthant R+ of the plane,
2
while for the function f (x) = x x is the unit square [0; 1] [0; 1] of the plane.

1037
1038 CHAPTER 34. IMPLICIT FUNCTIONS

graph is the parabola inscribed in the rectangle A Im f = [ 1; 1] [0; 1].

y
2.5

1.5

0.5

0
-1 O 1 x
-0.5

-1
-2 -1 0 1 2

We can write f in implicit form as g x; 1 x2 = 0 with g : R2 ! R de ned by g (x; y) =


1 x2 y. Since g 1 (0) = (x; y) 2 R2 : 1 x2 = y , we have
1
g (0) \ (A Im f ) = (x; y) 2 [ 1; 1] [0; 1] : 1 x2 = y = Gr f

The implicit rewriting of a scalar function f whose explicit form is known is nothing
more than a curiosity because the explicit form contains all the relevant information on f ,
in particular on the dependence between the independent variable x and the dependent one
y. Unfortunately, often applications feature important scalar functions that are not given in
a \ready to use" explicit form, but only in implicit form through equations g (x; y) = 0. For
this reason, it is important to consider the inverse problem: does an equation of the type
g (x; y) = 0 de ne implicitly a scalar function f ? In other words, does a function f exist
such that g (x; f (x)) = 0? If so, which properties does it have? For instance, is it unique?
Is it convex or concave? Is it di erentiable?
This chapter will address these motivating questions by showing that, under suitable
regularity conditions, this function f exists and is unique (locally or globally, as it will
become clear) and that it may enjoy remarkable properties. As usual, we will emphasize a
global viewpoint, the one most relevant for applications.

An important preliminary observation: there is a close connection between implicit func-


tions and level curves that permits to express in functional terms the properties of the level
curves, a most useful way to describe such properties which will be seen later in the chapter
(Section 34.3.2). Because of its importance, in the next lemma we make this connection
rigorous. Note that the lemma the sets A and B play the roles of domain and codomain of
the implicit functions considered. In other words, the lemma considers functions f : A ! B
34.1. THE PROBLEM 1039

that belong to a posited space B A (cf. Section 6.3.2). It is a purely set-theoretic result that
considers generic sets A, B, C and D.

Proposition 1576 Let g : C ! D, with A B C, and k 2 D. For a function f : A ! B


the following two properties are equivalent:

(i) f is the unique function in B A with the property

g (x; f (x)) = k 8x 2 A (34.1)

(ii) f satis es the equality


1
g (k) \ (A B) = Gr f (34.2)

Condition (34.2) amounts to say that

g (x; y) = k () y = f (x) 8 (x; y) 2 A B

that is, the level curve g 1 (k) of the function g is described on the rectangle A B by the
function of a single variable f . Thus, f provides a \functional description" of this level curve
that speci es the relationship existing between the arguments x and y of g when they belong
to the level curve g 1 (k). By the lemma, for a function f to be implicit { so to satisfy
condition (34.1) { thus amounts to provide a functional description of the level curve.

Proof (i) implies (ii). We rst show that Gr f g 1 (k) \ (A B). Let (x; y) 2 Gr f .
By de nition, (x; y) 2 A B and y = f (x), thus g (x; y) = g (x; f (x)) = k. This implies
(x; y) 2 g 1 (k) \ (A B), so Gr f g 1 (k) \ (A B). As to the converse inclusion, let
(x; y) 2 g 1 (k) \ (A B). We want to show that y = f (x). Suppose not, i.e., y 6= f (x).
De ne f~ : A ! R by f~ (x) = f (x) if x 6= x and f~ (x) = y. Since g (x; y) = k, we
have g(x; f~ (x)) = k for every x 2 A. Since (x; y) 2 A B, we have f~ 2 B A . Being by
construction f~ 6= f , this contradicts the uniqueness of f . We conclude that (34.2) holds, as
desired.
(ii) implies (i). Let f 2 B A be such that (34.2). By de nition, (x; f (x)) 2 Gr f for each
x 2 A. By (34.2), we have (x; f (x)) 2 g 1 (k), so g (x; f (x)) = k for each x 2 A. It remains
to prove the uniqueness of f . Let h 2 B A satisfy (34.1). We have Gr h g 1 (k) \ (A B)
since we can argue as in the rst inclusion of the rst part of the proof. By (34.2), this
inclusion then yields Gr h Gr f . In turn, this implies h = f . Indeed, if we consider x 2 A,
then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f , then (x; h (x)) = (x0 ; f (x0 )) for some
x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so h (x) = f (x). Since x was arbitrarily
chosen, we conclude that f = h, as desired.

N.B. If C = A B, then (34.2) simpli es to


1
g (k) = Gr f

Indeed, in this case g 1 (k) = f(x; y) 2 A B : g (x; y) = kg A B and so g 1 (k) \


(A B) = g 1 (k). O
1040 CHAPTER 34. IMPLICIT FUNCTIONS

34.2 Implicit functions


To address the motivating questions that we posed we need some more structure. For this
reason, throughout the section we assume that A Rn , C Rn+1 and B and D are intervals
in R. By taking advantage of this added structure, the next result provides a simple answer
to a key existence question.

Proposition 1577 Let g : C ! D, with A B C, and k 2 D. If g continuous in y and

inf g (x; y) < k < sup g (x; y) 8x 2 A (34.3)


y2B y2B

then there exists f : A ! B such that g (x; f (x)) = k for all x 2 A.

In this case we say that equation g (x;y) = k implicitly de nes f on the rectangle A B.

Proof For simplicity, let k = 0. Let x0 2 A. By condition (34.3), there exist scalars
y 0 ; y 00 2 B, say with y 0 y 00 , such that g (x0 ; y 0 ) 0 g (x0 ; y 00 ). Since f is continuous, by
Bolzano's Theorem there exists y0 2 [y 0 ; y 00 ] such that g (x0 ; y0 ) = 0. Since x0 was arbitrarily
chosen, this proves the existence of the implicit function f .

Next comes the uniqueness of the implicit function. Recall that a function is strictly
monotone if it is either strictly increasing or strictly decreasing.

Proposition 1578 Let g : C ! D, with A B C, and k 2 D. If g strictly monotone in


y, then there exists at most one function f : A ! B in B A such that g (x; f (x)) = k for all
x 2 A.

Proof Let f; h : A ! B be such that g (x; f (x)) = g (x; h (x)) = k for all x 2 A. We want
to show that h = f . Suppose, by contradiction, that h 6= f . So, there is at least some x 2 A
with h (x) 6= f (x), say h (x) > f (x). The function g is strictly monotone in y, say increasing.
Thus, k = g (x; h (x)) > g (x; f (x)) = k, a contradiction. We conclude that h = f .

The last two propositions show that, if g is continuous and strictly monotone in y and
satis es condition (34.3), then equation g (x;y) = k implicitly de nes a unique function f on
the rectangle A B.
When g is partially derivable in y, a convenient di erential condition that ensures the
strict monotonicity of g in y is that either
@g
(x; y) > 0 8 (x; y) 2 A B
@y

or that the opposite inequality holds for all (x; y) 2 A B. This type of di erential mono-
tonicity conditions will play a key role in what follows, in particular in the local and global
versions of the Implicit Function Theorem.

Example 1579 De ne g : R2 ! R by g (x; y) = x2 2y ey . Equation

g (x; y) = 0
34.2. IMPLICIT FUNCTIONS 1041

de nes on the entire plane a unique implicit function f : R ! R. Indeed, g is di erentiable


with
@g (x; y)
= 2 ey < 0 8y 2 R
@y

Therefore, g is strictly decreasing in y. Moreover, condition (34.3) holds because

lim g (x; y) = +1 and lim g (x; y) = 1 8x 2 R


y! 1 y!+1

By Propositions 1577 and 1578, there is a unique implicit function f : R ! R such that

g (x; f (x)) = x2 2f (x) ef (x) = 0 8x 2 R

Note that we are not able to write y as an explicit function of x, that is, we are not able to
provide the explicit form of f . N

The following example exhibits a discontinuous g which is not strictly monotone in y.


Nevertheless, we have a unique implicit function, thus showing that the conditions of the
last two propositions are only su cient.

Example 1580 Let g : R f0g R be de ned for each x 6= 0 as


8 y
>
< 1 if x; y 2 Q
x
g (x; y) =
: y 1 otherwise
>
x
On R f0g R there is a unique implicit function f : R f0g ! R given by
(
x if 0 6= x 2 Q
f (x) =
x otherwise

as the reader can check. N

Having discussed existence and uniqueness, we can now turn to the properties that the
implicit function f inherits from g. In short, the continuity of g is passed to the implicit
function, as well as its monotonicity and convexity, although reversed.

Proposition 1581 Let g : C ! D, with A B C, and k 2 D. If g strictly increasing in


y and f : A ! B is such that g (x; f (x)) = k for all x 2 A, then

(i) f is strictly decreasing (increasing) if g is strictly increasing (decreasing) in x.

(ii) f is (strictly) convex if g is (strictly) quasi concave, provided the sets A, B and C are
convex.

(iii) f is (strictly) concave if g is (strictly) quasi convex, provided the sets A, B and C are
convex.

(iv) f is continuous if g is continuous, provided the sets A and B are open.


1042 CHAPTER 34. IMPLICIT FUNCTIONS

We leave to the reader the dual version of this result in which the strict monotonicity of
g in y changes from increasing to decreasing.

Proof (i) Suppose rst that g is strictly increasing in x. Let us show that f is strictly
decreasing. Take x; x0 2 A with x > x0 . Suppose, by contradiction, that f (x) f (x0 ). We
have
0 = g (x; f (x)) > g x0 ; f (x) g x0 ; f x0 = 0
This contradiction shows that f (x) < f (x0 ). We conclude that f is strictly decreasing.
Now, suppose that g is strictly decreasing in x. To show that f is strictly increasing,
take again x; x0 2 A with x > x0 and suppose, by contradiction, that f (x) f (x0 ). We now
have
0 = g (x; f (x)) g x; f x0 < g x0 ; f x0 = 0
This contradiction shows that f (x) > f (x0 ). Thus, f is strictly increasing.
(ii) Let g be quasi concave. Let us show that f is convex. Let x; x0 2 A and 2 [0; 1].
From g (x; f (x)) = g (x0 ; f (x0 )) it follows that

g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0

Hence, f (x) + (1 ) f (x0 ) f ( x + (1 ) x0 ) as f is strictly increasing in y. A similar


argument can be used to show the strict version.
(iii) Similar, mutatis mutandis, to point (ii).
(iv) Consider a point x and the corresponding value y = f (x). Since A is open, the point
(x; y) is interior. Hence, there exists " > 0 such that B" (x; y) A B. Let m 1 be large
enough so that 0 < 1=m < ". Since g (x; y) = k and g is strictly increasing in y, we have
g (x; y 1=m) < k < g (x; y + 1=m). By the continuity of g, the functions g ( ; y 1=m) and
g ( ; y + 1=m) are both continuous in x. So, there exists (cf. the Theorem on the permanence
of sign) a small enough neighborhood B~" (x) A such that

1 1
g x; y < k < g x; y + 8x 2 B~" (x)
m m

Since g is strictly increasing, we then have


1 1
f (x) < f (x) < f (x) + 8x 2 B~" (x) (34.4)
m m
In turn, this guarantees that f is continuous at x. In fact, let xn ! x. Fix any m 1 large
enough so that 0 < 1=m < ". By what we just proved, there exists ~" > 0 such that (34.4)
holds. By the de nition of convergence, there is n~" 1 such that xn 2 B~" (x) for every
n n~" , so that
1 1
f (x) < f (xn ) < f (x) + 8n n~"
m m
Thus
1 1
f (x) lim inf f (xn ) lim sup f (xn ) f (x) +
m m
34.2. IMPLICIT FUNCTIONS 1043

Since this holds for all m large enough, we have


1 1
f (x) = lim f (x) lim inf f (xn ) lim sup f (xn ) lim f (x) + = f (x)
m!1 m m!1 m

We conclude that lim f (xn ) = f (x). Since x was arbitrarily chosen, the function f is
continuous.

We turn, for the case n = 1, to the all-important issue of the di erentiability of the
implicit function.

Proposition 1582 Let g : C R2 ! D, with A B C, and k 2 D. Suppose that

(i) the sets A and B are open;

(ii) g is continuously di erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.

If f : A ! B is such that g (x; f (x)) = k for all x 2 A, then it is continuously di eren-


tiable, with
@g
(x; y)
f 0 (x) = @x (34.5)
@g
(x; y)
@y
for all (x; y) 2 g 1 (k) \ (A B).

In the next section we will discuss at length the di erential formula (34.5), which plays
a fundamental role in applications.
Proof Since either @g (x; y) =@y > 0 for all (x; y) 2 A B or the opposite inequality holds,
g is strictly monotone in y. By Proposition 1578, f is then the unique function in B A such
that g (x; f (x)) = k for all x 2 A. Let x 2 A and y = f (x). Set h2 = f (x + h1 ) f (x).
Since g is continuously di erentiable, for every h1 ; h2 6= 0 there exists 0 < # < 1 such that2
@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )
g (x + h1 ; y + h2 ) = g (x; y) + h1 + h2
@x @y
If h1 is small enough so that x + h1 2 A and y + h2 2 B, we then have
@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )
0= h1 + h2 (34.6)
@x @y
By Proposition 1581-(iv), the implicit function f is continuous. Hence, if h1 ! 0 then
h2 ! 0. So, by (34.6) we have
@g
h2
@g(x+#h1 ;y+#h2 ) (x; y)
f 0 (x) = lim = lim @x
= @x (34.7)
h1 !0 h1 h1 !0 @g(x+#h1 ;y+#h2 ) @g
@y (x; y)
@y
2
It is a cruder version of approximation (29.35).
1044 CHAPTER 34. IMPLICIT FUNCTIONS

because of the continuity of @g=@x and of @g=@y. In turn, this shows that the continuity of
the derivative function f 0 is a direct consequence of the continuity of @g=@x and of @g=@y.
From (34.7) it follows that

@g
(x; f (x))
f 0 (x) = @x 8x 2 A
@g
(x; f (x))
@y

However, the uniqueness of f ensures that g 1 (k) \ (A B) = Gr f (Proposition 1576). In


turn, this implies formula (34.5) because (x; y) 2 g 1 (k) \ (A B) if and only if y = f (x).

Example 1583 In the last example we learned that the equation

g (x; y) = x2 2y ey = 0

de nes on the plane a unique implicit function f : R ! R. The function g is continuously


di erentiable, with
@g
(x; y) = 2 ey < 0 8x; y 2 R2
@y
By Proposition 1582, f is then continuously di erentiable, with

@g
(x; y) 2x
f 0 (x) = @x = 8 (x; y) 2 g 1
(0)
@g 2 + ey
(x; y)
@y

Though we were not able to provide the explicit form of f , we have formula (34.5) for
its derivative. For instance, at each (x0 ; y0 ) 2 g 1 (0) we can then write the rst-order
approximation

2x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 ) = y0 + (x x0 ) + o (x x0 )
2 + ey0
that gives us some precious information on f . N

In the next example we use the results of this section without explicitly referring to them,
a task that we leave to the readers.

Example 1584 Let us study the dependence of y on x determined by the equation

4 2
y y
x ln e 3 + e 3 + xy = 0
3
where both scalars x and y are strictly positive. To this end, de ne g : (0; 1) (0; b) ! R,
with b 2 (0; 1], by
4 2 y
g (y; x) = x log e 3 y + e 3 + xy
3
34.3. A LOCAL PERSPECTIVE 1045

The function g is strictly increasing in y. So, equation g (y; x) = 0 de nes implicitly a


function f : (0; 1) ! (0; b) that describes the dependence of y on x. Observe that
y 2
y
4 e3 e 3
log y= >0 (34.8)
3 x
for all (x; y) 2 (0; 1) (0; b). Hence, we can take b = log (4=3). The function g is then
strictly decreasing in x. In turn, this implies that the function f is strictly increasing. Since
g is continuous, f is also continuous. Moreover, by (34.8) we have
4 4
f (x) < log 8x > 0 and lim f (x) = log
3 x!+1 3
We also have
@g
(x; y) log 34 y
f 0 (x) = @x
@g
= 2 y
2 y
@y (x; y) 3e
3 + 13 e 3 + x
Thus, f 0 (x) < 0 for all (x; y) 2 (0; 1) (0; b). This con rms that f is strictly increasing.
More interestingly, f 0 is strictly decreasing, which proves that f is strictly concave. This
completes the study of the function f , so of the dependence of y on x. N

34.3 A local perspective


34.3.1 Implicit Function Theorem
We now address the motivating questions from a local perspective, which is particularly well
suited for di erential calculus, as the next famous result shows.3 It is the most important
result in the study of implicit functions and is widely used in applications. In particular, we
focus on a point (x0 ; y0 ) that solves equation g (x; y) = 0, i.e., such that g (x0 ; y0 ) = 0 or,
equivalently, such that (x0 ; y0 ) 2 g 1 (0).

Theorem 1585 (Implicit Function Theorem) Let g : U ! R be de ned (at least) on an


open set U of R2 and let g (x0 ; y0 ) = 0. If g is continuously di erentiable on a neighborhood
of (x0 ; y0 ), and
@g
(x0 ; y0 ) 6= 0 (34.9)
@y
then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = 0 8x 2 B (x0 ) (34.10)
The function f is continuously di erentiable on B (x0 ), with
@g
(x; y)
0
f (x) = @x (34.11)
@g
(x; y)
@y
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).
3
This theorem rst appeared in lecture notes that Ulisse Dini prepared in the 1870s. For this reason,
sometimes it is named after him.
1046 CHAPTER 34. IMPLICIT FUNCTIONS

Along with the continuous di erentiability of g, the easily checked di erential condition
(34.9) thus ensures that locally, near the point (x0 ; y0 ), there exists a unique and continuously
di erentiable implicit function f : B (x0 ) ! V (y0 ). It is a remarkable achievement: the
hypotheses of the global results of the previous section (Propositions 1577, 1578 and 1582)
are clumsier. Yet, the global viewpoint { the most relevant for applications { will be partly
vindicated by the Global Implicit Function Theorem of next chapter and, more important
here, the proof of the Implicit Function Theorem will show how this theorem actually builds
on the previous global results.
To emphasize the local perspective of the Implicit Function Theorem, here we say that
equation g (x;y) = 0 implicitly de nes a unique f at the point (x0 ; y0 ) 2 g 1 (0).

Proof Suppose, without loss of generality, that (34.9) takes the positive form
@g
(x0 ; y0 ) > 0 (34.12)
@y
Since g is continuously di erentiable, by the Theorem on the permanence of sign there exists
a neighborhood B ~ (x0 ; y0 ) U for which

@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B (34.13)
@y
Let " > 0 be small enough so that

[x0 "; x0 + "] [y0 "; y0 + "] ~ (x0 ; y0 )


B

Since @g (x; y) =@y > 0 for every (x; y) 2 [x0 "; x0 + "] [y0 "; y0 + "], the function g (x; )
is strictly increasing in y for every x 2 [x0 "; x0 + "]. So, g (x0 ; y0 ") < 0 = g (x0 ; y0 ) <
g (x0 ; y0 + "). The functions g ( ; y0 ") and g ( ; y0 + ") are both continuous in x, so again by
the Theorem on the permanence of sign there exists a small enough neighborhood B (x0 )
[x0 "; x0 + "] so that

g (x; y0 ") < 0 < g (x; y0 + ") 8x 2 B (x0 ) (34.14)

By Bolzano's Theorem, for each x 2 B (x0 ) there exists y0 " < y < y0 + " such that
g (x; y) = 0. By the strict monotonicity of g (x; ) on [y0 "; y0 + "], such y is unique. By
setting V (y0 ) = (y0 "; y0 + "), we have thus de ned a unique implicit function f : B (x0 ) !
V (y0 ) on the rectangle U (x0 ) V (y0 ) such that (34.10) holds.4
Having established the existence of a unique implicit function, its di erential properties
now follow from Proposition 1582.

Since the function f : B (x0 ) ! V (y0 ) de ned implicitly by the equation g (x;y) = 0 at
(x0 ; y0 ) is unique, in view of Proposition 1576 the relation (34.10) is equivalent to

g (x; y) = 0 () y = f (x) 8 (x; y) 2 B (x0 ) V (y0 ) (34.15)


4
Though we gave a simple direct proof, after having established (34.14) we could have just invoked Propo-
sitions 1577 and 1578 to conclude that there exists a unique f . Indeed, (34.14) implies (34.3), so the existence
of f is a consequence of Proposition 1577. In a similar vein, its uniqueness follows from Proposition 1578
because g is strictly increasing in y.
34.3. A LOCAL PERSPECTIVE 1047

that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (34.16)
Thus, the level curve g 1 (0) { so, the solutions of the equation g (x; y) = 0 { can be repre-
sented locally by the graph of the implicit function. In the nal analysis, this is the reason
why the theorem is so important in applications (as we will see shortly in Section 34.3.2).

Inspection of the proof of the Implicit Function Theorem shows that on the rectangle
B (x0 ) V (x0 ) we have either @g (x; y) =@y > 0 or @g (x; y) =@y < 0. Assume the former, so
that g is strictly increasing in y. By Proposition 1581, we then have that:

(i) f is strictly decreasing if @g (x; y) =@x > 0 on B (x0 ) V (x0 );


(ii) f is (strictly) convex if g is (strictly) quasi concave provided the set U is convex.
(iii) f is (strictly) concave if g is (strictly) quasi convex provided the set U is convex.

Thus, some basic properties of the implicit function provided by the Implicit Function
Theorem can be easily established. Note that formula (34.11) permits the computation of
the rst derivative of the implicit function even without knowing the function in explicit
form. Since the rst derivative is often what is really needed for such a function (because,
for example, we are interested in solving a rst-order condition), this is a most useful feature
of the Implicit Function Theorem.

At the point (x0 ; y0 ), formula (34.11) takes the form


@g
(x0 ; y0 )
f 0 (x0 ) = @x
@g
(x0 ; y0 )
@y
Note that the use of formula (34.11) is based on the clause \(x; y) 2 g 1 (0)\B (x0 ) V (y0 )"
that requires to x both variables x and y. This is the price to pay in implicit derivability.
In contrast, in explicit derivability it is su cient to x the variable x to compute f 0 (x).
On the other hand, we can rewrite (34.11) as
@g
(x; f (x))
f 0 (x) = @x (34.17)
@g
(x; f (x))
@y
for each x 2 B (x0 ), thus emphasizing the role played by the implicit function. Formulations
(34.11) and (34.17) are both useful, for di erent reasons; it is better to keep both of them
in mind. As we remarked, formulation (34.11) allows one to compute the rst derivative of
f even without knowing f itself, thereby yielding a useful rst-order local approximation of
f . For this reason in the examples we will always use (34.11) because the closed form of f
will not be available.

We can provide a heuristic derivation of formula (34.11) through the total di erential
@g @g
dg = dx + dy
@x @y
1048 CHAPTER 34. IMPLICIT FUNCTIONS

of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which \yields" (the power of heuristics!):
@g
dy @x
= @g
dx
@y

It is a rather rough (and incorrect) argument, yet useful to remember formula (34.11).

Example 1586 In the trivial case of a linear function g (x; y) = ax + by c, equation


g (x; y) = 0 becomes ax + by c = 0, and yields
a c
y = f (x) = x+
b b
provided b 6= 0. Even in this very simple case, the existence of an implicit function requires
the condition b = @g (x) =@y 6= 0. N

Example 1587 Let g : R2 ! R be given by g (x; y) = x2 xy 3 + y 5 16. Let us determine


whether equation g (x; y) = 0 de nes implicitly a function at the point (x0 ; y0 ) = (4; 2) 2
g 1 (0). The function g is continuously di erentiable on R2 , namely @g (x; y) =@y = 3xy 2 +
5y 4 , and therefore
@g
(4; 2) = 32 6= 0
@y
By the Implicit Function Theorem, there exists a unique continuously di erentiable f :
B (4) ! V ( 2) such that

x2 xf 3 (x) + f 5 (x) = 16 8x 2 B (4)

Moreover, since @g (x; y) =@x = 2x y 3 , we have


@g
(4; 2) 2 4 ( 2)3 16 1
f 0 (4) = @x = = =
@g 3 4 ( 2)2 + 5 ( 2)4 32 2
(4; 2)
@y

In general, at each point (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )) in which @g (x; y) =@y 6= 0,
we have
@g
(x; y) 2x y 3 y 3 2x
f 0 (x) = @x = =
@g 3xy 2 + 5y 4 3xy 2 + 5y 4
(x; y)
@y
In particular, the rst-order local approximation in a neighborhood of x0 is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


y03 2x0
= y0 + (x x0 ) + o (x x0 )
3x0 y02 + 5y04
34.3. A LOCAL PERSPECTIVE 1049

for every x 2 B(x0 ).5 N

Sometimes it is possible to nd stationary points of the implicit function without knowing


its explicit form. When this happens, it is a remarkable application of the Implicit Function
Theorem. For instance, consider in the previous example the point (4; 2) 2 g 1 (0). We have
(@g=@y) (4; 2) = 32 6= 0. Let f : B (4) ! V (2) be the unique function then de ned implicitly
at the point (4; 2).6 We have:

@g
(4; 2) 0
f 0 (4) = @x = =0
@g 32
(4; 2)
@y

Therefore, x0 = 4 is a stationary point for the implicit function f . It is possible to check


that it is actually a local maximizer.

Example 1588 (i) Consider the function g : R2 ! R given by g (x; y) = 7x2 + 2y ey .


The hypotheses of the Implicit Function Theorem are satis ed at each point (x0 ; y0 ) 2 R2 .
Thus, equation g (x; y) = 0 de nes implicitly at a point (x0 ; y0 ) 2 g 1 (0) a continuously
di erentiable function f : B (x0 ) ! V (y0 ) with

@g(x;y)
14x
f 0 (x) = @x
= (34.18)
@g(x;y) 2 ey
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).


Even if we do not know the explicit form of f , we have been able to nd its derivative
function f 0 . The rst-order local approximation is

14x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 ) = y0 (x x0 ) + o (x x0 )
2 ey0
p 1 (0)
p
at (x 0 ; y0 ). For example, at the point (1= 7; 0) 2 g we have, as x ! 1= 7,

1 p 1 1
f p = 2 7 x p +o x p
7 7 7

(ii) Let g : R2 ! R be given by g (x; y) = x3 + 4yex + y 2 + xey . If g (x0 ; y0 ) = 0 and


@g (x0 ; y0 ) =@y 6= 0, then by the Implicit Function Theorem the equation g (x; y) = 0 de nes
at (x0 ; y0 ) a unique continuously di erentiable function f : B (x0 ) ! V (y0 ) with

@g(x;y)
3x2 + 4yex + ey
f 0 (x) = @x
=
@g(x;y) 4ex + 2y + xey
@y

5
The reader can verify that also ( 12; 2) 2 g 1 (0) and @g=@y ( 12; 2) 6= 0, and calculate f 0 ( 12) for
the implicit function de ned at ( 12; 2).
6
This function is di erent from the previous implicit function de ned at the other point (4; 2).
1050 CHAPTER 34. IMPLICIT FUNCTIONS

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The rst-order local approximation is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


3x20 + 4y0 ex0 + e y0
= y0 (x x0 ) + o (x x0 )
4ex0 + 2y0 + x0 ey0

at (x 0 ; y0 ). For example, if (x0 ; y0 ) = (0; 0) we have @g (0; 0) =@y = 4 6= 0, so

@g(0;0)
1
f 0 (0) = @x
@g(0;0)
=
4
@y

and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N

By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such
that locally g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de ne any implicit
function of the type y = f (x). But @g ( 1; 0) =@x = 2 6= 0 and, therefore, at such points
the equation de nes an implicit function of the type x = ' (y). Symmetrically, at the two
points (0; 1) and (0; 1) the equation de nes an implicit function of the type y = f (x) but
not one of the type x = ' (y).

This last remark suggests a nal important observation on the Implicit Function Theo-
rem. Suppose that, as at the beginning of the chapter, ' is a standard function de ned in
explicit form, which can be written in implicit form as

g (x; y) = ' (x) y (34.19)

Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem { in
\exchanged" form { then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that

g (f (y) ; y) = 0 8y 2 B (y0 )

that is, by recalling (34.19),

' (f (y)) = y 8y 2 B (y0 )


34.3. A LOCAL PERSPECTIVE 1051

The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence { locally, around the point y0 { of the inverse
of '. In particular, formula (34.11) here becomes

@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classic formula (26.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will see later
in the book (Section 35.1).

34.3.2 Level curves and marginal rates


Though so far in this section we considered the equation g (x; y) = 0, there is nothing special
about 0 and we can actually consider any scalar k. Though mathematically it is an obvious
generalization of the Implicit Function Theorem, because of its importance in applications
next we state and prove the version of the theorem for a generic scalar k, possibly di erent
from 0.

Proposition 1589 Let g : U ! R be de ned (at least) on an open set U of R2 and let
g (x0 ; y0 ) = k. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), and

@g
(x0 ; y0 ) 6= 0
@y

then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = k 8x 2 B (x0 )
The function f is continuously di erentiable on B (x0 ), with

@g
(x; y)
f 0 (x) = @x (34.20)
@g
(x; y)
@y

for every (x; y) 2 g 1 (k) \ (B (x0 ) V (y0 )).

This is the version of the Implicit Function Theorem which we will refer to in the rest of
the section when discussing marginal rates.

Proof De ne gk : U R2 ! R by gk (x; y) = g(x; y) k. We have g (x; y) = k if and only if


gk (x; y) = 0, that is, g 1 (k) = gk 1 (0). Moreover, @gk (x0 ; y0 ) =@y = @g (x0 ; y0 ) =@y 6= 0. By
the Implicit Function Theorem, there exist neighborhoods B (x0 ) and V (y0 ) and a unique
function f : B (x0 ) ! V (y0 ) such that gk (x; f (x)) = 0 for all x 2 B (x0 ). In turn, this
implies g (x; f (x)) = k for all x 2 B (x0 ). Since f is continuously di erentiable, the result is
proved.
1052 CHAPTER 34. IMPLICIT FUNCTIONS

In view of Proposition 1576, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the basic relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (34.21)
which is the general form of (34.16) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible to
formulate trough them some key properties of these curves. The great e ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (34.15).

For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep the quantity of output produced constant.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, to keep constant the output produced. Therefore, the properties of the function
f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that guarantee the
level k of output. We usually assume that f is:
(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, to keep the quantity produced unchanged to the level k, to lower
quantities of the input x have to correspond larger quantities of the input y (and vice
versa);
(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to
correspond larger and larger quantities of y to compensate (negative) in nitesimal
variations of x to keep production at level k.
Remarkably, as noted after the proof of the Implicit Function Theorem, via Proposition
1581 we can tell which properties of g induce these desirable properties.
Example 1590 Consider a Cobb-Douglas production function g : R2++ ! R given by
g (x; y) = x y 1 , with 0 < < 1. Given any k > 0, let (x0 ; y0 ) 2 R2++ be such that
2
g (x0 ; y0 ) = k. Since g : R++ ! R is continuously di erentiable, with @g (x0 ; y0 ) =@y 6= 0, by
the Implicit Function Theorem there exist neighborhoods B (x0 ) and V (y0 ) and a unique
implicit function fk : B (x0 ) ! V (y0 ) such that g (x; fk (x)) = k for all x 2 B (x0 ). The
implicit function fk is continuously di erentiable, as well as strictly decreasing and strictly
convex because g is strictly increasing and strictly concave (Proposition 1581).7 N
The absolute value jf 0 j of the derivative of the implicit function is called the marginal
rate of transformation because for in nitesimal variations of the inputs, it describes their
degree of substitutability { that is, the variation of y that balances an increase in x. Thanks
to the functional representation (34.21) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classic
interpretation of the rate, which follows from (34.21).
The Implicit Function Theorem implies the classic formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(34.22)
@y (x; y)
7
Later in the chapter we will revisit this example (Example 1607).
34.3. A LOCAL PERSPECTIVE 1053

This is the usual form in which the notion of marginal rate of transformation M RTx;y
appears.

Example 1591 Let g : R2+ ! R be the Cobb-Douglas production function g (x; y) =


x y 1 , with 0 < < 1. The corresponding marginal rate of transformation is
@g 1y1
@x (x; y) x y
M RTx;y = @g
= =
(x; y) (1 )x y 1 x
@y

For example, at a point at which we use equal quantities of the two inputs { that is, x = y {
if we increase the rst input by one unit, the second one must decrease by = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input ve times larger than that of the rst input { that is, y = 5x { an increase of one unit
of the rst input is compensated by a decrease of 5 = (1 ) of the second one. N

Similar considerations hold for the level curves of a utility function u : R2+ ! R, that is,
for its indi erence curves u 1 (k). The implicit functions provided by the Implicit Function
Theorem tell us, locally, how one has to vary the quantity y when x varies to keep the
overall utility level constant. For them we assume properties of monotonicity and convexity
similar to those assumed for the implicit functions de ned by isoquants. The monotonicity
of the implicit function re ects the partial substitutability of the two goods: it is possible to
consume a bit less of one good and a bit more of the other one and yet keep unchanged the
overall level of utility. The convexity of the implicit function models the classic hypothesis
of decreasing rates of substitution: when the quantity of a good, for example x, increases we
then need greater and greater \compensative" variations of the other good y to stay on the
same indi erence curve, i.e., to have u (x; y) = u (x + x; y + y).
Here as well, it is important to note that via Proposition 1581 we can tell which properties
of the utility function u induce these desirable properties, thus for instance making rigorous
the common expression \convex indi erence curves" (cf. Chapter 17). Indeed, they have a
functional representation via convex implicit functions.

In the present case the absolute value jf 0 j of the derivative of the implicit function is
called marginal rate of substitution: it measures the (negative) variation in y that balances
marginally an increase in x. Geometrically, it is the slope of the indi erence curve at (x; y).
Thanks to the Implicit Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)

which is the classic form of the marginal rate of substitution.

Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= = (34.23)
@y (x; y) h0 (u (x; y)) @u
@y (x; y) @u
@y (x; y)
1054 CHAPTER 34. IMPLICIT FUNCTIONS

Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant under strictly increasing (di erentiable) transformations. It
does not depend on which of the two equivalent utility function, u or h u, is considered.
This explains the centrality of this ordinal notion in consumer theory, where after Pareto's
ordinalist revolution it has replaced the cardinal notion of marginal utility (cf. Section 38.5).

Example 1592 To illustrate (34.23), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y

The two utility functions have the same marginal rate of substitution. N

Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by

U (c1 ; c2 ) = u (c1 ) + u (c2 )

where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi erence curve and let (c1 ; c2 ) be a point on it. When the hypotheses
of the Implicit Function Theorem { with the variables exchanged { are satis ed at (c1 ; c2 ),
there exists an implicit function f : B (c2 ) ! V (c1 ) such that

U (f (c2 ) ; c2 ) = k 8c2 2 B (c2 )

The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, so as to keep the overall utility U constant. We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When the number
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (34.24)
u0 (c1 )
exists, it is called intertemporal marginal rate of substitution: it measures the (negative)
variation in c1 that balances an increase in c2 .

Example 1593 For the power utility function u (c) = c = , with for > 0, we have
c1 c2
U (c1 ; c2 ) = +

The intertemporal marginal rate of substitution is (c2 =c1 ) 1


. N
34.3. A LOCAL PERSPECTIVE 1055

34.3.3 Quadratic expansions


The Implicit Function Theorem says, inter alia, that if the function g is continuously dif-
ferentiable, then also the implicit function f is continuously di erentiable. The next result
shows that this important property holds much more generally.

Theorem 1594 In the Implicit Function Theorem, if the function g is n times continuously
di erentiable, so does the implicit function f .8 In particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (34.25)
@g(x;y)
@y

for every x 2 U (x0 ).

This expression can be written in a compact way as


00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
f 00 (x) =
gy03
The numerator somehow reminds of a square formula, so it is easier to remember.

Proof We will omit the proof of the rst part of the statement. Suppose f is twice di er-
entiable and let us apply the chain rule to (34.11), that is to
@g(x;f (x))
gx0 (x; f (x))
f 0 (x) = @x
=
@g(x;f (x)) gy0 (x; f (x))
@y

For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
0 0
00 + g 00 f 0 (x) g 0 00
gxx 00 gx g 0
gxy 00
gx0 gyx 00 gx
gyy
gxx xy y gx0 gyx
00 + g 00 f 0 (x)
yy g0 y g0
f 00 (x) =
y y
2 + 2 = 2 + 2
gy0 gy0 (x; f (x)) gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
= 0 3
gy
as desired.

The two previous theorems allow us to give local approximations for an implicitly de ned
function. As we know, one is rarely able to write the explicit formulation of a function which
is implicitly de ned by an equation: being able to give approximations is hence of great
importance.
If g is of class C 1 on an open set U , the rst-order approximation of the implicitly de ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is
@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y
8
Also analyticity is preserved: if g is analytic, so does f .
1056 CHAPTER 34. IMPLICIT FUNCTIONS

as x ! x0 .
If f is of class C 2 on an open set U , the second-order (or quadratic) approximation of
the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, as x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03

where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).

Example 1595 Given the function in Example 1587 we have

00 2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2


f (x0 ) =
(3x0 + 2y0 )3
so that the quadratic approximation of f is, as x ! x0 ,
2x + 3y0
f (x) = y0 (x x0 )
3x + 2y0
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
(x x0 )2
(3x0 + 2y0 )3
+ o (x x0 )2

in a generic point (x 0 ; y0 ) 2 g 1 (0). For example, in (x 0 ; y0 ) = (0; 1) 2 g 1 (0) we have, as


x ! 0,
3 10 2
f (0) = 1 x x + o (jxj)
2 8
Furthermore, knowing the second derivatives allows us to complete the analysis of the critical
point (x0 ; y0 ) = (1=2; 1). We have f 00 (x0 ) = 316=1331 > 0, so the point is a local minimizer.
N

34.3.4 Implicit functions of several variables


The variables x and y are from a formal standpoint, abstracting from any possible interpre-
tation, symmetrical in equation g (x; y) = 0: we can try to express y in terms of x, so to have
g (x; f (x)) = 0, or x in terms of y, so to have g (f (y) ; y) = 0. Though we have concentrated
on the rst case for convenience, all notions and results are symmetrical in the second case
(as we often noted).
In this section we extend the analysis of implicit functions to the case

g (x1 ; :::; xn ; y) = 0

in which x = (x1 ; :::; xn ) is a vector, while y remains a scalar. In the n + 1 arguments of the
function g : A Rn+1 ! R, we thus separate one of them, denoted by y, from the other
ones. The choice of which argument to label y is, again from a formal standpoint, arbitrary.
Yet, in applications an argument may stand out in terms of interpretation, thus becoming
the one of substantive interest (e.g., y is an output and x is a vector of inputs).
In any case, here we regard x as a vector of independent variables and y as a dependent
variable, so the function implicitly de ned by equation g (x; y) = 0 is a function f of n
34.3. A LOCAL PERSPECTIVE 1057

variables. Fortunately, the Implicit Function Theorem easily extends to this case, mutatis
mutandis: since f is a function of several variables, now the partial derivatives @f (x) =@xk
take the place of the derivative f 0 (x) that we had in the scalar case.

Theorem 1596 Let g : U ! R be de ned (at least) on an open set U of Rn+1 and let
g (x0 ; y0 ) = 0. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique function f :
B (x0 ) ! V (y0 ) such that

g (x; f (x)) = 0 8x 2 B (x0 ) (34.26)

The function f is continuously di erentiable on B (x0 ), with


@g
(x; y)
@f @xk
(x) = (34.27)
@xk @g
(x; y)
@y
for every (x; y) 2 g 1 (0) \ B (x0 ) V (y0 ) and every k = 1; :::; n.

We omit the proof of this result. By using gradients, formula (34.27) can be written as
rx g (x; y)
rf (x) =
@g
(x; y) (x; y)
@y
where rx g denotes the partial gradient of g with respect to x1 , x2 , ..., xn only. Moreover,
being f unique, also in this more general case (34.26) is equivalent to (34.15) and (34.16).

Example 1597 Let g : R3 ! R be de ned by g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By the Implicit Function Theorem, there exists a unique y = f (x1 ; x2 ) de ned in a neigh-
borhood U (6; 3), which is di erentiable there and takes values in a neighborhood V ( 3).
Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2

we have
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y

In particular
12 6
rf (6; 3) = ;
27 27
1058 CHAPTER 34. IMPLICIT FUNCTIONS

The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
formula (34.27) is correct in computing rf (x). N

If in the previous theorems we assume that g is of class C n instead of class C 1 , the


implicitly de ned function f is also of class C n . This allows us to recover formulas analogous
to (34.25) to compute higher order partial derivatives, up to order n included, for the implicit
function f . We omit details for the sake of brevity.
Finally, the convexity and concavity property of the implicit function f follow from points
(ii) and (iii) of Proposition 1581.

N.B. Global versions in the spirit of Proposition 1582 of Theorems 1594 and 1596 can be
easily established, as readers can check. O

34.3.5 Implicit operators

A more general case is


g (x1 ; :::; xn ; y1 ; :::; ym ) = 0

in which both x = (x1 ; :::; xn ) and y = (y1 ; :::; ym ) are vectors. Here g : A Rn+m ! R is a
vector function and the equation implicitly de nes an operator f = (f1 ; :::; fm ) between Rn
and Rm such that

g (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0

Even more generally, we can consider the nonlinear system of equations:


8
>
> g1 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
>
<
g2 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
>
>
>
:
gm (x1 ; :::; xn ; y1 ; :::; ym ) = 0

Here also g = (g1 ; ::; gm ) : A Rn+m ! Rm is an operator and the equation de nes an
operator f = (f1 ; :::; fm ) between Rn and Rm such that
8
>
> g1 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
>
<
g2 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0 (34.28)
>
>
>
:
gm (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0

Let us focus directly on this latter general case. Here the following square submatrix of
34.3. A LOCAL PERSPECTIVE 1059

the Jacobian matrix of the operator g plays a key role:


2 @g1 @g1 @g1 3
@y1 (x; y) @y2 (x; y) @ym (x; y)
6 7
6 7
6 @g 7
6 2 @g2 @g2
(x; y) 7
6 @y1 (x; y) @y2 (x; y) @ym 7
6 7
Dy g (x; y) = 6 7
6 7
6 7
6 7
6 @gm (x; y) @gm (x; y) @gm
(x; y) 7
4 @y1 @y2 @ym 5

We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form of this result that we consider.

Theorem 1598 Let g : U ! Rm be de ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), with
det Dy g (x0 ; y0 ) 6= 0 (34.29)
then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (34.28) holds for every x 2 B (x0 ). The operator f
is continuously di erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (34.30)
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).

The Jacobian of the implicit operator is thus pinned down by formula (34.30). To better
understand this formula, it is convenient to write it as an equality
Dy g (x; y) Df (x) = D g (x; y)
| {z } | {z } | x {z }
m m m n m n

of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng entry of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
by solving the following linear system with m equations:
8 P
> m @g1 @fk @g1
>
> k=1 (x) (x) = (x)
>
> @yk @xj @xj
>
>
>
< Pm @g2 @fk @g2
k=1 (x) (x) = (x)
> @yk @xj @xj
>
>
>
>
>
> Pm @gm @fk @gm
>
: (x) (x) = (x)
k=1
@yk @xj @xj
1060 CHAPTER 34. IMPLICIT FUNCTIONS

By doing this for each j, we can nally determine the Jacobian Df (x) of the implicit
operator.
Example 1599 De ne g = (g1 ; g2 ) : R4 ! R2 by
g1 (x1 ; x2 ; y1 ; y2 ) = 3x1 4ex2 + y12 6y2
g2 (x1 ; x2 ; y1 ; y2 ) = 2x1 y22 y1
4x2 e + y12 1
and let (x0 ; y0 ) = (1; 0; 1; 0). The submatrix of the Jacobian matrix of the operator g
containing the partial derivatives of g with respect to y1 and y2 is given by
" #
2y1 6
Dy g(x; y) =
4x2 ey1 + 2y1 4x1 y2
while that reporting the partial derivatives with respect to x1 and x2 is
" #
3 4ex2
Dx g(x; y) =
2y22 4ey1
The determinant of Dy g(x; y) is jDy g(x; y)j = 8x1 y1 y2 24x2 ey1 + 12y1 , so jDy g(x0 ; y0 )j =
12 6= 0. Condition (34.28) is thus satis ed. By the last theorem, there exists an implicit
operator f = (f1 ; :::; fm ) : B (x0 ) ! V (y0 ) which is continuously di erentiable on B (x0 ).
The partial derivatives
@f1 @f2
(x) and (x)
@x1 @x1
satisfy the following system
" # 2 @f1 3 " #
2y1 6 @x1 (x) 3
4 5=
4x2 ey1 + 2y1 4x1 y2 @f2
@x (x) 2y22
1

while the partial derivatives


@f1 @f2
(x) and (x)
@x2 @x2
satisfy the following system
" #2 @f1
3 " #
2y1 6 @x2 (x) 4ex2
4 5=
4x2 ey1 + 2y1 4x1 y2 @f2
@x2 (x)
4ey1

Solving the two systems, we nd:


@f1 3x1 y2 + 3y22
(x) =
@x1 6x2 ey1 2x1 y1 y2 3y1
@f2 6x2 ey1 + 2y1 y22 3y1
(x) =
@x1 12x2 ey1 4x1 y1 y2 6y1
@f2 6x2 ey1 + 2y1 y22 3y1
(x) =
@x1 12x2 ey1 4x1 y1 y2 6y1
@f2 2y1 ey1 + 4x2 e(y1+ x2 ) 2y1 ex2
(x) =
@x2 2x1 y1 y2 6x2 ey1 + 3y1
So, we computed the Jacobian matrix Df (x) of the operator f . N
34.4. A GLOBAL PERSPECTIVE 1061

Our previous discussion implies, inter alia, that in the special case m = 1 formula (34.30)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj

which is formula (34.27) of the vector function version of the Implicit Function Theorem.
Since condition (34.29) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1. Everything ts together.

34.4 A global perspective

We now return to the global perspective of Section 34.2 and take a deeper look at some of
the motivating questions that we posed in the rst section. For simplicity, we will focus on
the basic equation g (x; y) = 0, where g : C R2 ! R is a function of two variables x and
y. But, before starting the analysis we introduce projections, which will play a key role.

34.4.1 Preamble: projections and shadows

Let A be a subset of the plane R2 , with typical elements (x; y). Its projection

1 (A) = fx 2 R : 9y 2 R such that (x; y) 2 Ag

is the set of point x on the horizontal axis for which there exists a point y on the vertical
axis such that the pair (x; y) belong to A.9
Likewise, de ne the projection

2 (A) = fy 2 R : 9x 2 R such that (x; y) 2 Ag

on the vertical axis, that is the set of points y on the vertical axis for which there exists (at
least) one point x on the horizontal axis such that (x; y) belongs to A.

The projections 1 (A) and 2 (A) are nothing but the \shadows" of the set A R2 on
the two axes, as the following gure illustrates:

9
This notion of projection is not to be confused with the altogether di erent one seen in Section 27.1.
1062 CHAPTER 34. IMPLICIT FUNCTIONS

4
y

0 π (A)
2

-2

-4
O π (A) x
1

-6
-6 -4 -2 0 2 4 6

Example 1600 (i) Let A = [a; b] [c; d]. In this case,

1 (A) = [a; b] and 2 (A) = [c; d]

More in general, if A = A1 A2 , one has

1 (A) = A1 and 2 (A) = A2

The projections of a product set are its factors.


(ii) Let A = x 2 R2 : x2 + y 2 = 1 and B = [0; 1] [0; 1]. Even though A B we obtain

1 (A) = 2 (A) = [ 1; 1] = 1 (B) = 2 (B)

Di erent sets may sharenthe same projections. o


p
(iii) Let B" (x; y) = x 2 R2 : x2 + y 2 < " be a neighborhood of a point (x; y) 2 R2 .
One has
1 (B" (x; y)) = B" (x) = (x "; x + ")

and
2 (B" (x; y)) = B" (y) = (y "; y + ")

We conclude that the projections of a neighborhood (x; y) in R2 are neighborhoods of equal


radius of x and y in R.
(iv) Given f (x) = 1= jxj de ned on R f0g, one has

1 (Gr f ) =R f0g and 2 (Gr f ) = (0; 1)

In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N
34.4. A GLOBAL PERSPECTIVE 1063

34.4.2 Implicit functions and frames


Given a function g : C R2 ! R of two variables, we have
1 1 1
g (0) 1 (g (0)) 2 (g (0)) (34.31)

So, for g (x; f (x)) = 0 to be well posed we need


1 1
x2 1 (g (0)) and f (x) 2 2 (g (0))

If the implicit function f exists, its domain will be included in 1 (g 1 (0)) and its codomain
will be included in 2 (g 1 (0)). This observation motivates the following de nition.

De nition 1601 A rectangle A B R2 is a frame of a function g : C R2 ! R if it is


included in C, with A 1 (0)) and B 1 (0)).
1 (g 2 (g

If we draw the graph of the level curve g 1 (0), a frame A B isolates a portion of the
graph. Over a such portion builds the next de nition.

De nition 1602 The equation g (x;y) = 0, with g : C R2 ! R, implicitly de nes on a


frame A B a function f : A ! B if

g (x; f (x)) = 0 8x 2 A

If such an f is unique, equation g (x;y) = 0 is said to be explicitable on A B.

The uniqueness of the implicit function f is crucial in applications as it guarantees a


univocal relationship between variables x and y. In light of Proposition 1576, in this case
we have
g 1 (0) \ (A B) = Gr f (34.32)
that is,
g (x; y) = 0 () y = f (x) 8 (x; y) 2 A B
A unique implicit function f allows us to represent the level curve g 1 (0) on frame A B by
means of its graph Gr f . In other words, the level curve admits a functional representation.

The following example illustrates these ideas. In particular, it shows that in some frames
the graph is explicitable, in other less fortunate ones, it is not. By changing the framing we
can tell apart di erent parts of the graph according to their explicitability.

Example 1603 Let g : R2 ! R be given by g (x; y) = x2 + y 2 1. The level curve


1
g (0) = (x; y) 2 R2 : x2 + y 2 = 1

is the unit circle. Since 1 (g 1 (0)) = 2 (g 1 (0)) = [ 1; 1], the possible implicit function on
a rectangle A B takes the form f : A ! B with A [ 1; 1] and B [ 1; 1]. Let us x
x 2 [ 1; 1], so to analyze the set

S (x) = y 2 [ 1; 1] : x2 + y 2 = 1
1064 CHAPTER 34. IMPLICIT FUNCTIONS

of solutions y to the equation x2 + y 2 = 1. We have


8
>
> f0g if x = 1
>
< n p o
p
S (x) = 1 x2 ; 1 x2 if 0 < x < 1
>
>
>
:
f0g if x = 1

The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there
are two values y for which g (x; y) = 0. Let us consider the frame given by the projections'
rectangle
A B = [ 1; 1] [ 1; 1]

Any function f : [ 1; 1] ! [ 1; 1] such that

f (x) 2 S (x) 8x 2 [ 1; 1]

entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]

and is thus implicitly de ned by g on frame A B. Such functions are in nitely many; for
example, this is the case for the function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise

as well as for the functions


p p
f (x) = 1 x2 and f (x) = 1 x2 8x 2 [ 1; 1] (34.33)

Therefore, there are in nitely many functions implicitly de ned by g on the rectangle
A B = [ 1; 1] [ 1; 1].10 The equation g (x; y) = 0 is therefore not explicitable on this
rectangle, which makes this case hardly interesting. Let us consider instead the less ambitious
frame
A~ B~ = [ 1; 1] [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de ned by f (x) = 1 x2 is the only function such that
p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

that is, f is the only function implicitly de ned by g on the rectangle A~ ~ Equation
B.
g (x; y) = 0 is then explicitable on frame A~ B,
~ with

g 1
(0) \ A~ ~ = Gr f
B
10
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(34.33).
34.4. A GLOBAL PERSPECTIVE 1065

The level curve g 1 (0) can be represented on A~ ~ by means of the graph of f .


B

y
2.5

1.5

0.5

0
-1 O 1 x
-0.5

-1
-2 -1 0 1 2

In a similar fashion, if we considerp the frame A B = [ 1; 1] [ 1; 0] and if we de ne


h : [ 1; 1] ! [ 1; 0] by h (x) = 1 x2 , we have
p
g (x; h (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

and also that


1
g (0) \ A B = Gr h
The function h is, thus, the only one implicitly de ned by g on the frame A B and the
level curve g 1 (0) can be represented by means of its graph. The equation g (x; y) = 0 is
explicitable on A B.

y
1.5

0.5

-1 1
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2
1066 CHAPTER 34. IMPLICIT FUNCTIONS

To sum up, there are in nitely many implicit functions on frame A B, while uniqueness
can be obtained when we restrict ourselves to the smaller frames A~ B ~ and A B. The
study of implicit functions is of interest on these two rectangles because the unique implicit
function f de ned thereon describes a univocal relationship between the variables x and y
which equation g (x; y) = 0 implicitly determines. N

This example shows, inter alia, how important is to study, for each x 2 1 (0)),
1 (g the
solution set
S (x) = y 2 2 (g 1 (0)) : g (x; y) = 0
The scalar functions f : 1 (g 1 (0)) ! 2 (g 1 (0)), with f (x) 2 S (x) for every x in their
domain, are the possible implicit functions. In particular, when the rectangle A B is such
that S (x) \ B is a singleton for each x 2 A, we have a unique implicit function f : A ! B.
In this case, for each x 2 A there is a unique solution y 2 B to equation g (x; y) = 0.

Let us see another simple example, warning the reader that { though useful to x ideas {
these are fortunate cases: usually constructing S (x) is far from easy (though a local result,
the Implicit Function Theorem is key in this regard).
p
Example 1604 Let g : R2+ ! R be given by g (x; y) = xy 1. We have
1
g (0) = (x; y) 2 R2+ : xy = 1

since 1 (0)) 1 (0))


1 (g = 2 (g = (0; 1), and so

A B (0; 1) (0; 1) = R2++

Let us x x 2 (0; 1) and let us analyze the set

S (x) = fy 2 (0; 1) : xy = 1g

Since
1
S (x) = 8x 2 (0; 1)
x
we consider A B = R2++ and f : (0; 1) ! (0; 1) given by f (x) = 1=x. We have

1
g (x; f (x)) = g x; =0 8x 2 (0; 1)
x

and f is the only function implicitly de ned by g on R2++ . Moreover, we have


1
g (0) \ R2++ = Gr f

The level curve g 1 (0) can be represented on R2++ as the graph of f . N

A nal remark. When writing g (x; y) = 0, variables x and y play symmetric roles, so
that we can think of a relationship of type y = f (x) or of type x = ' (y) indi erently. In
what follows, we will always consider a function y = f (x), as the case x = ' (y) can be
easily recovered via a parallel analysis to that we conduct here.
34.4. A GLOBAL PERSPECTIVE 1067

34.4.3 Comparative statics I


Besides the marginal analysis conducted in Section 34.3.2,11 the study of functions that are
implicitly de ned by equations
g (x; y) = 0 (34.34)
occurs in economics in at least two other settings:

(i) equilibrium analysis, where equation (34.34) derives from an equilibrium condition in
which y is an equilibrium (endogenous) variable and x is an (exogenous) parameter;
(ii) optimization problems, where equation (34.34) comes from a rst-order condition in
which y is a choice variable and x is a parameter.

The analysis of the relationship between x and y, that is, between the values of the
parameter and the resulting choice or equilibrium variable, is a comparative statics exercise.
In view of what we learned in this chapter, it consists in studying the function f implicitly
de ned by the economic relation (34.34). The uniqueness of f , so the explicitability of
equation (34.34), is essential to best conduct comparative statics exercises.
The following two subsections will present these two comparative statics problems.12

Equilibrium comparative statics Consider the market of a given good, as seen in Chap-
ter 13. Let D : [0; b] ! R and S : [0; b] ! R be the demand and supply functions respectively.
A pair (p; q) 2 [a; b] R+ of prices and quantities is said to be a market equilibrium if

q = D (p) = S (p) (34.35)

In particular, having found the equilibrium price p^ by solving the equation D (p) = S (p),
the equilibrium quantity is q^ = D (^
p) = S (^
p).
Suppose that the demand for the good (also) depends on an exogenous variable 0. For
example, may be the level of indirect taxation which in uences the demanded quantity.
The demand thus takes the form D (p; ) and is a function D : [0; b] R+ ! R, that is,
it depends on both the market price p and the value of the exogenous variable. The
equilibrium condition (34.35) now becomes

q = D (p; ) = S (p) (34.36)

and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these simple, yet important, economic questions is equivalent to asking oneself:
(i) whether a (unique) function p = f ( ) which connects taxation and equilibrium prices
(i.e., the exogenous and endogenous variable of this simple market model) exists, and (ii)
which properties such a function has.
To deal with this problem, we introduce the function g : [0; b] R+ ! R given by
g (p; ) = S (p) D (p; ), so that the equilibrium condition (34.36) can be written as

g (p; ) = 0
11
Though in that section we adopted a local angle, the marginal analysis can be carried out globally, as
readers can check (cf. Example 1607 below).
12
In Chapter 41 we will further study comparative statics exercises in optimization problems.
1068 CHAPTER 34. IMPLICIT FUNCTIONS

In particular,
1
g (0) = f(p; ) 2 [0; b] R+ : g (p; ) = 0g
is the set of all pairs of equilibrium prices/taxation levels (i.e., of endogenous/exogenous
variables).
The two questions asked above are now equivalent to asking oneself whether:

(i) a (unique) implicit function p = f ( ) such that g (f ( ) ; ) = 0 for all 0 exists;

(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.

Problems as such, where the relationship among endogenous and exogenous variables
is studied { in particular, how changes in the former impact the latter { are of central
importance in economic theory and in its empirical tests.
To x ideas, let us examine the simple linear case where everything is straightforward.

Example 1605 Consider the linear demand and supply functions:

D (p; ) = (p + )
S (p) = a + bp

where > 0 and b > 0. We have

g (p; ) = a + bp + (p + )

so that the function f : R+ ! R given by

a
f( )= + (34.37)
b+ +b

clearly satis es (34.36). The equation g (p; ) = 0 thus implicitly de nes (and in this case
also explicitly) the function f given by (34.37). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is

q^ = D (f ( ) ; ) = S (f ( ))

In other words, we have a function : R+ ! R, equivalently de ned by ( ) = D (f ( ) ; )


or by ( ) = S (f ( )) such that ( ) is the equilibrium quantity corresponding to the
taxation level . By using function ( ) = S (f ( )) for the sake of convenience, from
(34.37) we get that
b ( a) b
( )=a +
b+ +b
It is a strictly decreasing function, so that changes in the taxation level bring about opposite
changes in the equilibrium quantities as well. N
34.4. A GLOBAL PERSPECTIVE 1069

Optimum comparative statics Consider the optimization problem

max (p; y) sub y 0 (34.38)


y

of a rm with pro t function : [0; 1) ! R given by (p; y) = py c (y), where c : [0; 1) !


R is a di erentiable cost function (cf. Section 22.1.4). The choice variable is the production
level y of some good, say potatoes.
If, as one would expect, there is at least a production level y > 0 such that (y) > 0,
the level y = 0 is not optimal. So, problem (34.38) becomes

max (p; y) sub y > 0 (34.39)


y

Since the interval (0; 1) is open, by Fermat's Theorem a necessary condition for y > 0 to
be optimal is that it satis es the rst-order condition

@ (p; y)
=p c0 (y) = 0 (34.40)
@y
The key aspect of the producer's problem is to assess how the optimal production of potatoes
varies as the market price of potatoes changes, i.e., how the production of potatoes is a ected
by their price. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that
p c0 (f (p)) = 0 8p 0
that is, by the function implicitly de ned by the rst-order condition (34.40). Function
f is referred to as the producer's supply function (of potatoes). For each price level p, it
gives the optimal quantity y = f (p). Its existence and properties (for example, if it is
increasing, so that higher prices lead to larger produced quantities of potatoes, hence larger
supplied quantities in the market) are of central importance in studying a good's market. In
particular, the sum of the supply functions of all producers who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 13.
To formalize the derivation of the supply function from the optimization problem (34.39),
we de ne a function g : [0; 1) (0; 1) ! R by

g (p; y) = p c0 (y)

The rst-order condition (34.40) can be rewritten as

g (p; y) = 0

If there exists an implicit function y = f (p) such that g (p; f (p)) = 0, it is nothing but the
supply function itself. Let us see a simple example where the function f and its properties
can be recovered with simple computations.

Example 1606 Consider quadratic costs c (y) = y 2 for y 0. Here g (p; y) = p 2y, so the
only function f : [0; 1) ! [0; 1) implicitly de ned by g on R2+ is f (p) = p=2. In particular,
f is strictly increasing, so that higher prices entail a higher production, and hence a larger
supply. N
1070 CHAPTER 34. IMPLICIT FUNCTIONS

34.4.4 Properties
The rst important problem one faces when analyzing implicit functions is that of determin-
ing which conditions on function g guarantee that equation g (x; y) = 0 is explicitable on a
frame, that is, it de nes a unique implicit function over there. Later in the book we will
establish a Global Implicit Function Theorem (Section 35.4), a deep result. Here we can,
however, establish a few simple, yet quite interesting, facts that follow from Propositions
1577 and 1578.
For simplicity, set A = 1 (g 1 (0)) and B = 2 (g 1 (0)) and let us focus on the frame
given by the projections' rectangle A B.13 For the problem to be well posed, it is necessary
that
S (x) = fy 2 B : g (x; y) = 0g =
6 ; 8x 2 A (34.41)
So, for every possible x at least a solution (x; y) to equation g (x; y) = 0 exists. As previously
noted, every scalar function f : A ! B with f (x) 2 S (x) for all x 2 A is a possible implicit
function.
In view of Proposition 1577, the non-emptiness condition (34.41) holds if

inf g (x; y) 0 sup g (x; y) 8x 2 A


y2B y2B

Moreover, by Proposition 1578, if g is strictly monotone in y then equation g (x; y) = 0


de nes a unique implicit function f : A ! B on the rectangle A B.

The results of Section 34.2 permit to ascribe some notable properties to the implicit
function. Speci cally, let f : A ! B be the unique function such that g (x; f (x)) = 0 for all
x 2 A. By Propositions 1581 and 1582, if g is strictly increasing in y, then f is:14

(i) strictly decreasing if g is strictly increasing in x;

(ii) (strictly) convex if g is (strictly) quasi concave;

(iii) (strictly) concave if g is (strictly) quasi convex;

(iv) continuous if g is continuous;

(v) continuously di erentiable, with

@g
(x; y)
0
f (x) = @x 8 (x; y) 2 g 1
(0)
@g
(x; y)
@y

if g is continuously di erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.
13
So, this rectangle is included in the domain of g. In any case, what we establish in what follows is easily
seen to hold for any frame of g.
14
In points (ii) and (iii) we tacitly assume that the domain of C is convex, while in points (iv) and (v) we
assume that it is open.
34.4. A GLOBAL PERSPECTIVE 1071

Point (ii) makes rigorous in a global sense { in contrast to the local one already remarked
in Section 34.3.2 { the expression \convex indi erence curves" by showing that they are,
indeed, represented via convex implicit functions.

Example 1607 Consider the Cobb-Douglas production function g : R2++ ! R given by


g (x; y) = x y 1 , with 0 < < 1, on R2++ . In Example 1590 we showed via the Implicit
Function Theorem that, given any k > 0, equation g (x; y) = k implicitly de nes a unique
fk : B (x0 ) ! V (y0 ) at the point (x0 ; y0 ) 2 g 1 (k). But, do we really need the Implicit
Function Theorem? Using the results of Section 34.2 we can actually do much better:
equation g (x; y) = k implicitly de nes a unique fk : (0; 1) ! (0; 1) on the entire R2++
{ so, globally and not just locally at a point (x0 ; y0 ) 2 g 1 (k). Indeed, we can invoke
Propositions 1577 and 1578 since g is continuous and strictly increasing in y, while condition
(34.3) holds because

inf g (x; y) = 0 and sup g (x; y) = +1 8x > 0


y>0 y>0

Thus, the results of Section 34.2 are all what we need in this example, there is no need to
invoke the Implicit Function Theorem. For instance, the continuous di erentiability of fk
follows from Proposition 1582 since @g (x; y) =@y > 0 for all (x; y) 2 R2++ . In sum, here the
Implicit Function Theorem actually delivers an inferior, local rather than global, result. N

34.4.5 Comparative statics II


Let us use the observations just made for the comparative statics problems of Section 34.4.3.

Equilibrium comparative statics: properties We begin with the equilibrium problem


with indirect taxation . Suppose that:

(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .

(ii) D is strictly decreasing in p and S is strictly increasing.

The function g : [0; b] R+ ! R given by g (p; ) = S (p) D (p; ) is therefore strictly


increasing in p. Since condition (34.3) holds,15 by Propositions 1577 and 1578 the equation
g ( ; p) = 0 de nes a unique function p = f ( ) such that

g (f ( ) ; ) = 0 8 0

By Proposition 1581, it is

(i) continuous because D and S are continuous;

(ii) strictly decreasing because D is strictly decreasing in ;

(iii) (strictly) convex if S is (strictly) quasi concave and D is (strictly) quasi convex.
15
Indeed D and S are continuous and, furthermore, D (0; ) S (0) and D (b; ) S (b) for every .
1072 CHAPTER 34. IMPLICIT FUNCTIONS

Property (ii) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing: changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 1605, the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.

Optimum comparative statics: properties Consider the optimization problem

max F ( ; c) sub c 2 (a; b)


c

where c is the choice variable and 0 parameterizes the objective function F : (a; b)
[0; 1) ! R. Assume that F is partially derivable. If the partial derivative @F ( ; c) =@c is
strictly increasing in c { for example, @ 2 F ( ; c) =@c2 > 0 if F is twice di erentiable { and if
condition (34.3) holds, then by Propositions 1577 and 1578 the rst-order condition

@F ( ; c)
g (c; ) = =0
@c
implicitly de nes a unique function f : [0; 1) ! (a; b) such that

@F ( ; f ( ))
=0 8 0
@c
By Proposition 1581, the function f is:

(i) continuous if @F=@c is continuous;

(ii) strictly decreasing if @F=@c is strictly decreasing in ;

(iii) (strictly) convex if @F=@c is (strictly) quasi concave.

In the special case of the producer's problem, market prices p are the parameters and
production levels y are the choice variables. So, F (p; y) = py c (y) is the pro t function
and
@F (p; y)
g (p; y) = = p c0 (y)
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, in the standard case when
c0 is strictly increasing (so, c is strictly convex), the function g is concave, which implies that
the supply function y = f (p) is convex. In such a case, since g is strictly increasing in p, the
supply function is strictly increasing in p.
Chapter 35

Equations and inverse functions

35.1 Equations
Let
f :A!B
be an operator between two subsets A and B of Rn . A general form of an equation is

f (x) = y0 (35.1)

that is, 8
>
> f1 (x1 ; :::; xn ) = y01
>
>
<
f2 (x1 ; :::; xn ) = y02
>
>
>
>
:
fn (x1 ; :::; xn ) = y0n
where y0 is a given element of B.1 The variable x is the unknown of the equation and y0 is
the known term. The solutions of the equation are all x 2 A such that f (x) = y0 .
A basic taxonomy: equation (35.1) is

(i) linear if the operator f is linear;

(ii) homogeneous if y0 = 0;

(iii) polynomial if n = 1 and f is a polynomial.

Earlier in the book we studied homogeneous equations (Chapter 14) and linear equations
(Section 15.7). First-degree and second-degree equations are polynomial equations familiar
from, at least, high school.
Two basic existence and uniqueness questions can be asked about the solutions of equa-
tion (35.1), from local and global angles:

(Q.i) can the equation be solved globally: given every y0 2 B, is there x 2 A that satis es
(35.1)? if so, is the solution unique?
1
We write y0 to emphasize that it should be regarded as a xed element of Rn and not as a variable.

1073
1074 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

(Q.ii) can the equation be solved locally: given a y0 2 B, is there x 2 A that satis es (35.1)?
if so, is the solution unique?

Before further discussing these questions, let us formalize them. To this end, observe
that the set of all solutions of equation (35.1) is given by the preimage
1
f (y0 ) = fx 2 A : f (x) = y0 g

So, the previous questions can be addressed via the inverse correspondence f 1 :B A
de ned by
f 1 (y) = fx 2 A : f (x) = yg 8y 2 B
with domain Im f B (cf. Example 950). We say that f is weakly invertible at y 2 B if
f (y) is non-empty, that is, if y 2 Im f . When, in addition, f 1 (y) is a singleton, we say
1

that f is invertible at y. In particular, f is invertible at all y 2 Rn when it is a bijection,


i.e., when it has an inverse function f 1 : B ! A (Section 6.4.1).
Using this terminology, the above questions can be rephrased in more precise terms as
follows:

(Q.i) is f globally weakly invertible? if so, is it a bijection?

(Q.ii) is f weakly invertible at y0 2 B, i.e., does y0 belong to Im f ? if so, is it invertible at


y0 ?

The global question (Q.i) is more demanding, but also more important, than the local
one (Q.ii). The unique existence of solutions at each y0 2 B amounts to the existence of the
inverse function f 1 : B ! A, which describes how solutions vary as the known term varies.

Example 1608 Consider the second-order equation ax2 + bx + c = 0, where a, b and c are
scalar coe cients with a 6= 0. As well-known, the solution formula is
p
b b2 4ac
x=
2a
We can write it in the format (35.1) as

ax2 + bx = y

via the second-degree polynomial f : R ! R given by f (x) = ax2 + bx and the known term
y = c. Solutions are then described by the inverse correspondence f 1 : R R given by:2
8 p p
>
< b b2 +4ay
;
b+ b2 +4ay
if y 4ab2
1 2a 2a
f (y) =
>
: b2
; if y < 4a

Thus, the knowledge of the solution formula amounts to the knowledge of the inverse corre-
spondence. As the known term y varies, solutions may exist or not, may be unique or not.
2
The condition y b2 =4a amounts to the positivity of the discriminant b2 + 4ay.
35.2. WELL-POSED EQUATIONS 1075

For instance, for the quadratic f (x) = x2 the inverse correspondence f 1 :R R is given
by ( p p
1
y; y if y 0
f (y) =
; if y < 0
So, a unique solution exists when y = 0, so when the equation is homogeneous. Two distinct
solutions exist when y > 0, no solution exists when y < 0. In this case, where we posited
A = B = R, we can only answer the local question (Q.ii). But, we may be less ambitious
and restrict ourselves to A = B = (0; 1). In this case, there exist unique solutions described
p
by the inverse function f 1 : (0; 1) ! R given by f 1 (y) = y. N

As this example clari es, the choice of the solution domain A and of the set of known
terms B determines the nature of the equation: the same function, in the example the
quadratic, de nes di erent equations under di erent sets A and B.
Couple of further remarks are in order. First, every equation (35.1) can be put in a
homogeneous form fy0 (x) = 0 via the auxiliary function fy0 (x) = f (x) y0 . If we are
interested in addressing question (Q.ii), so what happens at a given y0 , it is then without
loss of generality to consider homogeneous equations. This is what we did, for example,
in Chapter 14 where we presented a few non-trivial results about homogeneous equations.
However, for the global question (Q.i) it is important to keep track of the known term by
studying the general form f (x) = y0 .
Second, though in the chapter we focus on the basic case m = n, equations can be de ned
more generally through operators f : A Rn ! B Rm between di erent Euclidean spaces.
For linear equations, we rst studied the case n = m (Section 15.7) and then generalized the
analysis to any n and m (Section 15.8). In this chapter such a generalization is not pursued.

35.2 Well-posed equations


A satisfactory answer to the previous questions opens a robustness issue:

(Q.iii) if f is a bijection, is its inverse f 1 continuous?

In words, if a unique solution exists for each value of the known term, does it change
continuously as the known term changes?
This is a question about the \robustness" of unique solutions, whether they change
abruptly, discontinuously, under small changes of the known term. If they did, the equation
would feature an unpleasant instability because small changes in the known term would
result in signi cant changes in its solutions.
To address this question, we introduce a piece of terminology.

De nition 1609 Equation (35.1) is said to be well posed if f is a bijection with continuous
inverse f 1 .

Next we present a basic example of a well-posed equation.

Example 1610 The linear equation

A x=b
n n
1076 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

is well posed if and only if det A 6= 0. The linear operator T : Rn ! Rn de ned by


T (x) = Ax is bijective if and only if the matrix A is invertible, i.e., if and only if det A 6= 0
(Cramer's Theorem). Condition det A 6= 0 thus ensures that, for each b 2 Rn , there is a
unique solution x 2 Rn given by T 1 (b) = A 1 b. Thus, the inverse T 1 : Rn ! Rn is well
de ned; as it is linear, by Proposition 671 it is continuous. The inverse T 1 is, therefore, a
continuous function that describes how solutions vary as b varies. N

The general study of well-posed equations is based on the following deep, innocuous
looking, result proved by Luitzen Brouwer in 1911.3

Theorem 1611 (Domain Invariance) An injective continuous operator f : U ! Rn has


open image Im f and continuous inverse f 1 : Im f ! Rn .

Before turning to the proof, to warm up we report an interesting consequence of this


result.

Corollary 1612 An injective continuous operator f : A Rn ! Rn maps open sets into


open sets.

In general, continuity does not preserve openness, as Example 596 shows. Yet, under a
basic condition like injectivity it does, a remarkable fact.

Proof Let U be an open subset of the domain A. We want to show that its image f (U ) is
open. The restriction fjU of f on U is injective and continuous. By the Domain Invariance
Theorem, the set f (U ) = Im fjU is open.

Proof of the Domain Invariance Theorem We prove this result only in a special di er-
ential case:4 we assume that the injective function f : U ! Rn is continuously di erentiable,
with det Df (x) 6= 0 for all x 2 U . We rst show that its image Im f is open. To this end,
let x0 2 U and consider a neighborhood B" (x0 ) U . De ne

' : @B" (x0 ) = fx 2 U : kx x0 k = "g ! R

by ' (x) = kf (x) f (x0 )k. The function f is easily seen to be continuous. It is also strictly
positive: since f is injective, we have f (x) 6= f (x0 ) for all x 2 @B" (x0 ), and so ' (x) > 0
for all x 2 @B" (x0 ). Since @B" (x0 ) is compact, by the Weierstrass Theorem ' has then a
minimum value m > 0.
Consider the neighborhood Vm=2 (f (x0 )). We want to show that Vm=2 (f (x0 )) f (U ),
thus proving that f (U ) is open. Fix y 2 Vm=2 (f (x0 )). We want to show that y 2 f (U ). To
this end, de ne
hy : B" (x0 ) = fx 2 U : kx x0 k "g ! R
by hy (x) = kf (x) yk. Since B" (x0 ) is compact, by the Weierstrass Theorem hy has a
minimizer xy 2 B" (x0 ). For each x 2 @B" (x0 ) we have

hy (x) = kf (x) yk = kf (x) f (x0 ) (y f (x0 ))k


m m
kf (x) f (x0 )k ky f (x0 )k ' (x) m =m
2 2
3
Throughout this chapter, U and V denote open subsets of Rn .
4
We follow Apostol (1974) p. 369.
35.2. WELL-POSED EQUATIONS 1077

where the rst inequality follows from (4.11). Yet, hy (x0 ) = kf (x0 ) yk < m=2 because
y 2 Vm=2 (f (x0 )). So, the minimizer xy does not belong to the boundary @B" (x0 ). That is,
P
it belongs to B" (x0 ). Clearly, xy minimizes also h2y , given by h2y (x) = ni=1 (fi (x0 ) yi )2 .
By Fermat's Theorem, rh2 (xy ) = 0, that is,
n
X @fi (xy )
(fi (xy ) yi ) =0 8j = 1; :::; n
@xj
i=1

Since det Df (xy ) 6= 0, this is an homogeneous linear system that, by Cramer's Theorem,
has 0 as its unique solution. Therefore, fi (xy ) = yi for each i = 1; :::; n. We conclude that
f (xy ) = y, so that y 2 f (U ), as desired.
It remains to prove that the bijective function f 1 : Im f ! Rn is continuous. Clearly,
Im f 1 = U . So, let G be any open subset of U . By Corollary 601, it is enough to prove
that the set f (G) = (f 1 ) 1 (G) is open. Let fjG be the restriction of f on G. By what
has been just proved, Im fjG is an open set. Since Im fjG = f (G), we conclude that f 1 is
continuous.

Let us denote by V the image of f , so we can write f as a continuous bijection


f :U !V
By the Domain Invariance Theorem, the set V is open and the inverse f 1 : V ! U is
continuous. To put in perspective this remarkable nding, recall that bijective functions play
a fundamental role as criteria of similarity between mathematical entities, as we remarked
in studying the cardinality of sets (Section 7.2). In this spirit, next we introduce a class of
bijective functions that provide a criterion of topological similarity across open sets of Rn .

De nition 1613 A bijective operator f : U ! V between two open sets U and V of Rn is


said to be an homeomorphism if both f and its inverse f 1 are continuous.

Two open sets U and V that admit an homeomorphism f : U ! V are called homeo-
morphic.5 This map has the nature of a topological isomorphism between the two sets. To
see why, denote by U and V the collections of all open subsets of U and V , respectively.
In view of the Domain Invariance Theorem, it is easy to see that f is a bijective correspon-
dence between the elements of U and V : we have f (U 0 ) 2 V for all U 0 2 U as well as
f 1 (V 0 ) 2 U for all V 0 2 V . We can thus move back and forth between the collections U
and V through f . The homeomorphic sets U and V are thus isomorphic from a topological
standpoint.
By the Domain Invariance Theorem, it is enough to check that a bijective operator
f : U ! f (U ) is continuous to establish that its domain U and its image V = f (U ) are two
homeomorphic open sets. No need to check the properties of the inverse, in particular the
openess of its domain and its continuity, a big relief.

Example 1614 (i) Bounded open intervals are homeomorphic: given any two of them (a; b)
and (c; d), it is enough to consider the continuous bijection f : (a; b) ! (c; d) given by
d c
f (x) = (x a) + c (35.2)
b a
5
In more advanced courses, readers will learn that this notion extends well beyond open sets.
1078 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

(ii) The open unit ball B1 (0) = fx 2 Rn : kxk < 1g is homeomorphic to Rn : the map
f : B1 (0) ! Rn given by
x
f (x) = q (35.3)
1 kxk2
is a continuous bijection (cf. Example 213). N

When two open sets U and V are homeomorphic we write U =0 V .

Proposition 1615 The binary relation =0 is an equivalence relation.

Proof As the other properties are trivial, we consider only transitivity. Let U , V and G be
three open sets of Rn with U =0 V and V =0 G. By de nition, there exist homeomorphisms
f : U ! V and g : V ! G. Their composition g f : U ! G is easily checked to be an
homeomorphism (cf. Proposition 566), i.e., homeomorphisms are closed under composition.
Thus, U =0 G.

The transitivity of =0 permits to nd new pairs of homeomorphic sets from old ones.
For instance, it is easy to check that the neighborhoods of points of Rn are homeomorphic
to the unit open ball (why?). The transitivity of =0 then immediately implies that any two
neighborhoods in Rn are homeomorphic.
Not all open sets of Rn are homeomorphic, as the next basic example shows.

Example 1616 In the real line, the open sets U = (0; 1) and V = (2; 3) [ (5; 6) are not
homeomorphic. For, suppose per contra that there exists a continuous injective map f :
U ! V . By Proposition 580, V = Im f is an interval, a contradiction. N

We close this little homeomorphic excursion with an interesting result, whose proof we
omit, that covers the Example 1614.

Proposition 1617 The open convex sets of Rn are homeomorphic.

Back to equations, our main object of interest, the Domain Invariance Theorem shows
that, in the important case when the set B of known terms is open, an equation

f (x) = y0

de ned via a bijection f : A ! B is well posed if and only if f is an homeomorphism. Indeed,


when f 1 : B ! A is continuous, the Domain Invariance Theorem implies that A is open
and that f 1 , so f , is an homeomorphism.
This remark clari es the nature of a well-posed equation, so the scope of the robustness
question (Q.iii) that opened this section. Yet, the Domain Invariance Theorem is also useful
to answer this question: when the solution domain A is open, to check whether f is an
homeomorphism it is enough to check that f is continuous. No need to worry about the
inverse, often a big relief as previously remarked. The last example on linear equations ts
perfectly in this schema: the linear operator T is easily checked to be a continuos bijection
if and only if det A 6= 0. But, of course, the Domain Invariance Theorem provides a general
schema to verify well posedness that goes well beyond linear equations.

Next we scale up the analysis by considering a di erential notion of similarity.


35.2. WELL-POSED EQUATIONS 1079

De nition 1618 An homeomorphism f : U ! V is said to be a C k -di eomorphism if f


and its inverse f 1 are both 1 k 1 times continuously di erentiable.

For instance, the exponential function f (x) = ex is a C 1 -di eomorphism f : R ! (0; 1)


with inverse f 1 : (0; 1) ! R given by the natural logarithm f 1 (x) = log x.
A C k -di eomorphism is, by de nition, an homeomorphism. The converse is false: the cu-
bic function f : R ! R given by f (x) = x3 is a simple example of an homeomorphism which
is not a C 1 -di eomorphism (at the origin its inverse is not di erentiable). This di erential
notion, indeed, substantially strenghten the earlier purely topological notion of homeomor-
phism. In particular, the higher k, the better. The \ideal" case k = 1 occurs when both f
and its inverse f 1 have continuous partial derivatives of all orders.

De nition 1619 An equation (35.1), with open solution domain A, is said to be smoothly
well posed if f : A ! B is a C 1 -di eomorphism.

This is best possible kind of stability for an equation. By the Domain Invariance Theorem,
the set B of known terms is open.

Example 1620 Given an injective polynomial f : R ! R, so with either f 0 > or f 0 < 0,


the map f : R ! Im f is a C 1 -di eomorphism. It de nes a smoothly well-posed polynomial
equation. N

We close by brie y discussing a classi cation of open sets according to their di erential
similarity.

De nition 1621 Two open sets U and V of Rn are di eomorphic, written U =1 V , if


there exists a C 1 -di eomorphism f : U ! V .

By the chain rule, =1 is easily checked to be an equivalence relation. Clearly, a di eo-


morphic pair of open sets is also homeomorphic, i.e.,

U =1 V =) U = V

A natural question is when the converse holds, that is, when two homeomorphic open sets
are actually di eomorphic, so linked via a C 1 -di eomorphism rather than \just" via an
homeomorphism. For instance, this is the case for the homeomorphic open sets in Example
1614 since the maps (35.2) and (35.3) are both easily checked to be C 1 -di eomorphisms.
This is, indeed, a special case of the following result, which signi cantly improves Proposition
1617.6

Proposition 1622 The open convex sets of Rn are di eomorphic to Rn .

Any two open convex sets U and V of Rn are thus linked through a C 1 -di eomorphism
f : U ! V , a remarkable property. Yet, this is just the beginning of a long journey in the
di erential structure of sets of Rn , left to more advanced courses.7
6
See, e.g., Gonnord and Tosel (1998) p. 60.
7
It the subject matter of di erential topology (economic applications of this topic can be found in Mas-
Colell, 1985).
1080 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

35.3 Local analysis


In this section we prove the Inverse Function Theorem, an important result that provides
a simple di erential condition ensuring the local invertibility of an operator, so the local
unique solvability of equations.

35.3.1 A closer look at di eomorphisms


We begin the analysis with a basic regularity result for C 1 -di eomorphisms.

Proposition 1623 A C 1 -di eomorphism f : U ! V has, at each x 2 U , a non-singular


Jacobian matrix Df (x), with
1 1
Df (y) = (Df (x)) 8x 2 U (35.4)

where y = f (x).

Proof Let I : Rn ! Rn be the identity function I (x) = x for all x 2 Rn . Since I = f 1 f ,


by the Chain rule (Theorem 1296) it holds dI (x) = df 1 ((f (x))) df (x) for each x 2 U . As
dI (x) = I for each x 2 Rn , we can thus write:
1
x = df ((f (x))) df (x) 8x 2 U

We conclude that the linear operator df (x) : Rn ! Rn is invertible at all x 2 U .


By Theorem 1291, the Jacobian matrix is the matrix associated to the di erential op-
erator df (x). The invertibility of df (x) thus amounts to the non-singularity of Df (x),
i.e., det Df (x) 6= 0 for all x 2 U . Finally, by the chain rule formula (27.42) we have
Df 1 (f (x)) Df (x) = I. As det Df (x) 6= 0, formula (35.4) holds.

Thus, a C 1 -di eomorphism f : U ! V has a non-singular Jacobian at each point of its


domain, i.e.,
det Df (x) 6= 0 8x 2 U (35.5)
The inversion formula (35.4) has as a special case, for n = 1, the basic formula (26.20) on the
derivative of the inverse of a scalar function, i.e., (f 1 )0 (y0 ) = 1=f 0 (x0 ). It sheds further light
on the nature of a C 1 -di eomorphism. To see why, recall that { as just mentioned in the proof
{ the Jacobian matrix is the matrix associated to the di erential operator df (x) : Rn ! Rn ,
i.e.,
df (x) (h) = Df (x) h 8h 2 Rn
The Jacobian condition (35.5) thus amounts to require the di erential operator to be in-
vertible at all x 2 U . Its inverse operator d 1 f (x) : Rn ! Rn is, at each x 2 U , given
by
d 1 f (x) (h) = (Df (x)) 1 h 8h 2 Rn
as, by de nition, the inverse matrix is associated to the inverse operator. By the inversion
formula (35.4),
1 1 1 1
df (y) (h) = Df (y) h = (Df (x)) h=d f (x) (h) 8h 2 Rn
35.3. LOCAL ANALYSIS 1081

where y = f (x). So, the \di erential of the inverse\ coincides with the \inverse of the
di erential". Formula (35.4) thus ensures the mutual consistency of the linear approxima-
tions at x of f and of its inverse f 1 . This consistency is a key regularity property of
C 1 -di eomorphisms.
The converse of Proposition 1623 is false for n 2, even on a convex domain:8 later in
the chapter, in Example 1646 we will see a continuously di erentiable operator f : R2 ! R2 ,
with det Df (x) 6= 0 for each x 2 R2 , which is not injective { let alone a C 1 -di eomorphism.
Next we give another example of this kind.

Example 1624 Let U be the open convex set R2++ . De ne a continuously di erentiable
operator f : U ! R2 by f (x1 ; x2 ) = x21 x22 ; 2x1 x2 . We have

2x1 2x2
Df (x) =
2x2 2x1

and so Df (x) = 4 x21 + x22 > 0 for all x = (x1 ; x2 ) 2 U . Yet, f (x) = f ( x) for all x 2 U
and so f is not injective. N

As to the scalar case n = 1, for a convex domain we can state the converse.

Proposition 1625 Let U be an open interval, bounded or not, of the real line. A continu-
ously di erentiable function f : U ! f (U ) is a C 1 -di eomorphism if and only if f 0 (x) 6= 0
for all x 2 U .

Proof In view of the last result, we only need to prove the \if" part. Let f : U ! R be a
continuously di erentiable function with f 0 (x) 6= 0 for all x 2 U . Let x0 2 U be such that
f 0 (x0 ) > 0 (the case < 0 is similar). Then, f 0 > 0. For, suppose that f 0 (x1 ) < 0 for some
x1 2 U . Since the derivative function f 0 : U ! R is continuous, by the Bolzano Theorem
there exists x 2 U such that f 0 (x) = 0, a contradiction. We conclude that f 0 > 0. Thus,
f is strictly increasing, so injective. Set V = Im f . By the Invariant Domain Theorem,
f : U ! V is an homeomorphism. By Theorem 1234, f 1 : V ! R is di erentiable
with (f 1 )0 (y) = 1=f 0 (x) for y = f (x). Take any y 2 V . Let fyn g V converge to
y. There exists a sequence fxn g U with f (xn ) = yn for each n. As f 0 is continuous,
0
lim(f 1 )0 (yn ) = lim 1=f 0 (xn ) = 1=f 0 (x) = f 1 (y). We conclude that f 1 is continuous
at y 2 V . Thus, f : U ! V is a C 1 -di eomorphism.

35.3.2 Inverse Function Theorem


While a full- edged converse of Proposition 1623 fails, the upcoming Inverse Function The-
orem can be seen as a local converse: for a continuously di erentiable function f , the non-
singularity of the Jacobian at a point x of its domain ensures the existence of a small enough
open set A containing of x over which f becomes a C 1 -di eomorphism. More is actually
true: the result holds for each 1 k 1, so it goes well beyond the basic k = 1 case.
8
On a non-convex domain it is hopeless: e.g., the absolute value function on the non-convex open domain
( 1; 0) [ (0; 1) is continuously di erentiable with a non-zero derivative everywhere, yet it is de nitely not
injective.
1082 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

Theorem 1626 (Inverse Function Theorem) Let f : U ! Rn be a 1 k 1 times


continuously di erentiable operator. If

det Df (x0 ) 6= 0 (35.6)

at x0 2 U , then there exists an open subset A of U containing x0 such that

f : A ! f (A) (35.7)

is a C k -di eomorphism.

The local Jacobian condition (35.6) thus ensures that locally, over a small enough open
set A containing x0 , the operator f is a C k -di eomorphism. As a result, by Proposition 1623
we then have
Df (x) 6= 0 8x 2 A
as well as
1 1
Df (y) = (Df (x)) 8x 2 A (35.8)
where y = f (x).
The role of the Jacobian condition (35.6) in this important theorem suggests the following
classi cation.

De nition 1627 Given a di erentiable operator f : U ! Rn , we say that x 2 U is a:

(i) regular point of f if det Df (x) 6= 0;

(ii) critical or stationary point of f if det Df (x) = 0.

Thus, a point of the domain a di erentiable f : U ! R is either regular or critical. Under


this terminology, the Inverse Function Theorem says that, at a regular point, a continuously
di erentiable operator is locally a C 1 -di eomorphism.
Let us now turn to the proof of this important theorem. Remarkably, it is based on the
Implicit Function Theorem.9 It also relies on some lemmas of independent interest that show
some noteworthy topological consequences of Jacobian conditions.

Lemma 1628 Let f : U ! Rn be continuously di erentiable. If x0 is a regular point of f ,


then there is a neighborhood B (x0 ) on which f is injective.

Proof We consider only the easy scalar case n = 1, when condition det Df (x0 ) 6= 0 takes
the basic form f 0 (x0 ) 6= 0.10 Let f 0 (x0 ) > 0 (the case < 0 is similar). Being f 0 continuous,
there is a neighborhood B (x0 ) of x0 on which f 0 is strictly positive. Thus, f is strictly
increasing, so injective, on B (x0 ).

The next lemma follows from the previous one along with the Domain Invariance Theo-
rem.
9
Also the converse is true, so one can rst prove either theorem and get the other as a simple consequence
(cf. Theorem 1648).
10
For the proof of the more involved general case, we refer readers to Apostol (1974) p. 370.
35.3. LOCAL ANALYSIS 1083

Lemma 1629 Let f : U ! Rn be continuously di erentiable. If each point x of U is regular,


then f maps open sets into open sets.

Proof Let G be an open subset of U . We want to show that f (G) is open. Let x 2 G.
By Lemma 1628, there is a neighborhood B (x) on which[f is injective. By the Domain
Invariance Theorem, the set f (B (x)) is open. Since G = B (x), we have
x2G
!
[ [
f (G) = f B (x) = f (B (x))
x2G x2G

We conclude that the set f (G) is open because it is the union of open sets.

Proof of the Inverse Function Theorem By Lemma 1628, there is a neighborhood B (x0 )
on which f is injective. By setting V = f (B (x0 )), the function f : B (x0 ) ! V is bijective.
By the Domain Invariance Theorem, the set V is open. Since f is continuously di erentiable,
the real-valued Jacobian function det Df (x) is continuous. We can thus assume, without
loss of generality, that det Df (x) 6= 0 for all x 2 B (x0 ). In the rest of the proof we consider
the restriction f : B (x0 ) ! V of f .
The set B (x0 ) V is open in R2n . De ne g : B (x0 ) V ! Rn by

g (x; y) = f (x) y (35.9)

Since (x0 ; y0 ) 2 g 1 (0), by condition (35.6) we have

det Dx g (x0 ; y0 ) = det Df (x0 ) 6= 0

The operator version of the Implicit Function Theorem (Theorem 1598), in \exchanged"
form, then ensures the existence of neighborhoods V~ (y0 ) V and B~ (x0 ) B (x0 ) and of a
unique function ' : V~ (y0 ) ! B
~ (x0 ) such that ' (y0 ) = x0 and

g (' (y) ; y) = 0 8y 2 V~ (y0 )

By (35.9),
f (' (y)) = y 8y 2 V~ (y0 )
This relation implies that ' is injective. Since f f 1 (y) = y for all y 2 V , we conclude
that f 1 = ' on V~ (y0 ). Thus, f 1 (V~ (y0 )) = Im ', so that Im ' is an open set. Therefore,
f : Im ' ! V~ (y0 ) is a continuous bijection de ned on an open set that contains x0 (because
' (y0 ) = x0 ). By the Domain Invariance Theorem, f is an homeomorphism. Since Im '
B (x0 ), we have det Df (x) 6= 0 for all x 2 Im '. Finally, k times continuously di erentiability
also follows from the Implicit Function Theorem. By setting A = Im ' the result is proved.

The Inverse Function Theorem relies on two hypotheses, the Jacobian condition (35.6)
and the continuous di erentiability hypothesis (of some order k 1). In general, it fails
when we remove either hypothesis. A non-trivial, omitted, example can be given to show
that plain di erentiability is not enough for the theorem. So, continuous di erentiability is
needed. The next simple example shows that the Jacobian condition is needed.
1084 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

Example 1630 The quadratic and cubic functions f (x) = x2 and f (x) = x3 are contin-
uously di erentiable functions on the real line that do not satisfy the Jacobian condition
(35.6) at the origin. The quadratic function is not locally invertible at the origin: there is
no neighborhood of the origin over which we can restrict the quadratic function and make
1
it injective. The cubic function is injective, but its inverse function f 1 (x) = x 3 is not
di erentiable at the origin. N

The Inverse Function Theorem permits to solve locally equations, thus answering question
(Q.ii). To see why, it is convenient to denote by B the image f (A) and write the C k -
di eomorphism (35.7) as
f :A!B
With this, suppose that { by skill or luck { we have been able to nd a solution x0 of equation
f (x) = y0 . Based on this knowledge, when x0 is a regular point of f the Inverse Function
Theorem ensures that, rst, x0 is the unique solution and, second, that there exists an open
set B { the image of a small enough open set A containing x0 { such that for each known
term y belonging to B the equation f (x) = y has unique solutions as well, described by
f 1 : B ! A.
In sum, if a regular point x0 happens to solve the equation, it is the unique solution
and, locally around x0 , the equation is smoothly well posed. The next result, an immediate
consequence of the Inverse Function Theorem, formalizes this discussion (for simplicity, we
consider the case k = 1).

Corollary 1631 Let f : U ! Rn be continuously di erentiable. Given a known term y0 2


Rn , let x0 2 U be a solution of the equation

f (x) = y0

If x0 is a regular point of f , there exist open sets A and B containing x0 and y0 such that
the equation
f (x) = y
has a unique solution in A for each y 2 B. The inverse function f 1 : B ! A, which
associates a solution to each known term, is continuously di erentiable.

35.4 Global analysis


We can address the global questions (Q.i) and (Q.iii) via a global version of the Inverse
Function Theorem, as we will study in this section. To this end, we rst need to introduce
some important topological notions.

35.4.1 Topological prelude


We begin with a piece of terminology: an open cover of a set A in Rn is any collection of
open sets fGi gi2I such that
[
A Gi
i2I
35.4. GLOBAL ANALYSIS 1085

In words, these open sets cover, through their union, the set A. A nite subcover of the
open cover fGi gi2I is a nite collection of sets, say fG1 ; :::; Gn g, taken from the open cover
fGi gi2I that still are able to cover the set A, that is, such that
[n
A Gi
i=1

Theorem 1632 (Heine-Borel) A subset A of Rn is compact if and only if each open cover
of A has a nite subcover.

This characterization of compactness may appear prima facie not that thrilling, but it
becomes key when one goes beyond Rn , as more advanced courses will do. Because of its
unimposing appearance, we introduce it here as, momentarily, we will be able to show that
it is a powerful characterization already in Rn .

Proof \If". Let A be a set of Rn such each open cover of A admits a nite subcover. We
want to show that it is closed and bounded. We rst prove that Ac is open. Fix x 2 Ac .
For each y 2 A, let B" (y) and B"y (x) be, respectively, neighborhoods of y and x with radius
lower than kx yk =2. This is easily seen to imply that B" (y) \ B"y (x) = ;. Since the
collection fB" (y)gy2A is an open cover of A, by hypothesis there exists a nite subcover
\n [n
fB" (yi )gni=1 of A. Set V = B"yi (x) and B = B" (yi ). Clearly, A B. We also
i=1 i=1
have V \ B = ;. Indeed,
[n [n [n
V \B =V \ B" (yi ) = (V \ B" (yi )) (B"yi (x) \ B" (yi )) = ;
i=1 i=1 i=1

As it is the intersection neighborhoods of x, the set V is itself a neighborhood of x (cf.


Section 5.5). As V Ac , the point x is an interior point of Ac . Thus, Ac is open.
It remains to show that A is bounded. Given " > 0, for each x 2 A let B" (x) be a
neighborhood of radius ". The collection fB" (x)gx2A is an open cover of A. By hypothesis,
there exists a nite set E A such that fB" (x)gx2E is a nite subcover of A, that is,
[
A B" (xi ). Set M = maxx0 ;x00 2E kx0 x00 k. Let y 0 ; y 00 2 A. There exist x0 ; x00 2 E
x2E
such that y 0 2 B" (x0 ) and y 00 2 B" (x00 ). Therefore,

y0 y 00 y0 x0 + x0 x00 + x00 y 00 < 2" + M

Thus, by taking any y 2 A we have A B2"+M (y). This implies that the set A is bounded.

\Only if". We only consider the scalar case when A is a compact interval [a; b] of R. We
begin with a claim.

Claim Let f[an ; bn ]gn be a collection of compact intervals of R with [an+1 ; bn+1 ]
1 [an ; bn ]
\
for each n 1. We have [an ; bn ] 6= ;.
n 1

Proof of the Claim Given the collection f[an ; bn ]gn 1 , let A = fa1 ; a2 ; :::; an ; :::g. We have
A [a1 ; b1 ] and therefore A is a bounded set. By the Least Upper Bound Principle, we can
set x = sup A. Since each bn is an upper bound for A, we have bn x for each n 1. Hence,
1086 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
\ \
an x bn for each n 1 and, therefore, x 2 [an ; bn ]. This implies [an ; bn ] 6= ;,
n 1 n 1
as desired.

Back to the proof, suppose per contra that there exists an open cover fGi gi2I of [a; b] that
does not contain any nite subcover of [a; b]. Let = b a and c1 = (a + b) =2. The collection
fGi gi2I is an open cover also of the intervals [a; c1 ] and [c1 ; b]. Therefore, at least one of
these two intervals has no nite subcover of fGi gi2I . Otherwise, from [a; b] = [a; c1 ][[c1 ; b] it
would follow that [a; b] itself would have such a subcover. Without loss of generality, suppose
therefore that [a; c1 ] has no nite subcover of fGi gi2I . Set c2 = (a + c1 ) =2. By repeating the
argument just seen, we can assume that also [a; c2 ] does not have not such a nite subcover.
By proceeding in this way we can construct a collection of intervals f[a; cn ]gn 1 such that
[a; cn+1 ] [a; cn ] and
cn a=
2n
for each n 1. Moreover, none of these closed intervals has a nite subcover of fGi gi2I . By
\ \ [n
the Claim, [a; cn ] 6= ;. Let x 2 [a; cn ]. Since [a; b] Gi , there exists Gi
n 1 n 1 i=1
such that x 2 Gi . As x is an interior point of Gi , there exists a neighborhood (x "; x + ")
such that (x "; x + ") Gi . For n su ciently large, we have =2n < " and, therefore,

[a; cn ] x ;x + (x "; x + ") Gi


2n 2n

Consequently, the singleton fGi g is a nite subcover of fGi gi2I that covers [a; cn ], which
contradicts the fact that all the intervals [a; cn ] do not have such subcovers. From this
contradiction it follows that [a; b] has a nite subcover of fGi gi2I .

In this characterization of compactness it is important to pay attention to the universal


quanti er \each open cover" that it features: it is not enough to exhibit some open cover
that admits a nite subcover, all open covers of A must have this property. Conversely, to
check that a set is not compact is enough to nd an instance of an open cover that does not
admit a nite subcover. For example, the open interval A = ( 1; 1) is not closed, let alone
compact. To see that, in accordance with the Heine-Borel Theorem, it admits an open cover
with no nite subcovers, take the family of open sets fGn g with

1 1
Gn = 1+ ;1
n n
[
It is an open cover of A since Gn = ( 1; 1). It does not admit any nite subcover: consider
n
any nite subcover fGn1 ; :::; Gnk g of fGn g, where

n1 < n2 < < nk

is any nite collection of natural numbers. As

Gn1 Gn2 Gnk


35.4. GLOBAL ANALYSIS 1087

we have
k
[ 1 1
Gni = Gnk = 1+ ;1
nk nk
i=1

Thus, the nite subcover fGn1 ; :::; Gnk g does not cover A.

Next we introduce an important class of sets.

De nition 1633 A subset A of Rn is discrete if it consists of isolated points.

Intuitively, discrete sets have a granular structure. This intuition is con rmed by the
next result (which shows, inter alia, the usefulness of the open cover characterization of
compactness).

Proposition 1634 A discrete set is at most countable. It is nite if and only if it is compact.

Proof Let A be a discrete set in Rn . For each x 2 A there exists a small enough neighborhood
B" (x) such that B" (x) \ A = ;. By the density of the rationals (Proposition 42), it is easy
to see that there exists a point qx 2 B" (x) with rational components, i.e., qx 2 Qn . To
distinct elements x and x0 of A we can associate distinct points qx and qx0 . Thus, the map
A 3 x 7 ! qx 2 Qn is injective. Hence, jAj jQn j. As the set Qn is countable (why?), we
conclude that the set A is at most countable.
If A is nite, it is compact (cf. Example 171). Conversely, let A be compact. For
each x[ 2 A there exists a small enough neighborhood B" (x) such that B" (x) \ A = ;. As
A B" (x), by The Heine-Borel Theorem there exists a nite collection fB" (xi )gki=1
x2A
k
[
such that A B" (xi ). Thus, A = fx1 ; :::; xk g.
i=1

We continue this topological analysis by introducing a new class of functions. In the rest
of the section C denotes a closed subset of Rn .

De nition 1635 An operator f : C ! Rm is said to be proper if, for every sequence


fxn g Rn ,
kxn k ! +1 =) kf (xn )k ! +1

Properness requires the norm of the images of f to diverge to +1 along any possible
unbounded sequence fxn g Rn , i.e., such that kxn k ! +1. In words, the function cannot
take, inde nitely, values that have increasing norm values on a sequences that \dash o " to
in nity.

Example 1636 (i) If m = 1, supercoercive functions are proper. Indeed, for them we have

kxn k ! +1 =) f (xn ) ! 1 =) jf (xn )j ! +1

The converse is false: the cubic function f (x) = x3 is proper, but not supercoercive. In view
of Proposition 1016, supercoercive functions f : R ! R are actually the proper functions that
1088 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

feature bounded upper contour sets (momentarily, Proposition 1637 will put in perspective
this remark).
(ii) De ne f : R2 ! R2 by f (x) = x21 x22 ; 2x1 x2 . We have

kf (x)k2 = x41 + x42 + 6x21 x22 x41 + x42 + 2x21 x22 = kxk4
and so f is easily checked to be proper.11
(iii) Let B be a symmetric square matrix of order n and c a vector in Rn . The multivari-
able quadratic function f : Rn ! R de ned by
1
f (x) = x Bx + c x
2
is proper if B is positive de nite. Since x Bx > 0 for all 0 6= x 2 Rn , by the Weierstrass
Theorem it has a minimum value m > 0 over the unit sphere fx 2 Rn : kxk = 1g. Thus,
x x
x Bx = B kxk2 m kxk2 80 6= x 2 Rn
kxk kxk
| {z }
m

By the Cauchy-Schwarz inequality, jc xj kck kxk for all x 2 Rn . Hence,


c x kck kxk 8x 2 Rn
We can thus write
1 m m 2 kck
f (x) = x Bx + c x kxk2 kck kxk = kxk kxk
2 2 2 m
We conclude that kxn k ! +1 implies kf (xn )k ! +1, that is, f is proper. As the reader
can check, the converse is also true: if f is proper, then B is positive de nite. N

By now, the next characterization of proper functions should not be that surprising.

Proposition 1637 An operator f : C ! Rm is proper if and only if the preimages of


bounded sets are, in turn, bounded sets.

Proof \If". Suppose that f is proper. Let B be a bounded set of Rm . Suppose, by


contradiction, that the preimage f 1 (B) is not bounded. Then, there is, a sequence fxn g
f 1 (B) such that kxn k ! +1. That is, fxn g Rn is such that kxn k ! +1 and f (xn ) 2 B
for each n. But, kxn k ! +1 implies kf (xn )k ! +1 because f is proper. This contradicts
the boundedness of B. We conclude that f 1 (B) is bounded. The second part of the
statement now follows from Proposition 1074.
\Only if". Suppose that f is such that the preimages of bounded sets of Rm are bounded
sets of Rn . Let fxn g C be such that kxn k ! +1. Suppose, by contradiction, that
there is K > 0 such that kf (xn )k K for all n. Then, the preimage of the bounded set
n
B = fx 2 R : kf (x)k Kg contains an unbounded sequence fxn g, a contradiction. We
conclude that f is proper.

In view of Proposition 600, we have the following simple, yet interesting, corollary that
neatly shows what properness adds to continuity.
11
In Example 1624 we considered the restictrion of this function on R2++ .
35.4. GLOBAL ANALYSIS 1089

Corollary 1638 A continuous operator f : C ! Rm is proper if and only if the preimages


of compact sets are, in turn, compact sets.

Thus, homeomorphisms f : Rn ! Rn are proper: if K is any compact set in Rn , then


f 1 (K) is compact as f 1 is continuous (Proposition 597). The next result presents another
important class of proper operators.

Proposition 1639 Invertible linear operators f : Rn ! Rn are proper.

Proof Invertible linear operators are bijective and their inverses are linear operators (see
Chapter 15). By Lemma 899, there exists a constant k > 0 such that f 1 (x) k kxk for
every x 2 Rn . Let fxn g Rn be such that kxn k ! +1. Then, kxn k = f 1 (f (xn ))
k kf (xn )k, so kf (xn )k ! +1. We conclude that f is proper.

We close with another noteworthy property.

Proposition 1640 A continuous proper operator f : C ! Rm maps closed sets into closed
sets.

Proof Let F C be closed. We want to show that f (F ) is closed in Rm . Let fyn g f (F )


be a sequence that converges to y 2 Rm . We want to show that y 2 f (F ). By de nition,
there exists a sequence fxn g F such that f (xn ) = yn for each n. There exists k > 0 such
that kxn k k for all n 1. Otherwise, by properness kxn k ! +1 implies kyn k ! +1,
which contradicts yn ! y. The set K = fx 2 C : kxk kg is compact. By the Bolzano-
Weierstrass Theorem, there exists a subsequence fxnk g that converges to some x 2 K. By
the continuity of f , we have ynk = f (xnk ) ! f (x). Since ynk ! y, this implies y = f (x),
so that y 2 f (F ). We conclude that the set f (F ) is closed.

35.4.2 Finitely many solutions


Armed with the previous topological notions, we go back to our analysis. We prove a rst
interesting global result.

Proposition 1641 Let f : U ! Rn be continuously di erentiable. If each point x of U is


regular, then f 1 (y) is a discrete set for each y 2 Im f .

Proof Let y 2 Im f . Let x 2 f 1 (y). By the Inverse Function Theorem, there exist open
sets Bx and Vy , containing x and y, such that the map f : Bx ! Vy is bijective. Since
f : Bx ! Vy is bijective, the set f 1 (y) \ Bx contains at most a single point: the one whose
image is y. Thus, f 1 (y) \ Bx = fxg. This proves that x is an isolated point. We conclude
that the level set f 1 (y) is discrete.

Thus, two simple hypotheses, continuous di erentiability and everywhere regularity, en-
sure that the level sets are discrete sets, a signi cant nding that motivates the following
de nition.

De nition 1642 An operator f : U ! Rm is said to be level-proper if its level sets are


bounded.
1090 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

This is a large class of functions, it is easy to come up with examples. For instance,
by Proposition 1637 proper functions are level-proper (this also clari es the \level-proper"
terminology). Yet, constant functions on the real line are a basic example of functions not
level-proper.

Corollary 1643 Let f : U ! Rn be continuously di erentiable and level-proper. If each


point x of U is regular, then f 1 (y) is a nite set for each y 2 Im f .

Proof Let y 2 Im f . By Proposition 1641, f 1 (y) is a discrete set. As f is continuous and


level-proper, it is also a closed and bounded set, i.e., a compact set. By Proposition 1634,
f 1 (y) is then a nite set.

All this has interesting consequences for the study of equations. Take an equation, not
necessarily well posed, with an open solution domain A and de ned via a continuously
di erentiable and level-proper function f : A ! Rn . If each point of A is regular under f ,
the last corollary ensures that, for each known term y0 2 Rn , the solution set f 1 (y0 ) is at
most nite. That is, this equation has at most nitely many solutions.
Simple di erential conditions like continuous di erentiability and regularity, paired with
a mild requirement on level sets, thus deliver a remarkable niteness property of solution
sets.

35.4.3 Unique solutions


We begin with a useful lemma, a simple dividend of the Inverse Function Theorem.

Lemma 1644 A k-times continuously di erentiable operator f : U ! Rn , with 1 k 1,


is a C k -di eomorphism f : U ! Im f if and only if it is injective and each x 2 U is a regular
point of f .

Proof We prove the \if" as the converse is trivially true. By the Domain Invariance Theorem,
Im f is an open set. Let f : U ! Im f be bijective, with inverse f 1 : Im f ! U . By the
Inverse Function Theorem, each x 2 U has a neighborhood B (x) such that f : B (x) !
f (B (x)) is a C k -di eomorphism. Thus,[
the inverse f 1 is k-times continuously di erentiable
on the open set f (B (x)). As Im f = f (B (x)), we conclude that the inverse f 1 is k-
x2U
times continuously di erentiable, as desired.

The following beautiful result is a far reaching generalization of Cramer's Theorem.12


Here proper functions play a key role.

Theorem 1645 (Caccioppoli-Hadamard) A continuously di erentiable operator f : Rn !


Rn is a C 1 -di eomorphism if and only if it is proper and

det Df (x) 6= 0 8x 2 Rn (35.10)


12
A rst version of this theorem was proved by Jacques Hadamard in 1906 and then substantially generalized
by Renato Caccioppoli in 1932. In the proof we follow Palais (1959).
35.4. GLOBAL ANALYSIS 1091

In view of Proposition 1639, Cramer's Theorem is a special case of this theorem because
for linear operators T (x) = Ax we have DT (x) = A, so the Jacobian condition (35.10) holds
when det A 6= 0.

Proof \Only if". Let f : Rn ! Rn be a C 1 -di eomorphism. It is closed because homeo-


morphisms are proper, as previously remarked. Moreover, by Proposition 1623 the Jacobian
condition (35.10) holds.
\If". Suppose that f is proper and satis es condition (35.10). By Proposition 1640, the
image f (Rn ) is closed in Rn . On the other hand, by Lemma 1629 the image f (Rn ) is open
in Rn . We conclude that f (Rn ) is both open and closed in Rn , that is, f (Rn ) = Rn . This
proves that f is surjective, i.e., Im f = Rn .
Next we prove that f is injective, i.e., that the preimage f 1 (y) is a singleton for all
y 2 Rn . Let y 2 Rn . By Proposition 1641, the preimage f 1 (y) is a discrete set; it is also
compact since f is proper. By Proposition 1634, f 1 (y) is a nite discrete set.
For any k 1, let
Lk = y 2 Rn : f 1 (y) k
be the set of points of Rn whose preimage contains at least k distinct points. This set is open.
Let y0 2 Lk and write f 1 (y0 ) = fx1 ; :::; xr g, with r k. By the Inverse Function Theorem,
for each i = 1; :::; r there exist neighborhoods Bxi and V i of, respectively, xi and y0 , such
that the map f : Bxi ! V i is an homeomorphism. In particular, take the neighborhoods
Bxi small enough so that they are pairwise disjoint. Consider the neighborhood
k
\
V = Vi
i=1

of y0 . Let y 2 V . In each Bxi there exists a point x xi ) 2 V . Hence, f 1 (y)


~i such that f (~
contains at least k distinct elements, i.e., y 2 Lk . We conclude that V Lk and so y0 is an
interior point of Lk . The set Lk is thus open.
The set Lk is also closed. Let fym g Lk converge to some y 2 Rk . We want to show
that y 2 Lk . Let B" (y) be a neighborhood of y. Without loss of generality, we can assume
that fym g belongs to B" (y), so to its closure B" (y). As B" (y) is compact, its preimage
f 1 (B" (y)) is also compact since f is proper. In particular, f 1 (ym ) f 1 (B" (y)) for each
m. Since ym 2 Lk , for each m we can select k distinct elements fxm m
1 ; :::; xk g of f
1 (y ).
m
m 1
Consider the k sequences fxi gm=1 . By the Bolzano-Weierstrass Theorem, without loss
of generality (by passing to a subsequence if needed) we can assume that each sequence
1
fxmi gm=1 converges to some xi 2 f
1 (B (y)). By the continuity of f , we then have
"

f (xi ) = y 8i = 1; :::; k (35.11)

These limit points are distinct. For, suppose per contra that xi = xj for some i 6= j with
1 i; j k. Call x this common limit point. Then, there exist distinct xm m
i and xj , with
ym = f (xm m
i ) = f (xj ), that converge to x . But, this contradicts the fact that, by the Inverse
Function Theorem, there exists a neighborhood B" (x ) such that f : B" (x ) ! f (B" (x ))
is injective. We conclude that the limit points xi are distinct. In view of (35.11), we have
1
fx1 ; :::; xk g f (y)
1092 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

and so y 2 Lk , as desired.
The set Lk is thus both open and closed, so either empty or equal to Rn , for each k. Let
y 2 Im f . The set f 1 (y) is nite with, say, k elements. Thus, Lk = Rn . As y 2 = Lk+1 , we
also have Lk+1 = ;. Thus,
y 2 Rn : f 1
(y) = k = Lk Lk+1 = Lk = Rn
We conclude that there exists a natural number k such that
1
f (y) = k (35.12)
for all y 2 Rn . It remains to prove that k = 1, a non-trivial task that we omit. Observe,
however, that at this point we can conclude the proof by just making a grain of injectivity
assumption: there exists some y 2 Im f with singleton preimage f 1 (y), i.e.,
1
9y 2 Im f; f (y) = 1 (35.13)
In view of (35.12), this mild assumption implies right away that k = 1. In any event, having
proved that k = 1 one can conclude that f : Rn ! Rn is a bijective function. By Proposition
1644, f is then a C k -di eomorphism, as desired.

Consider an equation f (x) = y0 with open solution domain A and de ned through a
continuously di erentiable and proper function f . If each point of A is regular under f , the
equation is well posed: for every possible known term y0 2 Rn , there exists a unique solution,
given by x = f 1 (y0 ). Since f 1 is continuously di erentiable, solutions change smoothly.
At a theoretical level, questions (Q.i) and (Q.iii) are best answered in this case.13

The \if" part of the last theorem fails, in general, without the hypothesis that f is proper,
as the next classic example shows.

Example 1646 Consider the continuously di erentiable operator f : R2 ! R2 de ned by


f (x1 ; x2 ) = (ex1 cos x2 ; ex1 sin x2 ). Its Jacobian matrix is
2 3 " #
@f1 (x1 ;x2 ) @f1 (x1 ;x2 )
@x1 @x2 ex1 cos x2 ex1 sin x2
Df (x) = 4 5=
@f2 (x1 ;x2 ) @f2 (x1 ;x2 ) ex1 sin x2 ex1 cos x2
@x1 @x1

Thus,
det Df (x) = e2x1 cos2 x2 + e2x1 sin2 x2 = e2x1 > 0 8x 2 Rn
and so condition (35.10) holds. However, this function is not proper.
p Indeed, if we take
xn = (0; n), then kxn k = n but kf (xn )k = k(cos n; sin n)k = cos n + sin2 n = 1. So,
2

kxn k ! +1 does not imply kf (xn )k ! +1.


This function is neither injective nor surjective. To see that it is not surjective, note
that there is no x 2 Rn such that f (x) = 0. Indeed, if f (x) = 0 then ex1 cos x2 = 0, so
cos x2 = 0. In turn, this implies sin x2 = 1, which contradicts f (x) = 0. As to injectivity,
for example we have f (0; 0) = f (0; 2 ) = (1; 0).
In sum, by the Inverse Function Theorem f is locally invertible at each x 2 Rn , yet it is
not bijective. Thus, a function locally invertible at each point of its domain might well not
be bijective. N
13
In (43.4) we will present another global inversion result that makes equations smoothly well posed.
35.4. GLOBAL ANALYSIS 1093

The next example complements the previous one by showing that properness alone is not
enough for the \if".

Example 1647 The function f : R2 ! R2 de ned by f (x) = x21 x22 ; 2x1 x2 is proper
(Example 1636). Its Jacobian matrix is:

2x1 2x2
Df (x) =
2x2 2x1

As det Df (x) = 4 x21 + x22 is zero at the origin, condition (35.10) fails. As remarked in
Example 1624, this function is not injective. N

35.4.4 Global Implicit Function Theorem


The Global Inverse Function Theorem implies a global version of the Implicit Function
Theorem, which next we state and prove. Besides its own interest, it shows how an inverse
function theorem imply an implicit function one.

Theorem 1648 (Global Implicit Function Theorem) Let g : Rn+m ! Rm be a proper


continuously di erentiable operator, with

det Dy g (x; y) 6= 0 8 (x; y) 2 Rn+m (35.14)

Then, there exists a unique operator f : Rn ! Rm such that

g (x; f (x)) = 0 8x 2 Rn

The operator f is di erentiable, with

1
Df (x) = (Dy g (x; y)) Dx g (x; y) 8x 2 Rn (35.15)

where y = f (x), i.e., g (x; y) = 0.

Proof De ne the continuously di erentiable operator F : Rn+m ! Rn+m by F (x; y) =


(x; g (x; y)), i.e.,

F (x1 ; :::; xn ; y1 ; :::; ym ) = (x1 ; :::; xn ; g (x1 ; :::; xn ; y1 ; :::; ym ))

Since g is proper, so does F . Indeed, if k(x; y)k ! +1, then kg (x; y)k ! +1, so
kF (x; y)k ! +1 because kg (x; y)k kF (x; y)k.
Since
Fi (x; y) = xi 8i = 1; :::; n

and
Fn+j (x; y) = gj (x; y) 8j = 1; :::; m
1094 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

we have
2 @F1 (x) @F1 (x) @F1 (x) @F1 (x) @F1 (x)
3
@x1 @x2 @xn @y1 @ym
6 7
6 7
6 7
6 @Fn (x) @Fn (x) @Fn (x) @Fn (x) @Fn (x) 7
6 7
DF (x) = 6
6
@x1 @x2 @xn @y1 @ym 7
7
6 @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) 7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x)
@x1 @x2 @xn @y1 @ym
2 3
1 0 0 0 0
6 7
6 7
6 7
6 0 0 1 0 0 7
= 6
6 @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y)
@x1 @x2 @xn @y1 @ym

So, 2 3
@g1 (x;y) @g1 (x;y)
@y1 @ym
6 7
det DF (x) = det 4 5 = det Dy g (x; y)
@gm (x;y) @gm (x;y)
@y1 @ym

By (35.14), we thus have det DF (x) 6= 0 for all x 2 Rn .


By Caccioppoli-Hadamard's Theorem, F is globally invertible with di erentiable F 1 :
Rn+m ! Rn+m . Fix x 2 Rn . Since there is y 2 Rn such that F 1 (x; 0) = (x; y), we have
g (x; y) = 0. We claim that such y 2 Rn is unique. Indeed, let y; y 0 2 Rn be such that
g (x; y) = 0. Then,

F (x; y) = (x; g (x; y)) = (x; 0) = x; g x; y 0 = F x; y 0

Since F is bijective, it then follows that y = y 0 , as desired. So, let f : Rn ! Rm be the


operator that associates to each x 2 Rn the unique y 2 Rm such that g (x; y) = 0. By
de nition, g (x; f (x)) = 0 for all x 2 Rn and f is the unique such operator. Moreover, from

F (x; f (x)) = (x; 0) 8x 2 Rn

it follows that
1
F (x; 0) = (x; f (x)) 8x 2 Rn
Since F 1 is di erentiable, it can be proved that this implies that f is di erentiable. Since

g (x; f (x)) = 0 8x 2 Rn

by the chain rule we have

Dx g (x; f (x)) = Dy g (x; f (x)) Dx f (x) 8x 2 Rn

So, formula (35.15) holds because condition (35.14) ensures that the matrix Dy g (x; f (x)) is
invertible at all x 2 Rn .
35.5. MONOTONE EQUATIONS 1095

35.5 Monotone equations


In the study of equation robustness is a key issue and the notion of well posedness is all-
important. This is why we devoted most of the chapter to this topic. There is, however,
another relevant question: when the known term varies, do solutions vary in the same
direction? This is an order-theoretic question that can be formalized as follows:

(Q.iv) If f is a bijection, is its inverse f 1 monotone?

For instance, when f 1 is strictly increasing we have

y00 > y0 =) f 1
y00 > f 1
(y0 )

that is, a larger known term results in a larger solution. The opposite is true when f 1
is strictly decreasing. The monotonicity properties of the inverse function thus determine
the concordance between the changes in the known term and the resulting changes in the
solutions.

De nition 1649 Equation (35.1) is said to be monotone if f is a bijection with strictly


monotone inverse f 1 .

In particular, we say that an equation is monotone increasing (decreasing) when f 1 is


strictly increasing (decreasing). For instance, consider a linear equation

Ax = b (35.16)

de ned via a linear operator T : Rn ! Rn given by T (x) = Ax. For this equation, mono-
tonicity amounts to the unique existence of positive solutions in correspondence of positive
known terms, i.e.,
b > 0 =) A 1 b > 0 8b 2 Rn (35.17)
This a classic comparative statics exercise in economics.14
A straightforward application of Collatz's Theorem permits to characterize monotone
equations.

Proposition 1650 Let A and B be two subsets of Rn . An equation de ned via a surjective
map f : A ! B is monotone increasing if and only if

f (x) f (y) =) x y 8x; y 2 A (35.18)

A dual result holds for the decreasing case. We can illustrate this result with the linear
equation (35.16). To this end, we need some matrix terminology. A square matrix A is:

(i) positive if all its entries are positive;

(ii) inverse-positive if it is non-singular and its inverse is positive;

(iii) Leontief if all its o -diagonal entries are negative.


14
See, e.g., McKenzie (1960).
1096 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

Because of the linearity of T , here condition (35.18) amounts to

Ax 0 =) x 0 8x 2 Rn (35.19)

It is easy to check that a square matrix A is inverse-positive if and only if the linear operator
T is invertible with a strictly increasing inverse operator T 1 . By Collatz's Theorem, a
square matrix A thus satis es condition (35.19) if and only if it is inverse-positive. In this
case, (35.17) holds and so to a positive known term b > 0 corresponds a unique positive
solution A 1 b > 0.
To check the monotonicity of a linear equation (35.16) thus amounts to check whether its
matrix A is inverse-positive. This can be done by checking directly that A is non-singular and
has a positive inverse. For some classes of matrices, however, there are some computationally
convenient criteria that do not require the computation of the inverse matrix. For instance,
the next result of Hawkins and Simon (1949) characterizes through the positivity of principal
minors the inverse positivity of Leontief matrices, a class of matrices relevant, for instance,
in input-output and in general equilibrium analyses.15

Theorem 1651 (Hawkins-Simon) A Leontief matrix A is inverse-positive if and only if


all its principal minors are > 0.

We omit the proof of this result.16 The following simple example illustrates it.

Example 1652 The Leontief matrix

2 3
A=
4 7

has all principal minors > 0. By the Hawkins-Simon Theorem, A is an inverse-positive


matrix. Indeed, its inverse matrix
" #
7 3
1 2 2
A =
2 1

is positive. N

Inverse-positivity becomes an old friend for symmetric Leontief matrices.

Proposition 1653 A symmetric Leontief matrix A is inverse-positive if and only if it is


de nite positive.

Proof The \if" follows from Proposition 1204. The converse is proved in Fiedler (1986) p.
114.
15
See, e.g., the discussion on gross substitution in Mas-Colell et al. (1995) p. 612.
16
We refer to Fiedler (1986) p. 114.
35.6. PARAMETRIC EQUATIONS AND IMPLICIT FUNCTIONS 1097

35.6 Parametric equations and implicit functions


In applications, equations have often the parametric form

f (x; ) = y0 (35.20)

where f : A Rn Rm ! Rn and y0 2 Rn . The variable parametrizes the equation:


given a value of , we are interested in the variables x 2 A that solve the equation under the
known term y0 .
For each value of , we can ask the same three questions posed in Section 35.1. Yet, in
this parametric setting we also can take a di erent perspective: once posited a known term
y0 , do solutions exist given some, or all, values of the parameter? are they unique? how do
they vary when the value of the parameter varies?
To formalize these questions, for a given y0 2 Rn de ne the (equation) solution corre-
spondence Sy0 : Rn by

Sy0 ( ) = fx 2 A : f (x; ) = y0 g

It associates to each parameter value the corresponding solution set of equation (35.20). The
solution correspondence describes how solutions vary as the parameter varies. The previous
questions then become:

(QP.i) is the set Sy0 ( ) not empty for all 2 ? If so, is it a singleton?

(QP.ii) is the set Sy0 ( ) not empty for some 2 ? If so, is it a singleton?

(QP.iii) if Sy0 is, locally or globally, a function, is it continuous (or even di erentiable)?

If Sy0 ( ) is a singleton, we have

f (Sy0 ( ) ; ) = y0

So, to answer the unique existence questions (QP.i) and (QP.ii) amounts to check whether
Sy0 is a function implicitly de ned, locally or globally, by equation (35.20). That is, Sy0
gives the functional representation of the level curve
1
f (y0 ) = f(x; ) 2 A : f (x; ) = y0 g

The study of the solutions of a parametric equation given a known term and the study of
the functional representation of a level curve are, mathematically, equivalent exercises.
In sum, to answer the questions (QP.i) and (QP.ii) we need suitable versions of the
Implicit Function Theorem: local versions of such theorem would give local answers, while
global versions would give global answers. In any case, a deja vu: in our discussions of
implicit functions we already (implicitly) took this angle, which in economics is at heart
of comparative statics analysis (cf. Section 34.4.3). Indeed, conditions that ensure the
existence, at least locally, of a solution function Sy0 : ! Rn permit to e ectively describe
how solutions { the endogenous variables { react to changes in the parameters { the exogenous
variables.
1098 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

For brevity, we leave readers to revisit those discussions through the lenses of this section.
To help them, we close with a consequence of the operator version of the Implicit Function
Theorem (Theorem 1598) that deals with a parametric homogeneous equation

f (x; ) = 0 (35.21)

de ned by a continuously di erentiable operator f : A ! Rn , where A and are open


sets in Rn (so, here m = n).

Corollary 1654 Given 0 2 , let x0 2 A be a solution of the parametric homogeneous


equation (35.21). If
det Dx f (x0 ; 0) 6= 0
then there exist neighborhoods B (x0 ) and V ( 0 ) such that equation (35.21) has a unique
solution x = S0 ( ) in B (x0 ) for each 2 V ( 0 ). The solution function S0 : V ( 0 ) ! B (x0 ),
which associates a solution to each parameter value, is continuously di erentiable.

35.7 Coda: de consolatione topologiae


Earlier in the book we discussed how the disconcerting Theorem 278 of Cantor, so famously
commented by him (\I see it, but I do not believe it"), de es our nitary intuition by showing
that the real line and the plane, more generally any two Euclidean spaces Rn and Rm , have
the same cardinality. That is, there exists a bijective function f : Rn ! Rm for all m; n 2 N.
To further aggravate matters, Giuseppe Peano in 1890 came up with a further, equally
disconcerting, celebrated nding.

Theorem 1655 (Peano) There exists a continuous and surjective map f : Rn ! Rm for
all m; n 2 N.

The basic case of a Peano map f : R ! R2 is already striking: the plane is the continuous
image of the real line. By just following a scalar prescription, the Peano map enables you
to ll the plane without ever lifting the pencil. Formally, to every point y = (y1 ; y2 ) of the
plane corresponds a scalar x such that (f1 (x) ; f2 (x)) = (y1 ; y2 ).

Proof The case m = n = 1 is trivial (just take the identity function).17 In his original
work Peano constructed a highly non-trivial, here omitted, example of a continuous and
surjective map { the so-called Peano curve { from the closed unit interval [0; 1] to the closed
unit square [0; 1] [0; 1]. In a variation on Peano's theme, it is possible to show that there
exists a continuous and surjective map from the real line to the plane. By building on this,
next we show by induction that there exists a continuous and surjective map f1;m : R ! Rm
for all m 2. Initial step: as just remarked, there exists a continuous and surjective map
f1;2 : R ! R2 . Induction step: suppose now that there exists a continuous and surjective
map f1;m 1 : R ! Rm 1 . De ne f2;m : R2 ! Rm by

f2;m (x1 ; x2 ) = (x1 ; fm 1 (x2 ))

17
This proof is based on Section 1.8 of Aron et al. (2016), to which we refer for missing details.
35.8. ULTRACODA: EQUATIONS IN SCIENCE 1099

Clearly, this function is continuous. To show that it is also surjective, take any y =
(y1 ; y2 ; :::; ym ) 2 Rm . By the induction hypothesis, there exists a scalar a such that fm 1 (a) =
(y2 ; :::; ym ). By setting x = (y1 ; a) 2 R2 , we thus have f2;m (x) = y. We conclude that
Im f2;m = Rm . The composition f2;m f1;2 : R ! Rm is, therefore, a continuous and
surjective map. By setting f1;m = f2;m f1;2 , this completes the induction argument.
To conclude the proof it is enough to de ne fn;m : Rn ! Rm by fn;m (x1 ; :::; xn ) =
f1;m (x1 ). This function is continuous and surjective.

In sum, any two Euclidean spaces Rn and Rm are the continuous image (Peano) as well
the bijective image (Cantor) one of the other, regardless of their dimension. Peano's map is
not injective. Cantor's map is not continuous. Still, after all these surprises, one is left to
wonder whether it is possible to combine their striking ndings by showing the existence of
an homeomorphism between Euclidean spaces of di erent dimension.
In 1911, fortunately, Brouwer showed that this is not possible. Indeed, as a direct conse-
quence of his Domain Invariance Theorem we have the following result.

Theorem 1656 (Brouwer) Two spaces Rn and Rm , with n 6= m, are not homeomorphic.

At last, topology vindicates intuition: spaces with di erent dimensions are similar set-
theoretically but not topologically. In particular, continuous maps f : Rn ! Rm are either
surjective or injective when n 6= m, they cannot enjoy simultaneously both properties.18 A
basic topological failure characterizes a change in dimension.

Proof Let n 6= m, say n < m. Suppose, per contra, that there exists an homeomorphism
f : Rn ! Rm . De ne : Rn ! Rm by (x1 ; :::; xn ) = (x1 ; :::; xn ; 0; :::; 0). As (Rn ) Rm ,
this continuous map embeds Rn into Rm . In particular,
(Rn ) = f(x1 ; :::; xn ; 0; :::; 0) : (x1 ; :::; xn ) 2 Rn g
is a closed subset of Rm (with empty interior). Thus, the set f 1 ( (Rn )) is closed in Rn
because f is continuous. But, the map f 1 is continuous, so by the Domain Invariance
Theorem the set f 1 ( (Rn )) is also open. As f 1 ( (Rn )) is neither empty nor equal to
Rn , we reached a contradiction. We conclude that there is no homeomorphism between Rn
and Rm .

Summing up, we conclude { on the shoulders of Cantor, Peano and Brouwer { that the
dimension of an Euclidean space is a topological, but not a set-theoretic, invariant.

35.8 Ultracoda: equations in science


In a scienti c inquiry, be it in a natural or social science, we posit a set X of possible
causes (or inputs), a set Y of possible e ects (or outputs), and a set M of possible models
m : X ! Y . A cause x determines an e ect y = m (x) via model m; this schema can be
diagrammed as
x! m !y
18
In contrast, recall that linear operators f : Rn ! Rn are injective if and only if they are surjective
(Corollary 685), so they have to enjoy simultaneously both properties. Moving from n = m to n 6= m thus
dramatically changes matters.
1100 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS

We can consider four main problems about a scienti c inquiry described by a triple (X; Y; M ).
We formalize them by means of the evaluation function g : X M ! Y de ned by g (x; m) =
m (x) that relates causes, e ects and models through the expression

y = g (x; m) (35.22)

The four problems are:

(i) Direct problems: Given a model m and a cause x, what is the resulting e ect y?
formally, which is the (unique) value y = g (x; m) given x 2 X and m 2 M ?

(ii) Causation problems: Given a model m and an e ect y, what is the underlying cause
x? formally, which are the (possibly multiple) values of x that solve equation (35.22)
given y 2 Y and m 2 M ?

(iii) Identi cation problems: Given a cause x and an e ect y, what is the underlying model
m? formally, which are the (possibly multiple) values of m 2 M that solve equation
(35.22) given x 2 X and y 2 Y ?

(iv) Induction problems: Given an e ect y, what are the underlying cause x and model m?
formally, which are the (possibly multiple) values of x 2 X and m 2 M that solve
equation (35.22) given x 2 X?

The latter three problems { causation, identi cation and induction { are formalized by
regarding (35.22) as an equation. For this reason, we call them inverse problems.19 We can
thus view the study of equations as a way to address such problems. In this regard, note
that:

1. In causation and identi cation problems, the equation (35.22) is parametric. In the
former problem, x is the unknown, y is the known term and m is a parameter; in the
latter problems, m is the unknown, y is the known term and x is a parameter.

2. In induction problems, y is the known term of equation (35.22), while x and m are the
unknowns.

Example 1657 Consider an orchard with several apple trees that produce a quantity of
apples according to the summer weather conditions; in particular, the summer could be
either cold or hot or mild. Here m is an apple tree that belongs to the collection M of the
apple trees of the orchard, y is the apple harvest with Y = [0; 1), and x is the average
summer temperature with X = [0; 1). We interpret m (x) as the quantity of apples that
the tree m produces when the summer weather is x. The trees in the orchard thus di er in
their performance in the di erent weather conditions.
In this example the previous four problems takes the form:

(i) Given a tree m and an average summer temperature x, what is the resulting apple
harvest y?
19
In this chapter we considered the case X; Y Rn , but the study of equations can be carried out more
generally, as readers will learn in more advanced courses.
35.8. ULTRACODA: EQUATIONS IN SCIENCE 1101

(ii) Given a tree m and an apple harvest y, what is the underlying average summer tem-
perature x?

(iii) Given an average summer temperature x and an apple harvest y, what is the underlying
tree m?

(iv) Given an apple harvest y, what are the underlying average summer temperature x and
tree m? N
1102 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
Chapter 36

Study of functions

It is often useful to have, roughly, a sense of how a function looks like. In this chapter we
will outline a qualitative study of functions. To this end, we rst introduce couple of classes
of points.

36.1 In ection points


We begin with a local notion of concavity.

De nition 1658 Let f : A R ! R and x0 an accumulation point of A. The function


f is said to be ( strictly) concave at x0 if there exists a neighborhood of x0 on which it is
(strictly) concave.

A dual de nition holds for (strict) convexity at a point. From Corollary 1438 it immedi-
ately follows the next result.

Proposition 1659 Let f : A R ! R be twice di erentiable at x0 2 A. If f is concave at


x0 , then f 00 (x0 ) 0 (with the derivative understood as one-sided when needed). If f 00 (x0 ) <
0, then f is strictly concave at x0 .

An dual characterization holds for (strict) convexity.

Example 1660 (i) The function f : R ! R given by f (x) = 2x2 3 is strictly convex at
each point because f 00 (x) = 4 > 0 at each x. (ii) The function f : R ! R given by f (x) = x3
is strictly convex at x0 = 5 since f 00 (5) = 30 > 0, and it is strictly concave at x0 = 1 since
f 00 ( 1) = 6 < 0. N

Geometrically, as we know well, for di erentiable functions concavity (convexity) means


that the tangent line lies always above (below) the graph of the function. Concavity (con-
vexity) at a point means, therefore, that the straight line tangent at that point lies locally {
that is, at least on a neighborhood of the point { above (below) the graph of the function.

1103
1104 CHAPTER 36. STUDY OF FUNCTIONS

5 10
y y

0 f(x )
0
6

-5 4 f(x )
0

-10

O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7

O.R. Like the rst derivative of a function at a point gives information on its increase or
decrease, so the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the \belly") of f at x0 { and the
\belly" is upward if f 00 (x0 ) < 0 and downward if f 00 (x0 ) > 0, as the previous gure shows.
Economic applications often consider the ratio

f 00 (x0 )
f 0 (x0 )

which does not depend on the unit of measure of f (x) (cf. Section 31.1.4). Indeed, let T and
S be the units of measure of the dependent and independent variables, respectively. Then,
the units of measure of f 0 and of f 00 are T =S and T =S 2 , so the unit of measure of f 00 =f 0 is
T
S2 1
T
=
S
S

Note that f 00 (x0 ) =f 0 (x0 ) is the derivative of log f 0 (x0 ). H

De nition 1661 Let f : A R ! R. An accumulation point x0 of A is said to be an


in ection point for f if there exists a neighborhood of x0 on which f is concave at the points
to the right of x0 and convex at the points to the left of x0 or vice versa.

In short, in an in ection point the \sign" of the concavity of the function changes. By
Proposition 1659, we have the following simple result.

Proposition 1662 Let f : A R ! R and x0 an accumulation point of A.

(i) If x0 is an in ection point for f , then f 00 (x0 ) = 0 (provided f is twice di erentiable at


x0 ).

(ii) If f 00 (x0 ) = 0 and f 000 (x0 ) 6= 0, then x0 is an in ection point for f (provided f is three
times continuously di erentiable at x0 ).
36.2. ASYMPTOTES 1105

Example 1663 (i) The origin is an in ection point of the cubic function f (x) = x3 . (ii)
2 2
Let f : R ! R be the Gaussian function f (x) = e x . Then f 0 (x) = 2xe x and f 00 (x) =
2
4x2 2 e x , so the function is concave for

1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in ection points. Indeed,
f 00 ( 1= 2) = 0. We will continue the study of this function later in the chapter in Section
36.4. N

For di erentiable functions, geometrically at a point of in ection x0 the tangent line cuts
the graph: it cannot lie (locally) above or below it. In particular, if f 0 (x0 ) = f 00 (x0 ) = 0
then the tangent line is horizontal and cuts the graph of the function: we talk of a point of
in ection with horizontal tangent.

Example 1664 The origin is an in ection point with horizontal tangent of the cubic func-
tion, as well as of any function f (x) = xn with n odd. N

36.2 Asymptotes
Intuitively, an asymptote is a straight line to which the graph of a function gets arbitrarily
close. Such straight lines can be vertical, horizontal, or oblique.

(i) When at least one of the two following conditions is satis ed:

lim f (x) = +1 or 1
x!x+
0

lim f (x) = +1 or 1
x!x0

the straight line of equation x = x0 is called a vertical asymptote for f .

(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1

with L 2 R, the straight line of equation y = L is called a horizontal asymptote for f


at +1 (or at 1).

(iii) When
lim (f (x) ax b) = 0 (or lim (f (x) ax b) = 0)
x!+1 x! 1

that is, when the distance between the function and the straight line y = ax + b tends
to 0 as x ! +1 (or ! 1), the straight line of equation y = ax + b is an oblique
asymptote for f to +1 (or to 1).

Horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one oblique asymptote as x ! 1 or as
x ! +1. It is, instead, possible that f has several vertical asymptotes.
1106 CHAPTER 36. STUDY OF FUNCTIONS

Example 1665 Consider the function

7
f (x) = 3
x2 +1

with graph

2
y
1.5

0.5

-0.5

-1

-1.5
O x
-2

-2.5

-3

-3.5
-5 0 5

Since limx!+1 f (x) = limx! 1 f (x) = 3; the straight line y = 3 is both a right and a
left horizontal asymptote for f (x). N

Example 1666 The function f : R f 1g ! R de ned by

1
f (x) = +2
x+1

with graph
8
y
6

0
O x
-2

-4

-5 0 5

has horizontal asymptote y = 2 and vertical asymptote x = 1. N

Example 1667 Consider the function

1
f (x) =
x2 +x 2
36.2. ASYMPTOTES 1107

with graph
3
y

0
O x

-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4 5

Since limx!1+ f (x) = +1 and limx!1 f (x) = 1, the straight line x = 1 is a vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1, also the
straight line x = 2 is a vertical asymptote for f (x). N

Example 1668 Consider the function

2x2
f (x) =
x+1

with graph
20
y
15

10

0
O x
-5

-10

-15

-20
-6 -4 -2 0 2 4 6

Since limx!+1 (f (x) 2x 2) = 0 and limx! 1 (f (x) 2x 2) = 0, the straight line


y = 2x + 2 is both a right and a left oblique asymptote for f (x). N

Vertical and horizontal asymptotes are easily identi ed. We thus shift our attention to
oblique asymptotes. To this end, we provide two simple results.

Proposition 1669 The straight line y = ax + b is an oblique asymptote of f as x ! 1 if


and only if limx! 1 f (x) =x = a and limx! 1 [f (x) ax] = b.
1108 CHAPTER 36. STUDY OF FUNCTIONS

Proof \If". When f (x) =x ! a, consider the di erence f (x) ax. If it tends to a nite
limit b, then (and only then) f (x) ax b ! 0. \Only if". From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.

The next result follows from de l'Hospital's rule.

Proposition 1670 Suppose that f is di erentiable and f (x) ! 1 as x ! 1. Then y =


ax+b is an oblique asymptote of f as x ! 1 if limx! 1 f 0 (x) = a and limx! 1 [f (x) ax] =
b.

Proposition 1669 gives a necessary and su cient condition for the search of oblique
asymptotes, while Proposition 1670 only provides a su cient condition. To use this latter
condition, the limits involved must exist. In this regard, consider the following example.

Example 1671 For the function f : R ! R given by

cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore, y = x is an oblique asymptote of f as x ! 1. Nevertheless, the rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x2
x2 x2
It is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N

In the following examples we determine the asymptotes of some functions.

Example 1672 For the function f : R ! R given by f (x) = 5x + 2e x, as x ! +1, we


have
f (x) 2
=5+ x !5
x xe
and
f (x) 5x = 2e x ! 0
Therefore, y = 5x is an oblique asymptote of f as x ! +1. As x ! 1 the function does
not have oblique (so horizontal) asymptotes. N
p
Example 1673 For the function f : [1; 1) ! R given by f (x) = x2 x, as x ! +1, we
have p r
f (x) x2 x 1
= = 1 !1
x x x
36.2. ASYMPTOTES 1109

and as x ! +1
r 1 !
p 1 1 2
f (x) x= x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2

Therefore,
1
y=x
2
is an oblique asymptote as x ! +1 for f . N

It is quite simple to realize that:

(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g share the possible
oblique asymptotes.

(ii) If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial


p of degree n in x with a0 > 0 and
n odd, then the function de ned by f (x) = n pn (x) has, as x ! 1, the oblique
asymptote
p 1 a1
y = n a0 x +
n a0
If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial
p of degree n in x with a0 > 0 and
n even, then the function de ned by f (x) = n pn (x) as x ! +1 has the oblique
asymptote
p 1 a1
y = n a0 x +
n a0
and as x ! 1 the oblique asymptote

p 1 a1
y= n
a0 x +
n a0

Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1, we have
p q
n
a xn n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
hence the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a 1 xn 1 +:::+a
a0 xn n
a 0 xn
1110 CHAPTER 36. STUDY OF FUNCTIONS

Since as x ! 1
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
1 p a1 xn 1 + ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a 0 xn

we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1, and a1 = 1. Indeed, as x ! +1, the
asymptote had the equation
p
2 1 1 1
y= 1 x+ =x
2 1 2

36.3 Study of functions


The di erential calculus results so far obtained allow for a qualitative study of functions.
Such a study consists in nding the possible local maximizers and minimizers, the in ection
points, and the asymptotic and boundary behavior of the function.
Let us consider a function f : A R ! R de ned on a set A. To apply the results of the
chapter, we assume that f is twice di erentiable at each interior point of A. The study of f
may be articulated in a few steps.

(i) We rst calculate the limits of f at the boundary points of the domain, and also as
x ! 1 when A is unbounded.

(ii) We determine the sets on which the function is positive, f (x) 0, increasing, f 0 (x)
0, and concave/convex, f 00 (x) Q 0. Once it is also determined the intersections of the
graph with the axes by nding the set f (0) on the vertical axis and the set f 1 (0) on
the horizontal axis, we begin to have a rst idea of its graph.

(iii) We look for candidate extremal points via rst and second-order conditions (or, more
generally, via the omnibus procedure of Section 29.2.2).

(iv) We look, via the condition f 00 (x) = 0, for candidate in ection points; they are certainly
so if at them f 000 6= 0 (provided f is three times continuously di erentiable at x).

(v) Finally, we look for possible oblique asymptotes of f .

Next we study a few functions.

Example 1674 Let f : R ! R be given by f (x) = x6 3x2 + 1. We look for possible local
extremal points. The rst-order condition f 0 (x) = 0 has the form

6x5 6x = 0

therefore x = 0 and x = 1 are the unique critical points. We have f 00 (0) = 6, f 00 ( 1) =


24, and f 00 (1) = 24. Hence, x = 0 is a local maximizer, while x = 1 and x = 1 are local
36.3. STUDY OF FUNCTIONS 1111

minimizers. From limx!+1 f (x) = limx! 1f (x) = +1 if follows that the graph of this
function is:
2

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

N
Example 1675 De ne f : [0; 1) ! R by
(
x log x x>0
f (x) =
0 x=0
This function is continuous (why at the origin?) and is zero at the points 0 and 1. In view
of Example 1685, it is strictly concave and its unique maximizer is the point 1=e. Since
limx!+1 f (x) = 1, we conclude that the graph of f is

N
Example 1676 Let f : R ! R be given by f (x) = x3 7x2 + 12x. We have
lim f (x) = 1 , lim f (x) = +1
x! 1 x!+1

Therefore, there are no asymptotes. Then we have:


1112 CHAPTER 36. STUDY OF FUNCTIONS

1. f (0) p= 0 and f (x) = 0, that is, x x2 7x + 12 = 0 for x = 0 and for x =


7 49 48 =2 = 3 and 4. Given that it is possible to write f (x) = x (x 3) (x 4),
the function is 0 when x 2 [0; 3] [ [4; 1).

2. Since f 0 (x) = 3x2 14x + 12, the derivative is zero for


p p p
14 196 144 14 52 7 13
x= = =
6 6 3
p p
The derivative is 0 when x 2 ( 1; (7 13)=3] [ [(7 + 13)=3; 1).

3. Since f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.

00 ((7
p
4. Since
p f 13)=3) < 0, the point is a local maximizer; since instead f 00 ((7 +
13)=3) > 0, the point is a local minimizer. Finally, the point 7=3 is of in ection.

In sum, the graph of the function is:

10

y
8

0
O x

-2
-3 -2 -1 0 1 2 3 4 5 6 7

Example 1677 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:

1. f (x) 0 () x 0.

2. f 0 (x) = (x + 1) ex 0 () x 1.

3. f 00 (x) = (x + 2) ex 0 () x 2.

4. f (0) = 0, so the origin is the unique point of intersection with the axes.
36.3. STUDY OF FUNCTIONS 1113

Since f 0 (x) = 0 for x = 1 and f 00 ( 1) = e 1 > 0, the unique minimizer is x = 1.


Given that f 00 (x) = 0 for x = 2, it is a point of in ection. In sum, the graph of the function
is:

10

9
y
8

0
O x
-1
-6 -4 -2 0 2 4 6

Example 1678 Let f : R ! R be given by f (x) = x2 ex . Its limits are

lim x2 ex = 0+ , lim x2 ex = +1
x! 1 x!+1

We then have:

1. f (x) is always 0 and f (0) = 0, hence x = 0 is a minimizer.

2. f 0 (x) = x (x + 2) ex 0 () x 2 ( 1; 2] [ [0; 1).

p p
3. f 00 (x) = x2 + 4x + 2 ex 0 () x 2 ( 1; 2 2] [ [ 2 + 2; +1).

4. x = 2 and x = 0 are the unique stationary points. Since f 00 ( 2) = 2e 2 < 0, then


x = 2 is a local maximizer. Given that f 00 (0) = 2e0 > 0, this con rms that x = 0 is
a minimizer.

p
5. The two points of abscissae 2 2 are in ection points.
1114 CHAPTER 36. STUDY OF FUNCTIONS

In sum, the graph of the function is:

8 y
7

0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5

Example 1679 Let f : R ! R be given by f (x) = x3 ex . Its limits are

lim x3 ex = 0 , lim x3 ex = +1
x! 1 x!+1

We then have that:

1. f (0) = 0; f (x) 0 () x 0.

2. f 0 (x) = x2 (x + 3) ex 0 () x 3; note that f 0 (0) = 0 as well as f 0 > 0 close to


x = 0: the function is therefore increasing at the origin.

p p
3. f 00 (x) = x3 + 6x2 + 6x ex 0 () x 2 3 3; 3 + 3 [ [0; 1).

4. x = 3 and x = 0 are the unique stationary points. Since f 00 ( 3) = 9e 3 > 0, x = 3


is a local minimizer. One has f 00 (0) = 0, and we already know that the function is
increasing at x = 0.

p
5. The three points of abscissae 3 3 and 0 are in ection points.
36.3. STUDY OF FUNCTIONS 1115

In sum, the graph of the function is:

8
y
7

0
O x
-1

-2
-6 -5 -4 -3 -2 -1 0 1 2 3

Example 1680 Let f : R ! R be given by

1
f (x) = 2x + 3 +
x 2

This function is not de ned at x = 2. We have

lim f (x) = lim f (x) = 1 , lim f (x) = lim f (x) = +1


x! 1 x!2 x!2+ x!+1

1. f (0) = 3 0:5 = 2:5; we have f (x) = 0 when (2x + 3) (x 2) = 1, that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1:35 and 1:85
4

2. One has that


1
f 0 (x) = 2
(x 2)2
p
which is zero if (x 2)2 = 1=2, i.e., if x = 2 (1= 2).

3. Since
2
f 00 (x) =
(x 2)3
is positive
p for every x p
> 2 and negative for every x < 2, the two stationary points
2 + (1= 2) and 2 (1= 2) are, respectively, a local minimizer and a local maximizer.
1116 CHAPTER 36. STUDY OF FUNCTIONS

4. Since f 0 (x) ! 2 as x ! 1, the function has an oblique asymptote. Further, since

1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2

the oblique asymptote has equation y = 2x + 3. Clearly, there is also a vertical


asymptote of equation x = 2.
In sum, the graph of the function is:

25

20 y
15

10

0
O x
-5

-10

-15

-20

-25
-5 0 5 10

Note that
1
f (x) as x ! 2 and f (x) 2x + 3 as x ! 1
x 2
Thus, near the point 2 the function f (x) behaves like 1= (x 2), i.e., it diverges, while
for x su ciently large it behaves like the straight line 2x + 3. N

36.4 Bells
Gaussian function Let f : R ! R be the Gaussian function
x2
f (x) = e

Both limits, as x ! 1, are 0. So, the horizontal axis is a horizontal asymptote. The
function is always strictly positive and f (0) = 1. Next, we look for possible local extremal
2
points. The rst-order condition f 0 (x) = 0 has the form 2xe x = 0, so the origin x = 0
is the unique critical point. The second derivative is
x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1

Being f 00 (0) = 2, the origin is a local maximizer. Since

x < 0 < y =) f 0 (x) > 0 > f 0 (y)


36.4. BELLS 1117

by Proposition 1343 the origin is actually a strong global maximizer. Moreover, we have

1 1
f 00 (x) < 0 () 2x2 1 < 0 () x 2 p ;p
2 2
1
f 00 (x) = 0 () 2x2 1 = 0 () x = p
2
1 1
f 00 (x) > 0 () 2x2 1 > 0 () x 2 1; p [ p ; +1
2 2

p
So, the
p points
p x = 1= 2 are in ection points, with fp concave on
p the open interval
( 1= 2; 1= 2) and convex on the open intervals ( 1; 1= 2) and (1= 2; +1). The graph
of the function is the famous Gaussian bell:

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

which is probably the most classical among the graphs of functions.

Versiera of Agnesi The versiera of Agnesi is a curve that can be constructed as follows.1
Consider a circle of radius a > 0 centered at the point (0; a) and a straight line of equation
y = 2a. Take all straight lines that pass through the origin 0. Each of them intersects the
straight line y = 2a at a unique point A = (A1 ; A2 ). In turn, the segment 0A intersects
twice the circle, at the origin and at a point B = (B1 ; B2 ). The versiera is the locus of
points (A1 ; B2 ) that have as a rst coordinate that of A and as a second coordinate that of

1
It is named after Maria Agnesi, who studied it in her 1748 book Instituzioni analitiche.
1118 CHAPTER 36. STUDY OF FUNCTIONS

B. Graphically:

y
1.5

A A A
3 1 2
1

0.5
B B
3
1

0 B
2
-0.5

-1
O x
-1.5

-2
-2 -1 0 1 2

It easy to check that the versiera has equation

8a3
f (x) = 8x 2 R
x2 + 4a2

Remarkably, its graph is bell-shaped and reminds that of the Gaussian function:

5
y
4

0
O x

-1
-5 0 5

Indeed, we have
8a3
f 0 (x) = 2x
(x2 + 4a2 )2
36.4. BELLS 1119

which is zero only at the origin x = 0. Since xf 0 (x) < 0 for all x 6= 0,2 the origin is the
(global) maximizer. We also have

8a3 8a3 8a3 2x2


f 00 (x) = 2 + (2x)2 =2 1+
(x2 + 4a2 )2 (x2 + 4a2 )3 (x2 + 4a2 )2 x2 + 4a2

which is zero when


2x2
= 1 () x2 = 4a2 () x = 2a
x2 + 4a2
The points x = 2a are thus in ection points for f . In particular, when a = 1=2 we have
the most standard instance of the versiera
1
f (x) =
x2 +1

2
That is, f 0 (x) > 0 for all x < 0 and f 0 (x) < 0 for all x > 0.
1120 CHAPTER 36. STUDY OF FUNCTIONS
Part VII

Di erential optimization

1121
Chapter 37

Unconstrained optimization

37.1 Unconstrained problems


In the last part of the book we learned some remarkable tools that di erential calculus
provides for the study of local solutions of the optimization problems introduced in Chapter
22, problems that are at heart of economics (and of our book). In the next few chapters on
optimization theory we will show how these tools can be used to nd global solutions of such
problems, which are the real object of interest in applications { as we already stressed several
times. In other words, we will learn how the study of local solutions can be instrumental
for the study of global ones. To this end, we will study two main classes of problems:
(i) problems with coercive objective functions, in which we can combine local di erential
results a la Fermat with global existence results a la Weierstrass and Tonelli; (ii) problems
with concave objective functions that can rely on the fundamental optimality properties of
concave functions.
In this introductory chapter we illustrate a few classic di erential optimization themes
via an unconstrained di erential optimization problem

max f (x) sub x 2 C (37.1)


x

with objective function f : A Rn ! R which is di erentiable on an open choice set C A.


As usual, a point x^ 2 C is a (global) solution of this optimization problem if f (^x) f (x)
for each x 2 C, while it is a local solution of such a problem if there exists a neighborhood
Bx0 (") of x x) f (x) for each x 2 Bx0 (") \ C.1
^ such that f (^

37.2 Coercive problems


An unconstrained di erential optimization problem is said to be coercive if the objective
function f is coercive on C. Since the continuity of f on C is guaranteed by di erentiability,
Tonelli's Theorem can be used for this class of problems. Along with Fermat's Theorem, it
gives rise to the so-called elimination method for solving optimization problems that in this
chapter will be used to deal with unconstrained di erential optimization problems.

The elimination method consists in the following two phases:


1
As in the rest of the book, solutions are understood to be global even when not stated explicitly.

1123
1124 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

1. identify the set S of critical points of f on C, i.e.,

S = fx 2 C : rf (x) = 0g

2. construct the set f (S) = ff (x) : x 2 Sg; if x


^ 2 S is such that

f (^
x) f (x) 8x 2 S (37.2)

then x
^ is a solution for the optimization problem (37.1).

In other words, once the conditions for Tonelli's Theorem to be applied are veri ed, one
constructs the set of critical points S. A point x ^ in S where f attains its top value is a
solution of the optimization problem and f (^x) is the maximum value of f on C.

N.B. If the function f is twice continuously di erentiable, in phase 1 instead of S one can
consider the subset S2 S of the critical points that satisfy the second-order necessary
condition (Sections 28.5.3 and 29.3.3). O

The rationale of the elimination method is simple. By Fermat's Theorem, the set S
consists of all points in C which are candidate local solutions for the optimization problem
(37.1). On the other hand, if f is continuous and coercive on C, by Tonelli's Theorem there
exists at least a solution for this optimization problem. Such a solution must belong to
the set S (as long as it is non-empty) because a solution of the optimization problem is, a
fortiori, a local solution. Hence, the solutions of the \restricted" optimization problem

max f (x) sub x 2 S (37.3)


x

are also solutions of the optimization problem (37.1). But, the solutions of the restricted
problem (37.3) are the points x^ 2 S for which condition (37.2) holds, which are then the
solutions of optimization problem (37.1), as phase 2 of the elimination method states.

As the following examples show, the elimination method elegantly and e ectively com-
bines Tonelli's global result with Fermat's local one. Note how Tonelli's Theorem is crucial
since in unconstrained di erential optimization problems the choice set C is open, so Weier-
strass' Theorem inapplicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works in that phase 2
requires a direct comparison of f at all points of S. For this reason, the method is particularly
e ective when we can consider, instead of S, its subset S2 consisting of all critical points
which satisfy the second-order necessary condition.
2
Example 1681 Let f : Rn ! R be given by f (x) = (1 kxk2 )ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1

for any sequence fxn g of vectors such that tn = kxn k ! +1. Since it is continuous, f is
coercive on Rn by Proposition 1019. The unconstrained di erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (37.4)
x
37.2. COERCIVE PROBLEMS 1125

is thus coercive. Let us solve it by using the elimination method.

Phase 1: It is easy to see that


rf (x) = 0 () x = 0
so that S = f0g and x = 0 is the unique critical point.

Phase 2: Since S is a singleton, this phase trivially implies that x


^ = 0 is a solution of
optimization problem (37.4). N

Example 1682 Let f : R ! R be given by f (x) = x6 + 3x2 1 and let C = R. By


Proposition 1019, f is coercive on R because limx! 1 f (x) = limx! 1 ( x6 + 3x2 1) =
1. The unconstrained di erential optimization problem

max x6 + 3x2 1 sub x 2 R (37.5)


x

is thus coercive. Let us solve it with the elimination method.

Phase 1: The rst-order condition f 0 (x) = 0 takes the form 6x5 6x = 0, so x = 0


and x = 1 are the only critical points, that is, S = f 1; 0; 1g. We have f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24, so S2 = f0g.

Phase 2: Since S2 is a singleton, this phase trivially implies that x


^ = 0 is a solution of the
optimization problem (37.5). N

Example 1683 Let us get back to the unconstrained optimization problem


x4 +x2
max e sub x 2 R
x

of Example 1341. Let us check that this di erential problem is coercive. By setting g (x) = ex
and h (x) = x4 x2 , it follows that f = g h. We have limx! 1 h (x) = limx! 1 x4 +x2 =
1. So, by Proposition 1019 the function h is coercive on R. Since g is strictly increasing,
the function f is a strictly increasing transformation of a coercive function. By Proposition
1007, f is coercive.
This unconstrained di erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 1341 we know that S2 = 1= 2; 1= 2 .
p p p
Phase 2: We have f ( 1= 2) = f (1= 2), so both points x ^ = 1= 2 are solutions of the
unconstrained optimization problem. The elimination method allowed us to identify the
nature of such points, something not possible by using solely di erential methods as in
Example 1341. N

Example 1684 Example 1385 dealt with the optimization problem

max f (x) sub x 2 R2++


x
1126 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

where f : R2 ! R is de ned by f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3. The function


f is supercoercive: indeed, it is easily seen that

f (x1k ; x2k ) = 2x21k x22k + 3 (x1k + x2k ) x1k x2k + 3 ! 1


q
for any \exploding" sequence fxk = (x1k ; x2k )g R2++ , that is, such that kxk k = x21k + x22k !
+1. As f is continuous, it is coercive on Rn by Proposition 1019.
This unconstrained di erential optimization problem is coercive as well, so it can be
solved with the elimination method.

Phase 1: By Example 1385, S2 = f3=7; 9=7g.

Phase 2: As S2 is a singleton, this phase trivially implies that x


^ = (3=7; 9=7) is a solution
of the optimization problem (37.5). The elimination method has allowed us to identify the
nature of such a point, thus making it possible to conclude the study of the optimization
problem started in Example 1385. N

37.3 Concave problems


Optimization problems with concave objective functions are pervasive in economic applica-
tions because concave function can be often given a plausible (at times, even compelling)
economic meaning that makes it possible to take advantage of their remarkable optimality
properties.2 In particular, the unconstrained di erential optimization problem (37.1), i.e.,

max f (x) sub x 2 C (37.6)


x

is said to be concave if the set C A is both open and convex and if the function f : A
Rn ! R is both di erentiable and concave on C.
As we learned earlier in the book (Section 31.5.1), in a such a problem the rst-order
condition rf (^ x) = 0 becomes necessary and su cient for a point x ^ 2 C to be a solution.
This remarkable property explains the importance of concavity in optimization problems.
But, more is true: by Theorem 1032, such a solution is unique if f is strictly quasi-concave.
Besides existence, also the study of the uniqueness of solutions { key for comparative statics
exercises { is best carried out under concavity.

The necessary and su cient status of the rst-order condition leads to the concave (elim-
ination) method to solve the concave problem (37.6). It consists of a single phase:

1. Find the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.

In particular, when f is strictly quasi-concave, the set S is a singleton that consists of


the unique solution. This is the case when the concave method is most powerful. In general,
this method is, at the same time, simpler and more powerful than the method of elimination.
2
Recall the discussion on diversi cation in Section 17.4.
37.3. CONCAVE PROBLEMS 1127

It requires the concavity of the objective function, a demanding condition that, however, is
often assumed in economic applications, as remarked before.3

Example 1685 Let f : (0; 1) ! R be given by x log x and let C = (0; 1). The function
f is strictly concave since f 00 (x) = 1=x < 0 for all x > 0 (Corollary 1438). Let us solve the
concave problem
max x log x sub x > 0 (37.7)
x

We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
According to the concave method, x
^ = 1=e is the unique solution of problem (37.7). N

Example 1686 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian

4 3
3 12

is negative de nite (Proposition 1474). Let us solve the concave problem

max 2x2 3xy 6y 2 sub x 2 R2 (37.8)


x

We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (37.8). N

Example 1687 For bundles with two goods, the Cobb-Douglas utility function u : R2+ ! R
is u (x1 ; x2 ) = xa1 x12 a , with a 2 (0; 1). Consider the consumer problem

max u (x) sub x 2 (p; w) (37.9)


x

where (p; w) = x = (x1 ; x2 ) 2 R2+ : p1 x1 + p2 x2 = w is the budget set, with p1 ; p2 > 0


(strictly positive prices). We can easily solve this problem by substitution. Indeed, from the
budget constraint we have
w p1 x1
x2 =
p2
In view of this expression, de ne f : [0; w=p1 ] ! R by4
1 a
w p1 x1
f (x1 ) = xa1
p2
3
Actually, in these applications strict concavity is often assumed in order to have unique solutions, so to
best carry out comparative statics exercises. For instance, in many works in economics, utility functions u
that are de ned on monetary outcomes { i.e., on the real line { are assumed to be such that u0 > 0 and
u00 < 0, so strictly increasing (Proposition 1324) and strictly concave (Corollary 1438).
4
The condition x1 I=p1 ensures that x2 0.
1128 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

Problem (37.9) is equivalent to

w
max f (x1 ) sub x1 2 0;
x1 p1

Since f (0) = f (w=p1 ) = 0 and f 0, the maximizers are easily seen to belong to the open
interval (0; w=p1 ). Therefore, we can consider the nicer unconstrained problem

w
max f (x1 ) sub x1 2 0;
x1 p1

where x1 is required to belong to an open interval. We can actually even do better by


considering the logarithmic transformation g = log f of the objective function f , that is,
w p1 x1
g (x1 ) = a log x1 + (1 a) log
p2
The problem
w
max g (x1 ) sub x1 2 0;
x1 p1
is equivalent to the last one (Proposition 978), but more tractable because of the log-linear
form of the objective function. We have

a p1 1 (w p1 x1 )
g 0 (x1 ) = 0 () = (1 a) w p1 x1 () a = p1 (1 a)
x1 p2 p2
x1

Since g is easily checked to be strictly concave, by the concave method the unique maximizer
is
w
x
^1 = a
p1
By replacing it in the budget constraint, we conclude that

w w
x
^= a ; (1 a)
p1 p2

is the unique solution of the Cobb-Douglas consumer problem (37.9). N

The last example shows that via some little \cogito ergo solvo" it may possible to reduce
a constrained optimization problem into a simpler unconstrained one. Next we give another
simple illustration of this important fact.

Example 1688 De ne f : [0; 1) ! R by


(
x log x x>0
f (x) =
0 x=0

Consider the optimization problem

max f (x) sub x 2 [0; 1] (37.10)


x
37.4. RELATIONSHIP AMONG PROBLEMS 1129

The function f is positive on [0; 1], strictly positive on (0; 1) and zero at the points 0 and 1.
So, any solution of this problem must belong to (0; 1) and we can thus consider the simpler
unconstrained problem
max f (x) sub x 2 (0; 1)
x

In view of Example 1685, x^ = 1=e is the unique solution of problem (37.10). This nding is
consistent with the graph of f described in Example 1675, i.e.,

37.4 Relationship among problems


In this introductory chapter we introduced the two relevant classes of unconstrained di er-
ential optimization problems: coercive and concave ones. A few observations are in order:

1. The two classes are not exhaustive: there are unconstrained di erential optimization
problems which are neither coercive nor concave. For example, the unconstrained
di erential optimization problem

max cos x sub x 2 R


x

is neither coercive nor concave: the cosine function is neither coercive on the real line
(see Example 1006) nor concave. Nonetheless, the problem is trivial: as one can easily
infer from the graph of the cosine function, its solutions are the points x = 2k con
k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi cation.

2. The two classes are not disjoint: there are unconstrained di erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di erential
optimization problem
max 1 x2 sub x 2 R
x
1130 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

is both coercive and concave: the function 1 x2 is indeed both coercive (see Example
1010) and strictly concave on the real line. In cases such as this one, we use the more
powerful concave method.5

3. The two classes are distinct: there are unconstrained di erential optimization problems
which are coercive but not concave, and vice versa.

(a) Let f : R ! R be given by


(
1 x2 if x 0
f (x) =
1 if x > 0

Since f is di erentiable (Example 1218), the problem

max f (x) sub x 2 R


x

is an unconstrained di erential optimization problem. The graph of function f

3
y

1
1

0
O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di erential optimization problem

x2
max e sub x 2 R
x

is coercive but not concave: the Gaussian function e x2 is indeed coercive (Ex-

5
As coda readers may have noted, this objective function is strongly concave. Indeed, it is for such a class
of concave functions that the overlaps of the two classes of unconstrained di erential optimization problems
works at best.
37.5. RELAXATION 1131

ample 1008) but not concave, as its famous bell graph shows

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

37.5 Relaxation
An optimization problem
max f (x) sub x 2 C
x
with objective function f : A Rn ! R may be solved by relaxation, that is, by considering
an ancillary optimization problem

max f (x) sub x 2 B


x

which is characterized by a larger choice set C B A which is, however, analytically more
convenient (for example it may be convex or open), so that the relaxed problem becomes
coercive or concave. If, crossing ngers, a solution of the relaxed problem happens to belong
to the original choice set C, it automatically solves the original problem as well. The following
examples should clarify this simple yet powerful idea, which may permit to solve optimization
problems that are neither coercive nor concave.

Example 1689 (i) Consider the optimization problem


2
max 1 kxk2 ekxk sub x 2 Qn+ (37.11)
x

where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An obvious
relaxing of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x

whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is the
^ = 0 (Example 1681). Since it belongs to Qn+ , we conclude that the origin is also
origin x
the unique solution of problem (37.11). It would have been far more complex to reach such
a conclusion by studying the original problem directly.
1132 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

Exercise 1690 (ii) Consider the consumer problem with log-linear utility
n
X
max ai log xi sub x 2 C (37.12)
x
i=1

where C = B (p; w) \ Qn is the set of bundles with rational components (a realistic assump-
tion). Consider the relaxed version
n
X
max ai log xi sub x 2 B (p; w)
x
i=1

with a larger yet convex { thus analytically more convenient { choice set. Indeed, convexity
itself allowed us to conclude in Section 22.6 that the unique solution of the problem is the
bundle x ^ such that x
^i = ai w=pi for every good i = 1; :::; n. If ai ; pi ; w 2 Q for every i, the
bundle x ^ belongs to C, so is the unique solution of problem (37.12). It would have been far
more complex to reach such a conclusion by studying problem (37.12) directly. N

In sum, it is sometimes convenient to ignore some of the constraints that the choice
features set when doing so makes the choice set larger yet more analytically tractable, in the
hope that some solutions of the relaxed problem belong to the original choice set.
We close with an example, based on an example of Leonida Tonelli, that nicely illustrates
some of the themes explored so far in this chapter.

Example 1691 We want to divide a natural number n in three parts n1 , n2 , and n3 such
that the sum of their cubes is minimal. To solve this problem amounts to solving the following
optimization problem:

min n31 + n32 + n33 (37.13)


n1 ;n2 ;n3

sub n1 + n2 + n3 = n and n1 ; n 2 ; n 3 2 N

Intuitively, the problem has a unique solution n1 = n2 = n3 = n=3 when n is odd. To


verify this guess, we relax the problem by dropping the requirements that the three parts
are natural numbers. The problem thus takes the following form:

min x31 + x32 + x33


x1 ;x2 ;x3

sub x1 + x2 + x3 = n and x1 ; x2 ; x3 2 R+

By substitution, we can simplify the problem as follows:

min f (x1 ; x2 ) sub (x1 ; x2 ) 2 C


x1 ;x2

where the objective function f : R2 ! R is

f (x1 ; x2 ) = x31 + x32 + (n x1 x2 )3

and the choice set C is the isosceles right triangle

C = x 2 R2+ : x1 + x2 n
37.5. RELAXATION 1133

Its interior is the convex set int C = x 2 R2++ : x1 + x2 < n . The unconstrained optimiza-
tion problem
max f (x1 ; x2 ) sub (x1 ; x2 ) 2 int C (37.14)
x1 ;x2

is concave. To check the convexity of f , so the concavity of f , de ne ' : R ! R by


' (z) = (n z)3 . This function is convex on (1; n] because '00 (z) = 6 (n z) 0 for all
z n. Since cubic functions are convex, we conclude that f (x1 ; x2 ) = x31 + x32 + ' (x1 + x2 )3
is convex on C, so on int C. Problem (37.14) is thus concave.
Let us nd the set S = fx 2 int C : rf (x) = 0g of the stationary points of f , so of f ,
on int C. It holds
@f @f
(x) = 3x21 + 3 (n x1 x2 )2 ; (x) = 3x22 + 3 (n x1 x2 )2
@x1 @x2
Thus,
@f
(x) = 0 () x21 = (n x1 x2 )2 () x1 = (n x1 x2 )
@x1
@f
(x) = 0 () x22 = (n x1 x2 )2 () x2 = (n x1 x2 )
@x2
To get the stationary points, we need to solve four linear systems:
x1 = n x1 x2 x1 = n x1 x2
;
x2 = n x1 x2 x2 = n + x1 + x2
x1 = n + x1 + x2 x1 = n + x1 + x2
;
x2 = n x1 x2 x2 = n + x1 + x2
that is,
x2 = n 2x1 x2 = n 2x1 x2 = n x2 = n
; ; ;
x1 = n 2x2 x1 = n x1 = n 2x2 x1 = n
The solutions of these systems are
n n n o
; ; (n; n) ; ( n; n) ; (n; n)
3 3
Thus, n n n o
;
S=
3 3
By the concave method, the point x ^ = (n=3; n=3) solves problem (37.14). Since f is easily
checked to be strictly concave on int C, the point x ^ is actually the unique solution. This
point is also the unique solution of problem

max f (x1 ; x2 ) sub (x1 ; x2 ) 2 C (37.15)


x1 ;x2

For, let x 2 @C. For each 2 (0; 1), de ne z = x ^ + (1 ) x. Each z belongs to


int C. Thus, f (z ) < f (^ x) for each 2 (0; 1). By the continuity of f , we then have
f (x) < f (^ x). We conclude that x ^ is the unique solution of problem (37.15). In turn,
this is easily seen to prove the initial guess that n1 = n2 = n3 = n=3 is, when n is odd, the
unique solution of the division problem (37.13). N
1134 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

37.6 Optimization and equations: general least squares


Equations play a key role in unconstrained optimization problems via rst-order conditions.
Interestingly, the converse is also true: equations can be addressed via unconstrained opti-
mization problems. Indeed, consider equation (35.1), i.e.,

f (x) = y0 (37.16)

where f is an operator f : A Rn ! Rn and y0 is a given element of Rn . Consider the


unconstrained optimization problem

min kf (x) y0 k2 sub x 2 Rn (37.17)


x

If a vector x x) y0 k2 =
^ 2 A solves equation (37.16), then it solves problem (37.17). Indeed, kf (^
0. The converse is false because the optimization problem might have solutions even though
the equation has no solutions. Even in this case, however, the optimization connection is
important because the solutions of the optimization problems are the best approximations
{ i.e., the best surrogates { of the missing solutions. A classic example is a system of linear
equations Ax = b, which has the form (37.17) via the linear function f (x) = Ax de ned on
Rn and the known term b 2 Rm , i.e.,

min kAx bk2 sub x 2 Rn (37.18)


x

In this case (37.17) is a least squares problem and, when the system has no solutions, we
have the least squares solutions studied in Section 22.10.
In sum, the solutions of the optimization problem (37.17) are candidate solutions of equa-
tion (37.16). If they turn out not to be solutions, they are nevertheless best approximations.
As to problem (37.17), assume that the image of f is a closed convex set of Rn . Consider
the auxiliary problem
min ky y 0 k2 sub y 2 Im f
y

By the general Projection Theorem (Section 31.6), there is a unique solution y^ 2 Im f , which
is characterized by the condition

(y0 y^) (^
y y) 0 8y 2 Im f

All the vectors x 2 f 1 (^


y ) that belong to the preimage of y^ are, then, the candidate solutions
of equation (37.16). In the linear case f (x) = Ax we get back to the least squares solutions
(24.8).
This simple argument, which generalizes the spirit of the least squares method from
linear to general equations, illustrates the possibility of solving equations via optimization
problems. The problems of nding solutions of equations and of optimization problems are
closely connected, more than it may appear prima facie. Each of the two problems can be
addressed via the other one, which then plays an ancillary role that becomes relevant when
it features signi cantly better computational properties than the original problem.
37.7. CODA: COMPUTATIONAL ISSUES 1135

37.7 Coda: computational issues


Motivated by the last section, in this coda we discuss some computational issues for opti-
mization problems.6 Throughout we consider an optimization problem

max f (x) sub x 2 C (37.19)


x

that admits at least a solution, i.e., arg maxx2C f (x) 6= ;. To ease notation, we denote the
maximum value by f^ = maxx2C f (x).

37.7.1 Decision procedures


De nition 1692 A sequence fxn g C is relaxing for problem (37.19) if f (xn ) f (xn+1 )
for all n .

In words, a sequence fxn g in the choice set is relaxing if the objective function assumes
larger and larger values, so it gets closer and closer to the maximum value f^, as n increases.
The following notion gives some computational content to problem (37.19).

De nition 1693 Let f : A Rn ! R be a real-valued function and C a subset of A. A


self-map h : C ! C is a (homogeneous) optimal decision procedure with speed of order k > 0
of problem (37.19) if, for each initial condition x0 2 C, the sequence of iterates

xn+1 = h (xn )

is a relaxing sequence such that

1
f^ f (xn ) = O
nk

The sequence of iterates fxn g is de ned recursively via h. Their images f (xn ) converge
to the maximum value f^ at a rate k, that is, there is a constant c > 0 such that
c
f^ f (xn )
nk

We consider the convergences of images f (xn ) because one should be primarily interested
in getting, as fast as possible, to values that are almost optimal. Indeed, solutions have per
se only an instrumental role, ultimately what matters is the value that they permit to attain.
In particular, given a threshold " > 0, iterates xn are "-optimal if
1
c k
n
"
So, if we are willing to accept an " deviation from the maximum value, it is enough to
1
perform (c=") k iterates.
6
We refer interested readers to Nesterov (2004) for a authoritative presentation of this topic.
1136 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

37.7.2 Gradient descent


We can establish the existence of optimal decision procedures for di erentiable objective
functions that have Lipschitz continuous derivative operators. Speci cally, say that a func-
tion f : U ! R de ned on an open set of Rn is -smooth, for some constant > 0, if it is
di erentiable with
krf (x) rf (y)k kx yk 8x; y 2 U
We consider the following unconstrained version of problem (37.19):

max f (x) sub x 2 Rn (37.20)


x

Theorem 1694 Let f : Rn ! R be a -smooth. If f is concave, then the map h : Rn ! Rn


de ned by
1
h (x) = x + rf (x) (37.21)

is an optimal decision procedure for problem (37.20), with

^k2
2 kx0 x
f^ f (xn ) (37.22)
n
for the sequence fxn g of its iterates.

Thus, objective functions that are -smooth and concave have a optimal decision pro-
cedure (37.21), called gradient descent, with unitary speed { i.e., with O (1=n) errors. The
gradient descent procedure prescribes that, if at x we have @f (x) =@xi > 0 (resp., < 0), in
the next iterate we increase (resp., decrease) the component i of the vector x. If one draws
the graph of a scalar concave function, the intuition behind this rule should be apparent.7
This rule reminds a basic rule of thumb when trying to reach the peak of a mountain: at a
crossroad, always take the rising path.
The proof relies on the following lemma of independent interest (it is a rst-order ap-
proximation with integral remainder).

Lemma 1695 Let f : U ! R be a di erentiable function de ned on an open set of Rn .


Then Z 1
f (x) f (y) = rf (x + t (y x)) (y x) dt
0
for all x; y 2 U .

Proof Let x; y 2 U . De ne the auxiliary function : [0; 1] ! R by (t) = f ((1 t) x + ty).


Since f is di erentiable, the function is easily seen to be di erentiable. By the chain rule,
we then have
n
X
0 @f ((1 t) x + ty)
(t) = (yi xi ) = rf (x + t (y x)) (y x)
@xi
i=1
7
A dual version of this result holds for minimization problem with convex objective functions, with h (x) =
1
x rf (x).
37.7. CODA: COMPUTATIONAL ISSUES 1137

By (44.64), we have
Z 1 Z 1
0
f (y) f (x) = (1) (0) = (t) dt = rf (x + t (y x)) (y x) dt
0 0

as desired.

The next lemma reports some important inequalities for -smooth functions.

Lemma 1696 Let f : U ! R be a -smooth function de ned on an open set of Rn . Then

f (y) f (x) + rf (x) (y x) + ky xk2 (37.23)


2
for all x; y 2 U . If, in addition, f and U are convex, then
krf (x) rf (y)k2 (rf (x) rf (y)) (x y) (37.24)
for all x; y 2 U .

Proof By (1695), we can write


Z 1
f (y) f (x) = rf (x + t (y x)) (y x) dt
0
Z 1
= rf (x) (y x) + [rf (x + t (y x)) rf (x)] (y x) dt
0
Z 1
rf (x) (y x) + krf (x + t (y x)) rf (x)k k(y x)k dt
0
Z 1
rf (x) (y x) + t ky xk2 dt
0
Z 1
2
= rf (x) (y x) + ky xk tdt
0
2
= rf (x) (y x) + ky xk
2
where the rst inequality follows from the Cauchy-Schwarz inequality. This proves (37.23).
Assume that f and U are convex. Then, (37.23) implies

0 f (y) f (x) rf (x) (y x) + ky xk2


2
Fix x0 2 U and de ne the auxiliary function ' : U ! R by ' (x) = f (x) rf (x0 ) x. Since
r' (x) = rf (x) rf (x0 ), we have kr' (x) r' (y)k = krf (x) rf (y)k. So, also this
auxiliary function has a -Lipschitz continuous derivative operator. Moreover, r' (x0 ) = 0
and so x0 is a minimizer of '. Along with (37.23), this implies
1 1
' (x0 ) ' x r' (x) ' (x) + r' (x) x r' (x) x
1 2
+ x r' (x) x
2
1 1 2
= ' (x) r' (x) r' (x) + r' (x)
2
1 1 2 1
= ' (x) kr' (x)k + kr' (x)k2 = ' (x) kr' (x)k2
2 2
1138 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

for all x 2 U . Thus,


1
f (x0 ) rf (x0 ) x0 f (x) rf (x0 ) x krf (x) rf (x0 )k2
2
that is,
1
f (x0 ) + rf (x0 ) (x krf (x) x0 ) + rf (x0 )k2 f (x)
2
Since x0 was arbitrarily chosen, we conclude that
1
f (x) + rf (x) (y krf (y) rf (x)k2 f (y)
x) + (37.25)
2
for all x; y 2 U . Since x and y play a symmetric role, by interchanging them we have
1
f (y) + rf (y) (x krf (y) y) + rf (x)k2 f (x) (37.26)
2
By adding up (37.25) and (37.26), we get (37.24).

Proof of Theorem 1694 Set g = f . Clearly, also the function g is -smooth. Since g is
convex, we then have

0 g (y) g (x) rg (x) (y x) + ky xk2


2
Moreover, xn+1 = xn + 1 rf (xn ) = xn 1 rg (x ). Thus:
n
1 1
g (xn+1 ) g (xn ) rg (xn ) rg (xn ) + krg (xn+1 )k2
2
1 1
krg (xn+1 )k2 + krg (xn+1 )k2
2
1
krg (xn+1 )k2
=
2
where the second inequality follows from the Cauchy-Schwarz inequality. Since krf (x)k =
krg (x)k for all x 2 Rn , we thus have
1
krf (xn+1 )k2 f (xn+1 )
f (xn ) +
2
for all n, so the sequence fxn g is relaxing. In particular, we have
1
f^ f (xn+1 ) f^ f (xn ) krf (xn )k2 (37.27)
2
Next we show that
kxn+1 x
^k kxn x
^k 8n 0 (37.28)
Indeed, since g is -smooth and convex we have
2
1 1
kxn+1 ^ k2 =
x xn rg (xn ) x
^ = kxn ^ k2 +
x 2
krg (xn )k2
2
rf (xn ) (xn x
^)
1 21
kxn ^k2 +
x 2
krg (xn )k2 krf (x)k2
1
= kxn ^k2
x 2
krg (xn )k2
37.7. CODA: COMPUTATIONAL ISSUES 1139

where the inequality follows from (37.24) with y = x


^, so that rg (y) = 0.
By concavity, we have f^ f (xn ) + rf (xn ) (^ x xn ), so

f^ f (xn ) rf (xn+1 ) (^
x xn ) krf (xn )k kxn x
^k kx0 x
^k krf (xn )k

where the last inequality follows from (37.28). Then


h i2 h i h i
f^ f (xn ) kx0 ^k2 krf (xn+1 )k2
x 2 kx0 ^ k2
x f^ f (xn ) f^ f (xn+1 )

Set dn = f^ f (xn ) for each n. We can write the last inequality as

d2n 2 (dn dn+1 ) kx0 ^ k2


x

By (37.27), 0 dn+1 dn . Assume dn > 0 for each n, otherwise xn is the maximizer. Then

dn 1 1 1
1 2 (dn dn+1 ) kx0 ^ k2
x =2 kx0 ^ k2
x
dn+1 dn dn+1 dn+1 dn
that is,
1 1 1
dn+1 dn 2 kx0 ^k2
x
By iterating we get
1 1 1
2 +
d1 2 kx0 x
^k d0
1 1 1 1 1 1 2 1
d2 2 + d 2 + 2 +
d0
= 2 +
d0
2 kx0 x
^k 1 2 kx0 x
^k 2 kx0 x
^k 2 kx0 x
^k

1 n 1
2 +
dn 2 kx0 x
^k d0

Since d0 > 0, we then have 1=dn n=2 kx0 ^k2 , so


x

^k2
2 kx0 x
0 < dn
n
This proves (37.22).

Example 1697 Given a matrix A , with n m, consider the least squares optimization
m n
problem (37.18), i.e.,
max g (x) sub x 2 Rn
x

with g : Rn ! R de ned by g (x) = kAx bk2 . Then, rg (x) = AT (Ax b), so for
some > 0 we have

krf (x) rf (y)k = AT (Ax b) + AT (Ay b) = AT A (x y) kx yk

where the last inequality holds because the Gram matrix AT A induces a linear operator
g : Rn ! Rn de ned by g (x) = AT Ax, which is Lipschitz continuous by Theorem 898.
1140 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

We conclude that g is -smooth. Since it is also concave, by the last theorem the map
h : Rn ! Rn de ned by
1 T
h (x) = x A (Ax b)

is an optimal decision procedure for the least squares problem. In particular,

^k2
2 kx0 x
f^ f (xn )
n
for the sequence of iterates

1
xn+1 = xn AT (Axn b)

generated by h. N

37.7.3 Maximizing sequences


So far we considered convergence to maximum values. We now turn to convergence to
solutions. To this end, we introduce the following notion.

De nition 1698 A sequence fxn g C is maximizing for problem (37.1) if lim f (xn ) = f^.

Next we show that under some standard conditions maximizing sequences converge to
solutions.

Theorem 1699 Let f : C ! R be continuous and coercive. If problem (37.1) has a unique
solution x
^ 2 C, then a sequence fxn g C is maximizing for this problem if and only if it
converges to x
^.

Proof We prove the \if" because the converse is trivial. Let x ^ be the unique solution of
problem (37.1). Let fxn g C be maximizing, i.e., lim f (xn ) = f^. We want to show that
xn ! x ^. Suppose, by contradiction, that there exists " > 0 and a subsequence fxnk g such
that kxnk x ^k " for all k (cf. Proposition 2115). Since f is coercive, there is a scalar t < f^
such that (f t) \ C is compact. Since limk!+1 f (xnk ) = f^, eventually all terms of the
subsequence fxnk g belong to the set (f t)\C. By the Bolzano-Weierstrass' Theorem, there
exists a subsubsequence xnks that converges to some x 2 (f t). Since f is continuous,
we have lims!+1 f xnks = f (x ) f^ = lims!+1 f xnks , where the equality follows
from lim f (xn ) = f^. So, f^ = f (x ). In turn, this implies x^ = x . We thus reached the
contradiction:

0<" xnks x
^ xnks x + kx x
^k = xnks x !0

We conclude that xn ! x
^.

The following simple example shows that this result is, in general false.
37.7. CODA: COMPUTATIONAL ISSUES 1141

Example 1700 Consider the discontinuous function f : [0; 1] ! R de ned by


8
< 1 if x = 0
f (x) = x if x 2 (0; 1)
:
0 if x = 1

^ = 0, so f^ = 1. The sequence xn = 1
The unique maximizer of f is x 1=n is maximizing
because
1
f (xn ) = 1 !1
n
Yet, xn ! 1. N

The following consequence of the previous theorem is especially relevant for problem
(37.20).

Proposition 1701 Let f : Rn ! R be strictly concave and supercoercive. A sequence fxn g


is maximizing for problem (37.20) if and only if it converges to the solution x
^.

Proof We prove the \if" because the converse is trivial. Let f : Rn ! R be strictly concave
and supercoercive. The function f is continuous because it is concave (Theorem 833). By
Tonelli's Theorem, problem (37.20) has then a solution, which is unique because f is strictly
concave (Theorem 1032). The result now follows from Theorem 1699.

Example 1702 In the last example, assume that (A) = n . By Theorem 1057, the function
g is strictly concave and supercoercive. So, the iterates
1
xn+1 = xn AT (Axn b) (37.29)

1
^ = AT A
converge to the least squares solution x AT b. The iteration does not require any
matrix inversion. N

Thus, for optimization problems featuring strictly concave and supercoercive objective
functions, the sequence recursively de ned via a decision procedure converges to the solution.
If we make the stronger assumption that the objective function is strongly concave,8 then
we can bound the rate of convergence to solutions of maximizing sequences.

Proposition 1703 If f : Rn ! R is strongly concave, then there exists a constant >0


such that p
kx x ^k f (x) f (^x)
for every x 2 Rn .

Thus, for a the sequence fxn g recursively de ned via a decision procedure with speed of
order k we have p
c
kxn x ^k k
n2
provided the objective function is strongly concave.
8
Recall that strongly concave functions are strictly concave and supercoercive (Section 31.6).
1142 CHAPTER 37. UNCONSTRAINED OPTIMIZATION

Example 1704 In the last example, we have r2 g (x) = AT A, so g is strongly concave if


there exists < 0 such that the matrix AT A I is negative de nite. If this the case, the
iterates (37.29) converge to the least squares solution with rate
p
2 kx0 x ^k
p
n
p
because it is easy to check that we can take = . N

The proof of Proposition 1703 is an easy consequence of the following lemma that sharp-
ens for strongly concave functions a classic inequality that holds for concave functions (cf.
Theorem 1471).

Lemma 1705 Let f : U ! R be a strongly concave and -smooth function de ned on an


open and convex set of Rn . Then there exists a constant k > 0 such that
f (y) f (x) + rf (x) (y x) k kx yk2 (37.30)
for all x; y 2 U .

Proof By de nition, there is k > 0 such that the function g : U ! R de ned by g (x) =
f (x) + k kxk2 is concave. Then, for all x; y 2 U we have
g (y) g (x) + rg (x) (y x)
so that
f (y) + k kyk2 f (x) + k kxk2 + rf (x) (y x) + 2kx (y x) (37.31)
We have
k kxk2 k kyk2 + 2kx (y x) = k kxk2 kyk2 + 2x (y x)

= k kxk2 kyk2 + 2x y x x

= k kxk2 kyk2 + 2x y = k kx yk2

So, by (37.31) we have f (y) f (x) + rf (x) (y x) k kx yk2 , as desired.

Proof of Proposition 1703 Assume that f is strongly concave with constant k > 0. By
(37.30), we have f (x) pf p (^
x) + rf (^
x) (x x ^) k kx x ^k2 = f (^x) k kx x ^k2pfor all
n
x 2 R . So, k^ x xk k f (^ n
x) f (x) for all x 2 R . In turn, by setting = k this
easily implies the desired result.

37.7.4 Final remarks


For the optimization problem (37.19) with the set C closed and convex, the gradient descent
procedure becomes
1
h (x) = PC x + rf (x)

where PC : Rn ! C is the projection operator (Section 31.6). Indeed, the projection ensures
that the next iterate keeps being an element of the choice set C.
37.7. CODA: COMPUTATIONAL ISSUES 1143

Example 1706 (i) Let C = fx 2 Rn : Ax = bg be the a ne set determined by a matrix


A , with m n. Consider an optimization problem
m n

max f (x) sub x 2 C


x

1
If (A) = m, by (31.51) we have PC (x) = x + AT AAT (b Ax) for all x 2 Rn . So

1 1 1 1
h (x) = PC x+ rf (x) =x+ rf (x) + AT AAT b A x+ rf (x)

1 1 1 1
= x+ rf (x) + AT AAT b AT AAT A x+ rf (x)

provided f is di erentiable.
(ii) Let C = Rn+ be the positive orthant. Consider an optimization problem

max f (x) sub x 0


x

+
By (31.51), PC (x) = x+ for all x 2 Rn , so h (x) = x + 1 rf (x) provided f is di eren-
tiable. N

Finally, there exist \accelerated" decision procedure that have speed of order 2, i.e.,
f^ f (xn ) = O 1=n2 . Roughly speaking, they have a bivariate form

1
yn+1 = xn + rf (xn )
xn+1 = n yn+1 + n yn

as readers will learn in more advanced courses.


1144 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
Chapter 38

Equality constraints

38.1 Introduction
The classic necessary condition for local extremal points given by Fermat's Theorem considers
interior points of the choice set C, something that greatly limits its use in nding candidate
solutions of optimization problems coming from economics. Indeed, in many of them the
hypotheses of monotonicity of Proposition 979 hold and, therefore, the possible solutions are
on the boundary of the choice set, not in its interior. A classic example is the consumer
problem
max u (x) sub x 2 B (p; w) (38.1)
x
Under a standard hypothesis of monotonicity, by Walras' law the problem can be rewritten
as
max u (x) sub x 2 (p; w)
x

where the budget line (p; w) = x 2 Rn+ : p x = w @B (p; w) is determined by an


equality constraint { the consumer exhausts his budget in the purchase of the optimal bundle.
The set (p; w) has no interior points, that is,

int (p; w) = ;

Fermat's Theorem is thus useless for nding the candidate solutions of the consumer prob-
lem. The equality constraint, with its drastic topological consequences, deprives us of this
fundamental result in the study of the consumer problem. Fortunately, there is an equally
important result of Lagrange that rescue us, as this chapter will show.

38.2 The problem


The general form of an optimization problem with equality constraints is given by

max f (x) (38.2)


x
sub g1 (x) = b1 ; g2 (x) = b2 ; :::; gm (x) = bm

where f : A Rn ! R is the objective function, while the functions gi : A Rn ! R and


the scalars bi represent m equality constraints. Throughout the chapter we assume that all

1145
1146 CHAPTER 38. EQUALITY CONSTRAINTS

the functions f and gi are continuously di erentiable on a non-empty and open subset D of
their domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (38.3)
is the subset of A identi ed by the constraints. Therefore, optimization problem (38.2) can
be equivalently formulated in canonical form as

max f (x) sub x 2 C


x

Nevertheless, for this special class of optimization problems we will often use the more
evocative formulation (38.2).
In what follows we will rst study in detail the important special case of a single con-
straint, which we will then generalize in Section 38.7 to the case of several constraints.

38.3 One constraint


38.3.1 A key lemma
With a single constraint, the optimization problem (38.2) becomes:

max f (x) sub g (x) = b (38.4)


x

where f : A Rn ! R is the objective function, while the function g : A Rn ! R and the


scalar b de ne the unique equality constraint.

The next fundamental lemma gives the key to nding the solutions of problem (38.4).
The hypothesis x ^ 2 C \ D requires that x^ be a point of the choice set at which f and g are
both continuously di erentiable. Moreover, we require that rg (^ x) 6= 0. In this regard, note
that a point x 2 D is said to be regular (with respect to the constraints) if rg (x) = 0, and
singular otherwise. According to this terminology, the condition rg (^ x) 6= 0 requires point
x
^ to be regular.

Lemma 1707 Let x ^ 2 C \ D be a local solution of the optimization problem (38.4). If


x) 6= 0, then there exists a scalar ^ 2 R such that
rg (^

x) = ^ rg (^
rf (^ x) (38.5)

By unzipping gradients, the condition can be equivalently written as

@f @g
x) = ^
(^ (^
x) 8k = 1; :::; n
@xk @xk

Thus, a necessary condition for x ^ to be a local solution of the optimization problem (38.4)
is that the gradients of the functions f and g are proportional. The \hat" above reminds
us that this scalar depends on the point x^ considered.

Next we give a proof of this remarkable fact based on the Implicit Function Theorem.
38.3. ONE CONSTRAINT 1147

Proof We prove the lemma for n = 2 (the extension to arbitrary n is routine if one uses a
version of the Implicit Function Theorem for functions of n variables). Since rg (^ x) 6= 0, at
least one of the two partial derivatives @g=@x1 or @g=@x2 is non-zero at x ^. Let for example
@g=@x2 (^x) 6= 0 (in the case @g=@x1 (^ x) 6= 0 the proof is symmetric). As seen in Section
34.3.2, the Implicit Function Theorem can be applied also to study locally points belonging
to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), this theorem yields
x1 ; x
neighborhoods U (^ x1 ) and V (^x2 ) and a unique di erentiable function h : U (^ x1 ) ! V (^x2 )
such that x^2 = h (^
x1 ) and g (x1 ; h(x1 )) = b for each x1 2 U (^ x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )

Consider the auxiliary function : U (^


x1 ) ! R de ned by (x1 ) = f (x1 ; h(x1 )). By the
chain rule, the derivative of is

0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2

Since x
^ is a local solution of the optimization problem (38.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (38.6)
Without loss of generality, suppose that " is su ciently small so that

(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )

Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (38.6) as

f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")

that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, a local
maximizer for . The rst-order condition reads
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (38.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2

If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )

the common value of which we denote by ^ . Then we get


8
< @f (^ ^2 ) = ^ @x
@x1 x1 ; x
@g
1
(^
x1 ; x
^2 )
: @f
= ^ @x
@g
@x2 (^
x1 ; x
^2 ) 2
(^
x1 ; x
^2 )

or, equivalently, rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (38.5).
1148 CHAPTER 38. EQUALITY CONSTRAINTS

If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, then (38.7) yields

@f
(^
x1 ; x
^2 ) = 0
@x1

so that the equality


@f @g
(^ ^2 ) = ^
x1 ; x (^
x1 ; x
^2 )
@x1 @x1
is trivially veri ed for every scalar ^ . Setting
@f
@x2 (^
x1 ; x
^2 )
@g
=^
@x2 (^
x1 ; x
^2 )

we therefore have again rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (38.5).

The next example shows that condition (38.5) is necessary, but not su cient.

Example 1708 The optimization problem:

x31 + x32
max sub x1 x2 = 0 (38.8)
x1 ;x2 2

is of the form (38.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), so ^ = 0 is such
that rf (0; 0) = ^ rg (0; 0). Hence, the origin (0; 0) satis es condition (38.5) with ^ = 0.
But, the origin is not a solution of problem (38.8):

f (t; t) = t3 > 0 = f (0; 0) 8t > 0 (38.9)

Note that the origin is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for
every t < 0. N

To understand intuitively condition (38.5), assume that f and g are de ned on R2 , so


that (38.5) has the form:

@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2

that is,
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (38.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 means that at least one of the partial derivatives (@g=@xi ) (^
x) is
di erent from zero. If, for convenience, we suppose that both are non-zero and that ^ 6= 0,
then (38.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(38.11)
@x1 (^
x) @x2 (^
x)
38.3. ONE CONSTRAINT 1149

Let us try now to understand intuitively why (38.11) is necessary for x


^ to be a solution of
the optimization problem (38.4). The di erentials of f and g at x
^ are given by

@f @f
df (^
x) (h) = rf (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
@g @g
dg (^
x) (h) = rg (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
They linearly approximate the di erences f (^ x + h) f (^x) and g (^x + h) g (^x), that is, the
e ect of moving from x ^ to x
^ +h on f and g. As we know well by now, such an approximation is
the better the smaller h. Suppose, ideally, that h is in nitesimal and that the approximation
is exact, so that f (^ x + h) f (^ x) = df (^x) (h) and g (^
x + h) g (^ x) = dg (^
x) (h). This is
clearly incorrect formally, but here we are proceeding heuristically.
Continuing in our heuristic reasoning, let us start now from the point x ^ and let us
consider variations x ^ + h with h in nitesimal. The rst issue to worry about is whether they
are legitimate, i.e., whether they satisfy the equality constraint g (^ x + h) = b. This means
that g (^
x + h) = g (^x), so h must be such that dg (^ x) (h) = 0. It follows that

@g @g
(^
x) h1 + (^
x) h2 = 0
@x1 @x2
and so
@g
@x2 (^
x)
h1 = @g
h2 (38.12)
@x1 (^
x)
The e ect of moving from x ^ to x
^ + h on the objective function f is given by df (^
x) (h). When
h is legitimate, by (38.12) this e ect is given by
@g
!
@f @x2 (^
x) @f
df (^
x) (h) = (^
x) @g
h2 + (^
x) h2 (38.13)
@x1 (^
x) @x 2
@x1

If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for
every legitimate variation h. Otherwise, if, say df (^x) (h) > 0, one would have a point x ^+h
that satis es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if
instead df (^
x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation, and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (38.13) gives
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x 2
@x1

If, as it is natural, we assume h2 6= 0, then


@g
!
@f @x2 (^
x) @f
(^
x) @g
+ (^
x) = 0
@x1 (^
x) @x2
@x1

which is precisely expression (38.11). At an intuitive level, all this explains why (38.5) is
necessary for x
^ to be solution of the problem.
1150 CHAPTER 38. EQUALITY CONSTRAINTS

38.3.2 Lagrange's Theorem


Lemma 1707 gives a rather intuitive necessary condition for optimality. This condition can
be equivalently written as
x) ^ rg (^
rf (^ x) = 0
By recalling the algebra of gradients, the expression rf (x) rg (x) makes it natural to
introduce the function L : A R Rn R ! R de ned by

L (x; ) = f (x) + (b g (x)) 8 (x; ) 2 A R (38.14)

This function, called the Lagrangian, plays a key role in optimization problems. Its gradient
is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in this gradient the two parts rx L and r L given by

@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using this notation, we have

rx L (x; ) = rf (x) rg (x) (38.15)

and
r L (x; ) = b g (x) (38.16)
which leads to the following fundamental formulation of the necessary condition of optimality
of Lemma 1707 in terms of the Lagrangian function.

Theorem 1709 (Lagrange) Let x ^ 2 C \ D be a local solution of the optimization problem


(38.4). If rg (^ x) 6= 0, then there exists a scalar ^ 2 R, called Lagrange multiplier, such that
x; ^ ) 2 Rn+1 is a stationary point of the Lagrangian function.
the pair (^

Proof Let x^ be solution of the optimization problem (38.4). By Lemma 1707, there exists
^ 2 R such that
rf (^x) ^ rg (^x) = 0
By (38.15), the condition is equivalent to

x; ^ ) = 0
rx L(^

x; ^ ) = 0
On the other hand, by (38.15) we have r L (x; ) = b g (x), so we have also r L(^
since b g (^
x) = 0. It follows that (^ ^
x; ) is a stationary point of L.

Thanks to Lagrange's Theorem, the search for local solutions of the constrained opti-
mization problem (38.4) reduces to the search for the stationary points of a suitable function
of several variables, the Lagrangian function. It is a more complicated function than the
38.3. ONE CONSTRAINT 1151

original function f because of the new variable , but through it the search for the solutions
of the optimization problem can be done by solving a standard rst-order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to be able to use the simple ( rst-order) condition
rL (x; ) = 0 (38.17)
to search for the possible candidate solutions of the constrained optimization problem (38.4).
In the next section we will see that this condition plays a fundamental role in the search for
the local solutions of problem (38.4) with the Lagrange's method, which in turn may lead to
the global solutions through a version of the elimination method.

x; ^ ) is not
We close with two important remarks. First, observe that in general the pair (^
a maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing
more. Therefore, it is erroneous to assert that the search for solutions of the constrained
optimization problem reduces to the search for maximizers of the Lagrangian function.
Second, note that problem (38.4) has a symmetric version
min f (x) sub g (x) = b
x

in which, instead of looking for maximizers, we look for minimizers. Condition (38.5) is
necessary also for this version of problem (38.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. However, it can be the
case that they are neither maximizers nor minimizers. This is the usual ambiguity of rst-
order conditions, encountered also in unconstrained optimization: it re ects the fact that
rst-order conditions are only necessary conditions.

38.3.3 A heuristic interpretation of the multiplier


Lagrange multipliers have a nice interpretation in terms of marginal e ects. To present
it properly we would need some notions that we will only introduce later in the book in
Chapter 41. We refer, therefore, readers to that chapter for a more complete exposition
of the marginal interpretation of multiplies (Section 41.7). Here we can sketch, however, a
heuristic argument that gives a avor of this interpretation.
If we change the scalar b that de nes the equality constraint, we have a new optimization
problem (38.4), with new solutions x ^. Suppose, for simplicity, that for each possible value of
the scalar b, the resulting optimization problem has a unique solution, denoted x ^ (b), with
multiplier ^ (b). We have, therefore, informally de ned two functions x ^ ( ) and ^ ( ) that
associate to each value of b the solution x ^ (b) and the multiplier ^ (b) of the corresponding
optimization problem. The \optimal" objective function is then f (^ x (b)). As b varies, so
varies the maximum value that the objective function attains at the unique solution.
To ease matters, assume that n = 1 so that the choice variable x is a scalar. The equality
constraint implies that g (^
x (b)) b = 0 for every scalar b. By a heuristic application of the
chain rule, we then have
@g (^
x (b)) dx (b)
1=0 (38.18)
@x db
1152 CHAPTER 38. EQUALITY CONSTRAINTS

On the other hand, again by a heuristic application of the chain rule we have

df (^
x (b)) x (b)) dx (b)
df (^
=
db dx db
@f (^x (b)) ^ @g (^
x (b)) ^ @g (^x (b)) dx (b)
= (b) + (b)
@x @x @x db
@f (^x (b)) ^ @g (^
x (b)) 0 x (b)) dx (b)
@g (^
= (b) x^ (b) + ^ (b)
@x @x @x db
| {z }
=0 by (38.5)
x (b)) dx (b) ^
@g (^
= ^ (b) = (b)
@x db
where the last equality follows from (38.18). Summing up, for every scalar b we have

df (^
x (b)) ^
= (b)
db
The multiplier is thus the \marginal maximum value" in that it quanti es the marginal
e ect on the attained maximum value of (slightly) altering the constraint. For instance, in
the consumer problem the scalar b is the income of the consumer, so the multiplier quanti es
the marginal e ect on the attained maximum utility of a (small) variation in income.

N.B. We are using the word \altering" rather than \relaxing" because by changing b the
choice set (38.3) does not get larger. It just becomes di erent. So, a priori, a change in b
might not be bene cial (indeed, the sign of the multiplier can be positive or negative). In
contrast, the word \relaxing" becomes appropriate in studying variations of the scalars that
de ne inequality constraints (cf. the discussion in Section 41.7). O

38.4 The method of elimination


Lagrange's Theorem suggests the following procedure, which we may call Lagrange's method,
for the search of local solutions of the optimization problem (38.4):

1. determine the set D where the functions f and g are continuously di erentiable;

2. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di erentiable;

3. setting D0 = fx 2 D : rg (x) = 0g, determine the set C \ D0 of the singular points


that satisfy the constraint;

4. determine the set S of the regular points x 2 C \ (D D0 ) for which there exists a
Lagrange multiplier 2 R such that the pair (x; ) 2 Rn+1 is a stationary point of the
Lagrangian function, that is, it satis es the rst-order condition (38.17);1
1
Note that S C because the points that satisfy condition (38.17) also satisfy the constraints. It is
therefore not necessary to check if for a point x 2 S we have also x 2 C.
38.4. THE METHOD OF ELIMINATION 1153

5. the local solutions of the optimization problem (38.4), if they exist, belong to the set

S [ (C \ D0 ) [ (C D) (38.19)

Thus, according to Lagrange's method, the possible local solutions of the optimization
problem (38.4) must be searched among the points of the subset (38.19) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange's Theorem.
However, this theorem does not say anything about possible local solutions that are singular
points { and so belong to the set C \ D0 { as well as about possible local solutions where
the functions do not have a continuous derivative { and so belong to the set C D.
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (38.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange's Theorem, establishes. Clearly, the
smaller such a set is, the more e ective the application of the theorem is: the search for local
solutions can be then restricted to a signi cantly smaller set than the original set C.

That said, what about global solutions? If the objective function f is coercive and
continuous on C, the ve phases of the Lagrange's method plus the following extra sixth
phase provide a version of the elimination method to nd global solutions.

6. Compute the set ff (x) : x 2 S [ (C \ D0 ) [ (C D)g; if a point x


^ 2 S [ (C \ D0 ) [
(C D) is such that

f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (38.20)

then x
^ is a (global) solution of the optimization problem (38.4).

In other words, the points of the set (38.19) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange's method this is the set of the
possible local solutions; global solutions, whose existence is ensured by Tonelli's Theorem,
must then belong to such a set. Hence, the solutions of the \restricted" optimization problem

max f (x) sub x 2 S [ (C \ D0 ) [ (C D) (38.21)


x

are also the solutions of the optimization problem (38.4). Phase 6 is based on this remarkable
fact. As for the Lagrange's method, the smaller the set (38.19) is, the more e ective the
application of the elimination method is. In particular, in the lucky case when it is a single-
ton, the elimination method determines the unique solution of the optimization problem, a
remarkable achievement.

In sum, the elimination method is an elegant combination of a global existence result,


Tonelli's Theorem, and a local di erential result, Lagrange's Theorem. In the rest of the
section we illustrate the procedure with some analytical examples. In the next section we
will consider the classic consumer problem.

Example 1710 The optimization problem:


n
X
kxk2
max e sub xi = 1 (38.22)
x
i=1
1154 CHAPTER 38. EQUALITY CONSTRAINTS

2 P
is of the form (38.4), where f; g : Rn ! R are given by f (x) = e kxk and g (x) = ni=1 xi ,
and b = 1. The functions are both continuously di erentiable on the entire plane, so D = R2 .
We then trivially have C D = ;: at all the points of the constraint set, the functions f
and g are both are continuously di erentiable. We have therefore completed phases 1 and 2
of Lagrange's method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange's method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1

To nd the set of its stationary points, it is necessary to solve the rst-order condition (38.17)
given here by the following (nonlinear) system of n + 1 equations:
( @L kxk2
@xi = 2xi e = 0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0

We observe that for no solution we can have = 0. Indeed, otherwise the rst n equations
would imply xi = 0, which contradicts the last equation. It follows that for every solution
we have 6= 0. The rst n equations yield
2
xi = ekxk
2
and, upon substituting these values in the last equation, we get
2
1 + n ekxk = 0
2
that is
2 kxk2
= e
n
Substituting this value of in any of the rst n equations we nd xi = 1=n, so the only
point (x; ) 2 Rn+1 that satis es the rst-order condition (38.17) is

1 1 1 2 1
; ; :::; ; e n
n n n n

That is, S is the singleton


1 1 1
S= ; ; :::;
n n n
This completes phase 4 of Lagrange's method. Since C D = ; and D0 = ;, we have

S [ (C \ D0 ) [ (C D) = S (38.23)

Thus, in this example the rst-order condition (38.17) turns out to be necessary for any local
solution of the optimization problem (38.22). The unique element of S is, therefore, the only
candidate to be a local solution of the problem. This completes Lagrange's method.
38.4. THE METHOD OF ELIMINATION 1155

Turn now to the elimination method, which we can use since the continuous function f
is coercive on the (non compact, being closed but unbounded) set
( n
)
X
C = x = (x1 ; :::; xn ) 2 Rn : xi = 1
i=1

Indeed: 8 n
< R
> if t 0
p
(f t) = x 2 Rn : kxk lg t if t 2 (0; 1]
>
:
; if t > 1
so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (38.23) is
a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (38.22). N

Example 1711 Given p = (p1 ; :::; pn ) 2 Rn++ [ Rn ,2 the optimization problem:


n
X n
X
max pi log xi sub xi = 1 (38.24)
x1 ;:::;xn
i=1 i=1
n
Pn
Pnof the form (38.4), with f; g : R++ ! R given by f (x) =
is i=1 pi log xi and g (x) =
i=1 xi , and b = 1. The functions f and g are continuously di erentiable at all points of the
constraint set, i.e., C D = ;, and there are no singular points, i.e., D0 = ;. This completes
the rst three phases of Lagrange's method.
The Lagrangian function L : Rn++ R ! R is given by
n n
!
X X
L (x; ) = pi log xi + 1 xi
i=1 i=1

To nd the set of its stationary points we need to solve the rst-order condition (38.17),
given here by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0

Because the coordinates of the vector p are all di erent from zero, one cannot have = 0
for any solution. PIt followsPthat for each solution 6= 0. Because x 2 Rn++ , the rst n
equations
P imply pi = xi , and by substituting these values in the last equation we
nd ni=1 pi =P . Then, by substituting this value of in each of the rst n equations we
nd xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 that satis es the rst- order
condition (38.17) is ( )
n
X
p1 p2 pn
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1
so that S is the singleton
p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
2
That is, all coordinates of p are either strictly positive or strictly negative.
1156 CHAPTER 38. EQUALITY CONSTRAINTS

This completes the phase 4 of Lagrange's method. Since C D = ; and D0 = ;, we have

S [ (C \ D0 ) [ (C D) = S (38.25)

Thus, also in this example the rst-order condition (38.17) is necessary for each local solution
of the optimization problem (38.24). Again, the unique element of S is the only candidate to
be a local solution of the optimization problem (38.22). This completes Lagrange's method.
We can apply the elimination method because P the continuous function f is, by Lemma
1050, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is not compact because
it is not closed. In view of (38.25), the elimination method implies that
p1 pn
( Pn ; :::; Pn )
i=1 pi i=1 pi

is the unique solution of the optimization problem (38.24). N

When the elimination method is based on Weierstrass' Theorem, rather than on the
weaker (but more widely applicable) Tonelli's Theorem, as a \by-product" we can also nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points that minimize f over S [(C \ D0 )[(C D).
Clearly, this is no longer true with Tonelli's Theorem because it only ensures the existence
of maximizers and remains silent on possible minimizers.

Example 1712 The optimization problem:

max 2x21 5x22 sub x21 + x22 = 1 (38.26)


x1 ;x2

is of the form (38.4), with f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. Both f and g are continuously di erentiable on the entire plane, so
D = R2 . Hence, C D = ;: at all the points of the constraint set the functions f and g are
continuously di erentiable. This completes phases 1 and 2 Lagrange's method.
We have rg (x) = (2x1 ; 2x2 ), so the origin (0; 0) is the unique singular point, that is,
D0 = f(0; 0)g. This singular point does not satisfy the constraint, so C \ D0 = ;. This
completes phase 3 of Lagrange's method.
The Lagrangian function L : R3 ! R is given by

L (x1 ; x2 ; ) = 2x21 5x22 + 1 x21 x22

To nd the set of its stationary points we must solve the rst-order condition (38.17):
8 @L
>
> =0
< @x1
@L
> @x2 = 0
>
: @L
@ =0

that is, the following (nonlinear) system of three equations


8
>
> 4x1 2 x1 = 0
<
10x2 2 x2 = 0
>
>
:
1 x21 x22 = 0
38.4. THE METHOD OF ELIMINATION 1157

in the three unknowns x1 , x2 , and . We verify immediately that x1 = x2 = 0 satisfy the


rst two equations for every value of ; but they do not satisfy the third equation. Further,
x1 = 0 and = 5 imply x2 = 1, while x2 = 0 and = 2 imply x1 = 1. In conclusion,
the triples (x1 ; x2 ; ) that satisfy the rst-order condition (38.17) are

f(0; 1; 5) ; (0; 1; 5) ; (1; 0; 2) ; ( 1; 0; 2)g

so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g

This completes phase 4 of Lagrange's method.3 Since C D = ; and C \ D0 = ;, we


conclude that
S = S [ (C \ D0 ) [ (C D) (38.27)

As in the last two examples, the rst-order condition is necessary for any local solution of
the optimization problem (38.26).
By having completed Lagrange's method, let us turn to elimination method to nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass' Theorem. In view of (38.27),
in phase 6 we have:

f (0; 1) = f (0; 1) = 5 > f (1; 0) = f ( 1; 0) = 2

The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(38.26), while the reliance here of the elimination method on Weierstrass' Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N

The next example illustrates the importance of singular points.

Example 1713 The optimization problem:


x1
max e sub x31 x22 = 0 (38.28)
x1 ;x2

is of the form (38.4), with f; g : R2 ! R given by f (x) = e x1 and g (x) = x31 x22 , and
b = 0. We have D = R2 , hence C D = ;. Phases 1 and 2 of Lagrange's method have been
completed.
Moreover, we have
rg (x) = 3x21 ; 2x2

so the origin (0; 0) is the unique singular point and it also satis es the constraint, i.e.,
D0 = C \ D0 = f(0; 0)g. This completes phase 3 of Lagrange's method.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
3
Note that there are no other points that satisfy rL = 0: Indeed, suppose that rL(^ ^2 ; ^ ) = 0, with
x1 ; x
x
^1 6= 0 and x
^2 6= 0. Then, from @L=@x1 = 0 we deduce = 2, whereas from @L=@x2 = 0 we deduce = 5.
1158 CHAPTER 38. EQUALITY CONSTRAINTS

To nd the set of its stationary points, we need to solve the rst-order condition (38.17),
given here by the following (nonlinear) system of three equations:
8 @L x1
> @x1 = e
> 3 x21 = 0
<
@L
> @x2 = 2 x2 = 0
>
: @L 2 x31 = 0
@ = x2

Note that for no solution we can have = 0. Indeed, for = 0 the rst equation becomes
e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The second equation
implies x2 = 0, hence from the third one it follows that x1 = 0. The rst equation becomes
1 = 0, and the contradiction shows that the system does not have solutions. Therefore,
there are no points that satisfy the rst-order condition (38.17), so S = ;. Phase 4 of
Lagrange's method shows that
S [ (C \ D0 ) [ (C D) = C \ D0 = f(0; 0)g (38.29)
By Lagrange's method, the unique possible local solution of the optimization problem (38.28)
is the origin (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed but unbounded) set C = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8 2
>
> R if t 0
<
(f t) = ( 1; lg t] R if t 2 (0; 1]
>
>
:
; if t > 1
Thus, f is not coercive on the entire plane but it is coercive on C, which is all that matters
here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and
(f t) \ C (( 1; lg t] R) \ (R+ R) = [0; lg t] R 8t 2 (0; 1]
p p
If x1 2 [0; lg t], the constraint implies x22 2 0; lg3 t , i.e., x22 2 [ lg3 t; lg3 t]. It
follows that
q q
3
(f t) \ C [0; lg t] lg t; lg3 t 8t 2 (0; 1]

and so (f t) \ C is compact because it is a closed subset of a compact set. We conclude


that f is both continuous and coercive on C. We can thus use the elimination method.
In view of (38.29), it implies that the origin, a singular point, is the only solution of the
optimization problem (38.28). N

38.5 The consumer problem


Assume that a consumer problem satis es Walras' law, so that we can write it as@4 @
max u (x) sub x 2 (p; w) \ A
x
4
Sezione da rivedere per il cambio de nizione di budget set.
38.5. THE CONSUMER PROBLEM 1159

where (p; w) = x 2 Rn+ : p x = w , with strictly positive prices p 0. To best solve this
problem with the di erential methods of this chapter, assume also that the utility function
u : A Rn+ ! R is continuously di erentiable on int A.5
For instance, consumer problems that satisfy such assumptions
Pare the those featuring
n
a log-linear utility function u : Rn++ ! R de ned by u (x) = i=1 i log x
a Pi , with A =
int A = R++ , or a separable utility function u : R+ ! R de ned by u (x) = ni=1 xi , with
n n

int A = Rn++ A = Rn+ (cf. Proposition 996).

Let us rst nd the local solutions of the consumer problem through Lagrange's method.
The function g (x) = p x expresses the constraint, so

D = Rn+ \ int A and (p; w) D = @A \ (p; w)

Hence, the set (p; w) D consists of the boundary points of A that satisfy the constraint.6
Note that when A = int A, as in the log-linear case, we have (p; w) D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;. Hence,

(p; w) \ D0 = ;

All this completes phases 1 to 4 of Lagrange's method.


The Lagrangian function L : A R ! R is given by

L (x; ) = u (x) + (w p x)

so to nd the set of its stationary points, it is necessary to solve the rst-order condition:
8 @u(x)
> @L
>
> @x1 (x; ) = @x1 p1 = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = @u(x)
pn = 0
>
> @xn @xn
>
>
: @L
@ (x; ) = w p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (38.30)
@xi
p x=w (38.31)

The fundamental condition (38.30) is read in a di erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
In the cardinalist interpretation, the condition is recast in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
5
Note that A Rn + implies int A Rn
++ , i.e., the interior points of A have strictly positive coordinates.
6
Here the choice set, (p; I), is by de nition included in the domain A, so @A\A\ (p; I) = @A\ (p; I).
1160 CHAPTER 38. EQUALITY CONSTRAINTS

which emphasizes that, at a bundle x which is a (local) solution of the consumer problem,
the marginal utilities of the income spent for the various goods, measured by the ratios

@u(x)
@xi
pi

are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In the ordinalist interpretation, where the notion of marginal utility becomes meaningless,
condition (38.30) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj

for every pair of goods i and j of the solution bundle x. At such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classic geometric
interpretation of the optimality condition for a bundle (x1 ; x2 ) as equality between the slope
of the indi erence curve (in the sense of Section 34.3.2) and the slope of the straight line of
the budget constraint.

2
x
2

1.5

0.5

-0.5

O x
1
-1
-1 0 1 2 3 4 5 6 7

The ordinalist interpretation does not require the cardinalist notion of marginal utility, a
notion that { by Occam's razor { becomes thus super uous for the study of the consumer
problem. The observation dates back to a classic 1900 work of Vilfredo Pareto and repre-
sented a turning point in the history of utility theory, so much that we talk of an \ordinalist
revolution".

In any case, relations (38.30) and (38.31) are rst-order conditions for the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
Lagrange's method implies that the local solutions of the consumer problem must be looked
for among the points of the set
S [ (@A \ (p; w)) (38.32)
38.5. THE CONSUMER PROBLEM 1161

Besides points that satisfy the rst-order conditions (38.30) and (38.31), local solutions can
therefore be boundary points @A \ (p; w) of the set A that satisfy the constraint.7

When u is coercive and continuous on (p; w), we can apply the elimination method
to nd the (global) solutions of the consumer problem, that is, the optimal bundles (which
are the economically meaningful notions, consumers do not care about bundles that are just
locally optimal). In view of (38.32), the solutions are the bundles x
^ 2 S [ (@A \ (p; w))
such that
u (^
x) u (x) 8x 2 S [ (@A \ (p; w))
In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in @A \ (p; w). As the comparison
requires the computation of all these utility levels, the smaller the set S [ (@A \ (p; w))
the more e ective the elimination method.
Example 1714 Consider the log-linear utility function in the case n = 2, i.e.,
u (x1 ; x2 ) = a log x1 + (1 a) log x2
with a 2 (0; 1). The rst-order condition at each (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (38.33)
x1 x2
p1 x1 + p2 x2 = w (38.34)
Relation (38.33) implies
a 1 a
=
p1 x1 p2 x2
Substituting this in (38.34), we have
1 a
p1 x1 + p1 x1 = w
a
and hence
w w
x1 = a ; x2 = (1 a)
p1 p2
In conclusion,
w w
S= ; (1 a)a (38.35)
p1 p2
Since @A = ;, we have @A \ (p; w) = ;. By Lagrange's method, the unique possible local
solution of the consumer problem is the bundle
w w
x= a ; (1 a) (38.36)
p1 p2
We turn now to the elimination method that we can use because the continuous function u
is, by Lemma 1050, coercive on the set (p; w) \ A = x 2 R2++ : p1 x1 + p2 x2 = w , which
is not compact since it is not closed. In view of (38.35), the elimination method implies
that the bundle (38.36) is the unique solution of the log-linear consumer problem, that is,
the unique optimal bundle. Note that this nding con rms what we already proved and
discussed in Section 22.8, in a more general and elegant way through Jensen's inequality. N
7
When A = Rn + , they lie on the axes and are called corner solutions in the economics jargon (as remarked
earlier in the book). In the case n = 2 and A = R2+ , corner solutions can be (0; I=p2 ) and (I=p1 ; 0).
1162 CHAPTER 38. EQUALITY CONSTRAINTS

38.6 Cogito ergo solvo


The previous section shows the power of the elimination method: the Lagrange's method
allowed us to nd the unique candidate in R2++ to be a local solution of the consumer problem,
but it could not tell anything neither about its nature (whether a maximizer, a minimizer
or something else) nor about its uniqueness, a fundamental feature for an optimal bundle in
that it permits comparative statics exercises. The elimination method answers all these key
questions by showing that the unique local candidate is, indeed, the unique solution.
That said, the last example also shows the limitations of di erential methods. Indeed,
as we remarked at the end of the example, in Section 22.8 we reached a more general
result without using such methods via Jensen's inequality. The next example will show that
di erential methods can actually turn out to be silly. They are not a deus ex machina that
one should always try automatically, without rst thinking about the speci c optimization
problem at hand, with its peculiar features that may make it possible to address it with a
direct argument.
Example 1715 Consider the separable utility function u : R2+ ! R given by u (x) = x1 +x2 .
Suppose p1 6= p2 (as it is usually the case). First, observe that C D = f(0; w=p2 ) ; (w=p1 ; 0)g.
The rst-order condition at each point (x1 ; x2 ) 2 R2++ becomes
1 = p1 , 1 = p2
p1 x1 + p2 x2 = w
which has no solutions since p1 6= p2 . Hence, S = ; and so
w w
S [ (C D) = C D= 0; ; ;0
p2 p1
The unique possible local solutions of the consumer problem are, therefore, the boundary
bundles f(0; w=p2 ) ; (w=p1 ; 0)g. Since u is continuous on the compact set (p; w) = fx 2
R2+ : p1 x1 + p2 x2 = wg, we can apply the elimination method through Weierstrass' Theorem
and conclude that (0; w=p2 ) is the optimal bundle when p2 < p1 and (w=p1 ; 0) is the optimal
bundle when p2 > p1 .
The same result can be achieved, however, in a straightforward manner without any
di erential machinery. Indeed, if we substitute the constraint in the objective function, the
optimal x1 (and so the optimal x2 via the budget constraint) can be found by solving the
elementary optimization problem
w
max (p2 p1 ) x1 sub x1 2 0;
x1 p1
It is immediate to check that there are two boundary solutions x ^1 = 0 and x ^1 = w=p1 if,
respectively, p1 > p2 and p1 < p2 . This shows how silly can be a mechanical use of di erential
arguments. N

38.7 Several constraints


Consider now the general optimization problem (38.2) in which there may be multiple equal-
ity constraints. In this section we will show that Lemma 1707 and Lagrange's Theorem can
be easily generalized to such case.
38.7. SEVERAL CONSTRAINTS 1163

Let us write problem (38.2) as

max f (x) sub g (x) = b (38.37)


x

where g = (g1 ; :::; gm ) : A Rn ! Rm and b = (b1 ; :::; bm ) 2 Rm . All functions f and gi are
assumed to be continuously di erentiable on a non-empty open subset D A. Thus, at all
points x 2 D we can de ne the Jacobian matrix Dg (x) by
2 3
rg1 (x)
6 rg (x) 7
Dg (x) = 6
4
2 7
5
rgm (x)

A point x 2 D is called regular (with respect to the constraints) if Dg (x) has full rank,
otherwise is called singular. For instance, the Jacobian Dg (^
x) has full rank if the gradients
rg1 (^
x),...,rgm (^ n
x) are linearly independent vectors of R . In such a case, the full rank
condition requires m n, that is, that the number m of constraints be smaller than the
dimension n of the space.

Two observations about regularity: (i) when m = n, the Jacobian has full rank if and
only if it is a non-singular square matrix, that is, det Dg (x) 6= 0;8 (ii) when m = 1, we have
Dg (x) = rg (x) and so the full rank condition amounts to require rg (x) 6= 0, which brings
us back to the notions of regular and singular points seen in the case m = 1 of a single
constraint.

The following result extends Lemma 1707 to the case with multiple constraints and shows
that the regularity condition rg (^x) 6= 0 from such lemma can be generalized by requiring
the Jacobian Dg (^x) to have full rank. In other words, x
^ must not be a singular point here
either.9

Lemma 1716 Let x ^ 2 C \ D be the local solution of the optimization problem (38.37). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (38.38)
i=1

The Lagrangian is now the function L : A R Rn Rm ! R de ned by:


m
X
L (x; ) = f (x) + i (bi gi (x)) = f (x) + (b g (x)) (38.39)
i=1

for every (x; ) 2 A Rm , and Lagrange's Theorem takes the following general form.
8
So, in this case a point x is singular if its Jacobian matrix Dg (x) is a singular matrix. The notion of
singular point is thus consistent with the notion of singular matrix (Section 15.6.6).
9
We omit the proof, which generalizes that of Lemma 1707 by means of a suitable version of the Implicit
Function Theorem. We then also omit the simple proof of Theorem 1717, which is similar to that of the
special case of a single constraint.
1164 CHAPTER 38. EQUALITY CONSTRAINTS

Theorem 1717 (Lagrange) Let x ^ 2 C \ D be a solution of the optimization problem


(38.37). If Dg (^x) has full rank, there is a vector ^ 2 Rm such that the pair (^
x; ^ ) 2 Rn+m
is a stationary point for the Lagrangian.

The components ^ i of vector ^ 2 Rm are called Lagrange multipliers. Vector ^ is unique


x)gm
whenever the vectors frgi (^ i=1Pare linearly independent because, in such a case, there is
x) = m
a unique representation rf (^ ^
i=1 i rgi (^x).

The comments that we made for Lagrange's Theorem also hold in this more general case.
In particular, the search for local candidate solutions for the constrained problem must still
be conducted following Lagrange's method, while the elimination method can be still used
to check whether such local candidates actually solve the optimum problem. The examples
will momentarily illustrate all this.
From an operational standpoint note that, however, the rst-order condition

rL (x; ) = 0

is now based on a Lagrangian L that has the more complex form (38.39). Also the form of
the set of singular points D0 is more complex: the study of the Jacobian's determinant may
be complex, thus making the search for singular points quite hard. The best thing is often
to directly look for the singular points which satisfy the constraints { i.e., for the set C \ D0
{ instead of trying to determine the set D0 rst and the intersection C \ D0 afterwards (as
we did in the case with one constraint). The points x 2 C \ D0 are such that gi (x) = bi and
the gradients rgi (x) are linearly dependent. So, we must verify whether the system
8 Pm
>
> i=1 i rgi (x) = 0
>
>
>
> g1 (x) = b1
>
<
>
>
>
>
>
>
>
:
gm (x) = bm

admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is, with i that are not
all null. Such possible solutions identify the singular points that satisfy the constraints. To
ease calculations, it is useful to note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(38.40)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
38.7. SEVERAL CONSTRAINTS 1165

Example 1718 The optimization problem:

max 7x1 3x3 sub x21 + x22 = 1 and x1 + x2 x3 = 1 (38.41)


x1 ;x2 ;x3

has the form (38.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by f (x1 ; x2 ) =
7x1 3x3 ; g1 (x1 ; x2 ; x3 ) = x21 + x22 and g2 (x1 ; x2 ; x3 ) = x1 + x2 x3 , while b = (1; 1) 2 R2 .
These functions are all continuously di erentiable on R3 , so D = R3 . Hence, C D = ;:
at all points of the constraint set, the functions f , g1 and g2 are all continuously di erentiable.
This completes phases 1 and 2 of Lagrange's method.
Let us nd the singular points satisfying the constraint, that is, the set C \ D0 . The
system (38.40) becomes 8
>
> 2 1 x1 + 2 = 0
>
>
>
< 2 1 x2 + 2 = 0
>
2 =0
>
>
>
> x1 + x22 = 1
2
>
>
:
x1 + x2 x3 = 1
Since 2 = 0, 1 is di erent from 0. This implies that x1 = x2 = 0, thus contradicting the
fourth equation. Therefore, there are no singular points satisfying the constraint, that is,
C \ D0 = ;. Phase 3 of Lagrange's method is thus completed.
The Lagrangian L : R5 ! R is

L (x1 ; x2 ; x3 ; 1; 2) = 7x1 3x3 + 1 1 x21 x22 + 2 (1 x1 x2 + x3 )

To nd the set of its critical points we must solve the rst-order condition (38.17), which is
given by the following non-linear system of ve equations
8
> @L
>
> @x1 = 7 2 1 x1 2 =0
>
>
>
> @L
>
> @x2 = 2 1 x2 2 =0
<
@L
> @x3 = 3 + 2 = 0
>
>
>
> @L
x21 x22 = 0
>
> @ 1 =1
>
>
: @L = 1 x x2 + x3 = 0
@ 2 1

in the ve unknowns x1 , x2 , x3 , 1 and 2 . The third equation implies 2 = 3, so the rst


equation implies that 1 6= 0. Therefore, from the rst two equations it follows that 2= 1 =
x1 and 3= (2 1 ) = x2 . By substituting into the third equation we get that 1 = 5=2. If
1 = 5=2, we have x1 = 4=5, x2 = 3=5, x3 = 4=5. If 1 = 5=2, we have x1 = 4=5,
x2 = 3=5, and x3 = 6=5. We have thus found the two critical points of the Lagrangian

4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2

so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
1166 CHAPTER 38. EQUALITY CONSTRAINTS

thus completing all phases of Lagrange's method. Since C D = ; and C \ D0 = ;, we


conclude that
4 3 4 4 3 6
S [ (C \ D0 ) [ (C D) = S = ; ; ; ; ; (38.42)
5 5 5 5 5 5

thus proving that in the example the rst-order condition (38.17) is necessary for any local
solution of the optimization problem (38.41).
We now turn to the elimination method. Clearly, the set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 + x22 = 1 and x1 + x2 x3 = 1

is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass' Theorem. In view
of (38.42), in the last phase of the elimination method we have

4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5

Hence, (4=5; 3=5; 4=5) solves the optimum problem (38.41), while ( 4=5; 3=5; 7=5) is a
minimizer. N

Example 1719 The optimization problem:

max x1 sub x21 + x32 = 0 and x23 + x22 2x2 = 0 (38.43)


x1 ;x2 ;x3

has also the form (38.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by
f (x1 ; x2 ) = x1 , g1 (x1 ; x2 ; x3 ) = x21 + x32 , g2 (x1 ; x2 ; x3 ) = x23 + x22 2x2 , while b = (0; 0) 2
R2 .
As before, these functions all are continuously di erentiable on R3 , so D = R3 . Therefore,
C D = ;: at all points of the constraint set, the functions f , g1 and g2 are all continuously
di erentiable. This completes phases 1 and 2 of Lagrange's method.
Let us nd the set C \ D0 of the singular points satisfying the constraint. The system
(38.40) becomes 8
>
> 2 1 x1 = 0
>
>
> 3 1 x2 + 2 (2x2 2) = 0
>
< 2
2 2 x3 = 0
>
>
>
> x21 + x32 = 0
>
>
: 2
x3 + x22 2x2 = 0
In light of the rst and the third equations, we must consider three cases:

(i) 1 = 0, x3 = 0 and 2 6= 0: in this case the second equation implies x2 = 1, which


contradicts the last equation.

(ii) 2 = 0, x1 = 0 and 1 6= 0: in this case we obtain the solution x1 = x2 = x3 = 0.


38.7. SEVERAL CONSTRAINTS 1167

(iii) x1 = x3 = 0: here as well we obtain the solution x1 = x2 = x3 = 0.

In conclusion, the origin f(0; 0; 0)g is the unique singular point that satis es the con-
straints, so C \ D0 = f(0; 0; 0)g. This completes phase 3 of Lagrange's method.
The Lagrangian L : R4 ! R is given by

L (x1 ; x2 ; x3 ; ) = x1 + 1 x21 x32 + 2 x23 x22 + 2x2

The rst-order condition (38.17) given by the following (non-linear) system of ve equations
8
> @L
> @x1 = 1 + 2 1 x1 = 0
>
>
>
>
> @x@L
= 3 1 x22 2 2 (x2 1) = 0
>
>
< 2
@L
> @x3 = 2 2 x3 = 0
>
>
>
> @L 2 x32 = 0
>
> @ 1 = x1
>
>
: @L = x2 x2 + 2x = 0
@ 2 3 2 2

in ve unknowns x1 , x2 , x3 , 1 and 2 . The rst equation implies that 1 6= 0 and x1 6= 0.


From the fourth equation it follows that x2 6= 0 and so, from the second equation, we have
2 6= 0.
Since 2 6= 0, from the rst equation we have x3 = 0, so that the fth equation implies
that x2 = 0 or x2 = 2. Since x2 = 0 contradicts p what we have just stated, let us take
x2 = 2. The p fourth equation implies x1 = 8, and so from the rst equation
p implies that
1 = 1=4 2, so that from the second equation we get that 2 = 3=2 2. In conclusion,
the critical points of the Lagrangian are
p 1 3 p 1 3
8; 2; 0; p ; p ; 8; 2; 0; p ; p
4 2 2 2 4 2 2 2

and so n p p o
S= 8; 2; 0 ; 8; 2; 0

which completes all phases of Lagrange's method. In conclusion, since C D = ; we have


n p p o
S [ (C \ D0 ) [ (C D) = S [ (C \ D0 ) = 8; 2; 0 ; 8; 2; 0 ; (0; 0; 0) (38.44)

Among such three points one must search for the possible local solutions of the optimization
problem (38.43).
As to the elimination method, also here the set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x32 = x21 and x23 + x22 = 2x2

is clearly closed. It is also bounded (and so compact). In fact, the second constraint can be
written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that x2p2 [0; p 2]
and x3 2 [ 1; 1]. Now, the constraint x2 = x3 implies x2 2 [0; 8], and so x 2 [ 8; 8].
p p 1 2 1 1
We conclude that C [ 8; 8] [0; 2] [ 1; 1] and so C is bounded. As in the previous
1168 CHAPTER 38. EQUALITY CONSTRAINTS

example, we can thus use the elimination method through Weierstrass' Theorem. In view of
(38.44), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, the origin (0; 0; 0) solves the optimum problem (38.43), while ( 8; 2; 0) is a minimizer.
N

Example 1720 The optimization problem:

max x21 + x22 + x23 sub x21 x2 = 1 and x1 + x3 = 0 (38.45)


x1 ;x2 ;x3

has the form (38.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by


f (x1 ; x2 ; x3 ) = x21 + x22 + x23 , g1 (x1 ; x2 ; x3 ) = x21 x2 and g2 (x1 ; x2 ; x3 ) = x1 + x3 ,
2
while b = (1; 1) 2 R .
As in the previous examples, all these functions are continuously di erentiable on R3 , so
D = R3 . Therefore C D = ;, which completes phases 1 and 2 of Lagrange's method.
In this case we will directly study the rank of the Jacobian:
2x1 1 0
Dg (x) =
1 0 1
It is easy to see that for no value of x1 the two row vectors, that is, the two gradients
rg1 (x) and rg2 (x), are linearly dependent.10 Therefore, there are no singular points, that
is, D0 = ;. It follows that C \ D0 = ;, and so we have concluded phase 3 of Lagrange's
method.
Let us now move to the search of the set of the Lagrangian's critical points L : R5 ! R,
which is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1 1 x21 + x2 + 2 (1 x1 x3 )

To nd such points we must solve the following (non-linear) system of 5 equations


8
> @L
>
> @x1 = 2x1 2 1 x1 2 =0
>
>
>
> @x @L
= 2x2 + 1 = 0
>
>
< 2
@L
> @x3 = 2x3 2 =0
>
>
>
> @L
x21 + x2 = 0
>
> @ 1 =1
>
>
: @L = 1 x x3 = 0
@ 2 1

We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
>
1 x21 + x2 = 0
>
:
1 x1 x3 = 0
10
At a \mechanical" level, one can easily verify that no value of x1 can be such that the matrix Dg (x)
does not have full rank.
38.7. SEVERAL CONSTRAINTS 1169

From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so

1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore, the Lagrangian has a unique critical point

1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
This completes all phases of Lagrange's method. In conclusion, C D = ; and D0 = ; we
have
1 1 1
S [ (C \ D0 ) [ (C D) = S = p
3
;p
3
1; 1 p
3
(38.46)
2 4 2
There is a unique candidate local solution of the optimization problem (38.45).
Let us consider the elimination method. The set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 x2 = 1 and x1 = x3

is closed but
p not bounded (so it is not compact). In fact, consider the sequence fxn g given
by xn = 1 + n; n; 1 n . The sequence belongs to C, but kxn k ! +1 and so there is
no neighborhood in R3 that may contain it. On the other hand, by Proposition 1019 the
function f is coercive and continuous on C. As in the last two examples, we can thus use the
elimination method but this time via Tonelli's Theorem. In view of (38.46), the elimination
method implies that the point

1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (38.45). In this case the elimination method
is silent about possible minimizers because it relies on Tonelli's Theorem rather than on
Weierstrass' Theorem. N
1170 CHAPTER 38. EQUALITY CONSTRAINTS
Chapter 39

Inequality constraints

39.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income w 0.
Given the vector p 2 Rn+ of prices of the goods, because of Walras' law we wrote his budget
constraint as
C (p; w) = fx 2 A : p x = wg
and his optimization problem as:

max u (x) sub x 2 C (p; w) (39.1)


x

In this formulation we assumed that the consumer exhausts his budget (so the equality in
the budget constraint) and we did not impose other constraints on the bundle x except that
of satisfying the budget constraint. However, the hypothesis that income is entirely spent
may be too strong, so one may wonder what happens to the consumer optimization problem
if we weaken the constraint to p x w, that is, if the constraint is given by an inequality
and not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, say fruit or vegetables in an open air
market, in which the quantity purchased has to be positive. This suggests to impose the
positivity constraint x 0 in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:

max u (x) (39.2)


x
sub p x w and x 0

with constraints now given by inequalities. If we write the budget constraint as

C (p; w) = fx 2 A : x 0 and p x wg (39.3)

the optimization problem still takes the form (39.1), but the budget constraint C (p; w) is
now di erent.

1171
1172 CHAPTER 39. INEQUALITY CONSTRAINTS

The general form of an optimization problem with both equality and inequality constraints
is:

max f (x) (39.4)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

where I and J are nite sets of indexes (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj
equality constraints, while the functions hj : A Rn ! R with the associated scalars cj
induce jJj inequality constraints. We continue to assume, as in the previous chapter, that
the functions f , gi and hj are continuously di erentiable on a non-empty and open subset
D of their domain A.
The optimization problem (39.4) can be equivalently formulated in canonical form as

max f (x) sub x 2 C


x

where the choice set is

C = fx 2 A : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (39.5)

The formulation (39.4) is extremely exible. It encompasses the optimization problem


with only equality constraints, which is the special case I 6= ; and J = ;. It reduces to
an unconstrained optimization problem when I = J = ; and A is open. Moreover, observe
that:

(i) A constraint of the form h (x) c can be included in the formulation (39.4) by consid-
ering h (x) c. In particular, the constraint x 0 can be included by considering
x 0;

(ii) A constrained minimization problem for f can be written in the formulation (39.4) by
considering f .

These two observations show the scope and exibility of formulation (39.4). In particular,
in light of (ii) it should be clear that also the choice of the sign in expressing the inequality
constraints is just a convention. That said, next we give some discipline to this formulation.

De nition 1721 The problem (39.4) is said to be well posed if, for each j 2 J, there exists
x 2 C such that hj (x) < c.

To understand this de nition observe that an equality constraint g (x) = b can be written
in form of inequality constraint as g (x) b and g (x) b. This makes uncertain the
distinction between equality and inequality constraints in (39.4). To avoid this, and so
to have a clear distinction between the two types of constraints, in what follows we will
always consider optimization problems (39.4) that are well posed, so that it is not possible
to express equality constraints in the form of inequality constraints. Naturally, De nition
1721 is automatically satis ed when J = ;, so there are no inequality constraints to worry
about.
39.1. INTRODUCTION 1173

Example 1722 (i) The optimization problem:

max x21 + x22 + x33


x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

is of the form (39.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1 These functions are continuously di erentiable, so D = R3 .
Moreover, C = x 2 R3 : x1 + x2 x3 = 1 and x21 + x22 1
(ii) The optimization problem:

max x1
x1 ;x2 ;x3

sub x21 + x32 = 0 and x23 + x22 2x2 = 0

is of the form (39.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0. These functions are continuously di erentiable, so D = R3 .
Moreover, C = x 2 R3 : x32 = x21 and x23 + x22 = 2x2
(iii) The optimization problem:

max ex1 +x2 +x3


x1 ;x2 ;x3
1 1
sub x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10
is of the form (39.4) with I = J = f1; 2g ; f (x) = ex1 +x2 +x3 , g1 (x) = x1 + x2 + x3 ,
g2 (x) = x21 + x22 + x23 ; h1 (x) = x1 ; h2 (x) = x2 , b1 = 1, b2 = 1=2, c1 = 0 and c2 = 1=10.
These functions are continuously di erentiable, so D = R3 . Moreover,

1 1
C= x 2 R3 : x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10

(iv) The optimization problem:

max x31 x32


x1 ;x2

sub x1 + x2 1 and x1 + x2 1

is of the form (39.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1. These functions are continuously di erentiable, so D = R2 .
Moreover, C = x 2 R2 : x1 + x2 1 and x2 1 + x1
(v) The minimum problem:

min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = x21 + x22 and
b1 = c1 = 1. But, in this case of a single equality constraint and of a single inequality constraint, the
subscripts just make the notation heavy.
1174 CHAPTER 39. INEQUALITY CONSTRAINTS

can be written in the form (39.4) as

max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N

O.R. An optimization problem with inequality constraints is often written as

max f (x) (39.6)


x
sub g1 (x) b1 ; g2 (x) b2 ; :::; gm (x) bm

where f : A Rn ! R is our objective function, while the functions gi : A Rn ! R and


the scalars bi 2 R induce m inequality constraints. As we already noted, this formulation
may include equality constraints g (x) = b via two inequality constraints g (x) b and
g (x) b. Note, however, that this formulation requires the presence of at least one
constraint (it is the case m = 1) and hence it is less general than (39.4). Moreover, the
indirect way in which (39.6) encompasses the equality constraints may make less transparent
the formulation of the results. This is a further reason why we chose the formulation (39.4)
in which the equality constraints are fully speci ed. H

39.2 Resolution of the problem


In this section we extend to the optimization problem (39.4) the solution methods studied
in the previous chapter for the special case with only equality constraints (38.2). To do this,
we rst need to nd the general version of Lemma 1716 that also holds for problem (39.4).
To this end, for a given point x 2 A, set

A (x) = I [ fj 2 J : hj (x) = cj g (39.7)

In words, A (x) is the set of the indices of the so-called binding constraints at x, that is, of
the constraints that hold as equalities at the given point x. For example, in the problem

max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

the rst constraint is binding at all the points


p ofpthep
choice set C, while the second constraint
is, for instance, binding at the point (1= 2; 1= 2; 2 1) and is not binding at the point
(1=2; 1=2; 0).2

De nition 1723 A point x 2 D is said to be regular (with respect to the constraints) if


the gradients rgi (x) and the gradients rhj (x), with j 2 A (x), are linearly independent.
Otherwise, it is singular.
2
p p p
So, A(1= 2; 1= 2; 2 1) = f1; 2g and A (1=2; 1=2; 0) = f1g.
39.2. RESOLUTION OF THE PROBLEM 1175

In other words, a point x 2 D is regular if the gradients of the functions that induce
constraints binding at such point are linearly independent. This condition generalizes the
notion of regularity upon which Lemma 1716 was based. Indeed, if we form the matrix whose
rows consist of the gradients of the functions that induce binding constraints at the point
considered, the regularity of the point amounts to require that such a matrix has full rank.
Note that in view of Corollary 94-(ii) a point is regular only if jA (x)j n, that is, only
if the number of the binding constraints at x does not exceed the dimension of the space on
which the optimization problem is de ned.

We can now state the generalization of Lemma 1716 for problem (39.4). In reading it,
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.

Lemma 1724 Let x ^ 2 C \ D be a local solution of the optimization problem (39.4). If x


^ is
jJj
regular, then there exist a vector ^ 2 R and a vector ^ 2 R+ such that
jIj

X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (39.8)
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J (39.9)

By unzipping gradients, condition (39.8) can be equivalently written as

@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J

This lemma generalizes both Fermat's Theorem and Lemma 1716. Indeed:

(i) if I = J = ;, condition (39.8) reduces to the condition rf (^


x) = 0 of Fermat's Theorem;
P
(ii) if I 6= ; and J = ;, condition (39.8) reduces to the condition rf (^x) = i2I ^ i rgi (^
x)
of Lemma 1716.

The novelty of Lemma 1724 relative to these previous results is, besides the positivity of
the vector ^ associated to the inequality constraints, the condition (39.9). To understand
the role of this condition, it is useful the following characterization.

Lemma 1725 Condition (39.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).

Proof Assume (39.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (39.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have

^ j (cj hj (^
x)) = 0 8j 2 J (39.10)

because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Condition
(39.10) immediately implies (39.9).
1176 CHAPTER 39. INEQUALITY CONSTRAINTS

In words, (39.9) is equivalent to require the nullity of each ^ j associated to a not binding
constraint. Hence, we can have ^ j > 0 only if the constraint j is binding in correspondence
of the solution x^.
For example, if x^ is such that hj (^
x) < cj for each j 2 J, i.e., if in correspondence of x ^
all the inequality constraints are not binding, then we have ^ j = 0 for each j 2 J and the
vector ^ does not play any role in the determination of x ^. Naturally, this re ects the fact
that for this solution x
^ the inequality constraints themselves do not play any role.

The next example shows that conditions (39.8) and (39.9) are necessary, but not su cient
(something not surprising since the same is true for Fermat's Theorem and for Lemma 1716).

Example 1726 Consider the optimization problem:

x31 + x32
max (39.11)
x1 ;x2 2
sub x1 x2 0

It is a simple modi cation of Example 1708, and has the form (39.4) with f; h : R2 ! R
given by f (x) = 2 1 (x31 + x32 ) and h (x) = x1 x2 , while c = 0. We have:

rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1)

and

rf (0; 0) = rg (0; 0)
(0 0) = 0

The origin (0; 0) satis es with = 0 the conditions (39.8) and (39.9), but it is not solution
of the optimization problem (39.11), as (38.9) shows. N

We defer the proof of Lemma 1724 to the appendix.3 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (39.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case

max f (x) (39.12)


x
sub g (x) = b and h (x) c

where f : A Rn ! R is the objective function, and g; h : A Rn ! R induce one equality


and one inequality constraint.
De ne H : A R Rn+1 ! R as H (x; z) = h (x) + z 2 for each x 2 A and each z 2 R.
Given x 2 A, we have h (x) c if and only if there exists z 2 R such that h (x) + z 2 = c,
i.e., if and only if H (x; z) = c.4
3
A noteworthy feature of this proof is that, for a change, it does not rely on the Implicit Function Theorem,
unlike the proof that we gave for Lemma 1707 (the special case of Lemma Lemma 1716 that we proved).
4
The positivity of the square z 2 preserves the inequality g (x) b. The auxiliary variable z is often called
slack variable.
39.2. RESOLUTION OF THE PROBLEM 1177

De ne F : A R Rn+1 ! R and G : A R Rn+1 ! R by F (x; z) = f (x) and


G (x; z) = g (x) for each x 2 A and each z 2 R. The dependence of F and G on z is only
ctitious, but it allows to formulate the following optimization problem:
max F (x; z) (39.13)
x;z

sub G (x; z) = b and H (x; z) = c


Problems (39.12) and (39.13) are equivalent: x ^ is solution of problem (39.12) if and only if
there exists z^ 2 R such that (^ x; z^) is solution of problem (39.13).
We have, therefore, reduced problem (39.12) to a problem with only equality constraints.
By Lemma 1716, (^ x; z^) is solution of such problem only if there exists a vector ( ^ ; ^ ) 2 R2
such that:
rF (^ x; z^) = ^ rG (^
x; z^) + ^ rH (^
x; z^)
that is, only if
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^) 8i = 1; :::; n
@xi @xi @xi
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^)
@z @z @z
which is equivalent to:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
2^ z = 0
On the other hand, we have 2^ z = 0 if and only if ^ z 2 = 0. In view of the equivalence
between problems (39.12) and (39.13), we conclude that if x
^ is a solution of problem (39.12),
then there exists a vector ( ; ) 2 R2 such that:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
^ (c h (x)) = 0
We therefore have conditions (39.8) and (39.9) of Lemma 1724. What we have not been
able to prove is the positivity of the multiplier , and for this reason the proof just seen is
incomplete.5

39.2.1 Kuhn-Tucker's Theorem


In view of Lemma 1724, the Lagrangian function associated to the optimization problem
(39.4) is the function
jJj
L : A RjIj R+ Rn+jIj+jJj ! R
de ned by:6
X X
L (x; ; ) = f (x) + i (bi gi (x)) + j (cj hj (x)) (39.14)
i2I j2J

= f (x) + (b g (x)) + (c h (x)) ;


5
Since it is, in any case, an heuristic argument, for simplicity we did not check the rank condition required
by Lemma 1716.
6
The notation (x; ; ) underlines the di erent status of x with respect to and .
1178 CHAPTER 39. INEQUALITY CONSTRAINTS

jJj
for each (x; ; ) 2 A RjIj R+ . Note that the vector is required to be positive.

The next famous result, proved in 1951 by Harold Kuhn and Albert Tucker, generalizes
Lagrange's Theorem to the optimization problem (39.4). We omit the proof because it is
analogous to that of Lagrange's Theorem.

Theorem 1727 (Kuhn-Tucker) Let x ^ 2 C \ D be a local solution of the optimization


jJj
problem (39.4). If x ^ is regular, then there exists a pair of vectors ( ^ ; ^ ) 2 RjIj R+ such
x; ^ ; ^ ) satis es the conditions:
that the triple (^

^; ^ ; ^ = 0
rLx x (39.15)

^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (39.16)

rL ^; ^ ; ^ = 0
x (39.17)

rL ^; ^ ; ^
x 0 (39.18)

The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange (or Kuhn-Tucker )
multipliers, while (39.15)-(39.18) are called Kuhn-Tucker conditions. The points x 2 A
jJj
for which there exists a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis es the
conditions (39.15)-(39.18) are called points of Kuhn-Tucker.
Kuhn-Tucker points are, therefore, the solutions of the { typically nonlinear { system of
equations and inequalities given by the Kuhn-Tucker conditions. By Kuhn-Tucker's Theo-
rem, a necessary condition for a regular point x to be solution of the optimization problem
(39.4) is that it is a point of Kuhn-Tucker.7 Observe, however, that a Kuhn-Tucker point
(x; ; ) is not necessarily a stationary point for the Lagrangian function: the condition
(39.18) only requires rL (x; ; ) 0, not the stronger property rL (x; ; ) = 0.

Let (x; ; ) be a Kuhn-Tucker point. By Lemma 1725, expression (39.16) is equivalent to


require j = 0 for each j such that hj (x) < cj . Hence, j > 0 implies that the correspondent
constraint is binding at the point x, that is, hj (x) = cj . Because of its importance, we state
formally this observation.

Proposition 1728 At a Kuhn-Tucker point (x; ; ), we have j > 0 only if hj (x) = cj .

Later in the book, in Section 41.7, we will present a marginal interpretation of the
multipliers ( ^ ; ^ ), along the lines sketched in the case of equality constraints (Section 38.3.3).

39.2.2 The method of elimination


Kuhn-Tucker's Theorem suggests a procedure to nd local solutions of the optimization prob-
lem (39.4) that generalizes Lagrange's method, as well as a generalization of the method of
elimination to nd its global solutions. For brevity, we directly consider this latter general-
ization.
7
Note the adjective \regular". Indeed, a point of Kuhn-Tucker which is not regular is outside the scope of
Kuhn-Tucker's Theorem.
39.2. RESOLUTION OF THE PROBLEM 1179

Let D0 be the set of the singular points x 2 D where the regularity condition of the
constraints does not hold, and let D1 be, instead, the set of the points x 2 A where this
condition holds. The method of elimination consists of four phases:

1. Verify if Tonelli's Theorem can be applied, that is, if f is continuous and coercive on
C;

2. determine the set D where the functions f and gi are continuously di erentiable;

3. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di erentiable;

4. determine the set C \ D0 of the singular points that satisfy the constraints;

5. determine the set S of the regular Kuhn-Tucker points, i.e., the points x 2 C \(D D0 )
jJj
for which there exists a pair ( ; ) 2 RjIj R+ of Lagrange multipliers such that the
triple (x; ; ) satis es the Kuhn-Tucker conditions (39.15)-(39.18);8

6. determines the set ff (x) : x 2 S [ (C \ D0 )g; if x


^ 2 S [ (C \ D0 ) is such that

f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D)

then such x
^ is solution of the optimization problem (39.4).

The rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (39.4).

Example 1729 The optimization problem:

max x1 2x22 (39.19)


x1 ;x2

sub x21 + x22 1

has the form (39.4), where f; h : R2 ! R are given by f (x1 ; x2 ) = x1 2x22 and h (x1 ; x2 ) =
x21 + x22 , while b = 1. Since C is compact, the rst phase is completed through Weierstrass'
Theorem.
The functions f and h are continuously di erentiable, so D = R2 and C D = ;. We
have rh (x) = (2x1 ; 2x2 ), so the constraint is regular at each point x 2 C, that is, C \D0 = ;.
This completes the rst four phases of the elimination method.
The Lagrangian function L : R3 ! R is given by

L (x1 ; x2 ; ) = x1 2x22 + 1 x21 x22


8
Note that S C because the Kuhn-Tucker conditions ensure, inter alia, that the Kuhn-Tucker points
satisfy the constraints is therefore not necessary to check if for a point x 2 S we have also x 2 C. A similar
observation was made in the previous chapter.
1180 CHAPTER 39. INEQUALITY CONSTRAINTS

and to nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8
> @L
>
> @x1 = 1 2 x1 = 0
>
>
>
> @L
= 4x2 2 x2 = 0
>
>
< @x2
@L
@ = 1 x21 x22 = 0
>
>
>
> @L
>
> = 1 x21 x22 0
> @
>
>
: 0

We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, since the sets C \ D0 and C D are both empty, we have

S [ (C \ D0 ) [ (C D) = S = f( 1; 0)g

The method of elimination allows us to conclude that ( 1; 0) is the only solution of the
optimization problem 39.19. Note that in this solution the constraint is binding (i.e., it is
satis ed with equality); indeed = (1=2) > 0, as required by Proposition 1728. N

Example 1730 The optimization problem


n
X
max x2i (39.20)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1
Pn
Pnis of the form (39.4), where f; g : Rn ! R are given by f (x) = 2
i=1 xi and g (x) =
n
i=1 xi , hj (x) : R ! R are given by h Pj (x) = xj for j = 1; :::; n; while b = 1 and cj = 0
for j = 1; :::; n. The set C = x 2 Rn+ : ni=1 xi = 1 is compact and so also in this case the
rst phase is completed thanks to Weierstrass' Theorem.
The functions f , g and hj are continuously di erentiable, so D = R2 and C D = ;. For
each x 2 Rn we have rg (x) = (1; :::; 1) and rhj (x) = ej . Therefore, the value of these
gradients does not depend on the point x considered. To verify regularity, we consider the
collection (1; :::; 1) ; e1 ; :::; en of these gradients. This collection has n + 1 elements and it
is obviously linearly dependent (the versors e1 ,..., en form the standard basis of Rn ).
On the other hand, it is immediate to see that any subcollection with at most n elements
is, instead, linearly independent. Hence, the only way to violate regularity is that they are
all binding, so that all collections of n + 1 elements have to be considered. Fortunately, there
are no points x 2 Rn where all constraints are binding. Indeed, the only point that satis es
with equality all thePconstraints xj 0 is the origin 0, which however does not satisfy the
equality constraint ni=1 xi = 1.
We conclude that all the points x 2 Rn are regular, i.e., D0 = ;. Hence, C \ D0 = ;.
This completes the rst four phases of the elimination method.
39.2. RESOLUTION OF THE PROBLEM 1181

The Lagrangian function L : R2n+1 ! R is given by


n n
! n
X X X
2
L (x1 ; x2 ; ) = xi + 1 xi + i xi 8 (x; ; ) 2 R2n+1
i=1 i=1 i=1

To nd the set S of its Kuhn-Tucker points, it is necessary to solve the system


8
@L
>
> @xi = 2xi + i=0 8i = 1; :::; n
>
>
>
> @L P n
>
> @ = (1 i=1 xi ) = 0
>
>
>
> P n
< @L
@ =1 i=1 xi = 0
>
>
@L
= i xi =0 8i = 1; :::; n
>
> i@ i
>
>
>
> @L
= xi 0 8i = 1; :::; n
>
> @ i
>
>
:
i 0 8i = 1; :::; n
If we multiply by xi the rst n equations, we get
2x2i xi + i xi =0 8i = 1; :::; n
Adding up these new equations, we have
n
X n
X n
X
2 x2i xi + i xi =0
i=1 i=1 i=1

Therefore,
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since C D = ; and D0 = ;, we have
1 1
S [ (C \ D0 ) = ; :::;
n n
1182 CHAPTER 39. INEQUALITY CONSTRAINTS

The method of elimination allows us to conclude that the point (1=n; :::; 1=n) is the solution
of the optimization problem (39.20). N

39.3 Cogito et solvo


The result of the last example { i.e., that (1=n; :::; 1=n) is the optimal point { can be proved
in a much more general form through a simple application of Jensen's inequality, without
any use of di erentiable methods. Yet another proof that di erential methods might not be
\optimal" (cf. the discussion after Example 1714 in the previous chapter).

Proposition 1731 Let h : [0; 1] ! R be concave. The optimization problem


n
X
max h (xi ) (39.21)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1

has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is called entropy (Examples 239 and 1685).
P
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint ni=1 xi = 1. Since h is concave, by the
Jensen's inequality we have
n n
!
X 1 1X 1
h (xi ) h xi = h
n n n
i=1 i=1

Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
P
This shows that (1=n; :::; 1=n) is a solution. Clearly, ni=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness of the solution is ensured by Theorem 1032.

In this chapter we presented a structured approach to optimization problems with in-


equality constraints that extends those studied for equality constraints. But, sometimes
with some little thinking we can go beyond any \method," however powerful it might be.
The last proposition is an illustration of this remark. But, recall how we solved the con-
strained optimization problem with inequality constraints (37.14) in Example 1691 with a
basic application of the concave method without any Kuhn-Tucker reasoning.

39.4 Concave optimization


39.4.1 The problem
The remarkable optimality properties of concave functions make them of particular interest
when dealing with the optimization problem (39.4). We start with a simple, but important,
result.
39.4. CONCAVE OPTIMIZATION 1183

Proposition 1732 Let A be convex. If the functions gi are a ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de ned in (39.5) is convex.

Proof Set Ci = fx 2 A : gi (x) = bi g for each i 2 I and Cj = fx 2 A : hj (x) cj g for each


j 2 J. Clearly, Cj is convex as the sublevel of a convex function (Proposition 839). A
similar argument shows that also each
T Ci is convex,
T and this implies the convexity of the set
C de ned in (39.5) because C = i2I Ci \ ( j2J Cj ).

It is easy to give examples where C is no longer convex when the conditions of convexity
and a nity used in this result are not satis ed. Note that the convexity condition of the
hj is much weaker than that of a nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
\structural" di erence between the two types of constraints { which are more di erent than
it may appear prima facie.

Motivated by the last result, we give the following de nition.

De nition 1733 The optimization problem (39.4) is said to be concave if the objective
function f is concave, the functions gi are a ne and the functions hj are convex on the open
and convex set A.

A concave optimization problem has therefore the form

max f (x) (39.22)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

where I and J are nite sets of indexes (possibly empty), f : A Rn ! R is a concave


objective function, the a ne functions gi : A Rn ! R and the associated scalars bi
characterize jIj equality constraints, while the convex functions hj : A Rn ! R with the
associated scalars cj induce jJj inequality constraints. The convex domain A is assumed to
be open to best exploit the properties of concave functions.

We can represent the a ne functions gi as gi (x) = i x + qi (Proposition 820). Hence,


if is the jIj n matrix that has the vectors i 2 Rn as its rows, we can write the equality
constraints in the matrix form x + q = b, where b 2 RjIj . Often q = 0, so the equality
constraints take the simple matrix form

x=b (39.23)

In a similar vein, when also the functions hj happen to be a ne, say hj (x) = j x + qi ,
we can write also the inequality constraints in the matrix form Hx c, where H is the
jJj n matrix with rows j and c 2 RjJj . Thus, when all constraints are identi ed by a ne
functions, the choice set is a polyhedron C = fx 2 Rn : x = b and Hx cg. This case often
arises in applications. Indeed, if also the objective function is a ne, we are back to linear
programming, an important class of concave problem that we already studied via convexity
arguments (Section 22.7.2).
1184 CHAPTER 39. INEQUALITY CONSTRAINTS

39.4.2 Kuhn-Tucker points


Recall from Section 37.3 that the search for the solutions of an unconstrained optimization
problem for concave functions was based on a remarkable property: the rst-order necessary
condition for the existence of a local maximizer becomes su cient for the existence of a
global maximizer in the case of concave functions.
The next fundamental result is the \constrained" version of this property. Note that
regularity does not play any role in this result.

Theorem 1734 The Kuhn-Tucker points solve a concave optimization problem in which the
functions f; fgi gi2I and fhj gj2J are di erentiable.

Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (39.4), that is,
(x; ; ) satis es the conditions (39.15)-(39.18). In particular, this means that
X X
rf (x ) = i rgi (x )+ j rhj (x ) (39.24)
i2I j2A(x )\J

Since each gi is a ne and each hj is convex, by (31.35) it follows that:

hj (x) hj (x ) + rhj (x ) (x x ) 8j 2 J; 8x 2 A (39.25)


gi (x) = gi (x ) + rgi (x ) (x x ) 8i 2 I; 8x 2 A (39.26)

For each j 2 A (x ) we have hj (x ) = cj , and hence hj (x) hj (x ) for each x 2 C and


each j 2 A (x ) \ J. Moreover, gi (x ) = gi (x) for each i 2 I and each x 2 C. By (39.25)
and (39.26) it follows

rhj (x ) (x x ) 0 8j 2 A (x ) ; 8x 2 C
rgi (x ) (x x )=0 8i 2 I; 8x 2 C

Together with (39.24), we therefore have:


X X
rf (x ) (x x )= i rgi (x ) (x x )+ j rhj (x ) (x x ) 0
i2I j2A(x )\J

for each x 2 C. On the other hand, by (31.35) we have:

f (x) f (x ) + rf (x ) (x x ) 8x 2 A

and we conclude that f (x) f (x ) for each x 2 C, as desired.

This theorem provides a su cient condition for optimality: if a point is Kuhn-Tucker,


then it solves the optimization problem. The condition is, however, not necessary: there can
be solutions of a concave optimization problem that are not Kuhn-Tucker points. In view
of Kuhn-Tucker's Theorem this can happen only if the solution is not a regular point. The
next example illustrates this situation.
39.4. CONCAVE OPTIMIZATION 1185

Example 1735 The optimization problem

max x1 x2 x23 (39.27)


x1 ;x2 ;x3

sub x21 + x22 2x1 0 and x21 + x22 + 2x1 0

has the form (39.4), where f; h1 ; h2 : R3 ! R are continuously di erentiable functions given
by f (x1 ; x2 ; x3 ) = x1 x2 x23 , h1 (x1 ; x2 ; x3 ) = x21 +x22 2x1 , h2 (x1 ; x2 ; x3 ) = x21 +x22 +2x1 ,
while c1 = c2 = 0.
Clearly, f is concave and h1 and h2 are convex, so (39.27) is a concave optimization
problem. The system of inequalities

x21 + x22 2x1 0


x21 + x22 + 2x1 0

has the point (0; 0) as its unique solution. Hence, C = x 2 R3 : x1 = x2 = 0 is a straight


line in R3 and the unique solution of the problem (39.27) is the origin (0; 0; 0). On the other
hand,
rh1 (0; 0; 0) = ( 2; 0; 0) and rh2 (0; 0; 0) = (2; 0; 0)
and so the origin is a singular point. Since

rf (0; 0; 0) = ( 1; 1; 0)

there are no pairs ( 1; 2) 2 R2+ such that:

rf (0; 0; 0) = 1 rh1 (0; 0; 0) + 2 rh2 (0; 0; 0)

Therefore, the solution (0; 0; 0) is not a Kuhn-Tucker point. N

By combining Kuhn-Tucker's Theorem and Theorem 1734 we get the following necessary
and su cient optimality condition.

Theorem 1736 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are continuously di erentiable. A regular point x 2 A is a solution of the
problem if and only if it is a Kuhn-Tucker point.

Theorem 1736 is a re nement of the Kuhn-Tucker's Theorem and, as such, it allows us


to re ne the method of elimination, which we will call convex method (of elimination). Such
method is based on the following phases:

1. Verify if the problem is concave;

2. verify if the functions f , gi and hj are continuously di erentiable, i.e., A = D;

3. determine the set C \ D0 of the singular points that satisfy the constraints;

4. determine the set S of the regular Kuhn-Tucker points;


1186 CHAPTER 39. INEQUALITY CONSTRAINTS

5. if S 6= ;, then all the points of S are solutions of the problem,9 while also a singular
point x 2 C \ D0 is a solution if and only if f (x) = f (^
x) for some x
^ 2 S;

6. if S = ;, check if Tonelli's Theorem can be applied { i.e., if f is continuous and coercive


on C; if this is the case, the maximizers of f on C \D0 are solutions of the optimization
problem (39.4).

Since either phase 5 or 6 applies, depending on whether or not S is empty, the actual
phases of the convex method are ve.

The convex method works thanks to Theorems 1734 and 1736. Indeed, if S 6= ; then
by Theorem 1734 all points of S are solutions of the problem. In this case, a singular point
x 2 C \ D0 can in turn be a solution when its value f (x) is equal to that of any point in S.
When, instead, we have S = ;, then Theorem 1736 guarantees that no regular point in A
is solution of the problem. At this stage, if Tonelli's Theorem is able to ensure the existence
of at least a solution, we can restrict the search to the set C \ D0 of the singular points that
satisfy the constraints. In other words, it is su cient to nd the maximizers of f on C \ D0 :
they are also solutions of problem (39.4), and vice versa.

Clearly, the convex method becomes especially powerful when S 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass or
Tonelli, but it is su cient to nd the Kuhn-Tucker points.
If we content ourselves with solutions that are regular points, without worrying about
the possible existence of singular solutions, we can give a short version of the convex method
that is based only on Theorem 1734. We can call it the short convex method. It is based
only on three phases:

1. Verify if the problem is concave;

2. verify if the functions f and gi are continuously di erentiable, i.e., A = D;

3. determine the set S of the regular Kuhn-Tucker points;

4. if S 6= ;, then all the points of S are solutions of the problem.

Indeed, by Theorem 1734 all regular Kuhn-Tucker points are solutions of the problem.
The short convex method is simpler than the convex method, and it does not require the use
of global existence theorems. The price of this simpli cation is in the possible inaccuracy of
this method: being based on su cient conditions, it is not able to nd the solutions where
these conditions are not satis ed (by Theorem 1736, such solutions would be singular points).
Furthermore, the short method cannot be applied when S = ;; in such a case, it is necessary
to apply the complete convex method.

The short convex method is especially powerful when the objective function f is strictly
concave, as often assumed in applications. Indeed, in such a case a solution found with the
short method is necessarily also the unique solution of the concave optimization problem.
The next example illustrates.
9
The set S is at most a singleton when f is strictly concave because in such a case there is at most a
solution of the problem (Theorem 1032).
39.4. CONCAVE OPTIMIZATION 1187

Example 1737 Consider the optimization problem:

max x21 + x22 + x23 (39.28)


x1 ;x2 ;x3

sub 3x1 + x2 + 2x3 1 and x1 0

This problem is of the form (39.4), where f; h1 ; h2 : R3 ! R are given by f (x) = x21 + x22 + x23 ,
h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) = x1 , while c1 = 1 and c2 = 0.
Using Theorem 1474 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (39.28) is a concave optimization problem.
Moreover, the functions f , h1 and h2 are continuously di erentiable. This completes the
rst two phases of the short convex method, which we apply here since f is strictly concave.
Let us nd the Kuhn-Tucker points. The Lagrangian function L : R5 ! R is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1( 1 + 3x1 + x2 + 2x3 ) + 2 x1 ;

To nd the set S of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities:
8 @L
>
> @x1 = 2x1 + 3 1 + 2 = 0
>
>
> @L = 2x +
>
>
> @x2 2 1 =0
>
>
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
>
>
>
< 1 @@L = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
1
(39.29)
>
> @L
= x = 0
>
> 2 @ 2 2 1
>
>
> @L
> @ = 1 + 3x1 + x2 + 2x3 0
>
>
>
>
1
>
> @L
>
> @ 2 = x1 0
>
>
:
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1: 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2: 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3: 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
> 2x1 + 3 1 = 0
>
>
< 2x + =0 2 1
>
> 2x3 + 2 =0
>
:
1
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (39.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
1188 CHAPTER 39. INEQUALITY CONSTRAINTS

Case 4: 1 = 2 = 0. The condition @L=@x1 = 0 implies x1 = 0, while the conditions


@L=@x2 = @L=@x3 = 0 imply x2 = x3 = 0. It follows that the condition @L=@ 1 0 implies
1 0, and this contradiction shows that we cannot have 1 = 2 = 0.
In conclusion, S = f((3=14; 1=14; 1=7))g. Since f is strictly concave, the short convex
method allows us to conclude that
3 1 1
; ;
14 14 7
is the unique solution of the optimization problem (39.28).10 N

We close with an important observation. The solution methods seen in this chapter are
based on the search of the Kuhn-Tucker points, and therefore they require the resolution of
systems of nonlinear equations. In general, these systems are not easy to solve and this limits
the computational usefulness of these methods, whose importance is mostly theoretical. At
a numerical level, other methods are used (which the interested reader can nd in books of
numerical analysis).

39.5 Appendix: proof of a key lemma


We begin with a calculus delight.

Lemma 1738 (i) The function y = x jxj is continuously di erentiable in R and Dx jxj =
2
2 jxj. (ii) The square (x+ ) of the function x+ = max fx; 0g is continuously di erentiable on
2
R, and D (x+ ) = 2x+ .

Proof (i) Observe that x jxj is in nitely di erentiable for x 6= 0 and its rst derivative is,
by the product rule for di erentiation,

jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su ces to invoke a basic calculus result that asserts: let f : I !
R be continuous on a real interval, and f be di erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore

2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is continuously di erentiable and D (x+ ) = x + jxj = 2x+ .

Proof of Lemma 1724 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2
= A (^ x). Since A is an open, there exists ~" > 0 su ciently small such that B~" (^ x) =
fx 2 A : kx x ^k ~"g A. Moreover, since each hj is continuous, for each j 2 = A (^
x) there
exists "j su ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) "j and ^ " = min f~"; "0 g; in other words, ^" is the minimum between ~" and
10
The objective function is easily see to be strongly concave. So, coda readers may note that the existence
and uniqueness of the solution would also follow from Theorem 1501.
39.5. APPENDIX: PROOF OF A KEY LEMMA 1189

the "j . In this way we have B^" (^ x) = fx 2 A : kx x ^k ^"g A and hj (x) < cj for each
x 2 B^" (^
x) and each j 2 = A (^x).
Given " 2 (0; ^"], the set S" (^
x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2= A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis ed.
For each j 2 J, let h~ j : A Rn ! R be de ned by

~ j (x) = max fhj (x)


h cj ; 0g = (hj (x) cj )+

~ 2 2 C 1 (A) and
for each x 2 A. By Lemma 1738, h j

~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (39.30)
@xp @xp

We rst prove a property that we will use after.

Fact 1. For each " 2 (0; ^"], there exists N > 0 such that

f (x) f (^
x) kx x ^ k2 (39.31)
0 1
X X 2
N@ x))2 +
(gi (x) gi (^ ~ j (x)
h ~ j (^
h x) A<0
i2I i2J\A(^
x)

for each x 2 S" (^


x).

Proof of Fact 1 We proceed by contradiction, and we assume therefore that there exists
" 2 (0; ^"] for which there is no N > 0 such that (39.31) holds. Take an increasing sequence
fNn gn with Nn " +1, and for each of these Nn take xn 2 S" (^ x) for which (39.31) does not
hold, that is, xn such that:

f (xn ) f (^
x) kxn x ^k2
0 1
X X 2
Nn @ x))2 +
(gi (xn ) gi (^ ~ j (xn )
h ~ j (^
h x) A 0
i2I i2J\A(^
x)

Hence, for each n 1 we have:

f (xn ) f (^
x) kxn ^k2
x X
(gi (xn ) x))2
gi (^ (39.32)
Nn
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
j2J\A(^
x)

Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass' Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (39.32) implies that, for each k 1,
1190 CHAPTER 39. INEQUALITY CONSTRAINTS

we have:
f (xnk ) f (^
x) kxnk ^ k2
x X
(gi (xnk ) x))2
gi (^ (39.33)
Nnk
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
k
j2J\A(^
x)

Since f is continuous, we have limk f (xnk ) = f (x ). Moreover, limk kxnk x


^k = kx x
^k.
Since limk Nnk = +1, we have

f (xnk ) f (^
x) kxnk ^k2
x
lim =0
k Nnk
~j ,
and hence (39.33) implies, thanks to the continuity of the functions gi and h
X X 2
(gi (x ) x))2 +
gi (^ ~ j (x )
h ~ j (^
h x)
i2I i2J\A(^
x)
0 1
X X 2
= lim @ (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A=0
k
i2I j2J\A(^
x)

2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
h x)
= 0 for each i 2 I and for each
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (39.33) implies

f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)

for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k

which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4

Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .

Fact 2. For each " 2 (0; ^"], there exist x" 2 B" (^
x) and a vector

" " " " "


0; 1 ; :::; jIj ; 1 ; :::; jJj 2S
39.5. APPENDIX: PROOF OF A KEY LEMMA 1191

with " 0 for each j 2 J, such that


j

X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (39.34)
@xz @xz @xz
i2I j2J\A(^
x)

for each z = 1; :::; n.

Proof of Fact 2 Given " 2 (0; ^"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)

for each x 2 A. We have " (^


x) = 0 and, given how N" has been chosen,

" (x) > 0; 8x 2 S" (^


x) (39.35)

The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass' Theorem, there exists x" 2 B" (^ x) such that " (x" ) " (x) for each x 2 B" (^
x).
"
In particular, " (x ) " "
" (^
x) = 0, and hence (39.35) implies that kx k < ", that is, x 2
x). Point x" is therefore a maximizer on the open set B" (^
B" (^ x) and by Fermat's Theorem
we have r " (x" ) = 0. Therefore, by (39.30), we have:
0 1
Xm X
@f
(x" ) 2 (x"z x ^z ) 2N" @ gi (x" )
@gi "
(x ) + ~ j (x" ) @hj (x" )A = 0 (39.36)
h
@xz @xz @xz
i=1 j2J\A(^
x)

for each z = 1; :::; n. Set:


m
X X 2 1
c" = 1 + (2N" gi (x" ))2 + ~ j (x" )
2N" h ; "
0 =
c"
i=1 j2J\A(^
x)

2N" gi (x" ) ~ j (x" )


2N" h
" "
i = 8i 2 I ; j = 8j 2 J \ A (^
x)
c" c"
"
j =0 8j 2
= A (^
x)

so that (39.34) is obtained by dividing (39.36) by c" . Observe that "i 0 for each j 2 J
P " 2 P 2
" " "
and that i2I ( i ) + j2J "j = 1, i.e., " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4

Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; ^"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass' Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k
1192 CHAPTER 39. INEQUALITY CONSTRAINTS

convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that

nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj

By Fact 2, for each "nk there exists xnk 2 B"nk (^


x) for which (39.34) holds, i.e.,

nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)

for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxn k x^k < "nk ! 0 and hence, for each z = 1; :::; n,

@f X @gi X @hj
0 (^
x) i (^
x) j (x) (39.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi nk X nk @hj nk A
= lim @ 0 xk 2 (xnk x
^z ) i (x ) j (x )
k @xz @xz @xz
i2I j2J\A(^
x)

= 0:

On the other hand, 0 6= 0. Indeed, if it were 0 = 0, then by (39.37) it follows that


X @gi X @hj
i (^
x) + j (^
x) = 0 8z = 1; :::; n
@xz @xz
i2I j2J\A(^
x)

The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.

In conclusion, if we set ^ i = i = 0 for each i 2 I and ^ = = 0 for each j 2 J, (39.37) j j


implies (39.8).
Chapter 40

General constraints

40.1 A general concave problem


The choice set of the optimization problem (39.4) of the previous chapter is identi ed by a
nite number of equality and inequality constraints expressed through suitable functions g
and h. In general, however, we may also require solutions to belong to a set X that is not
necessarily identi ed through a nite number of functional constraints.1 We thus have the
following optimization problem:

max f (x) (40.1)


x
8
>
> g (x) = bi 8i 2 I
< i
sub hj (x) cj 8j 2 J
>
>
:
x2X

where X is a subset of A and the other elements are as in the optimization problem (39.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (39.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Besides its own interest, formulation (40.1) may be also useful when there are conditions
on the sign or on the value of the choice variables xi . The classic example is the non-negativity
condition of the xi , which are best expressed as a constraint x 2 Rn+ rather than through n
inequalities xi 0. Here a constraint of the form x 2 X simpli es the exposition.

In this chapter we address the general optimization problem (40.1). If X is open, the
solution techniques of Section 39.2 can be easily adapted by restricting the analysis on X
itself, which then becomes the ad hoc domain of the objective function f . Matters are more
interesting when X is not open, in particular when it is a closed and convex set. This is
the case that we will consider. We will thus focus on the concave optimization problems
studied in Section 39.4, widely used in applications. Consequently, throughout the chapter
we assume that:
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).

1193
1194 CHAPTER 40. GENERAL CONSTRAINTS

(i) X is a closed and convex subset of an open convex set A;


(ii) f : A Rn ! R is a concave di erentiable objective function;
(iii) gi : Rn ! R is an a ne function for each i 2 I;
(iv) hj : Rn ! R is a convex di erentiable function for each j 2 J.

To ease matters, we thus de ne the functions gi and hj on the entire space Rn .

40.2 Black box optimization


In canonical form, the optimization problem (40.1) has the form

max f (x) sub x 2 C


x

where the choice set is

C = fx 2 X : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (40.2)

The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci c problem at hand, with its potentially distracting details. For this reason,
we will consider the following general optimization problem:

max f (x) sub x 2 C (40.3)


x
where C is a generic closed and convex choice set { not necessarily of the form (40.2) {
that, for the moment, we treat as a black box. Throughout this section we assume that f is
continuously di erentiable on an open convex set that contains C. The simplest case when
this assumption holds is when f is continuously di erentiable on its entire domain A.

40.2.1 Variational inequalities


We begin the analysis of the black box problem (40.3) with the simple scalar case

max f (x) sub x 2 [a; b] (40.4)


x

where a; b 2 R. Suppose that x


^ 2 [a; b] is a solution. It is easy to see that we can have two
cases:

(i) x ^ is an interior point; in this case, f 0 (^


^ 2 (a; b), i.e., x x) = 0.
(ii) x ^ is a boundary point; in this case, f 0 (^
^ 2 fa; bg, i.e., x x) ^ = a and f 0 (^
0 if x x) 0 if
x
^ = b.

The next lemma gives a simple and elegant way to unify these two cases.

Proposition 1739 If x
^ 2 [a; b] is solution of the optimization problem (40.4), then

f 0 (^
x) (x x
^) 0 8x 2 [a; b] (40.5)

The converse holds if f is concave.


40.2. BLACK BOX OPTIMIZATION 1195

The proof of this result rests on the following lemma.

Lemma 1740 Condition (40.5) is equivalent to f 0 (^ ^ 2 (a; b), to f 0 (^


x) = 0 if x x) 0 if
x 0
^ = a, and to f (^
x) 0 if x
^ = b.

Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (40.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (40.5) holds. Vice versa, suppose that
(40.5) holds. Setting x = a, we have (a x ^) < 0 and so (40.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (40.5) implies f (^ 0 x) 0. In
conclusion, x ^ 2 (a; b) implies f 0 (^
x) = 0.

^ = a. We prove that (40.5) is equivalent to f 0 (a)


(ii) Let x 0. Let f 0 (a) 0. Since
0
(x a) 0 for each x 2 [a; b], it follows that f (a) (x a) 0 for each x 2 [a; b], and hence
(40.5) holds. Vice versa, suppose that (40.5) holds. By taking x 2 (a; b], we have (x a) > 0
and so (40.5) implies f 0 (a) 0.

(iii) Let x ^ = b. We prove that (40.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (40.5) holds.
Vice versa, suppose that (40.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (40.5)
implies f 0 (b) 0.

Proof of Proposition 1739 In view of Lemma 1740, it only remains to prove that (40.5)
becomes a su cient condition when f is concave. Suppose, therefore, that f is concave and
that x
^ 2 [a; b] is such that (40.5) holds. We prove that this implies that x ^ is solution of
problem (40.4). Indeed, by (31.10) we have f (x) f (^ x) + f 0 (^
x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (40.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x^ solves the optimization
problem (40.4).

The inequality (40.5) that x


^ satis es is an example of a variational inequality. Besides
unifying the two cases, this variational inequality is interesting because when f is concave
it provides a necessary and su cient condition for a point to be solution of the optimiza-
tion problem. Even more interesting is the fact that this characterization can be naturally
extended to the multivariable case.

Theorem 1741 (Stampacchia) If x ^ 2 C is solution of the optimization problem (40.3),


then it satis es the variational inequality

rf (^
x) (x x
^) 0 8x 2 C (40.6)

The converse holds if f is concave.

As in the scalar case, the variational inequality uni es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (40.6) reduces to the classic rst-order condition rf (^ x) = 0 of Fermat's Theorem.
1196 CHAPTER 40. GENERAL CONSTRAINTS

Proof Let x ^ 2 C be solution of the optimization problem (40.3), i.e., f (^


x) f (x) for each
x 2 C. Given x 2 C, set zt = x ^ + t (x x^) for t 2 [0; 1]. Since C is convex, zt 2 C for each
t 2 [0; 1]. De ne : [0; 1] ! R by (t) = f (zt ). Since f is di erentiable at x ^, we have

0 (t) (0) f (^
x + t (x x ^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0 + t
df (^
x) (t (x x ^)) + o (kt (x x ^)k)
= lim
t!0 + t
o (t kx x ^k)
= df (^
x) (x x ^) + lim = df (^
x) (x x^) = rf (^
x) (x x
^)
t!0 + t

For each t 2 [0; 1] we have (0) = f (^x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
maximizer at t = 0. It follows that 0+ (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (31.35), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (40.6) implies f (x) f (^x) for each x 2 C.

For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0 for each x 2 C. For interior solutions, instead, the condition
x) = 0 is the same in both maximization and minimization problems.2
rf (^

Example 1742 The unique solution of problem

min x2 sub x 0
x

is, clearly, the origin. Indeed, here the variational inequality

rf (^
x) (x x
^) = 2^
x (x x
^) 0 8x 0

is satis ed if and only if x


^ = 0. N

40.2.2 A general rst-order condition


The normal cone NC (x) of a convex set C with respect to a point x 2 C is given by

NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg

Next we provide a couple of important properties of NC (x). In particular, (i) ensures that
normal cones are, indeed, cones and (ii) shows that they are non-trivial only for boundary
points.

Lemma 1743 (i) NC (x) is a closed and convex cone;

(ii) NC (x) = f0g if and only if x is an interior point of C.


2
The unifying power of variational inequalities in optimization is the outcome of a few works of Guido
Stampacchia in the early 1960s. For an overview, see Kinderlehrer and Stampacchia (1980).
40.2. BLACK BOX OPTIMIZATION 1197

Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 859, NC (x) is a convex cone. (ii) We only prove
the \if" part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.

To see the importance of normal cones, note that condition (40.6) can be written as:

rf (^
x) 2 NC (^
x) (40.7)

Therefore, x
^ solves the optimization problem (40.3) only if the gradient rf (^
x) belongs to the
normal cone of C with respect to x ^. This way of writing condition (40.6) is useful because,
given a set C, if we can describe the form of the normal cone { something that does not
require any knowledge of the objective function f { we can then have a sense of which form
takes the \ rst-order condition" for the optimization problems that have C as a choice set.
In other words, (40.7) can be seen as a general rst-order condition in which we can dis-
tinguish the part, NC (^ x), determined by the constraint C, and the part, rf (^
x), determined
by the objective function. This distinction between the roles of the objective function and
of the constraint is illuminating.3 For this reason, we report it formally.

Corollary 1744 If x ^ 2 C is solution of the optimization problem (40.3), then it satis es


the rst-order condition
rf (^x) 2 NC (^
x)
The converse holds if f is concave.

The next result characterizes the normal cone for convex cones.

Proposition 1745 If C is a convex cone and x 2 C, then

NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg

If, in addition, C is a vector subspace, then NC (x) = C ? for every x 2 C.

Proof Let y 2 NC (x) : Then y (x x) 0 for all x 2 C: As 0 2 C, we have y (0 x) 0.


Hence y x 0. On the other hand, we can write y x = y (2x x) 0. It follows that
y x = 0. In turn, y x = y (x x) 0 for each x 2 C. Conversely, if y satis es the
two conditions y x = 0 and y x 0 for each x 2 C, then y (x x) = y x y x 0,
and so y 2 NC (x). Suppose now, in addition, that C is a vector subspace. A subspace
C is a cone such that x 2 C implies x 2 C. Hence, the rst part of the proof yields
NC (x) = fy 2 Rn : y x = 0 and y x = 0 8x 2 Cg. Since x 2 C, we then have NC (x) =
fy 2 Rn : y x = 0 8x 2 Cg = C ? .
3
For a thorough account of this important viewpoint, we refer readers to Rockafellar (1993).
1198 CHAPTER 40. GENERAL CONSTRAINTS

Example 1746 If C = Rn+ ,

NC (x) = fy 2 Rn : yi xi = 0 and yi 0 8i = 1; :::; ng (40.8)

Indeed, we have yi 0 for each i since yi = y ei 0. Hence, yi xi 0 for each i, which in


turn implies yi xi = 0 for each i because y x = 0. N

Given a closed and convex cone C, a point x


^ thus satis es the rst-order condition (40.7)
when (
rf (^x) x^=0
(40.9)
rf (^x) x 0 8x 2 C
The rst-order condition is thus easier to check on cones. Even more so in the important
special case C = Rn+ , when from (40.8) it follows that condition (40.9) reduces to the following
n equalities and n inequalities,
8
> ^i @f@x(^xi ) = 0 8i = 1; :::; n
< x
>
: @f (^
x)
@xi 0 8i = 1; :::; n

We can also characterize the normal cones of the simplex


( n
)
X
n
n 1 = x 2 R+ : xi = 1
k=1

another all-important closed and convex set. To this end, given x 2 n 1 set

I (x) = fy 2 Rn : yi = 1 if i 2 P (x) and yi 1 if i 2


= P (x)g

where P (x) = fi : xi > 0g.

Proposition 1747 We have N n 1 (x) = f y 2 Rn : y 2 I (x) and 0g.

The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such a set.

Example 1748 If x = (1=3; 0; 2=3) 2 2, we have I (x) = f(1; y2 ; 1) : y2 1g and N 2 (x) =


f( ; y2 ; ) : y2 1 and 0g. N

In view of this characterization, a point x


^2 n 1 satis es the rst-order condition (40.7)
if and only if there is a scalar ^ 0 such that
@f (^
x) @f (^
x)
=^ if x
^i > 0 ; ^ if x
^i = 0
@xi @xi
that is, when 8 @f (^
x)
>
< @xi ^ 8i = 1; :::; n
(40.10)
>
: @f (^
x)
@xi ^ x
^i = 0 8i = 1; :::; n
40.2. BLACK BOX OPTIMIZATION 1199

Proof of Proposition 1747 Suppose that P (x) is not a singleton and let i; j 2 P (x).
Clearly, 0 < xi ; xj < 1. Consider the points x" 2 Rn having coordinates x"i = xi + ",
x"j = xj ", and x"k = xk for all k 6= i and k 6= j; while the parameter " runs over
Pn [ ""0 ; "0 ]
with "0 > 0 su ciently small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1
and so x" 2 n 1 . Let y 2 N n 1 (x). By de nition, y (x" x) 0 for every " 2 [ "0 ; "0 ].
Namely, "yi "yj = " (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for
all i 2 P (x). That is, the values of y must be constant on P (x). This is trivially true when
P (x) is singleton. Let now j 2= P (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0
for each k 6= j: If y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0
k6=j k2P (x) k2P (x)

Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
sion. Let y 2 Rn be such that, for some 0, we have yi = for all i 2 P (x) and yk
for each k 2
= P (x). If x 2 n 1 , then

n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2P (x) i2P
= (x)
0 1
X X X X
= (xi xi ) + yi xi = @ xi A + yi xi
i2P (x) i2P
= (x) i2P (x) i2P
= (x)
0 1
X X
@ xi A + xi = 0
i2P (x) i2P
= (x)

Hence y 2 N n 1 (x).

40.2.3 Divide et impera


Often the choice set C may be written as an intersection

C = C1 \ \ Cn

A natural question is whether the n relaxed optimization problems that correspond to the
larger choice sets Ci can be combined to inform on the original optimization problem. The
next result is key, as it provides a condition under which holds an \intersection rule" for
normal cones. It involves the sum
n
( n )
X X
NCi (x) = yi : yi 2 NCi (x) 8i = 1; :::; n
i=1 i=1

of the normal cones.4


4
Cf. Section 21.4.
1200 CHAPTER 40. GENERAL CONSTRAINTS

Proposition 1749 Let C = C1 \ \ Cn , with each Ci closed and convex. Then,


n
X
NCi (x) NC (x) 8x 2 C
i=1

Equality holds if C satis es Slater's condition

int C1 \ \ int Cn 6= ;

where the set Ci itself can replace its interior int Ci if it is a ne.
P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that Slater's condition implies the equality.

Example 1750 Let A be an m n matrix and b 2 Rn .


(i) Let C1 = fx 2 Rn : Ax bg and C2 = Rn+ . We have int C1 = fx 2 Rn : Ax bg and
int C2 = Rn++ . The set C = C1 \ C2 satis es Slater's condition when int C1 \ int C2 6= ;,
that is, when there exists x 2 Rn++ such that Ax b. In this case, by the last proposition
NC1 (x) + NC2 (x) = NC (x).
(ii) Let C1 = fx 2 Rn : Ax = bg and C2 = Rn+ . Since C1 is a ne, the set C = C1 \ C2
satis es Slater's condition when C1 \ int C2 6= ;, that is, when there exists x 2 Rn++ such
that Ax = b. Again, in this case by the last proposition we have NC1 (x) + NC2 (x) = NC (x).
N

In words, under Slater's condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis es the rst-order condition (40.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that
( P
rf (^ x) = ni=1 y^i
y^i 2 NCi (^
x) 8i = 1; :::; n

A familiar \multipliers" format emerges. The next section will show how the Kuhn-Tucker's
Theorem ts in this general framework.

40.3 Opening the black box


We can now get out of the black box and extend Kuhn-Tucker's Theorem to the general
concave optimization problem (40.1). Its choice set (40.2) is
\ \
C=X\ Ci \ Cj
i2I j2J

where Ci = (gi = bi ) and Cj = (hj cj ).

Lemma 1751 The set C satis es Slater's condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
40.3. OPENING THE BLACK BOX 1201
\ \
Proof The level sets Ci are a ne (Proposition 828). Since x 2 X \ Ci \ int Cj ,
i2I j2J
this intersection is non-empty and so C satis es Slater's condition.

In what follows we thus assume the existence of such x.5 In view of Proposition 1749, it
now becomes key to characterize the normal cones of the sets Ci and Cj .

Lemma 1752 (i) For each x 2 Ci ,

NCi (x) = f rg (x) : 2 Rg 8x 2 Ci

(ii) For each x 2 Cj ,


8
>
> f rhj (x) : 0g if hj (x) = cj
<
NCj (x) = f0g if hj (x) < cj
>
>
:
; if hj (x) > cj

Proof We only prove (ii) when hj (x) = cj . Assume cj = 0 (otherwise, it is enough to


consider the convex function hj cj ). Let hj (x) = 0. We have f rhj (x) : 0g = NC (x).
Let y 2 NC (x). Since hj (x) = 0, we have hj (x) hj (x) + y (x x) for all x 2 C, and so
y = rhj (x) since hj is di erentiable at x (cf. Theorem 1513). Conversely, if y = rhj (x)
for some 0, then 0 hj (x) y (x x) since hj (x) = 0 and x 2 C. Hence,
rhj (x) 2 NC (x). We omit the cases hj (x) < 0 and hj (x) > 0.

Along with Proposition 1749, this lemma implies that NC (x) is


8 9
< X X =
+ i rgi (^
x) + j rhj (x) : 2 NX (x) , i 2 R 8i 2 I, j 0 8j 2 A (x)
: ;
i2I j2A(x)

where A (x) is the collection of the binding inequality constraints de ned in (39.7). Since in
this concave problem the rst-order condition (40.7) is a necessary and su cient optimality
condition, we can say that x ^ 2 C solves the optimization problem (40.1) if and only if there
^ jJj
exists a triple of vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
8 P P
x) = ^ + i2I ^ i rgi (^
< rf (^ x) + j2J ^ j rhj (^
x)
(40.11)
:
^ j (c hj (^
x)) = 0 8j 2 J

Indeed, as we noted in Lemma 1725, the second condition amounts to require ^ j = 0 for
each j 2
= A (^
x).

To sum up, under a Slater's condition we get back the Kuhn-Tucker's conditions (39.8)
and (39.9), suitably modi ed to cope with the new constraint x 2 X. We leave to the reader
the formulation of these conditions via a Lagrangian function.
5
This also ensures that the problem is well posed in the sense of De nition 1721.
1202 CHAPTER 40. GENERAL CONSTRAINTS

Example 1753 Let X = Rn+ . By (40.8), ^ k x


^k = 0 and ^ k 0 for each k = 1; :::; n. By
(40.11), we have X X
^ = rf (^x) ^ i rgi (^
x) ^ j rhj (^
x) (40.12)
i2I j2J

So, condition (40.11) can be equivalently written (with unzipped gradients) as:
8 @f (^x) P
> ^ @gi (^x) P @hj (^
x)
>
> @xk i2I i @xk + j2J ^ j @xk 8k = 1; :::; n
>
>
<
^ j (c hj (^
x)) = 0 8j 2 J
>
>
>
>
: @f (^x) P ^ i @gi (^x) P
> @hj (^x)
@xk i2I @xk j2J ^ j @xk x
^k = 0 8k = 1; :::; n

In this formulation, we can omit ^ . N

Example 1754 Let X = n 1 . By (40.10), ^ 2 NX (^ x) if and only if there is some ^ 0


such that ^ k ^ and (^ ^ ) x ^k = 0 for every k = 1; :::; n. In view of (40.12), we can
say that x ^ 2 C solves the optimization problem (40.1) if and only if there exists a triple
^ jJj
( ; ^ ; ^ ) 2 RjIj R+ R+ such that
8 @f (^x) P
> ^ @gi (^x) P @hj (^
x)
>
> @xk i2I i @xk j2J ^ j @xk ^ 8k = 1; :::; n
>
>
<
^ j (c hj (^x)) = 0 8j 2 J
>
>
>
>
: @f (^x) P ^ i @gi (^x) P
> @hj (^
x)
@xk i2I @xk j2J ^ j @xk ^ x^k = 0 8k = 1; :::; n

In this formulation, we replace the vector ^ with the scalar ^ . N

Variational inequalities provided a third approach to theorems a la Lagrange/Kuhn-


Tucker. Indeed, Lagrange's Theorem was proved using the Implicit Function Theorem
(Lemma 1707) and Kuhn-Tucker's Theorem using a penalization technique (Lemma 1724).
Di erent techniques may require di erent regularity conditions. For instance, Slater's con-
dition comes up in using variational inequality, while a linear independence condition was
used in the previous chapter (De nition 1723). In general, they provide di erent angles on
the multipliers format. A nal, deep and surprising, game theoretic angle will be discussed
later in the book (Section 42.6.2).

40.4 Dulcis in fundo: Multivariable Bolzano Theorem


Normal cones permit to formulate an interesting multivariable version of the Bolzano Theo-
rem that complements the Poincare-Miranda Theorem.

Theorem 1755 (Multivariable Bolzano) Let f : K ! Rn be a continuous operator de-


ned on a compact and convex subset of Rn . If, for each x 2 @K,

y f (x) 0 8y 2 NK (x) (40.13)

then there exists c 2 K such that f (c) = 0.


40.4. DULCIS IN FUNDO: MULTIVARIABLE BOLZANO THEOREM 1203

By considering f , it is easy to see the result continues to hold when in (40.13) we replace
with . The proof is an application of the Brouwer Theorem. On the other hand, it can
be shown6 that the Multivariable Bolzano Theorem implies the Poincare-Miranda Theorem
and so, through it, the Brouwer Theorem. These three theorems are thus equivalent.

Proof Let PK be the projection of K (Section 31.6). Let x 2 Rn . By (31.48), for each y 2 C
it holds
(x PK (x)) (y PK (x)) 0
and so x PK (x) 2 NK (PK (x)). In view of (40.13), we then have

(x PK (x)) f (x) 0 8x 2 @K (40.14)

With this, de ne ' : Rn ! Rn by

' = PK f PK

This function is bounded. Indeed, since the projection operator is continuous (Corollary
1512) by the Weierstrass Theorem we can set = maxx2K jPK (x)j and = maxx2K jf (x)j
and so, for each x 2 Rn , it holds

j' (x)j = jPK (x) f (PK (x))j jPK (x)j + jf (PK (x))j +

Thus, ' is a self-map on B + (0). We can thus write ' : B + (0) ! B + (0). By the
Brouwer Theorem, there exists a xed point c 2 B + (0) such that

c = ' (c) = PK (c) f (PK (c))

If c 2 K, then PK (c) = 0 and so c = f (c). To complete the proof, it thus remains


to show that c 2 K. Suppose, per contra, that c 2= K. Thus, c PK (c) 6= 0. Since
c PK (c) 2 NK (PK (c)), we then have NK (PK (c)) 6= 0. By Lemma 1743, PK (c) 2 @K.
We thus reach the contradiction

0 < (c PK (c)) (c PK (c)) = (c PK (c)) f (PK (c)) 0

where the last inequality follows from (40.14). We conclude that c 2 K, as desired.

When K is the closed ball B" (0) we get a direct generalization of the scalar Bolzano
Theorem, in which condition (40.13) takes a sharp version.

Corollary 1756 Let f : B" (0) ! Rn be a continuous operator. If

8x 2 @B" (0); x f (x) 0 or 8x 2 @B" (0); x f (x) 0 (40.15)

then there exists c 2 B" (0) such that f (c) = 0.

In the scalar case [ "; "], condition (40.15) amounts to require f ( ") 0 f (") or
f ( ") 0 f ("), that is, f ( ") f (") 0. In turn, this easily implies the scalar version of
Bolzano's Theorem (Theorem 568).
This corollary is an immediate consequence of the Multivariable Bolzano Theorem thanks
to the following characterization of the normal cone of B" (0).
6
See, e.g., Mawhin (2020), from which we also take the next proof.
1204 CHAPTER 40. GENERAL CONSTRAINTS

Proposition 1757 We have

NB" (0) (x) = f x 2 Rn : 0g

for each x 2 @B" (0).

Proof Let x 2 @B" (0). By the Cauchy-Schwarz inequality,

x x jx xj kxk kxk kxk kxk = x x 8x 2 B" (0)

Thus, for each 0 we have

x (x x) = [x x x x] 0 8x 2 B" (0)

and so x 2 NB" (0) (x). This proves that f x 2 Rn : 0g NB" (0) (x). As to the converse
inclusion, let 0 6= y 2 NB" (0) (x). Since 0 2 B" (0), we have y x 0. On the other hand,
since "y= kyk 2 B" (0) we have

y y y y x
y " x 0 () " y x () " kyk y x () "
kyk kyk kyk

We conclude that
y x
" (40.16)
kyk
By the Cauchy-Schwarz inequality,

0 y x kyk kxk = kyk "

that is,
y x
"
kyk
Along with (40.16), this implies that

y x = kyk kxk

The vectors y and x are thus collinear (cf. Theorem 109). As both vectors are di erent from
0, this means that there exists 0 6= 2 R such that y = x. As 0 y x = (x x) and
x x > 0, we conclude that > 0. Hence, y 2 f x 2 Rn : 0g, as desired.
Chapter 41

Parametric optimization problems

41.1 De nition
Given a set Rm of parameters and an all-inclusive choice space A Rn , suppose that
each value of the parameter vector determines a choice (or feasible) set ' ( ) A. Choice
sets are thus identi ed, as the parameter varies, by a feasibility correspondence ' : A.
An objective function f : A ! R, de ned over pairs (a; ) of choices a and parameters
, has to be optimized over the feasible sets determined by the correspondence ' : A.
Jointly, ' and f thus determine an optimization problem in parametric form:

max f (x; ) sub x 2 ' ( ) (41.1)


x

When f ( ; ) is, for every 2 , concave (quasi-concave) on the convex set A and ' is
convex-valued, this problem is called concave (quasi-concave).

A vector x
^ 2 ' ( ) is a solution for 2 if it is an optimal choice given , that is,

f (^
x; ) f (x; ) 8x 2 ' ( )

The solution correspondence : S A of the parametric optimization problem (41.1) is


de ned by
( ) = arg max f (x; )
x2'( )

It associates to each the corresponding solution set, i.e., the set of optimal choices. Its
domain S is the solution domain, that is, the collection of all s for which problem
(41.1) admits a solution. If such solution is unique at all 2 S, then is single-valued, that
is, it is a function. In this case : S ! A is a solution function.

The (optimal ) value function v : S ! R of the parametric optimization problem is de ned


by
v ( ) = max ff (x; ) : x 2 ' ( )g (41.2)
that is, v ( ) = f (^x; ) for every x
^ 2 ( ). The value function gives, for each 2 S, the
maximum value of the objective function on the set ' ( ). Since this value is attained at the
solutions x^, the value function is well-de ned only on the solution domain S.

1205
1206 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Example 1758 The parametric optimization problem with equality and inequality con-
straints has the form

max f (x; ) (41.3)


x
sub i (x; )=0 8i 2 I
j (x; ) 0 8j 2 J

where i : A Rn Rm ! R for every i 2 I, j :A Rn Rm ! R for every j 2 J,


and = ( 1 ; :::; m
m ) 2 R . Here

'( ) = x 2 A : i (x; )=0 8i 2 I, j (x; ) 0 8j 2 J

If f does not depend on the parameter, and if i (x; ) = gi (x) bi for every i 2 I and
j (x; ) = hj (x) cj for every j 2 J (so that m = jIj + jJj), we get back to the familiar
problem (39.4) studied in Chapter 39, that is,

max f (x)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

In this case, if we set b = b1 ; :::; bjIj 2 RjIj and c = c1 ; :::; cjJj 2 RjJj , the parameter set
consists of all = (b; c) 2 RjIj RjJj . N

Example 1759 The consumer problem

max u (x) sub x 2 B (p; w)


x

is a parametric optimization problem (Section 22.1.4). The set A is the consumption set.
The space Rn+1+ of all price and income pairs is the parameter set , with generic element
= (p; w). The budget correspondence B : Rn+1 + Rn+ is the feasibility correspondence
and the utility function u : A ! R is the objective function (interestingly, in this important
example the objective function does not depend on the parameter).
Let S be the set of all parameters (p; w) for which the consumer problem has
solution (i.e., an optimal bundle). The demand correspondence D : S Rn+ is the solution
n
correspondence, which becomes a demand function D : S ! R+ when optimal bundles are
unique. Finally, the indirect utility function v : S ! R is the value function. N

Parametric optimization problems are pervasive in economics because they permit to


carry out the all-important comparative statics exercises that study how, within a given
optimization problem, changes in the parameters a ect optimal choices and their values.
The solution correspondence and the value function are key for these exercises because they
describe how optimal choices and their value vary as parameters vary. For instance, in the
consumer problem the demand correspondence and the indirect utility function describe,
respectively, how the optimal bundles and their values are a ected by changes in prices and
income.

Before starting the analysis of parametric optimization problems we establish a basic


stability result that addresses a key \continuity" question: do changes in the objective
41.2. AN ILLUSTRATION 1207

function smoothly translate in changes in the value functions? To address this question,
in the parametric optimization problem (41.1) we consider two di erent objective functions
f; g : A ! R and, to ease notation, we denote by vf : Sf ! R and vg : Sg ! R their
value functions.

Proposition 1760 Let 2 Sf \ Sg . For each " > 0, if jf (a; ) g (a; )j " for all a 2 A
then jvf ( ) vg ( )j ".

Fortunately, the translation is thus smooth: objective functions that, action by action,
are close induce value functions that, in turn, are close (e.g., close utility functions induce
close indirect utility functions). In terms of value attainment, nothing dramatic happens,
regardless of what happens to the solutions (about them, this result is silent).

Proof Fix 2 Sf \ Sg . Let a ^f 2 arg maxC f ( ; ) and a


^g 2 arg maxC g ( ; ), so that
vf ( ) = f (^
af ; ) and vg ( ) = f (^
ag ; ). Then,

f (^
af ; ) + " f (^
ag ; ) + " g (^
ag ; ) f (^
ag ; ) "

So, jvf ( ) vg ( )j = jf (^
af ; ) g (^
ag ; )j ", as desired.

Later in the chapter a much deeper stability result, the Maximum Theorem, will address
other fundamental continuity questions. Unlike this preliminary result, it will be able to say
something about solutions.

41.2 An illustration
Given an element 0 of the simplex n 1, de ne the parametric objective function
f : n 1 Rn ! R by
n
X n
X xi
f (x; ) = i xi + xi log
i=1 i=1 i

with the convention 0 log 0 = 0.1 We consider the parametric optimization problem

min f (x; ) sub x 2 n 1 (41.4)


x

The objective function is continuous and strictly convex in x since it can be written as
n
X n
X n
X
f (x; ) = i xi + xi log xi xi log i
i=1 i=1 i=1
P
and the entropy ni=1 xi log xi is continuous and strictly convex (Example 1495). So, problem
(41.4) is concave and, for each 2 Rn , has a unique solution. It thus features a solution
function, which the next result identi es.
1
Recall from Example 1495 the meaning of this convention.
1208 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Proposition 1761 The solution function : Rn ! n 1 of problem (41.4) is given by

e i
i
( ) = Pn 8i = 1; :::; n (41.5)
i=1 e
i
i

and its value function v : Rn ! R is given by

n
X
v( ) = log e i
i (41.6)
i=1

In particular,

rv ( ) = ( ) 8 2 Rn (41.7)

Later in the chapter, we will study in some detail these solution and value functions
(Section 41.9).

Pn
Proof Fix 0 y 2 n 1 and de ne hy : n 1 ! R by hy (x) = i=1 xi log (xi =yi ), with
the convention 0 log 0 = 0. By the log-sum inequality, we have

n n
! Pn
X xi X xi
hy (x) = xi log > xi log Pi=1
n = 0 = hy (y) 8y 6= x 2 n 1 (41.8)
yi i=1 yi
i=1 i=1

We have

n
X n
X
e i
i e i
i e i
f (^
x; ) = i Pn + Pn log Pn
i=1 e i=1 e i=1 e
i i i
i=1 i i=1 i i
n n n
!!
1 X X X
= Pn ie
i
i + e i
i i log e i
i
i=1 e
i
i i=1 i=1 i=1
n n n
! n
!
1 X X X X
= Pn i ie
i
ie
i
i e i
i log e i
i
i=1 e
i
i i=1 i=1 i=1 i=1
n
! n n
1 X X X
= Pn e i
i log e i
i = log e i
i
i=1 e
i
i i=1 i=1 i=1
41.3. BASIC PROPERTIES 1209

For each x 2 n 1, we have


n
X n
X n
X n
X
xi xi x
^i
f (x; ) = i xi + xi log = i xi + xi log
i=1 i=1 i i=1 i=1 ix^i
Xn Xn n
X
xi x
^i
= i xi + xi log + xi log
x
^i i
i=1 i=1 i=1
Xn
1 e i
i
= + log Pn xi + hx^ (x)
i=1 e
i
i=1 i i

Xn
e i
= i + log Pn xi + hx^ (x)
i=1 e
i
i=1 i
n n
!!
X X
= i + i log e i
i xi + hx^ (x)
i=1 i=1
n
X
= log e i
i + hx^ (x)
i=1

That is, we proved that


n
X
f (x; ) = log e i
i + hx^ (x) 8x 2 n 1
i=1

By (41.8), hx^ (x) > hx^ (^


x) = 0 for all x
^ 6= x 2 n 1. Thus,
n
X n
X
f (x; ) = log e i
i + hx^ (x) > log e i
i = f (^
x; ) 8x 2 n 1
i=1 i=1

We conclude that (41.5) is the solution function of problem (41.4) and that (41.6) is its value
function. Finally, (41.7) is readily checked.

41.3 Basic properties


The existence theorems of Weierstrass and Tonelli ensure the existence of solutions. For
instance, a straightforward consequence of Weierstrass' Theorem is that 0 2 S if ' ( 0 ) is
compact and f ( ; 0 ) : A ! R is continuous. This leads to the following important result.
Proposition 1762 If f is continuous in x and ' is compact-valued, then is viable (i.e.,
S = ) compact-valued.
Proof By Weierstrass' Theorem, we have S = and so is viable. The set ( ) is compact
for every 2 . Indeed, let f^ xn g ( ) be such that x^n ! x 2 Rn . Since ( ) ' ( )
and the latter set is compact (hence closed), we have that x 2 ' ( ). Since f (^ xn ; ) =
maxx2'( ) f (x; ) for every n, the continuity of f in x implies f (x; ) = limn!1 f (^
xn ; ) =
maxx2'( ) f (x; ), so x 2 ( ). This proves that ( ) is closed. We conclude that ( ) is
compact because it is a closed subset of the compact set ' ( ).

We now turn to convexity properties. In the next three results we assume that the set A
is convex and, to ease matters, that is viable.
1210 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Proposition 1763 The solution correspondence is convex-valued if f is quasi-concave in x


and ' is convex-valued.

Proof Given any 2 , let us show that ( ) is convex. Let x


^1 ; x
^2 2 ( ) and 2 [0; 1].
Since f is quasi-concave in x,

f (^
x1 ; ) f( x
^1 + (1 )x
^2 ; ) min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( )

and so f ( x
^1 + (1 )x
^2 ; ) = v ( ), i.e., x
^1 + (1 )x
^2 2 ( ).

The convexity of the solution set means inter alia that, when non-empty, such a set is
either a singleton or an in nite set. That is, either the solution is unique or there is an
in nite number of them. Next we give the most important su cient condition that ensures
uniqueness.

Proposition 1764 The solution correspondence is single-valued if f is strictly quasi-concave


in x and ' is convex-valued.

Proof Let us prove that is single-valued. Let 2 and x ^1 ; x


^2 2 ( ). We want to show
that x
^1 = x
^2 . Suppose, by contradiction, that x ^1 6= x
^2 . By the strict quasi-concavity of f in
x,
1 1
f x
^1 + x^2 ; > min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^x2 ; ) = v ( ) ;
2 2
a contradiction. Hence, x
^1 = x
^2 , as desired.

By strengthening the hypothesis of Proposition 1763 from quasi-concavity to strict quasi-


concavity, the solution set becomes a singleton. In this case we have a solution function and
not just a solution correspondence. This greatly simpli es comparative statics exercises that
study how solutions change as the values of parameters vary. For this reason, in applications
strict concavity (and so strict quasi-concavity) is often assumed, typically by requiring that
the second derivative be decreasing (Corollary 1436). By now, we have remarked several
times this key fact: hopefully, repetita iuvant (sed nauseant).

Turn now to value functions. In the following result we assume the convexity of the graph
of '. As we already remarked, this is a substantially stronger assumption than the convexity
of the images ' (x).

Proposition 1765 The value function v is quasi-concave (resp., concave) if f is quasi-


concave (resp., concave) and the graph of ' is convex.

Proof Let 1 ; 2 2 and 2 [0; 1]. Let x


^1 2 ( 1 ) and x ^2 2 ( 2 ). Since ' has convex
graph, x
^1 + (1 )x
^2 2 ' ( 1 + (1 ) 2 ). Hence, the quasi-concavity of f implies:

v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
min ff (^
x1 ; 1) ; f (^
x2 ; 2 )g = min fv ( 1 ) ; v ( 2 )g
41.4. MAXIMUM THEOREM 1211

So, v is quasi-concave. If f is concave, we have:

v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
f (^
x1 ; 1) + (1 ) f (^
x2 ; 2) = v ( 1 ) + (1 ) v ( 2)

So, v is concave.

A similar argument shows that v is strictly quasi-concave (resp., concave) if f is strictly


quasi-concave (resp., concave).

Example 1766 In the consumer problem, the graph of the budget correspondence is convex
if the consumption set is convex. Indeed, let ((p; w) ; x) ; ((p0 ; w0 ) ; x0 ) 2 Gr B and let 2
[0; 1]. Then, p ( x + (1 ) x0 ) w+(1 ) w0 , so the set Gr B is convex. By Proposition
1763, the demand correspondence is convex-valued if the utility function is quasi-concave,
while by Proposition 1765 the indirect utility is quasi-concave (concave) if the utility function
is quasi-concave (concave). N

41.4 Maximum Theorem


How do solutions and maximum values vary as parameters change? Are such changes abrupt
or gentle? The stability of an optimization problem under parameters' changes is a key issue
in applications, where it is typically desirable that changes in parameters nicely a ect, in a
\continuous" manner, solutions and maximum values. Formally, this amounts to the upper
hemicontinuity of the solution correspondence and the continuity of the value function.
In this section we address this fundamental stability question of parametric optimization
problems through the celebrated Berge's Maximum Theorem.2

Theorem 1767 (Berge) Consider a parametric optimization problem

max f (x; ) sub x 2 ' ( )


x

If ' is bounded and continuous and f is continuous, then is viable, bounded, compact-valued
and upper hemicontinuous, and v is continuous.

Under the continuity of both the objective function and feasibility correspondence, the
optimization problem is thus stable under changes in parameters: both the value function
and the solution correspondence are continuous. The Maximum Theorem is an important
result in applications because, as remarked before, the stability that it ensures is often a
desirable property of the optimization problems that they feature. Natura non facit saltus
as long as the hypotheses of the Maximum Theorem are satis ed.

The proof of the Maximum Theorem relies on a lemma of independent interest.

Lemma 1768 Given any bounded sequence of scalars fan g, if lim supn!1 an = a then there
exists a subsequence fank g such that limk!1 ank = a.
2
It is named after Claude Berge, who proved it in 1959.
1212 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

A similar property holds for the lim inf.

Proof For k = 1 de ne
n1 = min fn 1 : jan aj < 1g
and, recursively, for each k 2,
1
nk = min n 1 : n > nk 1 and jan aj <
k
In this way, fank g is a subsequence of fan g because by construction nk > nk 1 for every k
2. At the same time, fank g converges to a as, again by construction, it holds jank aj < 1=k
for every k 1. Thus, fank g is the subsequence we are looking for, provided we show that
it is well de ned. This requires to show that the sets whose minima we are taking are not
empty, so that these minima well de ned. The rest of the proof is devoted to show exactly
this.
For each n 1, set bn = supm n am 2 R. Recall that a = limn!1 bn = inf n An . Fix
any " > 0. Since bn converges to a, there exists some n" 1 such that bn" a < "=2. Since
bn = supm n am there is some m n" such that bn" "=2 am bn" . In turn, this easily
implies that
jam aj = jam bn" + bn" aj jam bn" j + jbn" aj < "
| {z } | {z }
"=2 <"=2

Summing up, for each " > 0 the set fn 1 : jam aj < "g is not empty. In turn, this is
easily seen to imply that the sets used to de ne nk are not empty, as desired.

Proof of the Maximum Theorem Since ' is bounded, recall that there exists a compact
set K such that ' ( ) K A for all 2 . Suppose that ' and f are continuous. By
Proposition 958, the set ' ( ) is closed for each 2 . Since ' is bounded, ' ( ) turns out
to be compact as well. By Proposition 1762, S = and is compact-valued. Fix any point
2 and consider a sequence f n g such that limn!1 n = . Next, we rst prove
that fv ( n )g is bounded. By contradiction, assume that supn jv ( n )j = +1. It follows that
there exists a subsequence f nk g such that jv ( nk )j k for every k 1. For each s 1, let
x
^nk 2 ' ( nk ) such that v ( nk ) = f (^
xnk ; nk ) for every s. By Bolzano-Weierstrass' Theorem
and since ' is bounded, there exists a subsequence x ^ks that converges to x 2 K. Since '
is continuous and lims!1 nks = , we can conclude that x 2 ' . Since f is continuous,
this implies that
+1 = lim v nks = lim f x
^nks ; nks = f x; < +1
s!1 s!1

a contradiction. We proceed by showing that lim supn!1 v ( n ) v . In light of the


previous part of the proof, we can use Lemma 1768. Set lim supn!1 v ( n ) = 2 R.
By Lemma 1768, there exists a subsequence f nk g such that limk!1 v ( nk ) = . Let
x
^k 2 ' ( nk ) for each k 1, so that f (^xk ; nk ) = v ( nk ) for each k 1. Since ' is
bounded, the sequence of vectors f^
xk g is bounded. By Bolzano-Weierstrass' Theorem, there
is a subsequence x ^ks that converges to some x 2 Rn . Since lims!1 nks = , it follows
that x 2 ' because ' is upper hemicontinuous. Since f is continuous, this implies
= lim v nks = lim f xks ; nks = f x; v
s!1 s!1
41.4. MAXIMUM THEOREM 1213

We conclude that v , as desired.


Next, we show that lim inf n!1 v ( n ) v . Set lim inf n!1 v ( n ) = 2 R. By
Lemma 1768, there exists a subsequence f nk g such that limk!1 v ( nk ) = . Since S = ,
there is x 2 ' such that v = f x; . Since ' is lower hemicontinuous, there exist
elements xk 2 ' ( nk ) such that limk!1 xk = x. Since v ( nk ) = f (xk ; nk ) for each k 1,
the continuity of f implies

= lim v ( nk ) = lim f (xk ; nk ) = f x; =v :


k!1 k!1

Hence, lim inf n!1 v ( n) v , as desired. We conclude that

lim inf v ( n) v lim sup v ( n)


n!1 n!1

so limn!1 v ( n ) = v .
It remains to show that is upper hemicontinuous at . Let n ! and xn ! x
with xn 2 ( n ). We want to show that x 2 . Since ( n ) ' ( n ) and ' is upper
hemicontinuous, clearly x 2 ' . By the continuity of both f and v, we then have

f x; = lim f (xn ; n) = lim v ( n) =v


n!1 n!1

and so x 2 , as desired.

The next example show that the joint continuity of the objective function in its two
arguments is needed in the Maximum Theorem.

Example 1769 De ne f : R2 ! R by
(
0 if (x; ) = (0; 0)
f (x; ) = 2x
x2 + 2 else

As noted by Schwarz (1872) p. 220, this function is separately continuous (why?), but
discontinuous at the origin (cf. Example 559). Indeed, let us approach the origin along the
45 degree line (x; ) = (t; t), with t 2 R. We have
2t2
lim f (t; t) = lim = 1 6= 0 = f (0)
t!0 t!0 t2 + t2

If we take a sequence xn = (1=n; 1=n), we have xn ! 0 but not f (xn ) ! f (0). By


Proposition 552, the function is discontinuous at the origin.
Consider the parametric optimization problem

max f (x; ) sub x 2 ' ( )


x

where ' ( ) = [ 1; 1] for each 2 [ 1; 1] (so, ' is both bounded and continuous). We have
(
0 if = 0
v( ) =
1 else
The value function is thus discontinuous at 0 (it is actually lower semicontinuous there, as
remarked by Baire, 1927). N
1214 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

The continuity properties of demand correspondences and indirect utility functions follow
from the Maximum Theorem, a remarkable rst dividend of this result. To this end, we need
the following continuity property of the budget correspondence.

Proposition 1770 The budget correspondence is continuous at all (p; w) such that w > 0.

In other words, the budget correspondence B is continuous on Rn+ R++ .

Proof Let (p; w) 2 Rn+ R++ . We rst show that B is upper hemicontinuous at (p; w). Let
(pn ; wn ) ! (p; w), xn ! x and xn 2 B (pn ; wn ). We want to show that x 2 B (p; w). Since
p xn wn for each n, it holds p x = limn!1 p xn limn!1 wn = w, that is, x 2 B (p; w).
We conclude that B is upper hemicontinuous at (p; w).
The correspondence B is also lower hemicontinuous at (p; w) 2 Rn+1
+ . Let (pn ; wn ) !
(p; w) and x 2 B (p; w). We want to show that there is a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We consider two cases.

(i) Suppose p x < w. Since (pn ; wn ) ! (p; w), there is n large enough so that pn x < wn
for all n n. Hence, the constant sequence xn = x is such that xn 2 B (pn ; wn ) for all
n n and xn ! x.

(ii) Suppose p x = w. Since w > 0, there is x 2 Rn+ such that p x < w. Since
(pn ; wn ) ! (p; w), there is n large enough so that pn x < wn for all n n. Set

1 1
xn = 1 x+ x
n n

We have xn 2 B (pn ; wn ) for all n n and xn ! x.

In both cases it then easily follows the existence of a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We conclude that B is lower hemicontinuous at (p; w).

We can now apply the Maximum Theorem to the consumer problem that, under a mild
continuity hypothesis on the utility function, turns out to be stable with respect to changes
in prices and wealth.

Proposition 1771 Suppose that u : A Rn ! R is a continuous utility function de ned


on a compact consumption set. Let (p; w) 2 Rn+ R++ . Then:

(i) the demand correspondence is compact-valued and upper hemicontinuous at (p; w);

(ii) the indirect utility function is continuous at (p; w).

Proof Since the consumption set is compact, the budget correspondence is bounded and
continuous on Rn+ R++ . Since the utility function is continuous, the result then follows
from the Maximum Theorem.

Observe that (i) implies that demand functions are continuous at (p; w) since upper
hemicontinuity and continuity coincide for bounded functions (Proposition 960).
41.5. ENVELOPE THEOREMS I: FIXED CONSTRAINT 1215

The Maximum Theorem has some remarkable consequences on the study of equations.
Indeed, consider the parametric equation

f (x; ) = y0

where f : A Rn Rm ! Rn is an operator and y0 is a given element of Rn (Sec-


tion 35.6). The equation solution correspondence Sy0 : S Rn de ned by Sy0 ( ) =
fx 2 A : f (x; ) = y0 g describes how solutions vary as the parameter varies. The next re-
sult, a consequence of the Maximum Theorem, shows that it has a remarkable continuity
property. So, changes in parameters do not translate in abrupt changes in the solution sets,
an important nding that complements what discussed in Section 35.6.

Corollary 1772 Assume that Sy0 ( ) 6= ; for all 2 . If f is continuous and A is bounded,
then Sy0 : Rn is viable, compact-valued and upper hemicontinuous.

Proof Consider the parametric version of the optimization problem (37.17), i.e.,

min kf (x; ) y 0 k2 sub x 2 A (41.9)


x

De ne g : A Rn Rm ! Rn by g (x; ) = kf (x; ) y0 k2 and ' : A by


' ( ) = A for all 2 . The continuity of f implies that of g, so by the Maximum Theorem
the solution correspondence : A de ned by ( ) = arg maxx2'( ) g (x; ) is compact-
valued and upper hemicontinuous. Since ( ) = Sy0 ( ) for all 2 , the result follows.

A similar result can be proved for xed points. Given an operator f : A Rn Rm !


A, de ne the xed-point correspondence :S Rn by ( ) = fx 2 A : f (x; ) = xg. In
words, ( ) is the collection of all xed points of the self-map f ( ; ) : A ! A, given a
parameter value 2 . Assume that such set is non-empty for all 2 . If f is continuous
and A is bounded, then the last corollary implies that is compact-valued and upper
hemicontinuous. Indeed, if we consider the auxiliary operator g : A R n Rm ! R n
de ned by g (x; ) = f (x; ) x, we have that a vector x 2 A is a xed point of f ( ; ) if
and only if it solves equation g (x; ) = 0.

41.5 Envelope theorems I: xed constraint


How do value functions react to changes in parameters? In other words, how do change the
objective functions' optimal levels when parameters change? The answer to this basic com-
parative statics exercise depends, clearly, on how solutions react to such changes, as optimal
levels are attained at the solutions. Mathematically, under di erentiability it amounts to
study the gradient rv ( ) of the value function. This the subject matter of the envelope
theorems.
We begin by considering in this section the special case

max f (x; ) sub x 2 C (41.10)


x

where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a ects the objective function. To ease matters, throughout the section we
also assume that S = .
1216 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

We rst approach heuristically the issue. To this end, suppose that n = k = 1 so


that both the parameter and the choice variable x are scalars. Moreover, assume that
there is a unique solution for each , so that : ! R is the solution function. Then
v ( ) = f ( ( ) ; ) for every 2 . A heuristic application of the chain rule { a \back of
the envelope calculation" { then suggests that, if exists, the derivative of v at 0 is:

@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
Remarkably, the rst term is null because by Fermat's Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,

@f ( ( 0 ) ; 0)
v0 ( 0) = (41.11)
@
Next we make general and rigorous this important nding.

Theorem 1773 Suppose f (x; ) is, for every x 2 C, di erentiable at 0 2 int . If v is


di erentiable at 0 , then for every x
^ 2 ( 0 ) we have rv ( 0 ) = r f (^
x; 0 ), that is,

@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (41.12)
@ i @ i

If f is strictly quasi-concave in x and ' is convex-valued, then is a function (Proposition


1764). So, (41.12) can be written as

@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i

which is the general form of the heuristic formula (41.11).

Proof Let 0 2 int . Let x ( 0 ) 2 ( 0 ) be an optimal solution at 0 , so that v ( 0 ) =


f (x ( 0 ) ; 0 ). De ne w : ! R by w ( ) = f (x ( 0 ) ; ). We have v ( 0 ) = w ( 0 ) and, for
all 2 ,
w ( ) = f (x ( 0 ) ; ) max f (x; ) = v ( ) (41.13)
x2C

We thus have
w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
for all u 2 Rk and t > 0 su ciently small. Hence,

@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v i
0 + he v ( 0) @v ( 0 )
lim =
h!0+ h @ i

On the other hand,


w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
41.6. ENVELOPE THEOREMS II: VARIABLE CONSTRAINT 1217

for all u 2 Rk and t < 0 su ciently small. By proceeding as before, we then have
@f (x; 0 ) @v ( 0 )
@ i @ i
This proves (41.12).

The hypothesis that v is di erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (41.10). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di erentiability hypothesis follows
from hypotheses that are directly on the objective function.

Theorem 1774 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di erentiable
at 0 2 int . If f is concave on C , then v is di erentiable at 0 .

Thus, if f is di erentiable on the variable and is concave, then rv ( 0 ) = r f (^x; 0 )


for all x
^ 2 ( 0 ). If, in addition, f is strictly concave in x, then we can directly write
rv ( 0 ) = r f ( ( 0 ) ; 0 ) because is a function and ( 0 ) is the unique solution at 0 .
\
Proof By Proposition 1765, v is concave. We begin by proving that @v ( 0 ) @ f (x; 0 ).
x2 ( 0)
Let 2 @v ( 0 ), so that v ( ) v ( 0 ) + ( 0) for all 2 . Being v ( 0 ) = w ( 0 ), by
(41.13) we have, for all 2 ,

w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)

Hence, 2 @w ( 0 ) = @ f (x; 0 ) for all x 2 ( 0 ). Since v is concave at 0 2 int , by


Proposition 1524 we have @v ( 0 ) 6= ;. Since f (x; ) is, for every x 2 ( 0 ), di erentiable
at 0 , we have @ f (x; 0 ) = fr f (x; 0 )g by Proposition 1516. We conclude that @v ( 0 ) =
fr f (x; 0 )g. By Proposition 1516, v is di erentiable at 0 .

41.6 Envelope theorems II: variable constraint


Matters are less clean when the feasibility correspondence is not constant. We consider a
parametric optimization problem with equality constraints

max f (x; ) sub i (x; )=0 8i = 1; :::; m (41.14)


x

where = ( 1 ; :::; m ) : A Rn ! Rm and = ( 1 ; :::; k ) 2 Rk .


Here ' ( ) = fx 2 A : i (x; ) = 0 8i = 1; :::; mg, so the constraint varies with the pa-
rameter . For instance, if f does not depend on and i (x; ) = gi (x) i for i = 1; :::; m
(so that k = m), we get back to the familiar problem (38.37) of Chapter 38, that is,

max f (x) sub gi (x) = bi 8i = 1; :::; m


x

Again, we begin with a heuristic argument. Assume that n = k = m = 1, so that there is


a single constraint and both the parameter and the choice variable x are scalars. Moreover,
assume that there is a unique solution for each , so that : ! R is the solution function
1218 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

and ( ) is the unique solution that corresponds to . A heuristic application of the chain
rule suggests that, if exists, the derivative of v at 0 is

@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have
@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
On the other hand, being v ( ) = f ( ( ) ; ) for every 2 , again by a heuristic application
of the chain rule we have
@f ( ( 0 ) ; 0 ) 0 @f
v0 ( 0) = ( 0) + ( ( 0) ; 0)
@x @
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) ^ @ ( ( 0) ; 0) 0
= ( 0) + ( 0) ( 0)
@x @x @x
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) 0
= ( 0 ) 0 ( ( 0 )) 0 ( 0 ) + ^ ( 0 ) ( 0)
@x @x
| {z }
=0
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0)
= ( 0)
@ @
as desired. Next we make more rigorous and general the result. We study the case of unique
solutions, common in applications.

Theorem 1775 Suppose that problem (41.14) has a unique solution ( ) at all 2 .3
Suppose that the sets A and are open and that f and are continuously di erentiable on
A . If the determinant of the Jacobian of the operator (rx L; ) is non-zero on , then

rv ( ) = r f ( ( ) ; ) ^( ) r ( ( ); ) 8 2

where ^ ( ) is the Lagrange multiplier that corresponds to the unique solution ( ).

That is, by unzipping gradients we have


m
X
@v ( ) @f ( ( ) ; ) ^i ( ) @ i( ( ); )
= 8s = 1; :::; k (41.15)
@ s @ s @ s
i=1

for all 2 .

Proof As in the heuristic argument, we consider the case n = k = m = 1 (the general case
being just notationally messier). By hypothesis, there is a solution function : ! A. By
3
Earlier in the chapter we saw which conditions ensure the existence and uniqueness of solutions.
41.7. MARGINAL INTERPRETATION OF MULTIPLIERS 1219

Lagrange's Theorem, is then the unique function that, along with a \multiplier" function
^ : ! R, satis es for all 2 the equations

@f ( ( ) ; ) ^ @ ( ( ); )
rx L( ( ) ; ^ ( )) = ( ) =0
@x @x
r L( ( ) ; ^ ( )) = ( ( ); ) = 0

So, the operator ( ; ^ ) : ! A R is de ned implicitly at each 2 by these equations.


Since the Jacobian of the operator (rx L; ) is non-zero on , the operator version of
Proposition 1582 ensures that the operator ( ; ^ ) is continuously di erentiable, with
@ ( ( ); ) 0 @ ( ( ); )
+ ( ) =0 8 2 (41.16)
@ @x
We also have v ( ) = f ( ( ) ; ) for all 2 . By Theorem 1274, v is di erentiable and, by
the chain rule, we have
@f ( ( ) ; ) @f
v0 ( ) = 0
( )+ ( ( ); ) 8 2 (41.17)
@x @
Putting together (41.16) and (41.17) via the simple algebra seen in the heuristic derivation,
we get
@f ( ( ) ; ) @f @f ( ( ) ; ) ^( ) @ ( ( ); )
v0 ( ) = 0
( )+ ( ( ); ) = 8 2
@x @ @ @
as desired.

41.7 Marginal interpretation of multipliers


Formula (41.15) continues to hold for parametric optimization problem with both equality
and inequality constraints (41.3), where it takes the form

@v ( 0 ) @f (^
x; 0 ) X X @ ( ( 0) ; 0)
= ^i ( 0) @ i( ( 0) ; 0)
^j ( 0)
j
(41.18)
@ s @ s @ s @ s
i2I j2J

jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1725 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a ect the derivation because their multipliers are null.
That said, let us consider the standard problem (39.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1758). Formula (41.18) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
1220 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Interestingly, the multipliers describe the marginal e ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.

41.8 Monotone solutions


Given an objective function f : I Rn Rm ! R, consider a parametric optimization
problem
max f (x; ) sub x 2 ' ( ) (41.19)
x

in which the feasibility correspondence ' : I is assumed to be ascending: when 0 ,


0 0
if x 2 ' ( ) and y 2 ' , then x ^ y 2 ' ( ) and x _ y 2 ' . Note that when ' is
single-valued, this amounts to require ' 0 ' ( ) whenever 0 { i.e., ' is an increasing
function.
The question that we address in this section is whether the solution correspondence of
this class of parametric optimization problems is itself ascending (so increasing when single-
valued): do higher values of the parameters translate in higher values of the solutions? That
is, does 0 imply 0
( )? It is a monotonicity property of solutions that may be
relevant in applications.4

The next class of functions will play a key role in our analysis.

De nition 1776 A function f : I Rn Rm ! R is parametrically supermodular if,


given any 0 , we have
0 0
f (x; ) + f y; f x _ y; + f (x ^ y; )
for all x; y 2 I.

Given any 2 , the section f ( ; ) : I ! R is supermodular. Indeed, it is enough


to set 0 = in the previous de nition. So, parametric supermodularity extends standard
supermodularity to a parametric setting.

Example 1777 Given a function : I ! R, de ne f : I Rn Rn ! R by f (x; ) =


n
(x) + x. For each x 2 I and h 2 R , with x + h 2 I, we have
f (x; ) f (x + h; ) = (x) (x + h) h 8 2
Assume that is supermodular. Let x; y 2 I and set h = y x_y = x^y x 0. If
0
, we have 0 h h and so
f (x; ) f (x ^ y; ) = f (x; ) f (x + h; ) = (x) (x + h) h= (x) (x ^ y) h
0 0
(x _ y) (y) h= (x _ y) (x _ y + h) h
0 0 0 0
= f x _ y; f x _ y + h; = f x _ y; f y;
We conclude that f is parametrically supermodular. N
4
We refer to Topkis (2011) for a detailed analysis of this topic. Throughout the section I = I1 In
denotes a rectangle in Rn , with each interval Ii bounded or not.
41.8. MONOTONE SOLUTIONS 1221

Example 1778 Assume that is a lattice. If f : I Rn Rm ! R is jointly


supermodular on I , then it is easily seen to be parametrically supermodular. So, any
condition that ensures such joint supermodularity of f , for instance a di erential condition
like (20.7), implies the parametric supermodularity of f . For instance, in the previous
example assume that the supermodular function is twice di erentiable and that I and
are open interval in Rn and Rm , respectively. Then, @f (x; ) =@xi @xj = @ (x) =@xi @xj ,

@f (x; ) @f (x; )
=0 81 i 6= j m and =1 81 i n; 81 j m
@ i@ j @xi @ j

Condition (20.7) is satis ed, so f is jointly supermodular. We conclude that f is paramet-


rically supermodular, thus con rming what established in the previous example. N

Since we deal with optimization problems, it is natural to turn to ordinal properties.

De nition 1779 A function f : I Rn Rm ! R is parametrically semi-supermodular


0
if, given any , for each x; y 2 I we have
0 0
f (x ^ y; ) < f (x; ) =) f y; f x _ y; (41.20)

This is an ordinal property much weaker than parametric supermodularity.

Example 1780 Functions f : I Rn Rm ! R that are increasing in x for every


are easily seen to be parametrically semi-supermodular. In particular, the function f :
2
R++ (0; 1) ! R de ned by log (x1 + x2 ) is parametrically semi-supermodular; it is not,
however, parametrically supermodular. N

We can now address the question that we posed at the beginning of this section. To ease
matters, from now on we assume that problem (41.19) has a solution for every 2 (e.g.,
I is compact and f is continuous in x), so that we can write the solution correspondence as
: Rn . In most applications, comparative statics exercises actually feature solution
functions : ! Rn rather than correspondences (as we already argued several times).
This motivates the next result.

Proposition 1781 Let f : I Rn Rm ! R be parametrically semi-supermodular. If


the solution correspondence of the parametric optimization problem (41.19) is single-valued,
then it is increasing.

Proof Suppose that f is parametrically semi-supermodular and is single-valued. By


de nition, ( ) = arg maxx2'( ) f (x; ) for all 2 . Let 0 . Since ' is ascend-
0 0
ing, we have ( ) ^ 2 ' ( ) and ( ) _ 2 ' 0 . So, by the de nition
0 0
of we have f ( )_ ; 0 f 0
; 0 , while by the de nition of ( ) we
0 0
have f ( )^ ; f ( ( ) ; ). Suppose f ( ( ) ; ) = f ( )^ ; . By the
0 0
uniqueness of the solution, we have ( ) = ( ) ^ . Suppose, instead, that
0 0
f ( ( ); ) > f ( )^ ; . By (41.20), we have f ( )_ ; 0 f 0
; 0 ,
0 0 0 0
so f ( )_ ; = f ; . By the uniqueness of the solution, we now have
0 0 0
( ) ( )_ = . In both cases, we conclude that ( ) , as desired.
1222 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Example 1782 From the last example we know that, given a supermodular function :
I ! R, the function f : I Rn Rn ! R de ned by f (x; ) = (x) + x is
parametrically supermodular. Consider the parametric problem

max (x) + x sub x 2 ' ( )


x

where the feasibility correspondence ' is ascending. By the last corollary, the solution
correspondence of this problem is ascending. For instance, consider a Cobb-Douglas pro-
duction function (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. If q 0 is the output's price and
p = (p1 ; p2 ) 0 are the inputs' prices, the pro t function (x; q) = qx1 1 x2 2 p1 x1 p2 x2 is
parametrically supermodular because is supermodular (see Example 937). The producer
problem is
max (x; q) sub x1 ; x2 0
x1 ;x2

where output's price q plays the role of the parameter . Since the pro t function is strictly
concave, solutions are unique (if they exist). In particular, a solution of the producer problem
is an optimal amount of inputs that the producer will demand. By the last corollary,5 the
solution function is increasing: if the output's price increases, the inputs' demand of the
producer increases. N

Next we turn to the ordinal version of parametric supermodularity.

De nition 1783 A function f : I Rn Rm ! R is parametrically quasi-supermodular


0
if, given any , for each x; y 2 I we have both
0 0
f (x ^ y; ) < f (x; ) =) f y; < f x _ y; (41.21)

and
0 0
f y; > f x _ y; =) f (x ^ y; ) > f (x; ) (41.22)

The next result motivates the \quasi" terminology.

Proposition 1784 Let f : I Rn Rm ! R be parametrically supermodular. If


' : Im f ! R is strictly increasing, then ' f is parametrically quasi-supermodular.

Clearly, parametric quasi-supermodularity implies parametric semi-supermodularity.

Example 1785 If f : I Rn Rm ! R is strictly monotone in x for every , then it


is parametrically supermodular (as the reader can check). N

Parametric quasi-supermodularity permits to extend Proposition 1781 to the multi-


valued case.

Proposition 1786 If f : I Rn Rm ! R is parametrically quasi-supermodular, the


solution correspondence of the parametric optimization problem (41.19) is ascending.
5
The feasibility correspondence ' : [0; +1) ! R2+ is given by ' (q) = R2+ , so it is trivially ascending.
41.9. APPROXIMATIONS: THE LAPLACE METHOD 1223

Proof Suppose that f is parametrically quasi-supermodular. Let 0 . Let x ( ) 2 ( ) =


arg maxx2'( ) f (x; ) for all 2 . Let 0 . Since ' is ascending, we have x ( ) ^ x 0 2
' ( ) and x ( ) _ x 0 2 ' 0 . So, by the de nition of x 0 we have f x ( ) _ x 0 ; 0
f x 0 ; 0 , while by the de nition of x ( ) we have f x ( ) ^ x 0 ; f (x ( ) ; ). Sup-
pose f x ( ) _ x 0 ; 0 < f x 0 ; 0 . By (41.22), f x ( ) ^ x 0 ; > f (x ( ) ; ), a
0 0 0
contradiction. We conclude that f x ( ) _ x ; = f x ; , so x ( ) _ x 0 2
0
0 0
. Suppose f x ( ) ^ x ; < f (x ( ) ; ). By (41.21), f x ( ) _ x 0 ; 0 > f x 0 ; 0
,
a contradiction. We conclude that f x ( ) ^ x 0 ; = f (x ( ) ; ), so x ( ) ^ x 0 2 ( ).
This proves that is ascending.

41.9 Approximations: the Laplace method


Back to the parametric optimization problem (41.4), its solution and the value functions
are instances of two important classes of functions, the log-exponential and the softmax
functions. In this section we focus on these two important functions and show that they
may provide interesting approximations of the solution and value functions.

41.9.1 Log-exponential and softmax functions


A (negative) log-exponential function g : Rn ! R indexed by a scalar > 0 is de ned by:6
n
X
1 xi
g (x) = log ie
i=1
The log-exponential function has some remarkable properties.
Proposition 1787 The log-exponential g : Rn ! R is Lipschitz, strongly increasing, con-
cave, translation invariant and normalized. Moreover, there exists 2 (0; 1) such that
log
min fx1 ; :::; xn g g (x) min fx1 ; :::; xn g 8x 2 Rn (41.23)

Proof In view of Example 912, g is Lipschitz, concave and translation invariant. It is easy
to check that g is also strongly increasing and normalized: g (k) = k for all k 2 R.
To prove the sandwich (41.23), set x = min fx1 ; :::; xn g and = min f 1 ; :::; n g 2
(0; 1). Since g is strongly increasing Pand normalized, we have g (x) g (x ) = x for all
x 2 Rn . On the other hand, we have ni=1 i e xi e x and so
n
X
1 xi log
g (x) = log ie x
i=1
as desired.

The sandwich (41.23) shows that, as diverges to +1, the log-exponential function
better and better approximate the Leontief function:
n
X
1 xi
lim log ie = min fx1 ; :::; xn g 8x 2 Rn (41.24)
!+1
i=1
6 1 Pn xi
Recall that in Example 912 we brie y studied a more general version g (x) = log i=1 ie of
this function. Here we require = .
1224 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

This min-approximation property is a remarkable feature of the log-exponential function. It


has a dual version for the max:
log
max fx1 ; :::; xn g + g~ (x) max fx1 ; :::; xn g 8x 2 Rn (41.25)

that involves the dual log-exponential function g~ : Rn ! R of g given by


n
X
1 xi
g~ (x) = g ( x) = log ie
i=1

So,
n
X
1 xi
lim log ie = max fx1 ; :::; xn g 8x 2 Rn (41.26)
!+1
i=1
The softmax operator f = (f1 ; :::; fn ) : Rn ! n 1 indexed by > 0 is de ned by
e xi
i
fi (x) = Pn xi
8i = 1; :::; n
i=1 e i

The softmax operator is the gradient operator of the log-exponential function,7 i.e.,

f (x) = rg (x) 8x 2 X

Being the gradient of a Lipschitz, strongly increasing and concave function, the softmax oper-
ator is, ipso facto, cyclically monotone (Theorem 1529): for any nite sequence x0 ; x1 ; :::; xm
of vectors in Rn , it holds

f (x0 ) (x1 x0 ) + + f (xm 1) (xm xm 1) + f (xm ) (x0 xm ) 0

To see another remarkable property of the softmax operator, observe that the vector of Rn
de ned by
e xi i
pi = Pn xi
8i = 1; :::; n
i=1 e i
is an element of the simplex, i.e., p 2 n 1 . In particular, we can interpret pi as the
probability that the component xi of vector x 2 Rn is selected, say by a suitable random
device calibrated with these probabilities.

Proposition 1788 We have, for each non-constant x 2 Rn ,

xi > min fx1 ; :::; xn g =) lim pi = 0


!+1

Thus, the probability that the random device selects a non-minimum component of x
goes to 0 as diverges to +1. So, with a higher and higher probability the random device
select a minimum component of the vector x. In particular, when the minimum component
is unique, the random device eventually selects it, i.e.,

8j 6= i; xi < xj =) lim pi = 1
!+1
7
Sometimes we talk of softmax functions, a legitimate (as operators are functions) but less precise termi-
nology.
41.9. APPROXIMATIONS: THE LAPLACE METHOD 1225

This nding nicely complements (41.24).

Proof Since x is non-constant, assume without loss of generality that x1 > xn = min fx1 ; :::; xn g.
We want to show that lim !+1 p1 = 0. Then

e x1 1 1
1
p1 = Pn xi
= Pn e xi = Pn 1 !0
i=1 e
(x1 xi ) (x1 xn )
i=1 e +e
i i n
i i=1 e x1
1 1 1

because x1 xn > 0 implies that lim (xn x1 )


!+1 e = +1.

As to the max, we can consider the dual softmax operator f = (f1 ; :::; fn ) : Rn ! n 1
de ned by
e xi i
fi (x) = Pn xi
8i = 1; :::; n
i=1 e i

as well as the probability vector in n 1 de ned by

e xi
i
qi = Pn xi
8i = 1; :::; n
i=1 e i

It is easy to see that, for each non-constant x 2 Rn , we have

xi < max fx1 ; :::; xn g =) lim qi = 0 (41.27)


!+1

and, when the maximum component of vector x is unique,

8j 6= i; xi > xj =) lim qi = 1 (41.28)


!+1

41.9.2 The Laplace method


In applications, one often needs to nd, within a nite set of alternatives Z, which alternative
z^ 2 Z is best according to an objective function f : Z ! R. We thus need to solve the nite
optimization problem
max f (z) sub z 2 Z
z

Suppose, without loss of generality, that the set Z = fz1 ; :::; zn g consists of n alternatives. Set
xi = u (zi ) for each i = 1; :::; n and consider a uniform , i.e., i = 1=n for each i = 1; :::; n.
By (41.26) and (41.28), it is easy to see that
n
X
1 u(zi )
lim log e = max u (z)
!+1 z2Z
i=1

and, if z^ is unique,
e u(^
z)
lim Pn u(z)
=1
i=1 e
!+1

Thus, as diverges to +1, the log-exponential function better and better approximates the
maximum value maxz2Z u (z), while via the softmax operator we can construct a random
device that, with a higher and higher probability, selects the maximizer z^.
1226 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS

Back to the subject matter of this chapter, consider a parametric optimization problem

max f (x; ) sub x 2 ' ( )


x

where ' ( ) is a nite set for each 2 . Assume that there is a unique solution for each
2 . In view of what we just proved, for each 2 we have
1 X
u(z)
v ( ) = lim log e
!+1
z2'( )

and
e u( ( ))
lim Pn u(z)
=1
i=1 e
!+1

We thus have approximations, deterministic and probabilistic, of the solution and value
functions : ! Z and v : ! R. Jointly, they form the Laplace (approximation)
method.
Chapter 42

Interdependent optimization

So far we have considered individual optimization problems. Many economic and social phe-
nomena, however, are characterized by the interplay of several such problems, in which the
outcomes of agents' decisions depend on their decisions as well as on the decisions of other
agents. Market interactions are an obvious example of interdependence among agents' deci-
sions: for instance, in an oligopoly problem the pro ts that each producer can earn depends
both on his production decision and on the production decisions of the other oligopolists.
Interdependent decisions must coexist: the mutual compatibility of agents' decisions is
the novel conceptual issue that emerges in the study of interdependent optimization. Equi-
librium notions address this issue. In this chapter we present an introductory mathematical
analysis of this most important topic, which is the subject matter of game theory and is at
the heart of economic analysis. In particular, the theorems of von Neumann and Nash that
we will present in this chapter are wonderful examples of deep mathematical results that
have been motivated by economic applications.

42.1 Minimax Theorem


De nition 1789 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets
of A1 and A2 , respectively. A pair (^ x1 ; x
^2 ) 2 C1 C2 is said to be a saddle point of f on
C1 C2 if
f (^
x1 ; x2 ) f (^
x1 ; x
^2 ) f (x1 ; x ^2 ) 8x1 2 C1 ; 8x2 2 C2 (42.1)
The value f (^
x1 ; x
^2 ) of the function at x
^ is called saddle value of f on C1 C2 .

In other words, (^
x1 ; x
^2 ) is a saddle point if the function f (^
x1 ; ) : C2 ! R has a minimum
at x
^2 and the function f ( ; x ^2 ) : C1 ! R has a maximum at x ^1 . To visualize these points,
think of centers of horse saddles: these points at the same time maximize f along one
dimension and minimize it along the other, perpendicular, one. This motivates their name.
Their nature is clari ed by the next characterization.

Proposition 1790 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets of


A1 and A2 , respectively. A pair (^
x1 ; x
^2 ) 2 C1 C2 is a saddle point of f on C1 C2 if and
only if1

1
Since we have inf and sup, we must allow the values 1 and +1, respectively.

1227
1228 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

(i) the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum value at x
^1 ,

(ii) the function supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x
^2 ,

(iii) the two values are equal, i.e.,

max inf f (x1 ; x2 ) = f (^ ^2 ) = min sup f (x1 ; x2 )


x1 ; x (42.2)
x1 2C1 x2 2C2 x2 2C2 x1 2C1

This characterization consists of two optimization conditions, (i) and (ii), and a nal
condition, (iii), that requires their mutual consistency. Let us consider these conditions one
by one.
By condition (i), the component x ^1 of a saddle point, called maximinimizer, solves the
following optimization problem, called maximinimization (or primal ) problem,

max inf f (x1 ; x2 ) sub x1 2 C1 (42.3)


x1 x2 2C2

where inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) is the objective function. If f does not depend on
x2 , this problem reduces to the standard maximization problem

max f (x1 ) sub x1 2 C1 (42.4)


x1

where the maximinimizer x ^1 becomes a standard maximizer.


By condition (ii), the component x^2 of a saddle point, called minimaximizer, solves the
following optimization problem, called minimaximization (or dual ) problem,

min sup f (x1 ; x2 ) sub x2 2 C2 (42.5)


x2 x1 2C1

where supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] is the objective function. If f does not depend
on x1 , this problem reduces to the standard minimization problem

min f (x2 ) sub x2 2 C2


x2

where the minimaximizer x ^2 becomes a standard minimizer.


The optimization problems (42.3) and (42.5) that underlie conditions (i) and (ii) are
dual: in one we rst minimize over x2 and then maximize over x1 , in the other we do the
opposite. The consistency condition (iii) makes interchangeable in terms of value attained
these dual optimization problems by requiring their values to be equal.

The optimization conditions (i) and (ii) have standard optimization (maximization or
minimization) problems as special cases, so conceptually they are generalizations of famil-
iar notions. In contrast, the consistency condition (iii) is the actual novel feature of the
characterization in that it introduces a notion of mutual consistency between optimization
problems, which are no longer studied in isolation, as we did so far. The scope of this
condition will become more clear with the notion of Nash equilibrium.

The proof of Proposition 1790 relies on the following simple but important lemma (inter
alia, it shows that the more interesting part in an equality sup inf = inf sup is the inequality
sup inf inf sup).
42.1. MINIMAX THEOREM 1229

Lemma 1791 For any function f : A1 A2 ! R, we have

sup inf f (x1 ; x2 ) inf sup f (x1 ; x2 )


x1 2A1 x2 2A2 x2 2A2 x1 2A1

Proof Clearly, f (x1 ; x2 ) inf x2 2A2 f (x1 ; x2 ) for all (x1 ; x2 ) 2 A1 A2 , so

sup f (x1 ; x2 ) sup inf f (x1 ; x2 ) 8x2 2 A2


x1 2A1 x1 2A1 x2 2A2

Then, inf x2 2A2 supx1 2A1 f (x1 ; x2 ) supx1 2A1 inf x2 2A2 f (x1 ; x2 ).

Proof of Proposition 1790 \Only if". Let (^


x1 ; x
^2 ) 2 C1 C2 be a saddle point of f on
C1 C2 . By (42.1),

inf f (^
x1 ; x2 ) = f (^ ^2 ) = sup f (x1 ; x
x1 ; x ^2 ) (42.6)
x2 2C2 x1 2C1

So,
sup inf f (x1 ; x2 ) f (^
x1 ; x
^2 ) inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1

By the previous lemma, the inequalities are actually equalities, that is,

sup inf f (x1 ; x2 ) = f (^


x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1

From (42.6) it follows that

inf f (^
x1 ; x2 ) = sup inf f (x1 ; x2 ) and sup f (x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x2 2C2 x1 2C1 x2 2C2 x1 2C1 x2 2C2 x1 2C1

which, in turn, implies (42.2). This proves the \only if".


\If". By (i) and (iii) we have f (^x1 ; x
^2 ) = maxx1 2C1 inf x2 2C2 f (x1 ; x2 ) = inf x2 2C2 f (^
x1 ; x2 ).
By (ii) and (iii), f (^
x1 ; x
^2 ) = minx2 2C2 supx1 2C1 f (x1 ; x2 ) = supx1 2C1 f (x1 ; x
^2 ). Hence,

inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 )
x2 2C2 x1 2C1

which, in turn, implies that (^


x1 ; x
^2 ) 2 C1 C2 is a saddle point of f .

The last proposition implies the next remarkable interchangeability property of saddle
points.

Corollary 1792 Let f : A1 A2 ! R be a real-valued function and C1 and C2 subsets of


A1 and A2 , respectively. If the pairs (^ x1 ; x x01 ; x
^2 ) ; (^ ^02 ) 2 C1 C2 are saddle points of f on
C1 C2 , so are the pairs (^x1 ; x0
^2 ) ; (^ 0
x1 ; x
^2 ) 2 C1 C2 .

In words, if we interchange the two components of a saddle point, we get a new saddle
point.

Proof It is enough to consider (^ ^02 ). Since (^


x1 ; x x1 ; x
^2 ) is a saddle point of f on C1 C2 ,
by Proposition 1790 the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum
value at x x01 ; x
^1 . Since (^ ^02 ) is a saddle point of f on C1 C2 , by Proposition 1790 the function
1230 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x ^02 . In turn, by the \if"
part of Proposition 1790 this implies that (^
x1 ; x0
^2 ) is a saddle point of f on C1 C2 .

A function f : A1 A2 ! R de ned on a Cartesian product A1 A2 induces the


functions f x1 : A2 ! R de ned by f x1 (x2 ) = f (x1 ; x2 ) for each x1 2 A1 as well as the
functions f x2 : A1 ! R de ned by f x2 (x1 ) = f (x1 ; x2 ) for each x2 2 A2 . These functions
are called the sections of f (see Section 20.4.1). Using this terminology, we can say that
(^ ^2 ) is a saddle point of f if and only if the section f x^1 : C2 ! R attains minimum value
x1 ; x
^2 and the section f x^2 : C1 ! R attains maximum value at x
at x ^1 .
This remark easily leads, via Stampacchia's Theorem, to a di erential characterization
of saddle points. To this end, as we did earlier in the book, in the gradient2

@f (x1 ; x2 ) @f (x1 ; x2 ) @f (x1 ; x2 ) @f (x1 ; x2 )


rf (x1 ; x2 ) = ; ::::; ; ; ::::;
@x11 @x1m @x21 @x2n

of a function f : A1 A2 Rm Rn ! R we distinguish the two parts rx1 f (x1 ; x2 ) and


rx2 f (x1 ; x2 ) de ned by:

@f (x1 ; x2 ) @f (x1 ; x2 )
rx1 f (x1 ; x2 ) = ; ::::;
@x11 @x1m
@f (x1 ; x2 ) @f (x1 ; x2 )
rx2 f (x1 ; x2 ) = ; ::::;
@x21 @x2n

This distinction is key for the next di erential characterization of saddle points.

Proposition 1793 Let f : A1 A2 Rm Rn ! R be a real-valued function and C1 and


C2 subsets of A1 and A2 , respectively. Suppose that

(i) Ci is a closed and convex subset of the open and convex set Ai for i = 1; 2;

(ii) f is continuously di erentiable in both x1 and x2 .3

If (^
x1 ; x
^2 ) 2 C1 C2 is a saddle point of f on C1 C2 , then

rx1 f (^
x1 ; x
^2 ) (x1 x
^1 ) 0 8x1 2 C1 (42.7)
rx2 f (^
x1 ; x
^2 ) (x2 x
^2 ) 0 8x2 2 C2 (42.8)

The converse is true if f is concave in x1 2 C1 and convex in x2 2 C2 .4

Proof It is enough to note that x ^2 ) : A1 Rm ! R on


^1 is a maximizer of the function f ( ; x
C1 , while x x1 ; ) : A2 R2 ! R on C2 . By Stampacchia's
^1 is a minimizer of the function f (^
Theorem, the result holds.
2
Here x1 = (x11 ; :::; x1m ) 2 Rm and x2 = (x21 ; :::; x2n ) 2 Rn denote generic vectors in A1 and A2 ,
respectively.
3
That is, given any x2 2 A2 the section f x2 : A1 ! R is continuously di erentiable, while given any
x1 2 A1 the section f x1 : A2 ! R is continuously di erentiable.
4
That is, given any x2 2 C2 the section f x2 : C1 ! R is concave, while given any x1 2 C1 the section
x1
f : C2 ! R is convex.
42.1. MINIMAX THEOREM 1231

When x
^1 is an interior point, condition (42.7) takes the simpler Fermat's form
rx1 f (^
x1 ; x
^2 ) = 0
and the same is true for condition (42.8) if x
^2 is an interior point. Remarkably, conditions
(42.7) and (42.8) become necessary and su cient when f is a saddle function on C1 C2 ,
i.e., when f is concave in x1 2 C1 and convex in x2 2 C2 . Saddle functions have therefore
for saddle points the remarkable status that concave and convex functions have in standard
optimization problems for maximizers and minimizers, respectively.
Example 1794 Consider the saddle function f : R2 ! R de ned by f (x1 ; x2 ) = x21 x22 .
Since
@f (x1 ; x2 ) @f (x1 ; x2 )
= = 0 () x1 = x2 = 0
@x1 @x2
from the last theorem it follows that the origin (0; 0) is the only saddle point of f on R2 (cf.
Example 1304). Graphically:

0
x3

-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

The previous result establishes, inter alia, the existence of saddle points under di eren-
tiability and concavity assumptions on the function f . Next we give a fundamental existence
result, the Minimax Theorem, that relaxes these requirements on f , in particular it drops
any di erentiability assumption. It requires, however, the sets C1 and C2 to be compact (as
usual, there are no free meals).
Theorem 1795 (Minimax) Let f : A1 A2 Rn Rm ! R be a real-valued function and
C1 and C2 subsets of A1 and A2 , respectively. Suppose that:
(i) C1 and C2 are convex and compact subsets of A1 and A2 , respectively;
(ii) f ( ; x2 ) : A1 ! R is continuous and quasi-concave on C1 ;
(iii) f (x1 ; ) : A2 ! R is continuous and quasi-convex on C2 .
Then, f has a saddle point on C1 C2 , with
max min f (x1 ; x
^2 ) = f (^
x1 ; x
^2 ) = min max f (x1 ; x2 ) (42.9)
x1 2C1 x2 2C2 x2 2C2 x1 2C1
1232 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

Proof The existence of the saddle point follows from Nash's Theorem, which will be proved
below. Since the sets C1 and C2 are compact and the function f is continuous in x1 and
in x2 , by Weierstrass' Theorem we can de ne the functions minx2 2C2 f ( ; x2 ) : C1 ! R and
maxx1 2C1 f (x1 ; ) : C2 ! R. So, (42.2) implies (42.9).

The Minimax Theorem was proved in 1928 by John von Neumann in his seminal paper
on game theory. Interestingly, the choice sets C1 and C2 are required to be convex, so they
have to be in nite (unless they are singletons, a trivial case).
A simple, yet useful, corollary of the Minimax Theorem is that continuous saddle func-
tions on a compact convex set C1 C2 have a saddle point on C1 C2 . If, in addition, they
are di erentiable, conditions (42.7) and (42.8) then characterize any such point.

42.2 Nash equilibria


Consider a group of n agents.5 Each agent i has a choice sets Ci and an objective function
fi . Because of the interdependence of agents' decisions, the domain of fi is the Cartesian
product C1 Cn , that is,

fi : C1 Cn ! R

For instance, the objective function f1 of agent 1 depends on the agent decision x1 , as well
on the decisions x2 , ...., xn of the other agents. In the oligopoly example below, x1 is the
production decision of agent 1, while x2 , ...., xn are the production decisions of the other
agents.
Decisions are simultaneous, described by a vector (x1 ; :::; xn ). The operator f = (f1 ; :::; fn ) :
C1 Cn ! Rn , with

f (x1 ; :::; xn ) = (f1 (x1 ; :::; xn ) ; :::; fn (x1 ; :::; xn )) 2 Rn

describes the value fi (x1 ; :::; xn ) that each agent attains at (x1 ; :::; xn ). The operator f is an
interdependent objective function called game.

Example 1796 Consider n rms that produce the same output, say potatoes, that they
sell in the same market. The market price of the output depends on the total output
that together all rms o er. Assume that the output has a strictly decreasing demand
function 1
Pn D : [0; 1) ! [0; 1) in the market. So, D (q) is the market price of the output if
q = i=1 qi is the sum of the individual quantities qi 0 of the output produced by each
n
rm i = 1; :::; n. The pro t function i : R+ ! R of rm i is
1
i (q1 ; :::; qn ) =D (q) qi ci (qi )

where ci : [0; 1) ! R is its cost function. Thus, the pro t of rm i depends via q on
the production decisions of all rms, not just on their own decisions qi . We thus have an
interdependent optimization problem, called Cournot oligopoly. Here the choice sets Ci are
the positive half-line [0; 1) and the game f is given by = ( 1 ; :::; n ) : Rn+ ! Rn . N
5
In game theory agents are often called players (or co-players or opponents).
42.2. NASH EQUILIBRIA 1233

To introduce the next equilibrium notion, to x ideas we rst consider the case n = 2
of two agents. Here f : C1 C2 ! R2 with f (x1 ; x2 ) = (f1 (x1 ; x2 ) ; f2 (x1 ; x2 )). Suppose a
decision pro le (^
x1 ; x
^2 ) 2 C1 C2 is such that

f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1 (42.10)
f2 (^
x1 ; x
^2 ) f2 (^
x1 ; x2 ) 8x2 2 C2

In this case, each agent is doing his best given what the other agent does. Agent i has no
incentive to deviate from x ^i { that is, to select a di erent decision { as long as he knows
that the other agent (his \opponent"), denoted i, is playing x ^ i .6 In this sense, decisions
(^
x1 ; x
^2 ) are mutually compatible.
All this motivates the following classic de nition proposed in 1950 by John Nash, which
is the most important equilibrium notion in economics. Here for each agent i we denote by
x i 2 C i = j6=i Cj the decision pro le of his opponents.

De nition 1797 Let f = (f1 ; :::; fn ) : A = A1 An ! Rn be an operator and


C = C1 Cn a subset of A. An element x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of
f on C if, for each i = 1; :::; n,

fi (^
x) fi (xi ; x
^ i) 8xi 2 Ci (42.11)

The vector x^ is thus a Nash equilibrium of the game f : C ! Rn . In the case n = 2, the
equilibrium conditions becomes (42.10). The interpretation is similar: each agent i has no
incentive to deviate from x ^i as long as he knows that his opponents are playing x
^ i . Note
that the de nition of Nash equilibrium does not require any structure on the choice sets Ci .
The scope of this de nition is, therefore, huge. Indeed, it has been widely applied in many
disciplines, within and outside the social sciences.

N.B. Nash equilibrium is de ned purely in terms of agents' individual decisions xi , unlike the
notion of Arrow-Debreu equilibrium (Section 22.9) that involves a variable, the price vector,
which is not under the control of agents. In this sense, the Arrow-Debreu equilibrium is a
spurious equilibrium notion from a methodological individualism standpoint, though most
useful in understanding markets' behavior.7 O

Nash equilibrium is based on the n interdependent parametric optimization problems,


one per agent,

max fi (xi ; x i ) sub xi 2 Ci


xi

where the opponents' decisions x i play the role of the parameter. The solution corre-
spondence i : C i Ci de ned by i (x i ) = arg maxxi fi (xi ; x i ) is called best reply
correspondence. We can reformulate the equilibrium condition (42.11) as

x
^i 2 i (^
x i) 8i = 1; :::; n (42.12)
6
How such mutual understanding among agents emerges is a non-trivial conceptual issue from which we
abstract away, leaving it to game theory courses.
7
Methodological principles are important but a pragmatic attitude should be kept not to transform them
in dogmas.
1234 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

In words, in equilibrium all agents are best replying in that each x


^i solves the optimization
problem

max fi (xi ; x
^ i) sub xi 2 Ci (42.13)
xi

In turn, this easily leads to a di erential characterization of Nash equilibria via Stam-
pacchia's Theorem. To ease matters, we assume that each Ai is a subset of the same space
Rm , so that both A and C are subsets of (Rm )n .

Theorem 1798 Let f = (f1 ; :::; fn ) : A = A1 An (Rm )n ! Rn be an operator and


C = C1 Cn a subset of A. Suppose that, for each i = 1; :::; n,

(i) Ci is a closed and convex subset of the open and convex set Ai ;

(ii) fi is continuously di erentiable in xi .

If x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of f on C, then, for each i = 1; :::; n,

rxi fi (^
x) (xi x
^i ) 0 8xi 2 Ci (42.14)

The converse is true if each fi is concave in xi .

Proof It is enough to note that x


^i is a maximizer of the function fi ( ; x
^ i ) : Ai Rm ! R
on Ci . By Stampacchia's Theorem, the result holds.

When m = 1, so that each Ai is a subset of the real line, the condition takes the simpler
form:
@fi (^
x)
(xi x^i ) 0 8xi 2 Ci
@xi
Moreover, when x
^i is an interior point of Ci , the condition takes the Fermat's form

rxi fi (^
x) = 0 8xi 2 Ci (42.15)

Example 1799 In the Cournot oligopoly, assume that both the demand and cost functions
are linear, where D 1 (q) = a bq and ci (qi ) = cqi with a > c and b > 0. Then, the pro t
function of rm i is i (q1 ; :::; qn ) = (a bq) qi cqi , which is strictly concave in qi . The
choice set of rm i is the set Ci = [0; +1). By the last proposition, the rst-order condition
(42.14) is necessary and su cient for a Nash equilibrium (^ q1 ; :::; q^n ). This condition is, for
every i,
@ i (^
q1 ; :::; q^n )
(qi q^i ) = (a b^
q b^
qi c) (qi q^i ) 0 8qi 0
@qi
So, for every i we have a b^ q b^ qi = c if q^i > 0, and (a b^
q b^
qi ) c if q^i = 0.
We have q^i > 0 for every i. Indeed, assume by contradiction that q^i = 0 for some i. The
rst-order condition then implies a b^ q c, which in turn implies a c, thus contradicting
a > c. We conclude that q^i > 0 for every i. Then, the rst-order condition implies
a c b^
q
q^i = 8i = 1; :::; n
b
42.3. NASH EQUILIBRIA AND SADDLE POINTS 1235

By adding up, one gets


n a c
q^ =
1+n b
So, the unique Nash equilibrium is

1 a c
q^i = 8i = 1; :::; n
1+n b

As n increases, the (per rm) equilibrium quantity decreases. N

The best reply formulation (42.12) permits to establish the existence of Nash equilibria
via a xed point argument based on Kakutani's Theorem.

Theorem 1800 (Nash) Let f = (f1 ; :::; fn ) : A = A1 An (Rm )n ! Rn be an


operator and C = C1 Cn a subset of A. Suppose that, for each i = 1; :::; n, we have

(i) Ci is a convex and compact subset of Ai ;

(ii) fi is continuous and quasi-concave in xi 2 Ci .

Then, f has a Nash equilibrium on C.

Proof Given any x i , the function fi ( ; x i ) : Ai ! R is by hypothesis continuous on the


compact set Ci . By the Maximum Theorem, the best reply correspondence i : C i Ci
is compact-valued and upper hemicontinuous because fi ( ; x i ) : Ai ! R is continuous on
the compact set Ci . Moreover, it is convex-valued because fi ( ; x i ) : Ai ! R is, again by
hypothesis, quasi-concave on the convex set Ci (Proposition 1763). Consider the product
correspondence ' : C C de ned by ' (x1 ; :::; xn ) = 1 (x 1 ) n (x n ). The
correspondence ' is easily seen to be upper hemicontinuous and convex-valued (as readers
can check) on the compact and convex set C. By Kakutani's Theorem, there exists a xed
point (^
x1 ; :::; x
^n ) 2 C such that

(^
x1 ; :::; x
^n ) 2 ' (^
x1 ; :::; x
^n ) = 1 (^
x 1) n (^
x n)

So, x
^i 2 i (^
x i) for each i = 1; :::; n, as desired.

42.3 Nash equilibria and saddle points


Consider the two-agent case. The game f = (f1 ; f2 ) is strictly competitive if there is a strictly
decreasing function ' such that f2 = ' f1 .

Example 1801 When ' (x) = x, we have f2 = f1 . This strictly competitive game f is
called zero-sum. It is the polar case that may arise, for example, in military interactions.
This is the case originally studied by von Neumann and Morgenstern in their celebrated
(wartime) 1944 opus. N
1236 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

We have (cf. Proposition 221):

(' f1 ) (^
x1 ; x
^2 ) (' f1 ) (^
x1 ; x2 ) () f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 )

So, when f is strictly competitive the equilibrium conditions (42.10) reduce to

f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1
f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 ) 8x2 2 C2

that is,
f1 (^
x1 ; x2 ) f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 )
In this case, a pair (^
x1 ; x
^2 ) is a Nash equilibrium if and only if it is a saddle point of f
on C1 C2 . We have thus proved the following mathematically simple, yet conceptually
important, result.

Theorem 1802 Let f = (f1 ; f2 ) : A1 A2 ! R be a strictly competitive operator and


C1 and C2 subsets of A1 and A2 , respectively. Then, a pair (^
x1 ; x
^2 ) 2 C1 C2 is a Nash
equilibrium if and only if it is a saddle point.

Saddle points are thus Nash equilibria of strictly competitive games. In particular, the
Minimax Theorem is the special case of Nash's Theorem for strictly competitive games. This
further clari es the nature of saddle points as a way to model individual optimization prob-
lems that are \negatively" interdependent, so agents expect the worst from their opponent
and best reply by maxminimizing.

42.4 Nash equilibria on a simplex


As in the Minimax Theorem, in Nash's Theorem the choice sets Ci are required to be convex,
so they have to be in nite (unless they are singletons). This raised the question of how to
\convexify" the nite choice sets that economic applications often feature. Mixing through
randomization is, typically, the way to answer this important question. In Section 42.6.1
we will elaborate. In any case,Pm formally this means that the choice set Ci is the simplex
m m
m 1 = (x1 ; :::; xm ) 2 R+ : i=1 xi = 1 of R . In this case, the following di erential
characterization holds.

Proposition 1803 Let f = (f1 ; :::; fn ) : m 1 n


m 1 ! R . If (^ x1 ; :::; x
^n ) 2
n
m 1 m 1 is a Nash equilibrium of f , then there exists ^ 2 R+ such that for each
i = 1; :::; n we have
@fi @fi
(^
x) = ^ i if x
^ik > 0 ; (^
x) ^i if x
^ik = 0
@xik @xik
for all k = 1; :::; m. The converse holds if each fi is concave in xi .

Proof Here condition (42.14) takes the normal cone form8

rxi fi (^
x) 2 N m 1 (^
x) 8xi 2 Ci
8
Recall Section 40.2.2.
42.5. PARAMETRIC INTERDEPENDENT OPTIMIZATION 1237

So, the result follows from Proposition 1747 and from Stampacchia's Theorem.

The objective function fi of agent i is often assumed to be a ne in xi because of the ex-


pected utility hypothesis (Section 42.6.1). Interestingly, next we show that in this important
case by Bauer's Theorem equilibrium decisions are convex combinations of extreme points
of the simplex.

Proposition 1804 Let f = (f1 ; :::; fn ) : m 1 ! Rn , with each fi a ne in


m 1
xi . Then, (^
x1 ; :::; x
^n ) 2 m 1 m 1 is a Nash equilibrium of f if and only if

max fi (xi ; x
^ i) = max fi (xi ; x
^ i) (42.16)
xi 2 m 1 xi 2fe1 ;:::;em g

and
;=
6 arg max fi (xi ; x
^ i ) = co arg max fi (xi ; x
^ i) (42.17)
xi 2 m 1 xi 2fe1 ;:::;em g

Proof By Bauer's Theorem { via Corollary 1038 { we have

arg max fi (xi ; x


^ i ) = co arg max fi (xi ; x
^ i) = co arg max fi (xi ; x
^ i)
xi 2 m 1 xi 2ext m 1 xi 2fe1 ;:::;em g

because ext m 1 = e1 ; :::; em .

By (42.17), the set of Nash equilibria is a non-empty set that consists of the n-tuples
(^
x1 ; :::; x
^n ) 2 m 1 m 1 such that

x
^i 2 co arg max fi (xi ; x
^ i)
xi 2fe1 ;:::;em g

for each i = 1; :::; n. Thus, x


^i is either a versor that best replies to the opponent's decisions
x
^ i or a convex combination of such versors. In particular, we have

^ik > 0 =) ek 2 arg max fi (xi ; x


x ^ i) 8k = 1; :::; m (42.18)
xi 2 m 1

^ik correspond to best replying versors ek .


Thus, in equilibrium strictly positive weights x
Moreover, by (42.16) in terms of value attainment agent i can solve the optimum problem

max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi

that only involves the versors. In the next section we will discuss the signi cance of all this
for games and decisions under randomization.

42.5 Parametric interdependent optimization


Interdependent optimization problems often take a parametric form

f = (f1 ; :::; fn ) : A = A1 An ! Rn (42.19)

where is a parameter space. For each agent i we have fi : A ! Rn .


1238 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

De nition 1805 Let f = (f1 ; :::; fn ) : A ! Rn be a parametrized operator and C =


C1 9
Cn a subset of A. An element x ^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of f on
C for 2 if, for each i = 1; :::; n,

fi (^
x; ) fi (xi ; x
^ i; ) 8xi 2 Ci (42.20)

Denote by N E ( ) the collection of all Nash equilibria x


^ 2 C of f on C for 2 . The
Nash equilibrium correspondence : S C is de ned by

( ) = NE ( )

So, the correspondence associates to each parameter the corresponding set of Nash
equilibria. Its domain S is the collection of all parameters for which Nash equilibria
exist. If such equilibria are unique for all 2 S, then is a Nash equilibrium function.

Example 1806 In the last Cournot oligopoly example, with D 1 (q) = a bq and ci (qi ) =
cqi , it is natural to regard the coe cients as parameters. Thus, i (q1 ; :::; qn ; ) = (a bq) qi
cqi with = (a; b; c). In particular, the Nash equilibrium function is given by

1 a c 1 a c
( )= ; :::; 2 Rn
1+n b 1+n b
N

The parameter space may have a product form = 1 n , so = ( 1 ; :::; n ).


Intuitively, i is the individual parameter of agent i. It may be all that matters for agent i,
while the opponents' parameters i do not matter; formally, for each a = (a1 ; :::; an ) and
each i 2 i , it holds
0 00 0 00
fi a; i ; i = fi a; i ; i 8 i; i 2 i

Without loss, one can then directly write fi : A i ! Rn .

Example 1807 In the Cournot oligopoly example, assume that D 1 (q) = 10 q and
ci (qi ) = ci qi . In this case, i (q1 ; :::; qn ; i ) = (10 q) qi ci qi with i = ci . Thus, =
(c1 ; :::; cn ). N

Often we are in an intermediate case where the parameter space has the form =
~ 1 n , where ~ is a common parameter space across all agents and, instead, each
i is an individual parameter space that is relevant only for player i. So, = (~; 1 ; :::; n )
and we can write fi : A ~ n
i !R .

Example 1808 In the Cournot oligopoly example, now assume that D 1 (q) = a bq and
ci (qi ) = ci qi . Then i (q1 ; :::; qn ; ~; i ) = (a bq) qi ci qi with ~ = (a; b) and i = ci . So,
= (a; b; c1 ; :::; cn ). N
9
For simplicity, we abstract from any dependence of the choice sets on parameters (otherwise we should
consider a feasibility correspondence ' : C).
42.6. APPLICATIONS 1239

We close with an important continuity property of the Nash equilibrium correspondence.


As we did before, to ease matters we assume that each Ai is a subset of the same space Rm ;
we also assume that is a subset of Rk .

Proposition 1809 If f : A (Rm )n Rk ! Rn is continuous on the closed set C ,


then the Nash equilibrium correspondence :S C has closed graph.

By Proposition 958, is then upper hemicontinuous (so, continuous if a function) if its


domain S is closed.

Proof Let xn ; n Gr be a sequence that converges to some x; 2 (Rm )n Rk .


Since the set C is closed, we have x; 2 C . We want to show that x; 2 Gr .
Suppose, by contradiction, that x; 2 = Gr . So, there is an agent i and a deviation x
~i 2 Ci
such that fi x ~i ; x i ; > fi x; . Since xn ; n 2 Gr for all n, we have fi xn ; n
fi x
~i ; (xn ) i ; n for all n. Since f is continuous on C , we then reach the contradiction

fi x
~i ; x i ; > fi x; = lim fi xn ; n
n!1
lim fi x
~i ; (xn ) i; n = fi x
~i ; x i ;
n!1

We conclude that x; 2 Gr , as desired.

42.6 Applications
42.6.1 Randomization in games and decisions
Suppose that an agent has a set S = fs1 ; s2 ; :::; sm g of m pure actions (or strategies),
evaluated with a utility function u : S ! R. Since the set S is nite, it is not convex (unless
it is a singleton), so we cannot use the powerful results { such as Nash's Theorem { that
throughout the book we saw to hold for concave (or convex) functions de ned on convex
sets. A standard way to embed S in a convex set is via randomization, as readers will learn
in game theory courses. Here we just outline the argument to illustrate the results of the
chapter.
Speci cally, by randomizing via some random device { coin tossing, roulette wheels,
and the like { agents can select a mixed (or randomized ) action in which (sk ) is the
probability that the random device assigns to the pure action sk . Denote by (S) the set of
all randomized actions. According to the expected utility criterion, an agent evaluates the
randomized action via the function U : (S) ! R de ned by
m
X
U( )= u (sk ) (sk )
k=1

In words, the randomized action is evaluated by taking the average of the utilities of
the pure actions weighted by their probabilities under .10 Note that each pure action sk
corresponds to the \degenerated" randomized action that assigns it probability 1, i.e.,
10
Weighted averages are discussed in Section 15.10.
1240 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

(sk ) = 1. Via this identi cation, we can regard S as a subset of (S) and thus write, with
an abuse of notation, S (S).
Under randomization, agents aim to select the best randomized action by solving the
optimization problem
max U ( ) sub 2 (S) (42.21)

where (S) is the choice set and U is the objective function.

We can extract the mathematical essence of this optimization problem by identifying a


randomized action with an element x of the simplex m 1 via the relation

(sk ) ! xk

In particular, a degenerate , with (sk ) = 1, is identi ed with the versor ek . That is, pure
actions can be identi ed with the versors of the simplex, i.e., with its extreme points. For
instance, if is such that (s2 ) = 1, then it corresponds to the versor e2 .
Summing up, we have the following identi cations and inclusions:

S ! ext m 1

(S) ! m 1

In this way, we have \convexi ed" S by identifying it with a subset of the simplex, which is
a convex set in Rm . In this sense, we have convexi ed S.

Example 1810 Let S = fs1 ; s2 ; s3 g. Then

(s1 ) ! x1 ; (s2 ) ! x2 ; (s3 ) ! x3

Here we have:

S = fs1 ; s2 ; s3 g ! ext 2 = e1 ; e2 ; e3

(s1 ; s2 ; s3 ) ! 2 = (x1 ; x2 ; x3 ) 2 R3+ : x1 + x2 + x3 = 1

For instance, if 2 (S) is such that (s1 ) = (s2 ) = 1=4, and (s3 ) = 1=2, then it
corresponds to x = (1=4; 1=4; 1=2). N

By setting uk = u (sk ) for each k, the expected utility function U can be identi ed with
the a ne function V : m 1 ! R de ned by
m
X
V (x) = uk xk = u x
k=1

where u = (u1 ; u2 ; :::; um ) 2 Rm . The optimization problem (42.21) of the agent becomes

max V (x) sub x 2 m 1 (42.22)


x
42.6. APPLICATIONS 1241

It is a nice concave optimization problem in which the objective function V is a ne and the
choice set m 1 is a convex and compact set of Rm . In particular, by Proposition 1804 we
have
max V (x) = max V (x) (42.23)
x2 m 1 x2fe1 ;:::;em g

and
;=
6 arg max V (x) = co arg max V (x) (42.24)
x2 m 1 x2fe1 ;:::;em g

By (42.23), agents' optimal mixed actions are convex combinations of pure actions that,
in turn, are optimal. So, the optimal x
^ is such that

^k > 0 =) ek 2 arg max V (x)


x 8k = 1; :::; m
x2 m 1

That is, the pure actions that are assigned a strictly positive weight by an optimal mixed
action are, in turn, optimal. By (42.24), in terms of value attainment problem (42.22) is
equivalent to the much simpler problem

max V (x) sub x 2 e1 ; :::; em


x

that only involves pure actions.


Similar identi cations can be done in a game with n agents. To keep notation simple,
we consider two agents that have a set Si = fsi1 ; :::; sim g of m pure actions, evaluated
with a utility function ui : S1 S2 ! R. By randomizing, they can consider mixed actions
i 2 (Si ). Because of interdependence, agent i evaluates a pro le f 1 ; 2 g of mixed actions,
one per agent, via an expected utility function Ui : (S1 ) (S2 ) ! R de ned by
m
X
Ui ( 1; 2) = (s1k ) (s2k0 ) ui (s1k ; s2k0 )
k;k0 =1

Under randomization, agents choose a mixed actions. In particular, a pair (^ 1 ; ^ 2 ) 2 (S1 )


(S2 ) is a Nash equilibrium if

Ui (^ i ; ^ i ) Ui ( i ; ^ i ) 8 i 2 (Si )

for each i = 1; 2.
The mixed actions (Si ) can be identi ed with the simplex m 1 , with its extreme points
ei representing the pure actions si . De ne ui : f1; :::; mg f1; :::; mg ! R by ui (k 0 ; k 00 ) =
ui (s1k0 ; s2k00 ). We can then identify Ui with the function Vi : m 1 m 1 ! R de ned by
X
Vi (x1 ; x2 ) = x1k0 x2k00 ui k 0 ; k 00 = x1 Ui x2
(k0 ;k00 )2f1;:::;mg f1;:::;mg

where Ui is the square matrix of order m that has the values ui (k 0 ; k 00 ) as entries.
The function Vi is a ne in xi . A pair (^
x1 ; x
^2 ) 2 m 1 m 1 is a Nash equilibrium if

Vi (^
xi ; x
^ i) Vi (xi ; x
^ i) 8xi 2 m 1
1242 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

for each i = 1; 2. By Proposition 1804,

max Vi (xi ; x
^ i) = max Vi (xi ; x
^ i) (42.25)
xi 2 m 1 xi 2fe1 ;:::;em g

and
;=
6 arg max Vi (xi ; x
^ i ) = co arg max Vi (xi ; x
^ i) (42.26)
xi 2 m 1 xi 2fe1 ;:::;em g

By (42.26), equilibrium mixed actions are convex combinations of pure actions that, in turn,
best reply to the opponent's mixed action. So, the equilibrium x
^i is such that (42.18) holds,
i.e.,
x^ik > 0 =) ek 2 arg max Vi (xi ; x ^ i)
xi 2 m 1

for each i = 1; 2. That is, the pure actions ek that are assigned a strictly positive weight x
^ik
by an equilibrium mixed action x ^i of an agent are, in turn, best replies to the opponent's
equilibrium mixed action x ^ i . Moreover, by (42.25) in terms of value attainment agent i can
solve the optimum problem

max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi

that only involves pure actions.

42.6.2 Kuhn-Tucker's saddles


Saddle points provide an interesting angle on Lagrange multipliers. For simplicity, consider
an optimization problem with inequality constraints

max f (x) (42.27)


x
sub g1 (x) b1 ; g2 (x) b2 ; :::; gm (x) bm

where f : A Rn ! R is the objective function, while the functions gi : A Rn ! R and


the scalars bi 2 R induce m inequality constraints.11
For this problem the Lagrangian function L : A Rm + ! R is de ned by

L (x; ) = f (x) + (b g (x)) 8 (x; ) 2 A Rm


+

x; ^ ) 2 A
A pair (^ Rm
+ is a saddle point of L on A Rm
+ if

L (^
x; ) x; ^ )
L(^ L(x; ^ ) 8x 2 A; 8 0

Lemma 1811 A pair (^x; ^ ) 2 A Rm


+ is a saddle point of the Lagrangian function L :
m
A R+ ! R if and only if

(i) f (^
x) f (x) + ^ (b g (x)) for every x 2 A;

(ii) g (^
x) b and ^ i (bi gi (^
x)) = 0 for all i = 1; :::; m.
11
Later we will invoke Slater's condition: till then, this setup actually includes also equality constraints (cf.
the discussion at the end of Section 39.1). For this reason we use the letters g and (rather than h and ).
42.6. APPLICATIONS 1243

x; ^ ) 2 A Rm
Proof \Only if". Let (^ + be a saddle point of the Lagrangian function L :
m
A R+ ! R. Since L (^ x; ) L(^ x; ^ ) for all 0, it follows that

( ^ ) (b g (^
x)) 0 8 0 (42.28)

Putting = ^ + ei , then (42.28) implies bi gi (^


x) 0. Since this holds for every i = 1; :::; m,
we have g (^x) b. Moreover, by taking = 0 from (42.28) it follows ^ (b g (^ x)) 0, while
by taking = 2 ^ from (42.28) it follows ^ (b g (^ x)) 0. So, ^ (b g (^ x)) = 0. Then,
x; ^ ) = f (^
L(^ x) and

f (^ x; ^ )
x) = L(^ L(x; ^ ) = f (x) + ^ (b g (x)) 8x 2 A (42.29)

Since the positivity of ^ implies that, provided g (^ x) b, condition ^ (b g (^ x)) = 0 is


^
equivalent to i (bi gi (^ x)) = 0 for all i = 1; :::; m, we conclude that (i) and (ii) hold.
\If". Assume that conditions (i) and (ii) hold. By taking x = x ^, from (i) it follows that
f (^
x) f (^ x) + ^ (b g (^ x)). By (ii) b g (^ x) 0, so f (^ x) + ^ (b g (^ x)) f (^ x) since
^ 0. We conclude that f (^ x) + ^ (b g (^ x)) = f (^ x), so that

^ (b g (^
x)) = 0 (42.30)

Thus, for every 0 we have:

x; ^ )
L(^ x; ) = ( ^
L (^ ) (b g (^
x)) = (b g (^
x)) 0

x; ^ )
which implies L(^ L (^
x; ) for all 0. On the other hand, (i) and (42.30) imply

x; ^ ) = f (^
L(^ x) f (x) + ^ (b g (x)) = L(x; ^ ) 8x 2 A

x; ^ )
so that L(^ L(x; ^ ) for all x 2 A. We conclude that (^
x; ^ ) is a saddle point of L on
A R+ .m

The next result is a rst dividend of this lemma.

Proposition 1812 A vector x ^ 2 A solves problem (42.27) if there exists ^ 0 such that
x; ^ ) is a saddle point of the Lagrangian function L on A Rm
(^ +.

So, the existence of a saddle point for the Lagrangian function implies the existence of
a solution for the underlying optimization problem with inequality constraints. No assump-
tions are made on the functions f and gi . If we make some standard assumptions on them,
the converse becomes true, thus establishing the following remarkable \saddle" version of
Kuhn-Tucker's Theorem.

Theorem 1813 Let f : A Rn ! R and gi : A Rn ! R be continuously di erentiable on


an open and convex set A, with f concave and each gi convex. Assume Slater's condition,
i.e., there exists x 2 A such gi (x) < bi for all i = 1; :::; m. Then, the following conditions
are equivalent:

(i) x
^ 2 A solves problem (42.27);
1244 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

(ii) there exists a vector ^ x; ^ ) is a saddle point of the Lagrangian function


0 such that (^
L on A Rm +;

(iii) there exists a vector ^ 0 such that the Kuhn-Tucker conditions hold

x; ^ ) = 0
rx L(^ (42.31)
^ i r L(^x; ^ ) = 0 8i = 1; :::; m (42.32)
i

r L(^x; ^ ) 0 (42.33)

Proof (ii) implies (i) by the last proposition. (i) implies (iii) by what we learned in Section
40.3. (iii) implies (ii) by Theorem 1793. Indeed the Kuhn-Tucker conditions are nothing but
conditions (42.7) and (42.8) for the Lagrangian function (cf. Example 1746). First, note that
condition (42.7) takes the form rx L(^ x; ^ ) = 0 because the set A is open. As to condition
(42.8), here it becomes
r L(^x; ^ ) ( ^) 0 8 0 (42.34)
This condition is equivalent to (42.32) and (42.33). From (42.32) it follows r L(^ x; ^ )
^ = 0, while from (42.33) it follows that r L(^ x; ^ ) 0 for all 0. So, (42.34) holds.
Conversely, by taking = 0 in (42.34), we have r L(^ ^
x; ) ^ 0 and by taking = 2 ^ we
have r L(^ x; ^ ) ^ x; ^ ) ^ = 0. Finally, by taking = ^ + ei in (42.34), we
0, so r L(^
easily get r L(^ x; ^ ) 0. Since r L(^ x; ^ ) = b g (x), from b g (x) and the positivity of
^ it follows that r L(^ x; ) = 0 is equivalent to ^ i r L(^
^ ^ x; ^ ) = 0 for all i = 1; :::; m. In
i
sum, the Kuhn-Tucker conditions are the form that conditions (42.7) and (42.8) take here.
Since the Lagrangian function is easily seen to be a saddle function when f concave and each
gi convex, this prove that properties (ii) and (iii) are equivalent, thus completing the proof.

By Proposition 1790, (^ x; ^ ) is a saddle point of the Lagrangian function L on A Rm


+ if
and only if there exists a vector ^ 0 such that:

(i) x
^ solves the primal problem

max inf L (x; ) sub x 2 A


x 0

(ii) ^ solves the dual problem

min sup L (x; ) sub 0 (42.35)


x2A

(iii) the two values are equal, i.e.,

x; ^ ) = min sup L (x; )


max inf L (x; ) = L(^
x2A 0 0 x2A

The primal problem is actually equivalent to the original problem (42.27). Indeed, let us
write problem (42.27) in canonical form as

max f (x) sub x 2 C


x
42.6. APPLICATIONS 1245

where the choice set is C = fx 2 A : g (x) bg. Since


inf L (x; ) = f (x) + inf (b g (x))
0 0

we have (
1 if x 2
=C
inf L (x; ) =
0 f (x) if x 2 C
because inf 0 (b g (x)) = 1 if x 2
= C and inf 0 (b g (x)) = 0 if x 2 C.
We conclude that
max inf L (x; ) = max f (x)
x2A 0 x2C
and
arg max inf L (x; ) = arg max f (x)
x2A 0 x2C
so the primal and the original problem are equivalent in terms of both solutions and value
attainment. We thus have the following corollary of the last theorem, which relates the
original and dual problems.
Corollary 1814 Let f : A Rn ! R and gi : A Rn ! R be continuously di erentiable
on an open and convex set A, with f concave and each gi convex. If x^ 2 A solves problem
(42.27) and Slater's condition holds, then there exists ^ 0 that solves the dual problem
(42.35), with maxx2C f (x) = min 0 supx2A L (x; ).
Summing up, in concave optimization problems with inequality constraints the solution
^ and the multiplier ^ solve dual optimization problems that are mutually consistent. In
x
particular, multipliers admit a dual optimization interpretation in which they can be viewed
as (optimally) chosen by some ctitious, yet malevolent, opponent (say, nature). An individ-
ual optimization problem is thus solved by embedding it in a ctitious game against nature,
a surprising paranoid twist on multipliers.
Under such game-theoretic interpretation, the Kuhn-Tucker conditions characterize a
saddle point of the Lagrangian function in that they are the form that conditions (42.7) and
(42.8) take for the Lagrangian function. We can write them explicitly as:
x; ^ )
@L(^ x; ^ )
@f (^
= i (bi gi (^
x)) = 0 8i = 1; :::; n
@xi @xi
x; ^ ) ^
@L(^
i =0 8i = 1; :::; m
@ i
x; ^ )
@L(^
= bi gi (x) 0 8i = 1; :::; m
@ i
This is our last angle on Kuhn-Tucker's Theorem, the deepest one.

42.6.3 Linear programming: duality


An elegant application of the game theoretic angle on Kuhn-Tucker's Theorem is a duality
result for linear programming (Section 22.6). Given a m n matrix A = (aij ) and vectors
b 2 Rm and c 2 Rn , consider the linear programming problem
max c x sub x 2 P = x 2 Rn+ : Ax b (42.36)
x
1246 CHAPTER 42. INTERDEPENDENT OPTIMIZATION

as well as the minimization problem


min b sub 2 = 2 Rm
+ :A
T
c (42.37)

The last corollary implies the following classic duality result.

Theorem 1815 (Duality Theorem of Linear Programming) Suppose Slater's condi-


tion holds for both problems (42.36) and (42.37). Then, there exists x ^ 0 that solves
problem (42.36) if and only if there exists ^ 0 that solves problem (42.37). In this case,
their optimal values are equal:
max c x = min b
x 0 0

As the proof clari es, the two problems (42.36) and (42.37) are one the dual of the other,
either providing the multipliers to the other. In particular, solutions exists if either of the
two polyhedra P and is bounded (Corollary 1038).

Proof The Lagrangian function L : Rn+ Rm


+ ! R of problem (42.36) is

L (x; ) = c x + (b Ax)
Its dual problem is
min sup L (x; ) sub 0 (42.38)
x 0
We have
sup L (x; ) = sup c x + (b Ax) = b + sup c x Ax
x 0 x 0 x 0
n m
!
X X
= b + sup cj aij i xj = b + sup c AT x
x 0 j=1 i=1 x 0

Consider the polyhedron = 0 : AT c in Rm . Then


(
+1 if 2
=
sup L (x; ) =
x2Rn b if 2
because supx2Rn c AT x = 0 if 2 = and supx2Rn c AT x = +1 if 2 . We
conclude that the dual problem (42.38) reduces to problem (42.37), which can be written in
linear programming form as
max b sub 2 = 0: AT c (42.39)

~ : Rm
In turn, the Lagrangian function L Rn+ ! R of this problem is
+
n m
!
X X
~ ( ; x) =
L b+x c + AT = b+ cj + aij i xj
j=1 i=1
= c x (b Ax) = L (x; )
x; ^ ) is a saddle point of L if and only if ( ^ ; x
So, (^ ~ We conclude
^) is a saddle point of L.
that the linear programs (42.36) and (42.39) are one dual to the other, each providing the
multipliers to the other. By Corollary 1814 the result then follows.
42.6. APPLICATIONS 1247

Example 1816 Let 2 3


1 2 2 1
A=4 0 2 1 2 5
0 1 1 3
and b = (1; 3; 2) and c = ( 1; 2; 4; 2). Consider the linear programming problem

max x1 + 2 (x2 x4 ) + 4x3


x1 ;x2 ;x3 ;x4
sub x1 2x2 + 2x3 + x4 1; 2 (x2 + x4 ) x3 3; x2 x3 + 3x4 2
x1 0; x2 0, x3 0, x4 0

Since 2 3
1 0 0
6 2 2 1 7
AT = 6
4 2
7
1 1 5
1 2 3
the dual problem is

min 1 +3 2 +2 3
1; 2; 3

sub 1 1; 2 ( 2 1) + 3 2; 2 1 2 3 4, 1 +2 2 +3 3 2
1 0; 2 0, 3 0

In view of the Duality Theorem of Linear Programming, if the two problems satisfy Slater's
condition (do they?) then either problem has a solution if the other does, with

max x1 + 2 (x2 x4 ) + 4x3 = min 1 +3 2 +2 3


x 0 0
1248 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
Chapter 43

Variational inequality problems

In this nal chapter of this part we study variational inequality problems, a topic started
in the early 1960s with the seminal works of Gaetano Fichera and Guido Stampacchia that
elegantly uni es the analysis of concave optimization problems and of operator equations.

43.1 De nition
We begin with a key notion.1

De nition 1817 Let f : C ! R be an operator and y 2 C. A point x


^ 2 C is an equalizer
of f and y if
f (^
x) y 2 NC (^x) (43.1)
that is,
(f (^
x) y) (x x
^) 0 8x 2 C (43.2)

An equalizer thus solves the following (Stampacchia-type) variational inequality problem:

Find x
^ 2 C such that (43.1) holds

We write formally this problem as:

var fy (x) sub x 2 C (43.3)


x

where fy = f y.

Example 1818 When C is a convex cone, by Proposition 1745 a point x


^ 2 C is an equalizer
if and only if
(f (^
x) y) x ^ = 0 and 8x 2 C; (f (^ x) y) x 0
^ 2 Rn+ is an equalizer if and only if
In particular, by (40.8) a point x

fi (^
x) x
^i = 0 8i = 1; :::; n and f (^
x) y

N
1 n
Throughout this section, C denotes a closed and convex set of R . When compact, it is denoted by K.
The term \equalizer" is not standard.

1249
1250 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS

Variational inequality problems encompass two fundamental problems:

(i) When C = Rn , a point x


^ 2 C is an equalizer of f and y if and only if it solves the
operator equation
f (x) = y
de ned by the operator f and with known term y (see Lemma 1819).

(ii) When f is the gradient operator of a concave function F : C ! R, i.e., f = rF , by


Stampacchia's Theorem a point x ^ 2 C is an equalizer of f and y = 0 if and only if it
solves the concave optimization problem

max F (x) sub x 2 C


x

with objective function F and choice set C.

Remarkably, two apparently di erent classes of problems are uni ed by variational in-
equality problems. Results for equalizers thus deliver, as corollaries, results for solutions of
operator equations and of concave optimization problems.

43.2 Properties
We denote by
arg var fy
C

the collection of equalizers, i.e., the solution set of the variational inequality problem (43.3).
We begin with an interesting property of interior solutions.

Lemma 1819 If x
^ 2 arg varC fy is an interior point of C, then f (^
x) = y.

Thus, interior solutions of the variational inequality problem (43.3) are solutions of the
equation f (x) = y. Hence,
arg var fy \ @C
C

consists of the solutions of this variational inequality problem that do not solve equation
f (x) = y.

Proof As x ^ 2 int C, there exists " > 0 such that B" (^ x) C. There exists " > 0 small
enough so that (1 ") x
^ 2 B" (^
x). By taking x = (1 ") x
^ in (43.2) we get (f (^
x) y) x ^=
0. Hence, (43.2) becomes (f (^x) y) x = 0 for all x 2 C. There exists > 0 small enough
^ + ei 2 B" (^
so that x x) for each i = 1; :::; n. By taking x = x ^ + ei for each i = 1; :::; n we
then have f (^
x) y = 0, as desired.

Next we establish an inner monotonicity property.

Lemma 1820 If x
^1 ; x
^2 2 arg varC fy , then

(f (^
x1 ) f (^
x2 )) (^
x1 x
^2 ) 0 (43.4)
43.2. PROPERTIES 1251

Proof By (43.2), we have


(f (^
x1 ) y) (^
x2 x
^1 ) 0 and (f (^
x2 ) y) (^
x1 x
^2 ) 0
By adding up, we get (43.4).

The operator f is thus inner increasing on this solution set. An immediate consequence
is that arg varC fy is at most a singleton { i.e., solutions are unique if they exist { when f is
strictly inner decreasing. Next we state a deeper result.

Proposition 1821 If f : C ! R is continuous and inner decreasing, then arg varC fy is


closed and convex (if non-empty).

This result has, as special cases, the convexity of the solution sets in concave optimization
problems and in operator equations de ned via inner decreasing functions (cf. Proposition
1461). The proof relies on an ingenious lemma proved in Minty (1962).

Lemma 1822 (Minty) Let f : C ! R be continuous and inner decreasing. A point x


^2C
belongs to arg varC fy if and only if
(f (x) y) (x x
^) 0 8x 2 C (43.5)

Proof \Only if". Let x


^ 2 arg varC fy . It holds:
(f (x) y) (x x
^) = (f (x) f (^
x) + f (^
x) y) (x x
^)
= (f (x) f (^
x)) (x x
^) + (f (^
x) y) (x x
^)
| {z } | {z }
a b

We have a 0 because f is inner decreasing and b 0 because x ^ 2 arg varC fy . We conclude


that (43.5) holds.
\If". Suppose that (43.5) holds for some x^ 2 C. We want to show that x ^ 2 arg varC fy .
Let x 2 C. For each t 2 (0; 1), set xt = (1 t) x
^ + tx. Clearly, xt 2 C for each t 2 (0; 1). By
(43.5),
(f (xt ) y) (xt x ^) 0
As xt x
^ = t (x x
^), for each t 2 (0; 1) we then have
(f (xt ) y) (x x
^) 0
By the continuity of f , as t goes to 0 we then have (f (^x) y) (x x
^) 0. Since x was
arbitrarily chosen in C, we conclude that x
^ 2 arg varC fy .

Proof of Proposition 1821 Suppose that arg varC fy 6= ;. For each x 2 C, set
Ex = f~
x 2 C : (f (x) y) (x x
^) 0g
This set is closed and convex. By Minty's Lemma,
\
arg var fy = Ex
C
x2C

This implies that arg varC fy is both closed and convex.

To sharpen the last proposition, we need to introduce a new notion.


1252 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS

De nition 1823 An operator f : C ! Rn is said to be inner coercive if there exists x0 2 C


such that, for all sequences fxn g C,

f (xn ) (xn x0 )
kxn k ! +1 =) ! 1 (43.6)
kxn k

An operator is trivially inner coercive when its domain C is bounded. So, this notion
has a bite on unbounded domains, otherwise it automatically holds.

Proposition 1824 An inner coercive operator f : C ! Rn is proper.

Proof Let fxn g C be such that kxn k ! +1. We want to show that kf (xn )k ! +1. As
f is inner coercive,
jf (xn ) (xn x0 )j
! +1
kxn k
Let " > 0. As kxn k ! +1, there is n large enough so that kx0 k = kxn k < ". By the
Cauchy-Schwarz inequality, for n large enough we then have

jf (xn ) (xn x0 )j kf (xn )k kxn x0 k kf (xn )k (kxn k + kx0 k)


kf (xn )k (1 + ")
kxn k kxn k kxn k

Thus, kf (xn )k ! +1, as desired.

Being proper operators, the preimages of inner coercive operators are bounded sets (cf.
Proposition 1637). It is then not surprising that inner coercivity implies the boundedness of
solution sets.

Proposition 1825 If the operator f : C ! Rn is inner coercive, then arg varC fy is a


bounded set.

Proof Suppose, per contra, that there exists a sequence f^


xn g arg varC fy such that k^
xn k !
+1. As f is inner coercive, we then have

f (^
xn ) (^
xn x0 )
! 1 (43.7)
k^
xn k

But, for each n it holds f (^


xn ) (^
xn x0 ) 0, which contradicts (43.7). We conclude that
the set arg varC fy is bounded.

The following immediate consequence of Propositions 1821 and 1825 completes our anal-
ysis of solution sets.

Corollary 1826 If f : C ! R is continuous, inner decreasing and inner coercive, then


arg varC fy is compact and convex (if non-empty).
43.3. EXISTENCE 1253

43.3 Existence
We now turn to the all-important problem of the existence of equalizers. This problem is
addressed by the following classic existence result proved in the mid-1960s by Felix Browder
and by Philip Hartman and Guido Stampacchia.2

Theorem 1827 (Browder-Hartman-Stampacchia) If f : C ! Rn is continuous and


inner coercive, then for each y 2 Rn it holds

arg var fy 6= ;
C

If, in addition, f is inner decreasing, then arg varC fy is closed and convex.

The proof relies on two interesting lemmas. The rst one considers the important special
case when C is compact (here continuity is enough as inner coercivity is automatically
satis ed, as previously remarked). It is the variational inequality counterpart of Weierstrass'
Theorem, while the theorem is that of Tonelli's Theorem.

Lemma 1828 Let f : K ! Rn be a continuous operator de ned on a compact and convex


set of Rn . For each y 2 Rn there exists x
^ 2 K such that

(f (^
x) y) (x x
^) 0 8x 2 K (43.8)

Proof Suppose, rst, that y = 0. De ne g : K ! Rn by g (x) = f (x) + x. As the projection


PK : Rn ! K is continuous, the composition PK g : K ! K is also continuous. By the
Brouwer Fixed Point Theorem, there exists x
^ 2 K such that

PK (f (^
x) + x
^) = (PK g) (^
x) = x
^

By the Projection Theorem,

(f (^
x) + x
^ PK (f (^
x) x
^)) (PK (f (^
x) + x
^) x) 0 8x 2 K

that is,
(f (^
x) + x
^ x
^) (^
x x) 0 8x 2 K
Thus, f (^
x) (x x ^) 0 for all x 2 K, as desired.
Finally, when y 6= 0 it is enough to consider the continuous function fy : K ! Rn de ned
by fy (x) = f (x) y. By what has been just proved, there exists x ^ 2 K such that

(f (^
x) y) (x x
^) = fy (^
x) (x x
^) 0

for all x 2 K, as desired.

The next lemma is the variational inequality counterpart of Fenchel's Theorem.

Lemma 1829 Given f : C ! Rn and y 2 Rn , let x


^ 2 C. If there exists " > 0 such that

(f (^
x) y) (x x
^) 0 8x 2 B" (^
x) \ C (43.9)

then x
^ 2 arg varC fy .
2
We refer to Kinderlehrer and Stampacchia (1980) for references. In the proof we follow them.
1254 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS

Proof Fix x 2 C. For each n 1 set


x x
^
x
~n = x
^+
n
Clearly, x
~n 2 C for each n. It holds,
1
k~
xn x
^k = kx x
^k
n
Hence, for n large enough we have x
~n 2 B" (^
x) \ C. As x x
^ = n (~
xn x
^), by (43.9) we
then have, for n large enough,

(f (^
x) y) (x x
^) = n (f (^
x) y) (~
xn x
^) 0

As x was arbitrarily chosen in C, we conclude that

(f (^
x) y) (x x
^) 0 8x 2 C

that is, x
^ 2 arg varC fy .

Proof of the Browder-Hartman-Stampacchia Theorem Suppose, rst, that y = 0.


Let us assume that 0 2 C and that x0 = 0. As f is inner coercive, we have

f (xn ) xn
kxn k ! +1 =) ! 1 (43.10)
kxn k

Take c < 0. By (43.10), there exists kc > 0 large enough so that

8x 2 C; kxk kc =) f (x) x c kxk < 0 (43.11)

The set K = C \ fx 2 Rn : kxk kc g is convex and compact. Hence, by Lemma 1828


there exists x
^ 2 K such that

f (^
x) (x x
^) 0 8x 2 K (43.12)

As 0 2 K, this implies f (^
x) x
^ 0. By (43.11), we then have

k^
xk < kc (43.13)

Thus, there is a small enough B" (^


x) such that B" (^
x) \ C K. Hence, by (43.12) we have

f (^
x) (x x
^) 0 8x 2 B" (^
x) \ C

In view of Lemma 1829, we conclude that

f (^
x) (x x
^) 0 8x 2 C (43.14)

Now, let C be any convex and closed set, not necessarily with 0 2 C. Let C0 = C x0 .
Clearly, 0 2 C0 . De ne f0 : C0 ! R by

f0 (x) = f (x + x0 )
43.3. EXISTENCE 1255

Hence, f0 (x x0 ) = f (x). We have z = x x0 2 C0 if and only if x 2 C. Thus,

f (x) (x x0 ) f0 (z) z
=
kxk kz + x0 k

Let kxn k ! +1. As f is inner coercive,

f (xn ) (xn x0 )
! 1
kxn k

Fix " 2 (0; 1). There is n large enough so that kzn k = kzn + x0 k 1 ". Hence,

f0 (zn ) zn f0 (zn ) zn kzn k f0 (zn ) zn


(1 ") = ! 1
kzn k kzn k kzn + x0 k kzn + x0 k

and so
f0 (zn ) zn
! 1
kzn k
Thus, f0 is inner coercive since 0 2 C0 . By (43.14), there exists z^ 2 C0 such that f0 (^ z)
(z z^) 0 for all z 2 C0 . Let x ^ = z^ + x0 , so that f (^
x) = f0 (^
z ). For each x 2 C, we then
have
f (^
x) (x x ^) = f (^x x) ((x x0 ) (^ x x0 )) = f0 (^ z ) (z z^) 0
as desired. This completes the proof when y = 0. For the case y 6= 0, it is enough to
observe that the function fy : C ! Rn given by fy = f y is easily seen to inherit both
continuity and inner coercivity from f . This completes the proof that arg varC fy 6= ;. The
inner decreasing part follows from Proposition 1821.

We continue the analysis by introducing a strong notion of inner monotonicity.

De nition 1830 An operator f : C ! Rn is said to be strongly inner decreasing if there


exists < 0 such that

(f (x1 ) f (x2 )) (x1 x2 ) kx1 x2 k2 8x1 ; x2 2 C

Clearly, a strongly inner decreasing operator is strictly inner decreasing. The converse is
false: the quadratic function on [0; 1) is strictly but not strongly inner increasing.
The next result shows the importance of this strong notion of inner monotonicity by
proving that it provides the di erential characterization of strong concavity. In so doing, it
completes the di erential characterizations of concavity established in Theorem 1473.

Theorem 1831 A di erentiable f : C ! R is strongly concave if and only if rf : C ! Rn


is strongly inner decreasing.

Proof We prove the \only if" and leave the converse to the reader. Let f be strongly
concave. So, there exists k > 0 such that g = f + k k k is concave. Hence,

(rf (x1 ) rf (x2 )) (x1 x2 ) = (rg (x1 ) rg (x2 )) (x1 x2 ) 2k kx1 x2 k2


| {z }
a
1256 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS

It holds a 0 because g is concave and so, by Theorem 1473, its gradient operator is inner
decreasing. Thus,

(rf (x1 ) rf (x2 )) (x1 x2 ) ( 2k) kx1 x2 k2

and, by setting = 2k < 0, this proves that the gradient operator rf : C ! Rn is strongly
inner decreasing.

Strongly inner decreasing operators are easily seen to be inner coercive. They thus feature
both inner monotonicity and coercivity. For them we can then establish an elegant unique
existence result that comes with a key regularity property of the solution map.

Theorem 1832 If f : C ! Rn is continuous and strongly monotone, then for each y 2 Rn


there exists a unique x
^y 2 C such that

f (^
xy ) y 2 NC (^
x)

Moreover, the solution function : Rn ! C de ned by (y) = x


^y is Lipschitz continuous.

This result can be seen as the counterpart here of Theorem 1501, which established the
remarkable optimality properties of strongly concave functions.

Proof The unique existence of x^y is an immediate consequence of the Browder-Hartman-


Stampacchia Theorem. It remains to prove that the solution map is Lipschitz continuous.
Let yn ! y. Let x ^ = (y) and x
^n = (yn ) for each n. We want to show that x^n ! x^. It
holds, for each n,

(f (^
x) y) (^
xn x
^) 0 and (f (^
xn ) yn ) (^
x x
^n ) 0

By adding up,
(f (^
x) f (^
xn ) + yn y) (^
xn x
^) 0
By the Cauchy-Schwarz inequality,

(f (^
x) f (^
xn )) (^
xn x
^) kyn yk k^
xn x
^k (43.15)

As f is strongly monotone, there is < 0 such that

(f (^
xn ) f (^
x)) (^
xn x
^) k^
xn ^ k2
x

Hence,
(f (^
x) f (^
xn )) (^
xn x
^) ( ) k^
xn ^k2
x
By (43.15), we then have

kyn yk k^
xn x
^k ( ) k^
xn ^ k2
x

that is,
kyn yk ( ) k^
xn x
^k
We conclude that
1
k (yn ) (y)k kyn yk

that is, the solution function is Lipschitz continuous with coe cient 1.
43.4. EQUATIONS 1257

43.4 Equations
An important dividend of the previous analysis is the following existence result for solutions
of equations.

Theorem 1833 (Browder-Minty) A continuous and inner coercive operator f : Rn ! Rn


is surjective, i.e., for each y 2 Rn the equation

f (x) = y (43.16)

has a solution. If, in addition, f is inner decreasing, the solution set f 1 (y) is closed and
convex.

Proof By taking C = Rn , the result is a straightforward consequence of the Browder-


Hartman-Stampacchia Theorem.

Thus, when a continuous and inner coercive operator f : Rn ! Rn is injective, it is


then an homeomorphism by the Domain Invariance Theorem. Equation (43.16) is then well
posed. As we learned earlier in the book (Chapter 35), injectivity is ensured by Jacobian
di erential conditions.

Lemma 1834 Let f : U ! Rn be a inner decreasing and k-times continuously di erentiable


operator de ned on an open convex set. Then, f : U ! Im f is a C k -di eomorphism if and
only if
det Df (x) 6= 0 8x 2 Rn

Proof As f is inner decreasing, by Proposition 1461 its level sets are convex. As f is
continuously di erentiable, by Proposition 1641 its level sets are discrete sets. Hence, they
are singletons, i.e., f : U ! Im f is a bijective function. As f is k-times continuously
di erentiable, by Proposition 1644 it is a C k -di eomorphism.

This lemma, along with the Browder-Minty Theorem, has as an immediate consequence
the following global inversion result, a monotone version of the classic Caccioppoli-Hadamard
Theorem.

Proposition 1835 An inner decreasing, inner coercive and k-times di erentiable operator
f : Rn ! Rn , with 1 k 1, is a C k -di eomorphism if and only if it

det Df (x) 6= 0 8x 2 Rn

With this interesting result, we complete our study of variational inequalities.


1258 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS
Part VIII

Integration

1259
Chapter 44

The Riemann integral (sdoganato)

44.1 The method of exhaustion


Let us consider a positive function f (i.e., taking values 0) de ned on a closed interval
[a; b]. Intuitively, the integral of f on [a; b] is the measure, called area, of the plane region

A f[a;b] = f(x; y) 2 [a; b] R+ : 0 y f (x)g (44.1)

under the graph of the function f on the interval. Graphically:

6
y

1
O a b x

0
0 1 2 3 4 5 6

The problem is how to make this natural intuition rigorous. As the gure shows, the
plane region A f[a;b] is a \curved" trapezoid with three straight sides and a curved one.
So, it is not an elementary geometric gure that we know how to compute its area. To
our rescue comes a classic procedure known as the method of exhaustion. It consists in ap-
proximating from above and below the area of a non-trivial geometric gure (such as our
trapezoid) through the areas of simple circumscribed and inscribed elementary geometric
gures, typically polygons (in our case, the so-called \plurirectangles"), whose measure can
be calculated in an elementary way. Assume that the resulting upper and lower approxima-
tions can be made more and more precise via polygons having more and more sides, till in
the limit of \in nitely many sides" they reach a common limit value, we then take such a

1261
1262 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

common value as the sought-after area of the non-trivial geometric gure (in our case, the
area of the trapezoid, so the integral of f on [a; b]).
In the next sections we will make rigorous the heuristic procedure just outlined. The
method of exhaustion originates in Greek mathematics, where it found wonderful applications
in the works of Eudoxus of Cnidus and Archimedes of Syracuse, who with this method were
able to compute or approximate the areas of some highly non-trivial geometric gures.1

44.2 Plurirectangles

We know how to calculate the areas of elementary geometric gures. Among them, the
simplest ones are rectangles, whose area is given by the product of the side lengths. A
simple, but key for our purposes, generalization of a rectangle is the plurirectangle, that is,
the polygon formed by contiguous rectangles. Graphically:

-1
-1 0 1 2 3 4 5 6 7 8 9

Clearly, the area of a plurirectangle is just the sum of the areas of the individual rectangles
that compose it.
Let us go back now to the plane region A f[a;b] under the graph of a positive function f
on [a; b]. It is easy to see how such region can be sandwiched between inscribed plurirectangles
and circumscribed plurirectangles. For example, the following plurirectangle

1
For instance, Example 2095 of Appendix C reports the famous Archimedes approximation of , the area
of the closed unit ball, via the method of exhaustion based on circumscribed and inscribed regular polygons.
44.2. PLURIRECTANGLES 1263

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

is inscribed in A f[a;b] , while the following plurirectangle circumscribes it:

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

Naturally, the area of A f[a;b] is larger than the area of any inscribed plurirectangle and
smaller than the area of any circumscribed plurirectangle. The area of A f[a;b] is, therefore,
in between the areas of the inscribed and circumscribed plurirectangles.
We thus have a rst key observation: the area of A f[a;b] can always be sandwiched
between areas of plurirectangles. This yields simple lower approximations (the areas of
the inscribed plurirectangles) and upper approximations (the areas of the circumscribed
plurirectangles) of the area of A f[a;b] .
A second key observation is that such a sandwich, and consequently the relative ap-
proximations, can be made better and better by considering ner and ner plurirectangles,
obtained by subdividing further and further their bases:
1264 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

4 y 4 y

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
O a b x O a b x
-0.5 -0.5

-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6

Indeed, by subdividing further and further the bases, the area of the inscribed plurirectangles
becomes larger and larger, though it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
though it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the region A f[a;b] { i.e., the lower and the upper approximations
{ take values that become closer and closer to each other.
If by considering ner and ner plurirectangles, corresponding to ner and ner sub-
divisions of the bases, in the limit the lower and upper approximations coincide { so, the
two slices of the sandwich merge { such a limit common value can be rightfully taken to be
the area of A f[a;b] . In this way, starting with objects, the plurirectangles, that are sim-
ple to measure we are able to measure via better and better approximations a much more
complicated object such as the area of the plane region A f[a;b] under f . The method of
exhaustion is one of the most powerful ideas in mathematics.

44.3 De nition
We now formalize the method of exhaustion. We rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will then consider general bounded functions,
not necessarily positive

44.3.1 Positive functions


De nition 1836 A set = fxi gni=0 of points is a subdivision (or partition) of an interval
[a; b] if
a = x0 < x1 < < xn = b
The set of all possible subdivisions of an interval [a; b] is denoted by .

Given a bounded function f : [a; b] ! R+ , consider the contiguous bases generated by


the points of the subdivision :

[x0 ; x1 ] ; [x1 ; x2 ] ; ::: ; [xn 1 ; xn ] (44.2)


44.3. DEFINITION 1265

Let us construct on them the largest plurirectangle inscribed in the plane region under f .
In particular, for the i-th base, the maximum height mi of an inscribed rectangle with base
[xi 1 ; xi ] is
mi = inf f (x)
x2[xi 1 ;xi ]

Since f is bounded, by the Least Upper Bound Principle this in mum exists and is nite,
that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is

xi = xi xi 1

the area I (f; ) of such maximal inscribed plurirectangle is


n
X
I (f; ) = mi xi (44.3)
i=1

In a similar way, let us construct on the contiguous bases (44.2) determined by the subdivision
, the smallest plurirectangle that circumscribes the plane region under f . For the i-th base,
the minimum height Mi of a circumscribed rectangle with base [xi 1 ; xi ] is

Mi = sup f (x)
x2[xi 1 ;xi ]

Graphically:
4

M
i
0
m
i
-1

-2 x x
i-1 i

-3
-2 -1 0 1 2 3 4

As before, since f is bounded by the Least Upper Bound Principle the supremum exists
and is nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimal circumscribed
plurirectangle is
Xn
S (f; ) = Mi xi (44.4)
i=1

Since mi Mi for every i, we have

I (f; ) S (f; ) 8 2 (44.5)


1266 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

In particular, the area of the plane region under f lies between these two values. Hence,
I (f; ) gives a lower approximation of this area, while S (f; ) gives an upper approximation
of it. They are called the lower and upper integral sums of f with respect to , respectively.

De nition 1837 Given two subdivisions and 0 of [a; b], we say that 0 re nes if 0.

That is, if all the points of are also points of 0 .

In other words, the ner subdivision 0 is obtained by adding further points to . For
example, the subdivision
0 1 1 3
= 0; ; ; ; 1
4 2 4
of the unit interval [0; 1] re nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re nes , then
0 0
I (f; ) I f; S f; S (f; ) (44.6)

In other words, a ner subdivision 0 yields a better approximation, both lower and upper, of
the area under f .2 By starting from any subdivision, we can always re ne it, thus improving
(or, at least, not worsening) the approximations given by the corresponding plurirectangles.
The same can be done by starting from any two subdivisions and 0 , not necessarily
nested. Indeed, the subdivision 00 = [ 0 formed by all the points that belong to the two
subdivisions and 0 re nes both of them. In other words, 00 is a common re nement of
and 0 .

Example 1838 Consider the two subdivisions

1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4

of [0; 1]. They are not nested: neither re nes 0 nor 0 re nes . However, the subdivision

00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4

re nes both and 0. N

Thanks to inequality (44.6), we have


00 00
I (f; ) I f; S f; S (f; ) (44.7)

and
0 00 00 0
I f; I f; S f; S f; (44.8)
The common re nement 00 gives a better approximation, both lower and upper, of the area
under f than the original subdivisions and 0 .
All this motivates the next de nition.
2
For sake of brevity, we write \area under f " instead of the more precise expression \area of the plane
region that lies under the graph of f and above the horizontal axis".
44.3. DEFINITION 1267

De nition 1839 Let f : [a; b] ! R+ be a bounded function. The value


Z b
f (x) dx = sup I (f; ) (44.9)
a 2

is said to be the lower integral of f on [a; b] ; while the value


Z b
f (x) dx = inf S (f; ) (44.10)
a 2

is said to be the upper integral of f on [a; b].


Rb
Therefore, f (x) dx is the supremum of the areas I (f; ) of the inscribed plurirectangles
a
obtained by considering all the possible subdivisions of [a; b]. Starting from the inscribed
plurirectangles, this is the best possible lower approximation of the area under f on [a; b].
Rb
Similarly, a f (x) dx is the in mum of the areas S (f; ) of the circumscribed plurirect-
angles obtained by considering all the possible subdivisions of [a; b]. Starting from the cir-
cumscribed plurirectangles, this is the best possible upper approximation of the area under
f.

A rst important question is whether the lower and upper integrals of a bounded function
exist. Fortunately, this is the case, as next we show.

Lemma 1840 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are nite, with
Z b Z b
f (x) dx f (x) dx (44.11)
a a

Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have

0 inf f (x) sup f (x) M 8i = 1; 2; : : : ; n


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and so
0 I (f; ) S (f; ) M (b a) 8 2
By the Least Upper Bound Principle, the supremum in (44.9) and the in mum in (44.10)
Rb Rb
exist and are nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (44.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a

By Proposition 127, there exist a subdivision 0 such that


Z b
0 "
I(f; ) > f (x) dx
a
2
1268 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

and a subdivision 00 such that


Z b
00 "
S(f; )< f (x) dx +
a 2
These two inequalities yield
Z Z b
!
b
0 00 " "
I(f; ) S(f; )> f (x) dx f (x) dx + =" "=0
a
2 a 2

If we take the subdivision = 0 [ 00 , then I (f; ) I (f; 0) and S (f; ) S (f; 00 ). We


conclude that
I(f; ) S(f; ) I(f; 0 ) S(f; 00
)>0
that is, I(f; ) > S(f; ), which contradicts (44.5).

By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, with
Z b Z b
f (x) dx f (x) dx
a a

The area under f lies between these two values. The last inequality is the most re ned
version of (44.6). The lower and upper integrals are, respectively, the best lower and upper
approximations of the area under f that can be obtained through plurirectangles. In partic-
Rb Rb
ular, when f (x) dx = a f (x) dx, the area under f will be assumed to be such common
a
value. This motivates the next fundamental de nition.

De nition 1841 A bounded function f : [a; b] ! R+ is said to be integrable in the sense of


Riemann (or Riemann integrable) if
Z b Z b
f (x) dx = f (x) dx
a a
Rb
This common value, denoted by a f (x) dx, is called the integral in the sense of Riemann
(or Riemann's integral) of f on [a; b].

For brevity, in the rest of the chapter we will often talk about integrals and integrable
functions, omitting the clause \in the sense of Riemann". Since there are other notions of
integral, it is important however to keep always in mind such quali cation. In addition, note
that the de nition applies only to bounded functions. When in the sequel we will consider
integrable functions, they will be assumed to be bounded (even if not stated explicitly).
Rb
O.R. The notation
Pn a f (x) dx reminds us that P
the integral is obtained as the limit of sums
of the type
R i=1 i xi , in which the symbol is replaced by the integral sign (\a long
letter s") , the length xi by dx, and the values i of the function by f (x). H

Let us illustrate the de nition of the integral with, rst, an example of an integrable
function and, then, of a non-integrable one.
44.3. DEFINITION 1269

Example 1842 Let a 0 and f : [a; b] ! R be de ned by f (x) = x. For any subdivision
fxi gni=0 we have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1

Therefore,
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1) xn = ( xi )2
i=1

Thus, for any subdivision fxi gni=0 we obtain that


Z b Z b n
X n
X
2
0 f (x) dx f (x) dx ( xi ) xi max xj
a j2f1;:::;ng
a i=1 i=1
(b a) max xj
j2f1;:::;ng

By choosing the subdivision such that x0 = a and xi = xi 1 + (b a) =n for all i 2 f1; :::; ng,
Z b Z b
(b a)2
0 f (x) dx f (x) dx (b a) max xj = !0
a a
j2f1;:::;ng n
Rb Rb
as n ! 1. Thus, af (x) dx = f (x) dx and we conclude that f (x) = x is integrable. N
a

Example 1843 Let f : [a; b] ! R be the Dirichlet function


(
1 if x 2 Q \ [a; b]
f (x) = (44.12)
0 if x 2 (R Q) \ [a; b]
restricted to [a; b]. For every a x < y b there exists a rational number q such that
x < q < y, as well as an irrational number r such that x < r < y (Propositions 19 and 42).
Given any subdivision fxi gni=0 of [a; b], we thus have
mi = 0 and Mi = 1 8i = 1; 2; :::; n
Therefore,
I (f; ) = 0 x1 + 0 x2 + + 0 xn = 0
and
n
X
S (f; ) = 1 x1 + 1 x2 + + 1 xn = xi = b a
i=1
Rb Rb
which implies f (x) dx = 0 < b a = a f (x) dx. We conclude that the Dirichlet function
a
is not integrable in the sense of Riemann.3 N
3
Therefore, it is meaningless (at least in the sense of Riemann) to talk about the \area" of the plane region
under such a function.
1270 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Finally, let us introduce a useful quantity that characterizes the \ ness" of a subdivision
of [a; b].

De nition 1844 Given a subdivision of [a; b], we de ne the mesh of , denoted by j j,


the positive quantity
j j = max xi
i=1;2;:::;n

Finer subdivisions have a smaller mesh.

44.3.2 General functions


We now extend the notion of integral to any bounded function f : [a; b] ! R, not necessarily
positive. For a function f : [a; b] ! R that assumes both negative and positive values, the
plane region bounded by f on [a; b] has in general a positive part and a negative part:

5 y

2
+

1
O - x
0

-1

-2
-3 -2 -1 0 1 2 3 4

Intuitively, the integral is now the di erence between the area of the positive part and the
area of the negative part. If they have equal value, the integral is zero: this is the case, for
example, of the function f (x) = sin x on the interval [0; 2 ].
To make this idea rigorous, it is useful to decompose a function into its positive and
negative parts.

De nition 1845 Let f : A R ! R. The function f + : A R ! R+ is de ned by

f + (x) = max ff (x) ; 0g 8x 2 A

while the function f :A R ! R+ is de ned by

f (x) = min ff (x) ; 0g 8x 2 A

The function f + is called the positive part of f , while f is called the negative part.

Both functions f + and f are positive.


44.3. DEFINITION 1271

Example 1846 (i) Let f : R ! R be given by f (x) = x. We have

0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0

Graphically:

3 3
y y
2.5 2.5

2 2
+ -
f f
1.5 1.5

1 1

0.5 0.5

0 0
O x O x
-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6

(ii) Let f : R ! R be given by f (x) = sin x. We have


8 [
>
< sin x x 2 [2n ; (2n + 1) ]
f + (x) = n2Z
>
:
0 otherwise

and 8 [
>
< 0 x2 [2n ; (2n + 1) ]
f (x) = n2Z
>
:
sin x otherwise
Graphically:

4 4

y y
3 3

-
f
2 2
+
f
1 1

0 0
O x O x

-1 -1

-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10

N
1272 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Since for every real number a 2 R we trivially have

a = max fa; 0g + min fa; 0g (44.13)

it follows that, for every x 2 A,

f (x) = max ff (x) ; 0g + min ff (x) ; 0g = max ff (x) ; 0g ( min ff (x) ; 0g)
+
= f (x) f (x)

Every function f : A R ! R can therefore be decomposed as the di erence

f = f+ f (44.14)

of its positive and negative parts.4 Such a decomposition permits to extend in a natural way
the notion of integral to any function, not necessarily positive. Indeed, since both functions
f + and f are positive, the de nition of Riemann integral for positive functions applies to
the areas under each of them. The di erence between their integrals
Z b Z b
f + (x) dx f (x) dx
a a

is the di erence between the areas under f + and f . So, it is the integral which we were
looking for.
All of this motivates the following de nition of Riemann integral for general bounded
functions, not necessarily positive.

De nition 1847 A bounded function f : [a; b] ! R is said to be integrable in the sense of


Riemann if the functions f + and f are integrable. In this case, the Riemann integral of f
on [a; b] is de ned by
Z b Z b Z b
+
f (x) dx = f (x) dx f (x) dx
a a a

This de nition makes rigorous and transparent the idea of considering with di erent sign
the areas of the plane regions bounded by f that lie, respectively, above and below the
horizontal axis.

44.3.3 Everything holds together


Is it possible to express the notion of Riemann integral for general bounded functions in
terms of the lower and upper approximations I (f; ) and S (f; ) upon which the notion of
integral of positive functions so much relied, formally and conceptually? In this section we
show that, remarkably, this is indeed the case, thus showing that the method of exhaustion
is at the heart of the notion of Riemann integral for any bounded function, positive or not.
4
The analogy between this decomposition for functions and the earlier one (20.2) for vectors, as well as
between their notions of positive and negative parts, can be formalized in general Riesz spaces, as readers
will learn in more advanced courses.
44.3. DEFINITION 1273

To this end, we rst note that, given a subdivision = fxi gni=0 , we can still de ne for
any bounded function f : [a; b] ! R the sums I (f; ) and S (f; ) as in (44.3) and (44.4),
that is,
Xn Xn
I (f; ) = mi xi and S (f; ) = Mi xi
i=1 i=1

For general functions, too, the sums I(f; ) and S(f; ) are called the lower and upper
integral sum of f with respect to , respectively. The reader can easily verify that for these
sums the properties (44.5), (44.6), (44.7) and (44.8) continue to hold. In particular,

sup I (f; ) inf S (f; )


2 2

Moreover, for any bounded function f : [a; b] ! R, positive or not, we can still de ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (44.15)
2 a 2
a

in perfect analogy with what we did for positive functions. The next result shows that
everything ts together: the notion of Riemann integral obtained through the decomposition
(44.14) into positive and negative part is given by the equality between upper and lower
integrals of (44.15).
Rb
Proposition 1848 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In this case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a

The proof is based on the next three lemmas. The rst one establishes a general property
of the suprema and in ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will explain at the end of the section), while the last one
has a more technical nature.

Lemma 1849 For any two bounded functions g; h : A ! R, we have supx2A (g + h) (x)
supx2A g (x) + supx2A h (x) and inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).

Proof By de nition of sup,

(g + h) (y) = g (y) + h (y) sup g (x) + sup h (x) 8y 2 A


x2A x2A

Thus, supx2A g (x)+supx2A h (x) is an upper bound for the collection f(g + h) (y)gy2A . Since
the sup is the least upper bound,

sup(g + h) (x) sup g (x) + sup h (x)


x2A x2A x2A

The reader can prove, in a similar way, that inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).
1274 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Lemma 1850 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (44.16)
and
I (f; ) = I f + ; S f ; (44.17)

Proof Let f : [a; b] ! R be a bounded function and let = fxi gni=0 be a subdivision of [a; b].
For a generic interval [xi 1 ; xi ], set = supx2[xi 1 ;xi ] f (x). Since f is bounded, exists by
the Least Upper Bound Principle. We have

0 =) = sup f + (x)
x2[xi 1 ;xi ]

and
< 0 =) sup f + (x) = 0 and = inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

So,
sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

On the other hand, by Lemma 1849 for any pair of functions g; h : A ! R we have

sup(g + h) (x) sup g (x) + sup h (x) (44.18)


x2A x2A x2A

and so

= sup f + (x) f (x) sup f + (x) + sup f (x)


x2[xi 1 ;xi ] x2[xi 1 ;xi ] x2[xi 1 ;xi ]

= sup f + (x) inf f (x)


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

In sum,
= sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

which implies (44.16). A similar argument proves (44.17).

Lemma 1851 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],

sup I (f; ) sup I f + ; inf S f ; (44.19)


2 2 2

inf S f + ; sup I f ; inf S (f; )


2 2 2

Proof By (44.16) and by the \inf" part of Lemma 1849, we have

inf S (f; ) = inf S f + ; I f ; inf S f + ; + inf I f ;


2 2 2 2
= inf S f + ; sup I f ; (44.20)
2 2
44.3. DEFINITION 1275

Moreover, by (44.17) and by the \sup" part of Lemma 1849, we have

sup I (f; ) = sup I f + ; S f ; sup I f + ; + sup S f ;


2 2 2 2
+
= sup I f ; inf S f ; (44.21)
2 2

Putting together (44.20), (44.21) and (44.5) applied to both f + and f , we get the inequality
(44.19).
Rb Rb
Proof of Proposition 1848 We begin with the \if": suppose f (x) dx = af (x) dx. We
a
show that f + and f are integrable. From (44.19) it follows

sup I (f; ) = sup I f + ; inf S f ; (44.22)


2 2 2

= inf S f + ; sup I f ; = inf S (f; )


2 2 2

So
sup I f + ; inf S f ; = inf S f + ; sup I f ;
2 2 2 2

which implies

sup I f + ; inf S f + ; = inf S f ; sup I f ;


2 2 2 2

Using again (44.5) applied to both f + and f , we have

0 sup I f + ; inf S f + ; = inf S f ; sup I f ; 0


2 2 2 2

which implies

sup I f + ; inf S f + ; = inf S f ; sup I f ; =0


2 2 2 2

We conclude that inf 2 S (f + ; ) = sup 2 I (f + ; ) and inf 2 S (f ; ) = sup 2 I (f ; ),


so the functions f + and f are both integrable. Moreover, from (44.22) it follows that
Z b Z b Z b
+
inf S (f; ) = sup I (f; ) = f (x) dx f (x) dx = f (x) dx
2 2 a a a

It remains to prove the \only if". Suppose that f be integrable, that is, that f + and f
are both integrable. We will show that

sup I (f; ) = inf S (f; ) (44.23)


2 2

By (44.19), we have
Z b Z b
sup I (f; ) f + (x) dx f (x) dx inf S (f; ) (44.24)
2 a a 2
1276 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Since f + and f are both integrable, by the integrability criterion of Proposition 1852 we
have that, for every " > 0, there exist subdivisions and 0 such that5

S f +; I f +; < " and S f ; 0


I f ; 0
<"

If 00 is common re nement of and 0, a fortiori we have

S f +; 00
I f +; 00
< " and S f ; 00
I f ; 00
<"

So, by (44.16) and (44.17) we have


00 00
0 inf S (f; ) sup I (f; ) S f; I f;
2 2
= S f +; 00
I f +; 00
+S f ; 00
I f ; 00
< 2"

which implies (44.23). Together with (44.24), this proves that


Z b
sup I (f; ) = f (x) dx = inf S (f; )
2 a 2

as desired.

N.B. The Riemann integral is often de ned directly for general functions, not necessarily
positive, through the lower and upper sums. What is lost in de ning these sums for not nec-
essarily positive functions is the geometric intuition. While for positive functions I(f; ) is
the area of the inscribed plurirectangles and S(f; ) the area of the circumscribed plurirect-
angles, this is no longer true for a generic function that takes positive and negative values,
as (44.16) and (44.17) show. The formulation we adopt with De nition 1847 is suggested
by pedagogical motivations and is equivalent to the usual formulation, as Proposition 1848
shows. O

44.4 Integrability criteria


In the next section we will study some important classes of integrable functions. To this end,
we establish here some important integrability criteria.
We begin with a simple, yet useful, criterion.

Proposition 1852 A bounded function f : [a; b] ! R is Riemann integrable if and only if


for every " > 0 there exists a subdivision such that S (f; ) I (f; ) < ".

Proof \If". Suppose that, for every " > 0, there exists a subdivision such that S (f; )
I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
5
The integrability criterion of Proposition 1852 for positive functions (all we need here) can be proved
directly via De nition 1841. Thus, there is no circularity in using in the current proof such criterion.
44.4. INTEGRABILITY CRITERIA 1277

Rb Rb
and therefore, since " > 0 is arbitrary, we have a f (x) dx = f (x) dx.
a
Rb Rb
\Only if". Suppose that a f (x) dx = f (x) dx. By Proposition 127, for every " > 0
a
Rb
there exist a subdivision 0 such that S (f; 0 ) a f (x) dx < " and a subdivision
00 such
Rb
that f (x) dx I (f; 00 ) < ". Let be a subdivision that re nes both 0 and 00 . Thanks
a
to (44.6), we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), so
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a

as desired.

The next result shows that, if two functions are equal except at a nite number of points,
then their integrals (if they exist) are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a nite
number of points.

Proposition 1853 Let f : [a; b] ! R be an integrable function. If g : [a; b] ! R is equal


Rb
to f except at most at a nite number of points, then also g is integrable and a f (x) dx =
Rb
a g (x) dx.

Proof It is su cient to prove the statement for the case in which g di ers from f at only
one point x^ 2 [a; b]. The case of n points is then proved by ( nite) induction by adding one
point at a time.
Suppose, therefore, that f (^ x) 6= g(^ x) with x^ 2 [a; b]. Without loss of generality, suppose
that f (^
x) > g(^x). Setting k = f (^ x) g(^ x) > 0, let h : [a; b] ! R be the function h = f g.
Then
0 x 6= x ^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Consider an arbitrary
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=2k. Since h(x) = 0 for every x 6= x^,
we have
I(h; ) = 0
Next, we turn to S(h; ). Since x ^ 2 [a; b], there are two possibilities: (i) x
^ is not an interme-
diate point of the subdivision, that is, we have either x ^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some
i = 1; :::; n; (ii) x
^ is a point of the subdivision, with the exclusion of the extremes, that is,
x
^ = xi for some i = 1; :::; n 1. In case (i), with either x ^ 2 fx0 ; xn g or x ^ 2 (xi 1 ; xi ) for
some i = 1; :::; n, we have 6

" "
S(h; ) = k xi < k = <"
2k 2
In case (ii), with x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
6
If x
^ = x0 , we have S(h; ) = k x1 , while if x
^ = xn , we have S(h; ) = k xn . In both cases, we have
S(h; ) < ".
1278 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Therefore, in both cases (i) and (ii) we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary,
by Proposition 1852 h is integrable on [a; b]. Hence,
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (44.25)
a 2 2

But, since h(x) = 0 for every x 6= x


^, one has I(h; ) = 0 for every subdivision 2 , and so

sup I(h; ) = 0
2

Thanks to (44.25), we conclude that


Z b
h(x)dx = sup I(h; ) = 0
a 2

By applying the linearity of the integral (Theorem 1862), the di erence g = f h is integrable
because f and h are both integrable. In particular, again by the linearity of the integral, we
have Z Z Z Z
b b b b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a
as desired.

O.R. In view of Proposition 1853, even if a function f is not de ned at a nite number of
points of the interval [a; b], we can still talk about its integral. Indeed, it can be regarded as
the integral of any function g : [a; b] ! R that coincides with f on the domain of f . With
this, the integrals of f on the intervals [a; b], (a; b], [a; b) and (a; b) always coincide, thus
Z b
making unambiguous the notation f (x) dx. H
a

Finally, let us show that integrability is preserved by continuous transformations.

Proposition 1854 Let f : [a; b] ! R be an integrable function with m f M . If


g : [m; M ] ! R is continuous, then the composite function g f : [a; b] ! R is integrable.

Proof Let " > 0. Since g is continuous on [m; M ], by Theorem 603 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [m; M ] (44.26)

Without loss of generality, we can assume that " < ".


Since f is integrable, Proposition 1852 provides a subdivision = fxi gni=0 of [a; b] such
that S (f; ) I (f; ) < 2" . Let I f1; 2; : : : ; ng be the set of the indexes i of the subdivision
such that
sup f (x) inf f (x) < "
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

so that, for i 2 I we have

f (x) f x0 < " 8x; x0 2 [xi 1 ; xi ]


44.4. INTEGRABILITY CRITERIA 1279

From (44.26) it follows that, for every i 2 I,

(g f ) (x) (g f ) x0 < " 8x; x0 2 [xi 1 ; xi ]

and therefore

sup (g f ) (x) inf (g f ) (x) " 8i 2 I


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

On the other hand,7


" #
X X
2
" xi sup f (x) inf f (x) xi < "
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
= i2I
=
P
and therefore i2I
= xi <
< ". Hence, "

n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi < " (b a) + 2 max jg (y)j "
y2[m;M ] y2[m;M ]
i2I i2I
=

= b a + 2 max jg (y)j "


y2[m;M ]

By Proposition 1852, g f is integrable.

Since the function g (x) = jxj is continuous, a simple but important consequence of
Proposition 1854 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function 8
< 1 if x 2 Q \ [0; 1]
f (x) = (44.27)
:
1 if x 2 (R Q) \ [0; 1]
is a simple modi cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j which is the constant function equal to 1 on the interval [0; 1].

Finally, observe that the rst integrability criterion of this section, Proposition 1852,
opens an interesting perspective on the Riemann integral. Given any subdivision = fxi gni=0 ,
by de nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1
7
Here i 2
= I stands for i 2 f1; 2; : : : ; ng I.
1280 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Hence, since
Z b
I(f; ) f (x) dx S(f; )
a

we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a

which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a

By Proposition 1852, for every " > 0 there exists a su ciently ne subdivision for which
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; ) < "
i=1 a

In a suggestive way we can, therefore, write


n
X Z b
lim f x0i xi = f (x) dx (44.28)
j j!0 a
i=1

Rb
That is, the Riemann integral a f (x) dx can P be seen as a limit, for smaller and smaller
meshes j j of the subdivisions , of the sums ni=1 f (x0i ) xi .8 It is an equivalent way to see
Riemann integral, which is indeed sometimes de ned directly in these terms through (44.28).
Even if evocative, the limit limj j!0 is not among the notions of limit, for sequences or
functions, discussed in the book (indeed, it requires a more subtle de nition, which coda
readers can see in Section 47.9.3). Moreover, the de nition we have adopted is particularly
well suited for generalizations of the Riemann integral, as the reader will see in more advanced
courses on integration.

44.5 Classes of integrable functions


Armed with the integrability criteria of the previous section, we now study some important
classes of integrable functions.

44.5.1 Step functions


There is a class of functions closely related to plurirectangles that plays a central role in the
theory of integration.
8
Often called Riemann sums (or, sometimes, Cauchy sums). This perspective on the Riemann integral is
more in line with Riemann's original de nition of 1854 (published in 1868). The use of lower and upper sums
that we adopted is due to Jean Darboux and Vito Volterra (between 1875 and 1881).
44.5. CLASSES OF INTEGRABLE FUNCTIONS 1281

De nition 1855 A function f : [a; b] ! R is called step function if there exist a subdivision
= fxi gni=0 and a set fci gni=1 of constants such that

f (x) = ci 8x 2 (xi 1 ; xi ) (44.29)

For example, the functions f; g : [a; b] ! R given by

n
X1
f (x) = ci 1[xi 1 ;xi )
(x) + cn 1[xn 1 ;xn ]
(x) (44.30)
i=1

and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (44.31)
i=2

are step functions where, for every set A in R, we denote by 1A : R ! R the indicator
function
(
1 if x 2 A
1A (x) = (44.32)
0 if x 2
=A

The two following gures give, for n = 4, examples of functions f and g described by (44.30)
and (44.31). Note that f and g are, respectively, continuous from the right and from the
left, that is, limx!^x+ f (x) = f (^
x) and limx!^x g (x) = g (^
x) for all x
^ 2 [a; b].

7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9

On the intervals
[x0 ; x1 ) [ (x1 ; x2 ) [ (x2 ; x3 ) [ (x3 ; x4 ]

the two step functions generate the same plurirectangle


1282 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

4 c
4

3 c
2

2 c
3

1 c
1

0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9

determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di er and it is easy to verify that on the entire
interval [x0 ; x4 ] they do not generate this plurirectangle, as the next gure shows. Indeed,
the dashed segment at x2 is not under f and the dashed segments at x1 and x3 are not under
g.
7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7

But, thanks to Proposition 1853, such a discrepancy at a nite number of points is irrelevant
for the integral. The next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently of the values of
the function at the points x1 < x2 < x3 ).

Proposition 1856 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and by the constants fci gni=1 according to (44.29), is integrable, with
Z b n
X
f (x) dx = ci xi (44.33)
a i=1
44.5. CLASSES OF INTEGRABLE FUNCTIONS 1283

All the step functions that are determined by a subdivision fxi gni=0 and a set of constants
fci gni=1 according to (44.29), share therefore the same integral (44.33). In particular, this
holds for the step functions (44.30) and (44.31).
In the special case of a constant function, say f (x) = c for all x 2 [a; b], formula (44.33)
reduces to Z b
c dx = c (b a)
a
where on the left hand side c denotes the constant function f .9
Rb Rb
Proof Since f is bounded, an immediate extension of Lemma 1840 shows that f (x) dx; a f (x) dx 2
a
R. Let m = inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fix " > 0 su ciently small, and consider
the subdivision " given by

x0 < x0 + " < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn " < xn

We have

I (f; ") =" inf f (x) + c1 (x1 " x0 ") + 2" inf f (x) + c2 (x2 " x1 ")
x2[x0 ;x0 +"] x2[x1 ";x1 +"]

+ 2" inf f (x) + + 2" inf f (x) + cn (xn " xn 1 ") + " inf f (x)
x2[x2 ";x2 +"] x2[xn 1 ";xn 1 +"] x2[xn ";xn ]
n
X n
X1
=" inf f (x) + inf f (x) + ci ( xi 2") + 2" inf f (x)
x2[x0 ;x0 +"] x2[xn ";xn ] x2[xi ";xi +"]
i=1 i=1
n
X n
X
2"m + ci xi 2"M n + 2"m (n 1) = ci xi 2"n (M m)
i=1 i=1

In a similar way we show that


n
X
S (f; ") ci xi + 2"n(M m)
i=1

Therefore, setting K = 2n(M m) + 1 > 0, we have

S (f; ") I (f; ") 2K" < 4K"

Since " > 0 is arbitrary, Proposition 1852 shows that f is integrable. Moreover, since
Z b
I (f; " ) f (x) dx S (f; " )
a

we have Z
n
X b n
X
ci xi (K 1) " f (x) dx ci xi + (K 1) "
i=1 a i=1
Rb Pn
which, given the arbitrariness of " > 0, guarantees that a f (x) dx = i=1 ci xi .
9
So, in this formula c denotes two altogether di erent notions, a constant function on the left hand side
and a real number on the right hand side. It is a standard abuse of notation that makes the formula easier
to understand.
1284 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

44.5.2 Analytic and geometric approaches


Step functions can be seen as the functional version of plurirectangles. They are, therefore,
the simplest functions that one can integrate. In particular, thanks to formula (44.33),
the lower and upper integrals can be expressed in terms of integrals of step functions. Let
S ([a; b]) be the set of all step functions de ned on [a; b].

Proposition 1857 Given a bounded function f : [a; b] ! R we have


Z b Z b
f (x) dx = sup h (x) dx : h f and h 2 S ([a; b]) (44.34)
a a

and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (44.35)
a a

Thus, a bounded function f : [a; b] ! R is Riemann integrable if and only if


Z b Z b
sup h (x) dx : h f and h 2 S ([a; b]) = inf h (x) dx : f h and h 2 S ([a; b])
a a

That is, if and only if the lower approximation given by the integrals of step functions smaller
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the method of exhaustion assumes a more analytic and
less geometric aspect with the approximation by elementary polygons (the plurirectangles)
replaced by the one given by elementary functions (the step functions).10
This suggests a di erent approach to the Riemann integral, more analytic and less geo-
metric. In such an approach, we rst de ne the integrals of step functions (that is, the area
under them), which can be determined on the basis of elementary geometric considerations
based on plurirectangles. We then use these \elementary" integrals to suitably approximate
the areas under more complicated functions. In particular, we de ne the lower integral of
a bounded function f : [a; b] ! R as the best approximation \from below" obtained by
means of step functions h f , and, analogously, the upper integral of a bounded function
f : [a; b] ! R as the best approximation \from above" obtained by means of step functions
h f.
Thanks to (44.34) and (44.35), this more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
quite fruitful, as readers will learn in more advanced courses.

44.5.3 Continuous functions and monotone functions


We now consider two important classes of integrable functions, the continuous and the mono-
tone ones.

Proposition 1858 Every continuous function f : [a; b] ! R is integrable.


10
That is, based also on the use of notions of analysis, such as functions, and not only on that of geometric
gures, such as plurirectangles.
44.5. CLASSES OF INTEGRABLE FUNCTIONS 1285

Proof Since f is continuous on [a; b], by Weierstrass' Theorem f is bounded. Let " > 0. By
Theorem 603, f is uniformly continuous, that is, there exists " > 0 such that

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (44.36)

Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". By (44.36), for every i =
1; 2; : : : ; n we therefore have

max f (x) min f (x) < "


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

where max and min exist thanks to Weierstrass' Theorem. It follows that
n
X n
X
S (f; ) I (f; ) = max f (x) xi min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= max f (x) min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
<" xi = " (b a)
i=1

By Proposition 1852, f is integrable.

Because of the stability of the integral seen in Proposition 1853, we have the following
immediate generalization of the last result: every function f : [a; b] ! R that has at most
a nite number of removable discontinuities is integrable. Indeed, by recalling (13.9) of
Chapter 13, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (so, integrable) and is equal to f except at the points of S.
More is true: the hypothesis that the discontinuities are removable is actually super uous,
and we can actually allow for countably many points of discontinuity.

Theorem 1859 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities is integrable.

Therefore, a function is integrable if its points of discontinuity form a nite or a countable


set. We omit the proof, which is less easy than that of the special case just seen with only
removable discontinuities.
This important integrability result generalizes Proposition 1858 as well as Proposition
1856 on the integrability of step functions (which obviously are continuous, except at the
points of the subdivisions that de ne them). Let us see a couple of examples.

Example 1860 (i) The function f : [0; 1] ! R given by


(
x if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g
1286 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1859, the function f is integrable.
(ii) Consider the countable set
1
E= :n 1 [0; 1]
n
The function f : [0; 1] ! R de ned by
x2 if x 2
=E
f (x) =
0 if x 2 E

is continuous at all the points of [0; 1], except at the points of E.11 Since E is a countable
set, by Theorem 1859 the function f is integrable. N

Note that the Dirichlet function f : [0; 1] ! R


(
1 if x 2 Q\ [0; 1]
f (x) =
0 if x 2 (R Q) \ [0; 1]
which we know to be not integrable, does not satisfy the hypotheses of Theorem 1859.
Indeed, even if it is bounded, f is discontinuous at each point of [0; 1] { not only at the
points x 2 Q\ [0; 1], which form a countable set.

Let us now consider monotone functions.

Proposition 1861 Every monotone function f : [a; b] ! R is integrable.

The result follows immediately from Theorem 1859 because monotone functions have at
most countably many points of discontinuity (Proposition 564). Next we give, however, a
simple direct proof of the result.

Proof Let us suppose that f is increasing (the argument for f decreasing is analogous).
First, observe that f is obviously bounded. Now, let " > 0. Let = fxi gni=0 be a subdivision
of [a; b] such that j j < ". We have
inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1 ) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

By Proposition 1852, the function f is integrable.


11
Note that f is continuous at the origin, as the reader can verify.
44.6. PROPERTIES OF THE INTEGRAL 1287

44.6 Properties of the integral


44.6.1 Linearity and monotonicity
The rst important property of the integral is its linearity: the integral of a linear combina-
tion of functions is equal to the linear combination of their integrals.

Theorem 1862 Let f; g : [a; b] ! R be two integrable functions. Then, for every ; 2R
the function f + g : [a; b] ! R is integrable, with
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (44.37)
a a a

Proof The proof is divided into two parts. First we will prove homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (44.38)
a a

Then we will prove additivity, that is,


Z b Z b Z b
(f + g) (x) dx = f (x) dx + g (x) dx (44.39)
a a a

whenever f and g are integrable. Together, relations (44.38) and (44.39) are equivalent to
(44.37).
(i) Homogeneity. First, recall that an integrable function is, by de nition, bounded.
Thus, f is bounded for all 2 R. Let = fxi gni=0 be a subdivision of [a; b]. If 0 we
have I ( f; ) = I (f; ) and S ( f; ) = S (f; ). Therefore, f is integrable, with
Z b Z b
f (x) dx = f (x) dx (44.40)
a a

Let now < 0. Let us start by considering the case = 1. We have


n
X n
X
I ( f; ) = inf ( f ) (x) xi = sup f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= sup f (x) xi = S (f; )
i=1 x2[xi 1 ;xi ]

In a similar way, we have S ( f; ) = I (f; ). Let " > 0. Since f is integrable, by


Proposition 1852 there exists such that S (f; ) I (f; ) < ". Therefore, S ( f; )
I ( f; ) = S (f; ) I (f; ) < ", which implies, by Proposition 1852, that f is integrable.
Moreover,

Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a
1288 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Now let < 0. We have f = ( )( f ) with > 0. Then, by applying (44.40) we obtain
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a

Therefore,
Z b Z b
f (x) dx = f (x) dx 8 2R (44.41)
a a
that is, (44.38).
(ii) Additivity. First, observe that the sum f + g is bounded since integrable functions
are, by de nition, bounded. Let us prove (44.39). Let " > 0. Since f and g are integrable,
by Proposition 1852 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re nes
both and 0 . Thanks to (44.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, by applying the inequalities of Lemma 1849,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (44.42)

and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"

By Proposition 1852, f + g is integrable. Hence, (44.42) becomes


Z b
I (f; ) + I (g; ) (f + g)(x)dx S (f; ) + S (g; )
a
Rb Rb
for every subdivision 2 . By subtracting a f (x) dx + a g (x) dx from all the three
members of the inequality, we obtain
Z b Z b
I (f; ) + I (g; ) f (x) dx + g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) + S (g; ) f (x) dx + g (x) dx
a a

that is,
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a
44.6. PROPERTIES OF THE INTEGRAL 1289

Since f and g are integrable, given any " > 0 we can nd a subdivision " such that, for
h = f; g, we have
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2

Therefore,
Z b Z b Z b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a

and, given the arbitrariness of " > 0, one necessarily has


Z b Z b Z b
(f + g)(x)dx = f (x) dx + g (x) dx (44.43)
a a a

that is, (44.39).

An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.

Corollary 1863 If f; g : [a; b] ! R are integrable functions, then their product f g : [a; b] !
R is integrable.

Proof Before starting, observe that, by Proposition 1854, if h : [a; b] ! R is integrable, so


is h2 (it is enough to consider the continuous function g (x) = x2 ). Since f and g are two
functions, then f g can be rewritten as
1h i
fg = (f + g)2 (f g)2
4
By Theorem 1862 and since f and g are integrable, so are f + g and f g. By the initial part
of the proof, (f + g)2 and (f g)2 are also integrable. By applying again Theorem 1862, we
have that f g is integrable.

O.R. Thanks to the linearity of the integral, knowing the integrals of f and g allows one
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but knowing
the value of
R bthe integral ofR f does not help in the calculation of the integral of f 2 { indeed,
b
in general a f (x) dx 6= ( a f (x) dx)2 . More generally, knowing that g f is integrable does
2

not give any useful indication for the computation of the integral of the composite function.
H

Finally, the linearity of the integral implies that it is possible to freely subdivide the
domain of integration [a; b] into subintervals.

Corollary 1864 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, then
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (44.44)
a a c
1290 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de ned by (
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]
is also bounded and integrable, with
Z b Z c Z b
f (x) dx = f1 (x) dx + f2 (x) dx
a a c

Proof Before proving the statement, we make an observation. We rst consider a generic
bounded and integrable function f : [a; b] ! R. We show that:12
Z b Z c
1[a;c] f (x) dx = f (x) dx (44.45)
a a

and Z Z
b b
1(c;b] f (x) dx = f (x) dx (44.46)
a c

Note that the left-hand side of (44.45) and (44.46) is well-de ned. Indeed, since indicator
functions are step functions, they are bounded and integrable. By Corollary 1863 and since
f is bounded and integrable, 1[a;c] f and 1(c;b] f are bounded and integrable.
We proceed by proving (44.45). By Proposition 1853, we have that
Z b Z b
1[a;c] f (x) dx = 1[a;c) f (x) dx
a a

for 1[a;c] f is equal to 1[a;c) f except at most at c. Next, consider f~ : [a; c] ! R de ned by

f (x) if x 2 [a; c)
f~ (x) =
0 if x = c

By Proposition 1853, if f~ and fj[a;c] are bounded and integrable, we have that
Z c Z c
fj[a;c] (x) dx = f~ (x) dx
a a

for fj[a;c] is equal to f~ except at most at c. Thus, in order to prove (44.45), it is enough to
show that f~ is integrable (it is clearly bounded) and
Z b Z c
1[a;c) f (x) dx = f~ (x) dx (44.47)
a a
12
Rc
The careful reader might have noticed that the symbol a f (x) dx implicitly suggests that the domain
of f is [a; c]. Nonetheless, in our
R c current case, f has been de ned over the larger interval [a; b]. Thus, as
it
Rc should appear natural, with a
f (x) dx, we refer to the integral Rof the restriction of f to [a; c], that is,
c
f
a Rj[a;c]
(x)dx. For notational convenience, it is usual to just write a f (x) dx. A similar observation holds
b
for c f (x) dx.
44.6. PROPERTIES OF THE INTEGRAL 1291

Let " > 0. Since 1[a;c) f is bounded and integrable, by Proposition 1852, there exists a
subdivision of [a; b] such that
S(1[a;c) f; ) I(1[a;c) f; ) < "
Let 0 = fxi gni=0 be a re nement of that has c as point of subdivision, say c = xj . Then
we have
S(1[a;c) f; 0 ) I(1[a;c) f; 0 ) < "
Let 00 = 0 \ [a; c]. In other words, 00 = fx0 ; x1 ; :::xj g is the restriction of the subdivision
0 to the interval [a; c]. Since 1 ~
[a;c) f (x) = f (x) for all x 2 [a; c], this implies that for each
i 2 f1; :::; jg
mi = inf 1[a;c) f (x) = inf f~ (x)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and
Mi = sup 1[a;c) f (x) = sup f~ (x)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

At the same time, observe that for each i > j


mi = inf 1[a;c) f (x) = 0 = sup 1[a;c) f (x) = Mi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

We can conclude that


n
X j
X
I(1[a;c) f; 0
)= mi xi = mi xi = I(f~; 00
) (44.48)
i=1 i=1

and
n
X j
X
S(1[a;c) f; 0
)= Mi xi = Mi xi = S(f~; 00
) (44.49)
i=1 i=1
Therefore,
S(f~; 00 ) I(f~; 00 ) < "
By Proposition 1852, it follows that f~ : [a; c] ! R is integrable and bounded. Moreover,
from (44.48) and (44.49) we deduce that
Z b Z c
1[a;c) f (x) dx = f~(x)dx
a a

proving (44.47) which in turn implies (44.45). A similar argument yields (44.46).
We can now prove the rst part of the statement. Observe that
f = 1[a;c] f + 1(c;b] f
Thus, by (44.45) and (44.46) and since 1[a;c] f and 1(c;b] f are bounded and integrable, the
linearity of the integral implies that
Z b Z b Z b
f (x) dx = 1[a;c] f + 1(c;b] f (x) dx = 1[a;c] f (x) + 1(c;b] f (x) dx
a a a
Z b Z b Z c Z b
= 1[a;c] f (x) dx + 1(c;b] f (x) dx = f (x) dx + f (x) dx
a a a c
1292 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

proving (44.44).
Let us prove the second part. First, de ne f~1 ; f~2 : [a; b] ! R by

f1 (x) if x 2 [a; c] 0 if x 2 [a; c)


f~1 (x) = and f~2 (x) =
0 if x 2 (c; b] f2 (x) if x 2 [c; b]

Clearly, f~1 and f~2 are bounded. We next prove that f~1 is integrable. A similar argument
yields the integrability of f~2 . Consider f^1 : [a; b] ! R de ned by

f~1 (x) if x 6= c
f^1 (x) =
0 if x = c

By Proposition 1853, it is enough to show that f^1 is integrable. Let " > 0. Since f1 is
integrable over [a; c], there exists a subdivision 0 = fxi gni=0 of [a; c] such that
0 0
S(f1 ; ) I(f1 ; )<"

Consider the subdivision 00 = fyi gn+1i=0 of [a; b] where yi = xi for all i 2 f0; :::; ng and
^
yn+1 = b. Since f1 (x) = 0 for all x 2 [c; b], note that

S(f^1 ; 00
) = S(f1 ; 0
) and I(f^1 ; 00
) = I(f1 ; 0
)

yielding that S(f^1 ; 00 ) I(f^1 ; 00 ) < ". By Proposition 1852, f^1 is integrable and so is f~1 .
Observe that f = 1[a;c] f~1 + 1(c;b] f~2 . Since f~1 , f~2 , and the indicator functions are bounded
and integrable, so is f . By the linearity of the integral and (44.45) as well as (44.46), we
have
Z b Z b Z b
f (x) dx = ~
1[a;c] f1 (x) dx + 1(c;b] f~2 (x) dx
a a a
Z c Z b Z c Z b
= f~1 (x) dx + f~2 (x) dx = f1 (x)dx + f2 (x)dx
a c a c

as desired.

The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b], i.e.,
the function f is pointwise smaller than the function g.
Rb
Theorem 1865 Let f; g : [a; b] ! R be two integrable functions. If f g, then a f (x) dx
Rb
a g (x) dx.

Proof Since f g, it follows that I (f; ) I (g; ) for all 2 . In turn, as f and g are
Rb Rb
integrable, this implies a f (x) dx a g (x) dx.

From the monotonicity of the integral we obtain an important inequality between \ab-
solute values of integrals" and \integrals of absolute values", the latter being larger. In
reading the result keep in mind that, as observed after Proposition 1854, the integrability of
jf j follows from that of f .
44.6. PROPERTIES OF THE INTEGRAL 1293

Corollary 1866 Let f : [a; b] ! R be an integrable function. We have


Z b Z b
f (x) dx jf (x)j dx (44.50)
a a

Proof Since f jf j and f jf j, by the monotonicity of the integral (Theorem 1865)


Rb Rb
it follows that a f (x) dx a jf (x)j dx. Moreover, by the monotonicity and linearity of
Rb Rb Rb
the integral (Theorem 1862), we also have a f (x) dx = a f (x) dx a jf (x)j dx. So,
Rb Rb
a f (x) dx a jf (x)j dx.

44.6.2 Panini
The monotonicity of the integral allows us to establish an interesting sandwich property for
integrals.

Proposition 1867 Let f : [a; b] ! R be an integrable function. Then, setting m =


inf [a;b] f (x) and M = sup[a;b] f (x), we have
Z b
m (b a) f (x) dx M (b a) (44.51)
a

Proof We have
m f (x) M 8x 2 [a; b]
Hence, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb Rb
Clearly, a mdx = m (b a) and a M dx = M (b a). This shows that (44.51) holds.

In turn, the previous sandwich property leads to the classic Integral Mean Value Theorem.

Theorem 1868 (Integral Mean Value) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (44.52)
a

In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a

Expression (44.52) can be rewritten as


Z b
1
f (x) dx =
b a a
1294 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

For this reason, is called the mean value (of the images) of f : the value of the integral
does not change if we replace each value f (x) by the constant value .

Proof By (44.51), we have


Rb
a f (x) dx
m M
b a
By setting
Rb
f (x) dxa
=
b a
we obtain the rst part of the statement. To prove the second part, assume that f is
continuous. By the Intermediate Value Theorem, f assumes all the values included between
its minimum m and its maximum M . Therefore, there exists c 2 [a; b] such that f (c) = ,
which completes the proof.

O.R. The Integral Mean Value Theorem is quite intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:

25

y
20

15

10

0
O a b x

-2 0 2 4 6 8

If, moreover, the function f is continuous, the height of such a rectangle coincides with
the image of some point c in [a; b], that is, f (c) = . H

One may expect that, among the positive integrable functions, only zero functions can
have zero integrals. In general, however, this is not the case: the positive function f : [0; 1] !
R given by (
0 if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g
is
R 1 integrable by Theorem 1859 (cf. also Proposition 1853), yet it is easy to see that
0 f (x) dx = 0. This function is not continuous. As a consequence of the Integral Mean
Value Theorem, next we show that under continuity only zero functions can, indeed, have
zero integrals.
Rb
Corollary 1869 Let f : [a; b] ! R be a continuous and positive function. If a f (x) dx = 0,
then f = 0.
44.6. PROPERTIES OF THE INTEGRAL 1295

In this case, it is under continuity that the behavior of the Riemann integral best accords
with intuition.
Rb
Proof Let a f (x) dx = 0. We want to show that f = 0. Suppose, by contradiction, that
there exists x0 2 [a; b] such that f (x0 ) 6= 0. Since f 0, we have f (x0 ) > 0. Assume rst
that x0 2 (a; b). Since f is continuous, there exists " > 0 small enough so that f (x) > 0 for
all x 2 [x0 "; x0 + "] [a; b] (cf. the theorem on the permanence of sign). The function f
is continuous on [x0 "; x0 + "]. By (44.44), we have
Z x0 " Z x0 +" Z b Z b
f (x) dx + f (x) dx + f (x) dx = f (x) dx = 0
a x0 " x0 +" a
R x0 +"
So, f (x) dx = 0 because the three addends are positive since f
x0 " 0. By the Integral
R x0 +"
Mean Value Theorem, there exists c 2 [x0 "; x0 + "] such that 0 = x0 " f (x) dx = f (c) 2".
Thus, f (c) = 0 and this contradicts the strict positivity of f on [x0 "; x0 + "]. A similar
contradiction is easily obtained if x0 2 fa; bg. We conclude that f = 0.

We close with a nice dividend of the properties of integrals proved in this section, an
integral version of Stone-Weierstrass' Theorem (with the sandwich avor of Corollary 608)
that shows how the integral of any function can be approximated, arbitrarily well, by the
integral of a polynomial.

Proposition 1870 Let f : [a; b] ! R be a bounded function. If f is integrable, then for each
" > 0 there exist two polynomials p; P : [a; b] ! R such that p f P and
Z b
(P (x) p (x)) dx "
a

In turn, this is easily seen to imply that


Z b Z b Z b Z b Z b
P (x) dx " p (x) dx f (x) dx P (x) dx p (x) dx + "
a a a a a

Proof We prove the statement when f is continuous, a stronger assumption compared to


mere integrability that, however, makes the proof much shorter and less tedious. Fix " > 0.
Since f is continuous, by the Stone-Weierstrass' Theorem (cf. Proposition 608) there exist
two polynomials p; P : [a; b] ! R such that p f P and
"
P (x) p (x) 8x 2 [a; b]
b a
Since 0 P p "= (b a), by Theorem 1865 we have that
Z b Z b
"
0 (P (x) p (x)) dx dx = "
a a (b a)
proving the statement.

N.B. Given a function f : [a; b] ! R, until now we have considered the Riemann integral of
Rb
f from a to b, that is, a f (x)dx. Sometimes it is useful to consider the integral of f from b
1296 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Ra Ra
to a, that is, b f (x)dx,13 as well as the integral of f from a to a, that is, a f (x)dx. What
do we mean by such expressions? By convention we de ne, for a < b,
Z a Z b
f (x)dx = f (x)dx (44.53)
b a

and Z a
f (x)dx = 0 (44.54)
a
Rb
Thanks to these conventions, it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (44.53) and (44.54). Moreover, it
Rb
is possible to prove that the properties established for the integral a f (x)dx extend, mutatis
mutandis, also in the case a b. O

44.7 Integral calculus


After having introduced the Riemann integral and studied its main properties, we turn our
attention to its actual calculation, for which the de nition is of little help (even if it is,
obviously, essential to understand its nature).
In this section we study the central results of integral calculus, termed \fundamental"
to emphasize their importance. Inter alia, we will show how integration can be seen as the
inverse of the operation of di erentiation, something that greatly simpli es the computation
of integrals.

In the study of di erentiability, we have typically considered functions di erentiable on


an open interval (a; b), or at least at the interior points of their domain. In this section we
will consider functions f : [a; b] ! R that are di erentiable on [a; b], where the derivatives
at the endpoints a and b are taken as one-sided. In a similar way we talk of di erentiability
on the half-open intervals (a; b] and [a; b).

44.7.1 Primitive functions


Even if we will be mainly interested in functions de ned on closed and bounded intervals
[a; b], in this section we will consider more generally any interval I of the real line, be it open,
closed, or half-open, bounded, or unbounded (for example, I can be the entire real line R).

De nition 1871 A function P : I ! R is called a primitive or inde nite integral of f :


I ! R if it is di erentiable on I and

P 0 (x) = f (x) 8x 2 I

We denote a primitive of f by Z
f (x) dx

13
This happens, for example, if f is integrable on an interval [a; b] and we take two generic points x; y 2 [a; b],
without specifying if x < y or x y, and then consider the integral of f between x and y.
44.7. INTEGRAL CALCULUS 1297

In other words, moving from the function f to its primitive P can be seen as the inverse
procedure with respect to moving from P to f through di erentiation. In this sense, the
primitive function is the inverse of the derivative function (indeed, sometimes it is called
antiderivative).
Let us provide a couple of examples. Here it is important to keep in mind that, as Example
1876 will show, a function might not have a primitive, so the search of the primitive of a
function might be vain. In any case, by Corollary 1317 a necessary condition for a function
f to have a primitive is that it has no removable or jump discontinuities.

Example 1872 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is a primitive of f . Indeed, P 0 (x) = 2x=2 = x. N

Example 1873 Let f : R ! R be given by f (x) = x= 1 + x2 . The function P : R ! R


given by
1
P (x) = log 1 + x2
2
is a primitive of f . Indeed
1 1
P 0 (x) = 2x = f (x)
2 1 + x2
for every x 2 R. N

N.B. If I1 and I2 are two nested intervals, with I1 I2 , then a primitive of f on I2 is also a
primitive on I1 . For example, if we consider the restriction of f (x) = x= 1 + x2 on [0; 1],
that is, the function f~ : [0; 1] ! R given by f~ (x) = x= 1 + x2 , then a primitive on [0; 1]
remains P (x) = 2 1 log 1 + x2 . O

If P is a primitive of f , then the function P + k obtained by adding a constant to P is


also a primitive of f . For, (P + k)0 (x) = P 0 (x) = f (x) for every x 2 I. The next result
shows that, up to such translations, the primitive function is unique.

Proposition 1874 Let f : I ! R and let P1 : I ! R be a primitive function of f . A


function P2 : I ! R is a primitive of f on I if and only if there exists a constant k 2 R such
that
P2 = P1 + k

Proof The \if" is obvious, given our previous discussion. Let us prove the \only if". Let
I = [a; b] and let P1 ; P2 : [a; b] ! R be two primitive functions of f on [a; b]. Since P10 (x) =
f (x) = P20 (x) for every x 2 [a; b], by Corollary 1313, we have P2 = P1 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1

By what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P2 (x) = P1 (x) + kn 8x 2 a + ; b (44.55)
n n
1298 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (44.55) it follows that P2 (x0 ) = P1 (x0 ) + kn for every n 1. Therefore,
kn = P2 (x0 ) P1 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists, therefore,
k 2 R such that P2 (x) = P1 (x) + k for every x 2 (a; b).
In a similar way one can show the result when I is a half-open and bounded [1 interval
(a; b] or [a; b). If I = R, we proceed as in the case (a; b), observing that R = [ n; n].
n=1
A similar argument, which we leave to the reader, holds also for unbounded intervals.

This proposition is another important application of the Mean Value Theorem (of di er-
ential calculus). Thanks to it, once a primitive P of a function f is identi ed, we can write
the family of all the primitives as fP + kgk2R .
Example 1875 Let us go back to Examples 1872 and 1873. For the function f : [0; 1] ! R
given by f (x) = x, we have Z
x2
f (x) dx = +k
2
For the function f : R ! R given by f (x) = x= 1 + x2 we have
Z
1
f (x) dx = log 1 + x2 + k
2
N
We close the section by showing that not all the functions admit a primitive, so an
inde nite integral.
Example 1876 The signum function sgn : R ! R given by (13.17), i.e.,
8
>
> 1 if x > 0
<
sgn (x) = 0 if x = 0
>
>
:
1 if x < 0
does not admit a primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, i.e., a di erentiable function such that P 0 (x) = sgn x. By Proposition 1874 and
by focusing separately on the intervals ( 1; 0) and (0; 1), there exist k1 ; k2 2 R such that
P (x) = x + k2 if x < 0 and P (x) = x + k1 if x > 0
Since P is di erentiable, P is continuous, yielding that P (0) = k1 = k2 . Therefore, P (x) =
jxj+k1 for every x 2 R. But, this function is not di erentiable at the origin, which contradicts
what has been assumed on P .
We could have reached the same conclusion by also noting that the signum function has
a jump discontinuity at 0. By Corollary 1317, this prevents the signum function from being
the derivative of any other function. Finally, observe that, despite not admitting a primitive,
by Proposition 1856, the signum function is integrable on any closed and bounded interval.
In fact, it is a step function. N
Rb
The Riemann integral a f (x) dx is often called a de nite integral to distinguish it from
the inde nite integral introduced above. Note that the inde nite integral is a di erential
calculus notion. The Riemann integral, with its connection with the method of exhaustion,
is a conceptually much deeper notion.
44.7. INTEGRAL CALCULUS 1299

44.7.2 Formulary
The next table, obtained by \reversing" the corresponding table of the basic derivatives,
records some fundamental inde nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k n 2 N and x 2 R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R

We make three observations:


(i) For powers, we have Z
xa+1
xa dx =+k 8a 6= 1
a+1
on the entire real line R when a is such that the power function xa has R as domain:
for example, if a 2 N. In general, if a 2 R we might need to require x > 0 (e.g., if
a = 1=2).
(ii) The case a = 1 for powers is covered by f (x) = 1=x.
(iii) Note that (
Z log x + k if x > 0
f (x) dx =
log ( x) + k 0 if x < 0
summarizes the cases x < 0 and x > 0 for f (x) = 1=x. In this regard, observe that for
x < 0 and g (x) = log ( x) one has
1 1
g 0 (x) =
( 1) =
x x
R
It is tempting, but incorrect (why?), to write f (x) dx = log jxj + k for x 6= 0.
1300 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

44.7.3 The First Fundamental Theorem of Calculus


The next theorem, called First Fundamental Theorem of Calculus, is a central result in the
theory of integration. Conceptually, it shows that integration can be seen as the inverse
operation of di erentiation. This, in turn, o ers a powerful method of computation of
integrals based on the use of primitive functions.
Theorem 1877 (First Fundamental Theorem of Calculus) Let P : [a; b] ! R be a
primitive function of f : [a; b] ! R. If f is integrable, then
Z b
f (x) dx = P (b) P (a) (44.56)
a
Rb
Formula (44.56) reduces the computation of the Riemann integral a f (x) dx to the
computation of the primitive P of f , that is, to that of the inde nite integral. As we saw in
the last section, this can be carried out by using the di erentiation rules studied in Chapter
26. In a sense, formula (44.56) reduces integral calculus to di erential calculus.

Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1

Let us consider P on [xi 1 ; xi ]. Since P is a primitive of f , P is di erentiable on (xi 1 ; xi )


and continuous on [xi 1 ; xi ] for all i 2 f1; :::; ng. By the Mean Value Theorem for each
i 2 f1; :::; ng there exists x
^i 2 (xi 1 ; xi ) such that
P (xi ) P (xi 1)
P 0 (^
xi ) =
xi xi 1
Since P is a primitive, we have
P (xi ) P (xi 1)
xi ) = P 0 (^
f (^ xi ) =
xi xi 1
and hence
n
X n
X n
X
P (b) P (a) = (P (xi ) P (xi 1 )) = f (^
xi ) (xi xi 1) = f (^
xi ) xi
i=1 i=1 i=1

which implies
I (f; ) P (b) P (a) S (f; ) (44.57)
Since is any subdivision, (44.57) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2

from which, since f is integrable, we obtain (44.56).

Let us illustrate the theorem with some examples, which use the primitives computed in
Examples 1872 and 1873.
44.7. INTEGRAL CALCULUS 1301

Example 1878 Let f : R ! R be given by f (x) = x. A primitive of f is P (x) = x2 =2 and


therefore, thanks to (44.56),
Z b
b2 a2
xdx =
a 2 2
R1
For example, 0 xdx = 1=2. More generally, let f : R ! R be given by a power f (x) = xn .
Clearly, we have P (x) = xn+1 = (n + 1). So, by (44.56),
Z b
bn+1 an+1
xn dx =
a n+1 n+1
R1
Now, 0 xn dx = 1= (n + 1). As the reader can easily verify, since any primitive of f (x) = xn
di ers from P (x) = xn+1 = (n + 1) only by a constant (cf. Proposition 1874), the result of
the above computations is not a ected by which speci c primitive is chosen. N

Example 1879 Let f : R ! R be given by f (x) = x= 1 + x2 . As we saw in Example


1873, a primitive function P : R ! R is given by P (x) = (1=2) log 1 + x2 . Therefore,
thanks to (44.56),
Z b
x 1 1
2
dx = log 1 + b2 log 1 + a2
a 1+x 2 2
For example,
Z 1
x 1 log 2
dx = log 2 0=
0 1 + x2 2 2
N

For integrable functions without primitives, such as the signum function, the last theorem
cannot be applied and the calculation of integrals cannot be done through formula (44.56).
In some simple cases it is, however, possible to calculate the integral using directly the
de nition. For example, the signum function is a step function and therefore we can apply
Proposition 1856 in which, using the de nition of the integral, we determined the value of
the integral for this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0

The cases a 0 and b 0 are obvious using (44.33). Let us consider the case a < 0 < b.
Using (44.33) and (44.44), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b

A nal remark is due. Recall that Darboux's Theorem shows that derivative functions,
though in general discontinuous, enjoy a remarkable property of continuity, namely, they
satisfy a version of the Intermediate Value Theorem. One might then wonder if this grain of
1302 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

continuity is enough to make derivative functions integrable. If so, the hypothesis of integra-
bility in the First Fundamental Theorem of Calculus would be redundant. Unfortunately,
this is not true: Vito Volterra published in 1881 an highly non-trivial example of a derivative
function which is not integrable.14

44.7.4 The Second Fundamental Theorem of Calculus


In light of the First Fundamental Theorem of Calculus, it is natural to look for conditions
that guarantee that an integrable function f : [a; b] ! R has, indeed, a primitive. To this
end, we introduce an important notion.

De nition 1880 Let f : [a; b] ! R be an integrable function. The function F : [a; b] ! R


given by Z x
F (x) = f (t) dt 8x 2 [a; b]
a

is called the integral function of f .

In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.15

Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has, as
variable,Rthe upper limit of integration x that, when varies, determines a di erent Riemann
x
integral a f (t) dt. The value of this integral (which is a scalar) is the image F (x) of the
integral function. In this regard, note that F is de ned on [a; b] since, f being integrable on
this interval, it is integrable on all the subintervals [a; x] [a; b]. O

Let us establish a rst property of integral functions.

Proposition 1881 The integral function F : [a; b] ! R of an integrable function f : [a; b] !


R is (Lipschitz) continuous.

Proof Since f is bounded, there exists M > 0 such that jf (x)j M for all x 2 [a; b].
We next show that jF (x) F (y)j M jx yj for all x; y 2 [a; b]. Consider x; y 2 [a; b].
To avoid the trivial case x = y, assume that x 6= y and without loss of generality
R x assume
also that x > y. By the de nition of integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (44.50), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y

proving Lipschitz continuity (Chapter 19).


14
At the time, Volterra was still an undergraduate student in Pisa.
15
In the de nition of the integral function, the (mute) variable of integration is no longer x but t (or any
other letter di erent from x). Such a choice is dictated by the necessity of avoiding any confusion about the
use of the variable x, which here becomes the independent variable of the integral function.
44.7. INTEGRAL CALCULUS 1303

Armed with the notion of integral function, we can address the problem that opened the
section: the next important result, the Second Fundamental Theorem of Calculus, shows
that the integral function is a primitive of a continuous f . Continuity is, therefore, a simple
condition that guarantees the existence of a primitive.

Theorem 1882 (Second Fundamental Theorem of Calculus) Let f : [a; b] ! R be a


continuous (so, integrable) function. Its integral function F : [a; b] ! R is a primitive of f ,
that is, it is di erentiable at each x 2 [a; b], with

F 0 (x) = f (x) 8x 2 [a; b] (44.58)

By the \pasting" property (44.44), for all a y x b we have


Z x
F (x) F (y) = f (t) dt (44.59)
y

In view of (44.56), the fact that the integral function may be a primitive is then not that
surprising. The proof of this fundamental result gives a rigorous argument. It relies on an
interesting lemma.

Lemma 1883 Let f : [a; b] ! R be a continuous function. Then, for each x 2 [a; b),
R x+h
f (t) dt
lim x = f (x) (44.60)
h!0+ h
and, for each x 2 (a; b], Rx
x hf (t) dt
lim = f (x) (44.61)
h!0+ h
Proof Let x0 2 [a; b). Since f is continuous, for each " > 0 there exists " > 0 such that,
for each x 2 [a; b],
jx x0 j < " =) jf (x) f (x0 )j < "
Fix " > 0 and take h 2 (0; " ) with x0 + h < b. By the Integral Mean Value Theorem, there
exists ch 2 [x0 ; x0 + h] such that
Z x0 +h
f (t) dt = f (ch ) h
x0

Since ch x0 h< ", this implies that


R x0 +h
x0 f (t) dt
f (x0 ) = jf (ch ) f (x0 )j < "
h

Since " was arbitrarily chosen, we conclude that (44.60) holds. A similar argument proves
(44.61).

Note that (44.60) and (44.61) imply that, for each x 2 (a; b),
R x+h
f (t) dt
lim x h = f (x0 )
h!0 + 2h
1304 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

as the reader can easily check.

Proof of the Second Fundamental Theorem of Calculus Let x0 2 (a; b). By (44.59),
it holds
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt and F (x0 ) F (x0 h) = f (t) dt
x0 x0 h

for each h > 0 with x0 h 2 (a; b). By (44.60),


R x0 +h
F (x0 + h) F (x0 ) x0 f (t) dt
lim f (x0 ) = lim f (x0 ) = 0
h!0+ h h!0+ h

By (44.61),

F (x0 + h) F (x0 ) F (x0 ) F (x0 ( h))


lim f (x0 ) = lim f (x0 )
h!0 h h!0 h
F (x0 ) F (x0 k)
= lim f (x0 )
k!0+ k
R x0
x0 k f (t) dt
= lim f (x0 ) = 0
k!0+ k

Therefore,

F (x0 + h) F (x0 ) F (x0 + h) F (x0 )


lim = lim = f (x0 )
h!0+ h h!0 h
and so
F (x0 + h) F (x0 )
F 0 (x0 ) = lim = f (x0 )
h!0 h
This proves the case x0 2 (a; b). The cases x0 = a and x0 = b are proved in a similar way,
as the reader can easily verify. We conclude that there exists F 0 (x0 ) and that it is equal to
f (x0 ).

The Second Fundamental Theorem of Calculus gives a su cient condition, continuity,


for an integrable function to have a primitive (so, an inde nite integral). More importantly,
however, in so doing it shows that di erentiation can be seen as the inverse operation of
integration: condition (44.58) can, indeed, be written as
Z x
d
f (t) dt = f (x) (44.62)
dx a

On the other hand, a di erentiable function f : [a; b] ! R is, obviously, a primitive of its
derivative function f 0 : [a; b] ! R. By the First Fundamental Theorem of Calculus, if f 0
is integrable { e.g., if f is continuously di erentiable (cf. Proposition 1858) { then formula
(44.56) takes the form Z x
f 0 (t) dt = f (x) f (a) (44.63)
a
44.7. INTEGRAL CALCULUS 1305

for all a x b, that is, Z x


df
(t) dt = f (x) f (a) (44.64)
a dt
Integration can thus be seen as the inverse operation of di erentiation as together (44.62)
and (44.64) show. The two fundamental theorems form the backbone of integral calculus by
clarifying its dual relation with di erential calculus and, in this way, by making it operational.
The importance of all this in both mathematics and applications is just enormous.16

The next example shows that continuity is only a su cient, but not necessary, condition
for an integrable function to admit a primitive.

Example 1884 The function f : R ! R given by


8
< 2x sin 1 cos 1 if x 6= 0
f (x) = x x
:
0 if x = 0

is discontinuous at 0. Nevertheless, a primitive P : R ! R of this function is


8
< x2 sin 1 if x 6= 0
P (x) = x
:
0 if x = 0

Indeed, for x 6= 0 this can be veri ed by di erentiating x2 sin 1=x, while for x = 0 one
observes that
P (h) P (0) h2 sin h1 1
P 0 (0) = lim = lim = lim h sin = 0 = f (0)
h!0 h h!0 h h!0 h
So, there exist discontinuous functions that have primitives. Moreover, on the interval [0; 1]
the function f is integrable by Theorem 1859, so the First Fundamental Theorem of Calculus
can be applied. N

The signum function, which has no primitive (Example 1876), is an example of a discon-
tinuous function for which the last theorem altogether fails. The next example shows that,
in the Second Fundamental Theorem of Calculus, even if F is assumed to be di erentiable,
the continuity of f cannot be weakened to integrability of f . In other words, if f is integrable
and F is di erentiable, it might be the case that F 0 6= f .

Example 1885 De ne f : [0; 1] ! R by


( 1
n if x = m
n > 0 (in its lowest terms)
f (x) =
0 otherwise

The function f is a modi cation of the Dirichlet function, known as Thomae's function (after
its 1875 discoverer). It is continuous at each irrational point and discontinuous at each non-
zero rational point of the unit interval. So, unlike the Dirichlet function, it is continuous
16
Coda readers will nd a neat version of this duality in the Barrow-Torricelli's Theorem (Section 48.7).
1306 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

somewhere and, by RTheorem 1859, this property makes it integrable. In particular, its
1
integral is zero, i.e., 0 f (t) dt = 0. It is a useful and non-trivial exercise to check all this.
Thomae's function is thus an instance of an integrable function which is discontinuous at
in nitely many points. Rx
For the integral function F : [0; 1] ! R, given by F (x) = 0 f (t) dt, we then have
R1
F (x) = 0 for every x 2 [0; 1] since 0 F (x) 0 f (t) dt = 0. Hence, F is trivially
di erentiable, with F (x) = 0 for every x 2 [0; 1]. But F 0 6= f because F 0 (x) = f (x) if and
0

only if x is either irrational or 0. We conclude that (44.58) does not hold, and so the Second
Fundamental RTheorem of Calculus fails because F is not a primitive of f . Nevertheless, we
x
have F (x) = 0 F 0 (t) dt for every x 2 [0; 1]. N

O.R. The operation of integration makes a function more regular: the integral function F of
f is always continuous and, if f is continuous, it is di erentiable. In contrast, the operation
of di erentiation makes a function more irregular. Speci cally, integration scales up of a
degree the regularity: F is always continuous; if f is continuous, F is di erentiable and,
continuing in this way, if f is di erentiable, F is twice di erentiable, and so on and so forth.
Di erentiation, instead, scales down the regularity of a function. H

44.8 Properties of the inde nite integral


The First Fundamental Theorem of Calculus gives, through formula (44.56), a powerful
method to compute Riemann integrals. It relies on the calculation R b of primitives, that is,
of the inde nite integral. Indeed, to calculate Riemann's integral a f (x) dx of a function
f : [a; b] ! R that has primitive, we proceed in two steps:
R
(i) we calculate a primitive P : [a; b] ! R of f , that is, the inde nite integral f (x) dx;

(ii) we calculate the di erence P (b) P (a): this di erence is often denoted by P (x)jba or
[P (x)]ba .

Next we present some properties of the inde nite integral that simplify its calculation.
A rst observation is that the linearity of derivatives, established in (26.12), implies the
linearity of the inde nite integral.17

Proposition 1886 Let f; g : I ! R be two functions that admit a primitive. Then, for
every ; 2 R the function f + g : I ! R admits a primitive and
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx + k (44.65)

for some k 2 R.
R R
Proof For ease of notation, denote Pf = f (x) dx and Pg = g (x) dx. Since f; g : I ! R
admit a primitive, both objects are well de ned. By (26.12), we have

( Pf + Pg )0 (x) = Pf0 (x) + Pg0 (x) = f (x) + g (x) 8x 2 I


17
As in Section 44.7.1, in this section we denote by I a generic interval, bounded or unbounded, of the real
line.
44.8. PROPERTIES OF THE INDEFINITE INTEGRAL 1307

R R
So, Pf + Pg = f (x) dx + g (x) dx is a primitive of f + g. By Proposition 1874,
(44.65) follows.

A simple application of the result is the calculation of the inde nite integral of a poly-
nomial. Namely, given a polynomial f (x) = 0 + 1 x + + n xn , it follows from (44.65)
that Z Z X !
n Xn
i xi+1
f (x) dx = ix dx = i +k
i+1
i=0 i=0
Using partial fraction expansions, we can then also calculate the inde nite integral of rational
fractions, as the next example illustrates.

Example 1887 Let us calculate the inde nite integral


Z
x 1
2
dx
x + 3x + 2
In view of Example 249, the partial fraction expansion of f is
2 3
f (x) = +
x+1 x+2
By (44.65),
Z Z Z Z
x 1 2 3 1 1
2
dx = + dx = 2 dx + 3 dx
x + 3x + 2 x+1 x+2 x+1 x+2
It is thus enough to compute two elementary inde nite integrals (a task left to the reader).
N

The product rule for di erentiation leads to an important formula for the calculation of
the inde nite integral, called integration by parts.

Proposition 1888 (Integration by parts) Let f; g : I ! R be two di erentiable func-


tions. Then, for some k 2 R we have
Z Z
f (x) g (x) dx + f (x) g 0 (x) dx = f (x) g (x) + k
0
(44.66)

Proof By the product rule (26.13), (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks to
(44.65) we have
Z Z Z
f (x) g (x) = f (x) g (x) + f (x) g (x) dx = f (x) g (x) dx + f (x) g 0 (x) dx + k^
0 0 0

for some k^ 2 R. By rearranging terms and de nining k = ^ the statement follows.


k,

Formula (44.66) is useful becauseR sometimes there is Ra strong asymmetry in the com-
putability of the inde nite integrals f 0 (x) g (x) dx and f (x) g 0 (x) dx, one of them may
be much simpler to calculate than the other one. By exploiting this asymmetry, thanks to
(44.66) we may be able to calculate the more complicated integral as the di erence between
f (x) g (x) and the simpler integral.
1308 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

R
Example 1889 Let us calculate the inde niteR integral log x dx. Let f; g :R(0; 1) ! R be
de ned by f (x) = log x and g (x) = x, so that log x dx can be rewritten as log x g 0 (x) dx.
By formula (44.66), we have
Z Z
0
xf (x) dx + log x dx = x log x + k

that is, Z Z
1
x dx + log x dx = x log x + k
x
So, Z
log x dx = x (log x 1) + k
N
R
Example 1890 Let us calculate the inde nite R integral x sin x dx. Let f;Rg : R ! R be
given by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as f (x) g 0 (x) dx.
By formula (44.66),
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k

that is, Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k
N

Note that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, formula
(44.66)
R becomes
R useless. Also with such choice of f and g, it is still possible to rewrite
x sin x dx as f (x) g 0 (x) dx. Yet, here (44.66) implies
Z Z
x2
f 0 (x) g (x) dx + x sin x dx = sin x + k
2
that is, Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R 2
which actually complicates things because
R the integral x cos xdx is more di cult to com-
pute compared to the original integral x sin x dx. This shows that integration by parts
cannot proceed in a mechanical way, but it requires a bit of imagination and experience.
R
O.R. Example 1890 shows that to calculate the integral xn h(x)dx, where h is a function
whose primitive has a similar \complexity" (e.g., h is sin x, cos x or ex ), a good choice is
to set f (x) = xn and g 0 (x) = h(x). Indeed, after having di erentiated f (x) for n times,
the polynomial form disappears and one is left with g(x) or g 0 (x), which is immediately
integrable. Such a choice has been used in Example 1890. H

The formula of integration by parts is usually written as


Z Z
0
f (x) g (x) dx = f (x) g (x) f 0 (x) g (x) dx + k
44.9. CHANGE OF VARIABLE 1309

The two factors of the product f (x) g 0 (x) dx are called, respectively, the nite factor, f (x),
and the di erential factor, g 0 (x) dx. So, the formula says that \the integral of the product
between the nite factor and a di erential factor is equal to the product between nite
factor and the integral of the di erential factor minus the integral of the product between
the derivative of the nite factor and the integral just found". We stress that it is important
to carefully choose which of the two factors to take as nite factor and which as di erential
factor.

Finally, in terms of Riemann integrals the formula obviously becomes


Z b Z b
f (x) g 0 (x) dx = f (x) g (x)jba f 0 (x) g (x) dx (44.67)
a a
Z b
= f (b) g (b) f (a) g (a) f 0 (x) g (x) dx
a

44.9 Change of variable


The next result shows how the integral of a function f changes when we compose it with
another function '.

Theorem 1891 Let ' : [c; d] ! [a; b] be a di erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
0
f (' (t)) ' (t) dt = f (x) dx (44.68)
c '(c)

If ' is surjective, we have a = ' (c) and b = ' (d). Formula (44.68) can therefore be
rewritten as Z d Z b
0
f (' (t)) ' (t) dt = f (x) dx (44.69)
c a

Heuristically, (44.68) can be seen as the result of the change of variable x = ' (t) and of the
corresponding change
dx = '0 (t) dt = d' (t) (44.70)
in dx. At a mnemonic and calculation level, this observation can be useful, even if the writing
(44.70) is per se meaningless.

Proof Let F be the integral function of f . Since f is continuous, (44.59) yields


Z '(d)
f (x) dx = F (' (d)) F (' (c)) (44.71)
'(c)

Moreover, by the Second Fundamental Theorem of Calculus and since f is continuous, the
chain rule implies

(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)
1310 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

that is, F ' is a primitive of (f ') '0 : [c; d] ! R. By Proposition 1854, the composite
function f ' : [c; d] ! R is integrable. By Corollary 1863 and since '0 : [c; d] ! R is
integrable, so is the product function (f ') '0 : [c; d] ! R. By the First Fundamental
Theorem of Calculus, we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (44.72)
c

By (44.72) and (44.71), we have


Z d Z '(d)
0
f (' (t)) ' (t) dt = F (' (d)) F (' (c)) = f (x) dx
c '(c)

as desired.

Theorem 1891, besides having a theoretical interest, can be useful in the calculation of
integrals. Formula (44.68), and its rewriting (44.69), can be used both from \right to left"
and from \left to right". In the rst case, from right to left in (44.69), the objective is
Rb
to calculate the integral a f (x) dx by nding a suitable change of variable x = ' (t) that
R ' 1 (b)
leads to an integral ' 1 (a) f (' (t)) '0 (t) dt that is easier to calculate. The di culty is in
nding a suitable change of variable x = ' (t): indeed, nothing guarantees that there exists
a \simplifying" change and, even if it existed, it might not be obvious how to nd it.

On the other hand, the application in direction left to right of formula (44.68) is useful
Rd
to calculate an integral that can be written as c f (' (t)) '0 (t) dt for some function f for
R '(d)
which we know the primitive F . In such a case, the corresponding integral '(c) f (x) dx,
obtained by setting x = ' (t), is easier to calculate since
Z
f ('(x))'0 (x)dx = F ('(x)) (44.73)

Rd
In such a case the di culty is in recognizing the composite form c f (' (t)) '0 (t) dt in the
integral that we want to calculate. Also here, nothing guarantees that the integral can be
rewritten in this form, nor that, also when possible, it is easy to recognize. Only experience
(and exercise) can be of help. The next example presents some classic integrals that can be
calculated with this technique.

Example 1892 (i) If a 6= 1, we have


Z
'(x)a+1
'(x)a '0 (x)dx = +k
a+1

For example, Z
1
sin4 x cos xdx = sin5 x + k
5
44.9. CHANGE OF VARIABLE 1311

(ii) If either ' > 0 or ' < 0, then we have


Z 0
' (x)
dx = log j'(x)j + k
'(x)
For example,
Z Z Z
sin x sin x
tan xdx = dx = dx = log j cos xj + k
cos x cos x
(iii) We have
Z Z
sin('(x))'0 (x)dx = cos '(x) + k and cos('(x))'0 (x)dx = sin '(x) + k

For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = cos(3x3 2x2 ) + k

(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k

For example, Z Z
x2 1 2 1 2
xe dx = 2xex dx = ex + k
2 2
We present now three examples that illustrate the two possible applications of formula
(44.68). The rst example considers the case right to left, the second and third example
consider the case left to right. For simplicity we use the variables x and t as they appear in
(44.68), even if it is obviously a mere convenience, without substantial value.
Example 1893 Consider the integral
Z b p
sin x dx
a
p
with [a; b] [0; 1). Set t = x, so that x = t2 . Here we have ' (t) = t2 and, thanks to
(44.69), p p
Z b Z b Z b
p
sin xdx = p
2t sin tdt = 2 p
t sin tdt
a a a
R
In Example 1890, by integrating by parts, we computed the inde nite integral t sin tdt. In
light of that example, we have
Z pb p p p p
p p p
t sin tdt = sin t t cos tjpb = sin b sin a + a cos a b cos b
p a
a

and so Z b p p p p p p p
sin xdx = 2 sin b
sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N
1312 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)

Example 1894 Consider the integral


Z
2 cos t
dt
0 (1 + sin t)3

In the integral we recognize a form of type (i) of Example 1892, that is, an integral of the
type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = 1 + sin t and a = 3. Since '(t)a '0 (t)dt = a+1 , we have
Z
2 cos t 1 2 1 1 3
dt = = + =
0 (1 + sin t)3 2 (1 + sin t)2 0
8 2 8

Example 1895 Consider the integral


Z d
log t
dt (44.74)
c t

with [c; d] (0; 1). Here we recognize again a form of type (i) of Example 1892, an integral
of the type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = log t and a = 1. Since again '(t)a '0 (t)dt = a+1 , we have
Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2
N
Chapter 45

Improper Riemann integrals


(sdoganato)

So far we considered the integrals of bounded functions on bounded intervals [a; b]. In this
chapter we extend Riemann integration beyond this case, as applications often require. We
consider two cases: unbounded intervals of integration, with integrands that are bounded
or not, and bounded intervals of integration with integrands that, however, are unbounded
near some point of the interval.
Most of the chapter will focus on the rst case, except for the last section. In any event,
in both cases we talk of improper integrals to distinguish them from the \canonical" integrals
of the previous chapter.

45.1 Integration on the positive half-line

In applications, integrals on unbounded intervals are important. A famous example is the


Gaussian bell centered at the origin

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

1313
1314 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

seen in Section 36.4 and whose area is given by the Gauss integral
Z +1
2
e x dx (45.1)
1

In this case the domain of integration is the whole real line ( 1; +1).

Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
R +1
The de nition of the improper integral a f (x) dx is based on the limit limx!+1 F (x),
that is, on the asymptotic behavior of the integral function. For such behavior, we can have
three cases:

(i) limx!+1 F (x) = L 2 R;

(ii) limx!+1 F (x) = 1;

(iii) limx!+1 F (x) does not exist.

Cases (i) and (ii) are considered by the next de nition.

De nition 1896 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1

and
R +1the function f is said to be integrable in the improper sense on [a; +1). The value
a f (x) dx is called the improper (or generalized) Riemann integral.

For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
\in an improper sense". We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (resp., negatively) if limx!+1 F (x) = +1
(resp., 1);
R +1
(iii) nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist (or that it is oscillating).

Example 1897 Fix > 0 and let f : [1; 1) ! R be given by f (x) = x . The integral
function F : [1; 1) ! R is
8
Z x < 1
1 x1 1 if =6 1
F (x) = dt = 1
1 t :
log x if = 1
45.1. INTEGRATION ON THE POSITIVE HALF-LINE 1315

So, 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x
exists for every > 0: it converges if > 1 and diverges positively if 1. N

Example 1898 De ne f : [0; 1) ! R by f (x) = ( 1)[x] where [x] is the integer part of
x 0 (Section 1.4.3). So, f (x) = ( 1)n if n x < n + 1, that is,
8
>
> 1 if 0 x < 1
>
>
>
>
>
> 1 if 1 x < 2
<
f (x) = 1 if 2 x < 3
>
>
>
>
>
> 1 if 3 x < 4
>
>
:

This implies (
Z n 0 if n is even
F (n) = f (x) dx =
0 1 if n is odd
R1
We conclude that limx!+1 F (x) does not exist, so the integral 0 f (x) dx is oscillating.N

Example 1899 A continuous time version of the discrete time intertemporal problem of
Section 9.1.2 features an in nitely lived consumer who chooses over consumption streams
f : [0; 1) ! [0; 1) of a single good. Such streams are evaluated by a continuous time
intertemporal utility function U : A R[0;1) ! R, often de ned by the improper integral
Z 1
U (f ) = u (f (t)) e t dt
0

with instantaneous utility function u : [0; 1) ! R and exponential discounting e t , with


subjective discount factor 2 (0; 1). The domain A is formed by the streams f where this
improper integral converges. N

Unbounded functions can be improperly integrable, as the next example shows.

Example 1900 Let h : [0; 1) ! N be the function that associates to each positive scalar
its nearest integer, with the convention h(n + 1=2) = n. For instance, h (2:7) = h (3:2) =
h (3:5) = 3 and h (2:5) = 2. De ne f : [0; 1) ! R by
8
<h (x) h2 (x) 2h(x) jx h (x)j if h (x) 6= 0 and jx h (x)j 1
h(x)2h(x)
f (x) =
:
0 otherwise
1316 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

At each natural number n 1, we then have

1 1
f (n + k) = n n2 2n jkj 8k 2 ;
n2n n2n

In words, there is an isosceles triangle centered at each n, with base 1=n2n 1 and height n.
The triangles have thus area 1=2n , so they shrink as n grows. In particular, the graph of f
is:

The function f is continuous { so, integrable on each interval [0; a] [0; 1) { as well as
unbounded because f (n) = n for all n 1. Moreover, since f is positive its integral function
F : [0; 1) ! R is positive and increasing, with
Z n+ n21n n
X
1 1 1
F n+ n = f (t) dt = =1 8n 1
n2 0 2k 2n
k=1

This implies that


Z 1
1 1
f (t) dt = lim F (x) = lim F n+ = lim 1 =1
0 x!1 n!1 n2n n!1 2n

We conclude that the continuous and unbounded P1 function f is integrable


R 1 on the un-
n
bounded domain [0; 1). In particular, since n=1 2 = 1, the integral 0 f (t) dt is the
sum of all areas of the isosceles triangles centered at each n 1. N
Ra
The integral 1 f (x) dx on the domain of integration ( 1; a] is de ned in a similar
R1 Ra
way to a f (x) dx by considering the limit limx! 1 x f (t) dt.

Example 1901 Let f : ( 1; 0] ! R be given by f (x) = xe x2 . We have


Z 0 Z 0
t2 1 x2 1
f (x) dx = lim te dt = lim 1 e =
1 x! 1 x x! 1 2 2

Therefore, the improper integral


Z 0
x2
xe dx
1

exists and converges. N


45.2. INTEGRATION ON THE REAL LINE 1317

Improper integrals and series have a clear analogy. Intuitively, integral functions are to
improper integrals as partial sums are to series. The next result clari es this important
analogy.
R n+1
Proposition 1902 Let f : [1; 1) ! R and set an = n f (x) dx for every n 1. If the
R +1 P
integral 1 f (x) dx converges, then the series 1 n=1 an converges, with
1
X Z +1
an = f (x) dx
n=1 1

The converse is true if limx!+1 f (x) = 0.


R +1
Proof \If". Assume that 1 f (x) dx converges. Then,
Xn n Z k+1
X Z n+1
lim sn = lim ak = lim f (t) dt = lim f (t) dt
n!+1 n!+1 n!+1 k n!+1 1
k=1 k=1
Z +1
= lim F (n + 1) = lim F (x) = f (t) dt
n!+1 x!+1 1

where the penultimate equality follows fromP the sequential characterization


R +1 of limits (cf.
Proposition 528). We conclude that thePseries 1 n=1 a n converges to 1 f (x) dx, as desired.
\Only if". Assume that the series 1 a
n=1 n converges to L and that lim x!+1 f (x) = 0.
We want to show that limx!+1 F (x) = L. Fix " > 0. Let n" 1 be such that both
jsn Lj < "=2 and jf (x)j < "=2 for all n n" and all x n" . Let [x] be the integer part of
x 1 (Section 1.4.3). We have
Z x Z [x] Z x
jF (x) Lj = f (t) dt L = f (t) dt + f (t) dt L
1 1 [x]
Z [x] Z x Z x
f (t) dt L + f (t) dt s[x 1] L + jf (t)j dt
1 [x] [x]
" " " "
< + (x [x]) + ="
2 2 2 2
for all x n" + 1 (so that [x 1] = [x] 1 n" ). By setting M" = n" + 1 in (12.10), we
conclude that limx!+1 F (x) = L, as desired.

45.2 Integration on the real line


Let us now consider improper integration on the domain of integration ( 1; 1).
De nition 1903 RLet f : R ! R be aR function integrable on every compact interval. If there
+1 a
exist the integrals a f (x) dx and 1 f (x) dx, the function f is said to be integrable ( in
an improper sense) on R and we set
Z +1 Z +1 Z a
f (x) dx = f (x) dx + f (x) dx (45.2)
1 a 1
R +1
provided we do not have an indeterminate form 1 1. The value 1 f (x) dx is called
the improper (or generalized) Riemann integral of f on R.
1318 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

It is easy to see that this de nition does not depend on the choice of the point a 2 R.
Often, for convenience we take a = 0.
R +1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
whether its value is nite or is equal to 1.
Next we illustrate this notion withR couple of examples.
R a Note that it is necessary to
+1
compute separately the two integrals a f (x) dx and 1 f (x) dx, whose values must
then be summed (unless the indeterminate form 1 1 arises).

Example 1904 Let f : R ! R be the constant function f (x) = k. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
8
>
> +1 if k > 0
<
= lim kx + lim kx = 0 if k = 0
x!+1 x! 1 >
>
:
1 if k < 0
R +1
In other words, 1 kdx = k 1 unless k = 0. N

The value of the integral in the previous example is consistent with the geometric inter-
pretation of the integral as the area (with sign) of the region under f . Indeed, such a gure
is a big rectangle with in nite base and height k. Its area is +1 if k > 0, zero if k = 0, and
1 if k < 0.
2
Example 1905 Let f : R ! R be given by f (x) = xe x . We have
Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
t2 t2
= lim te dt + lim te dt
x!+1 0 x! 1 x
1 x2 1 x2 1 1
= lim 1 e + lim e 1 = =0
x!+1 2 x! 12 2 2
Therefore, the improper integral Z +1
x2
xe dx
1
exists and is equal to 0. N

Example 1906 Let f : R ! R be given by f (x) = x. We have


Z +1 Z +1 Z 0 Z x Z 0
f (x) dx = f (x) dx + f (x) dx = lim tdt + lim tdt
1 0 1 x!+1 0 x! 1 x

x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
So, the improper integral Z +1
xdx
1
does not exist because we have the indeterminate form 1 1. N
45.3. PRINCIPAL VALUES 1319

45.3 Principal values


Di erently from Example 1904, the value of the integral in this last example is not consistent
with the geometric interpretation of the integral. Indeed, look at the following picture:

y
2

1
(+)
0
O x
(-)
-1

-2

-3
-3 -2 -1 0 1 2 3

The areas of the two regions under f for x < 0 and x > 0 are two \big triangles" that
are, intuitively, equal because they are perfectly symmetric with respect to the vertical axis,
but of opposite sign { as indicated by the signs (+) and ( ) in the gure. It is then natural
to think that they compensate each other, resulting in an integral equal to 0. Nevertheless,
the last de nition requires the separate calculation of the two integrals as x ! +1 and as
x ! 1, which in this case generates the indeterminate form 1 1.

To try to reconcile improper integration on ( 1; +1) with the geometric intuition, we


can follow an alternative route by considering the single limit
Z k
lim f (x) dx
k!+1 k

instead of the two separate limits in (45.2). This motivates the following de nition.

De nition 1907 Let f : R ! R be a function integrable on Reach interval [a; b]. The Cauchy
R1 1
principal value, denoted by PV 1 f (x) dx, of the integral 1 f (x) dx is given by
Z +1 Z k
PV f (x) dx = lim f (x) dx
1 k!+1 k

whenever the limit exists in R.

In place of the two limits upon which the Rde nition of the improper integral is based,
k
the principal value considers only the limit of k f (x) dx. We will see in examples below
that, with this de nition, the geometric intuition of the integral as the area (with sign) of
the region under f is preserved. It is, however, a weaker notion than the improper integral.
Indeed:
1320 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

(i) when the improper integral exists, also the principal value exists and one has
Z +1 Z +1
PV f (x) dx = f (x) dx
1 1

because by Proposition 536-(i) we have


Z +1 Z k Z k Z 0
PV f (x) dx = lim f (x) dx = lim f (x) dx + f (x) dx
1 k!+1 k k!+1 0 k
Z k Z 0
= lim f (x) dx + lim f (x) dx
k!+1 0 k!+1 k
Z k Z 0 Z 1
= lim f (x) dx + lim f (x) dx = f (x) dx
k!+1 0 k! 1 k 1

(ii) the principal value may exist also Rwhen the improper integral does not exist: in Ex-
+1
ample 1906 the improper integral 1 xdx does not exist, yet
Z +1 Z k
PV xdx = lim xdx = 0
1 k!+1 k
R +1
and therefore PV 1 xdx exists and is nite.

In sum, the principal value may exist even when the improper integral does not exist.
To better illustrate this key relation between the two notions of integral on ( 1; 1), let us
consider a more general version of Example 1906.

Example 1908 Let f : R ! R be given by f (x) = x + , with 2 R. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
= lim (t + ) dt + lim (t + ) dt
x!+1 0 x! 1 x

x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2
So the improper integral Z 1
(x + ) dx
1
does not exist because we have the indeterminate form 1 1. By taking the principal
value, we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if > 0
= lim xdx + 2 k = 2 lim k = 0 if = 0
k!+1 k k!+1 :
1 if < 0
R +1
The principal value thus exists: PV 1 (x + ) dx = 1, unless is zero. N
45.3. PRINCIPAL VALUES 1321

In the last example the principal value agrees with the geometric intuition of the integral
as area with sign. Indeed, when = 0 the intuition is obvious (see the gure and the
comment after Example 1906). In the case > 0, look at the gure

2.5 y

1.5

0.5
(+)

0
x
-0.5 (-)
-1

-1.5

-2
-3 -2 -1 0 1 2 3

The negative area of the \big triangle" indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other, what
\is left" is the area of the dotted gure, which is clearly in nite and with + sign (lying above
the horizontal axis). For < 0 similar considerations hold:

y
2

1
(+)
0

(-) x
-1

-2

-3
-3 -2 -1 0 1 2 3

The negative area of the \big triangle" indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other out,
\what is left" is here again the area of the dotted gure, which is clearly in nite and with
negative sign (lying below the horizontal axis).
1322 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

Example 1909 Let f : R ! R be given by f (x) = x= 1 + x2 . We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
x x
= lim dx + lim dx
x!+1 0 1 + x2 x! 1 x 1 + x2
1 1
= lim log 1 + x2 + lim log 1 + x2 =1 1
x!+1 2 x! 1 2

Therefore, the improper integral does not exist because we have the indeterminate form
1 1. By calculating the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2

and so Z +1
x
PV dx = 0:
1 1 + x2
N

45.4 Properties and criteria


We give now some properties of improper integrals, as well as some criteria of improper
integrability, i.e., su cient conditions for a function f de ned on an unbounded domain to
have an improper integral. For simplicity, we limit ourselves to the domain [a; +1), leaving
to the reader the analogous versions of these criteria for ( 1; a] and ( 1; +1).

45.4.1 Properties
Being de ned as limits, the properties of improper integrals follow from the properties of
limits of functions (Section 12.4). In particular, the improper integral retains the properties
of linearity and of monotonicity of the Riemann integral.
Let us begin with linearity, which follows from the algebra of limits established in Propo-
sition 536.

Proposition 1910 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (45.3)
a a a

provided the second member is not an indeterminate form 1 1.


45.4. PROPERTIES AND CRITERIA 1323

Proof By the linearity of the Riemann integral, and by points (i) and (ii) of Proposition
536, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x)) = lim F (x) + lim G (x)
x!+1 a x!+1 x!+1 x!+1
Z +1 Z +1
= f (x) dx + g (x) dx
a a

which implies the improper integrability of the function f + g and (45.3).

The property of monotonicity of limits of functions (see Proposition 535 and its scalar
variant) yields the property of monotonicity of the improper integral.

Proposition
R +1 1911 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.

Proof Thanks to the monotonicity of the Riemann integral, F (x) G (x) for every x 2
[a; +1). By the monotonicity of the limits of functions, we have therefore limx!+1 F (x)
limx!+1 G (x).
R +1
As we have seen in Example 1904, a 0dx = 0. So, a simple consequence of Proposition
R +1
1911 is that a f (x) dx 0 whenever f is positive and integrable on [a; +1).

45.4.2 Integrability criteria


We give now some integrability criteria. The best behaved series are the ones with posi-
tive terms. In view of the analogy between improper integrals and series (cf. Proposition
1902), it is then not surprising that the best behaved improper integrals are the ones with
positive integrands. We thus begin by studying integrability criteria for positive functions
f : [a; +1) ! R.
In this case, the integral function F : [a; +1) ! R is increasing. Indeed, for every
x2 x1 a,
Z x2 Z x1 Z x2 Z x1
F (x2 ) = f (t) dt = f (t) dt + f (t) dt f (t) dt = F (x1 )
a a x1 a
Rx
since x12 f (t) dt 0. Thanks to the monotonicity of the integral function, we have the
following characterization of improper integrals of positive functions.

Proposition 1912 Let f : [a; +1) ! R be a function positive and integrable on every
interval [a; b] [a; +1). Then, f is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (45.4)
a x2[a;+1)

R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided this limit exists).
1324 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

RPositive
+1
functions f : [a; +1) ! R are thereforeRintegrable in an improper sense, that
1
is, a f (t) dt 2 [0; 1]. In particular, their integral a f (t) dt either converges or diverges
positively: tertium non datur. We have convergence if and only if supx2[a;+1) F (x) < +1,
and
R +1 only if f is in nitesimal as x ! +1 (provided limx!+1 f (x) exists). Otherwise,
a f (t) dt diverges positively.

The condition limx!+1 f (x) = 0 is only necessary for convergence, as Example 1897
with 0 < 1 shows. For instance, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.

In stating the necessary condition limx!+1 f (x) = 0 we put the clause \provided this
limit exists". The next simple example
R 1 shows that the clause is important because the limit
may not exist even if the integral a f (t) dt converges.

Example 1913 Let f : [0; 1) ! R be given by


(
1 if x 2 N
f (x) =
0 otherwise
Rx
By Proposition 1853, it is easy to see that 0 f (t) dt = 0 for every x > 0 and, therefore,
R1
0 f (x) dx = 0. Nevertheless, limx!+1 f (x) does not exist. N

The proof of Proposition 1912 rests on the following simple property of limits of monotone
functions, which is the version for functions of Theorem 323 for monotone sequences.

Lemma 1914 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).

Proof Let us consider rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 127 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have

sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)

So, limx!+1 ' (x) = supx2[a;+1) ' (x).


Suppose now that supx2[a;+1) ' (x) = +1. For every M > 0 there exists xM 2 [a; +1)
such that ' (xM ) M . The increasing monotonicity implies ' (x) ' (xM ) M for every
x xM , and therefore limx!+1 ' (x) = +1.

Proof of Proposition 1912 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, by Lemma 1914,

lim F (x) = sup F (x)


x!+1 x2[a;+1)
45.4. PROPERTIES AND CRITERIA 1325

Suppose that limx!+1 f (x) exists. Let us show that the integral converges only if limx!+1 f (x) =
0. Suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1]. Given 0 < " < L, there
exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x
"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"
R +1
i.e., a f (t) dt diverges positively.

The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.
Corollary 1915 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; 1) =) f (x) dx 2 [0; 1) (45.5)
a a
and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (45.6)
a a
2
The study of integral (45.1) of the Gaussian function f (x) = e x , to which we will
devote the next section, is a remarkable application of this corollary.
R +1 R +1
Proof By Proposition 1911, a f (x) dx g (x) dx, while thanks to Proposition
R +1 R +1 a R +1
1912 we have a f (x) dx 2 [0; 1] and a g (x) dx 2 [0; 1]. Therefore, a f (x) dx
R +1 R +1 R +1
converges if a g (x) dx converges, while a g (x) dx diverges positively if a f (x) dx
diverges positively.

Finally, we report an important asymptotic criterion of integrability based on the asymp-


totic nature of the improper integral. We omit the proof.
Proposition 1916 Let f; g : [a; 1) ! R be positive functions integrable on every interval
[a; b] [a; 1).
R +1
(i) If f g as x ! +1, then a g (x) dx converges (diverges positively) if and only if
R +1
a f (x) dx converges (diverges positively).
R +1 R +1
(ii) If f = o (g) as x ! +1 and a g (x) dx converges, then so does a f (x) dx.
R +1 R +1
(iii) If f = o (g) as x ! +1 and a f (x) dx diverges positively, then so does a g (x) dx.
R +1
In light of Example 1897, this proposition implies that a f (x) dx converges if there
exists > 1 such that
1 1
f or f = o as x ! +1
x x
The comparison with powers x is an important convergence criterion for improper inte-
grals, as the next two examples show.
1326 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

Example 1917 Let a > 0 and f : [a; 1) ! R be the positive function given by
1 1
sin3 x + x2
f (x) = 1 1
x + x3

As x ! +1, we have
1
f
x
R +1
By Proposition 1916, 0 f (x) dx = +1, i.e., the integral diverges positively. N

Example 1918 Let a > 0 and f : [a; 1) ! R be a positive function given by


1
f (x) = x sin
x
with < 0. As x ! +1, we have
1
f
x1
R +1
By Proposition 1916, 0 f (x) dx 2 [0; 1), i.e., the integral converges. N

N.B. As the reader can check, what has been proved for positive functions extends easily to
functions f : [a; +1) ! R that are eventually positive, that is, such that there exists c > a
for which f (x) 0 for every x c. O

45.4.3 Absolute convergence


We now discuss integrability criteria for general integrands.

De nition 1919 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
R +1
[a; +1). The improper integral a f (x) dx is said to be absolutely convergent if the im-
R +1
proper integral a jf (x)j dx is convergent.

Like for series, absolute convergence plays a key role also for improper integration, as the
next result clari es.

Theorem 1920 Let f : [a; +1)R ! R be a function integrable on every interval [a; b]
+1
[a; +1). The improper integral a f (x) dx converges if it converges absolutely. In this
case, Z Z
+1 +1
f (x) dx jf (x)j dx
a a

This result is the analog of Theorem 402 and is an easy consequence { by taking g = jf j
{ of the following lemma, which is in turn the analog of Lemma 406.

Lemma 1921 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
R[a;+1
+1). Suppose that g : [a; +1) ! R is positive and integrable on [a; +1) and such that
a g (x) dx converges, with

(i) f + g 0,
45.5. GAUSS INTEGRAL 1327

(ii) f kg for some k > 0.


R +1 R +1
Then, both integrals a (f + g) (x) dx and a f (x) dx converge, with
Z +1 Z +1 Z +1
f (x) dx = (f + g) (x) dx g (x) dx
a a a

Proof Set h = f +Rg. Since 0 h (1 + k) g, by the comparison criterion (Corollary 1915)


R +1
+1
the convergence of a g (x) dx implies that of a h (x) dx. By (45.3), the result follows.

Theorem 1920 permits to study the integrability of general integrands through the inte-
grability criteria previously established for positive integrands. For instance, the comparison
criterion for positive integrands (Corollary
R +1 1915) implies the following general comparison
criterion: the improper integral a f (x) dx converges if there exists a positive function
R +1
g : [a; +1) ! R with jf j g for which a g (x) dx converges. In symbols,
Z +1 Z +1
jf j g and g (x) dx 2 [0; 1) =) f (x) dx 2 R (45.7)
a a
R +1
Indeed, by Corollary 1915 the condition jf j g ensures the convergence of jf j (x) dx,
R +1 a
and so that of a f (x) dx by Theorem 1920.

Example 1922 De ne f : [1; 1) ! R by


cos x
f (x) =
x2
We have
jcos xj 1
8x 1
x2 x2
R1
Since 1 1=x2 dx converges (Example 1897), by (45.7) we conclude that the improper
integral Z 1
cos x
dx
1 x2
converges. N

We close by observing that, as for series, also for improper integrals the converse of
Theorem 1920 fails: there exist improper integrals that converge but not absolutely, as
readers can check.

45.5 Gauss integral


2
Consider the Gaussian function f : R ! R given by Rf (x) = e x . Since it is positive,
+1
Proposition 1912 guarantees that the improper integral a f (x) dx exists for every a 2 R.
Let us show that it converges. De ne g : R ! R by
x
g (x) = e
1328 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

If x > 0, we have
2
g (x) f (x) () e x e x () x x2 () x 1
R +1 R +1
By (45.5) of Corollary 1915, if 1 g (x) dx converges, then also 1 f (x) dx converges. In
R +1
turn, this implies that a f (x) dx converges for every a 2 R. This is obvious if a 1. If
a < 1, we have Z Z Z
+1 1 +1
f (x) dx = f (x) dx + f (x) dx (45.8)
a a 1
R1 R1
Since a f (x) dx exists
R1 because of the continuity of f on [a; 1], the convergence of 1 f (x) dx
then implies that of a f (x) dx.
R +1
Thus, it remains to show that 1 g (x) dx converges. We have
Z x
G (x) = g (t) dt = e 1 e x
1

Hence, (45.4) implies


Z 1
1
g (x) dx = sup G (x) = e < +1
1 x2[1;1)
R +1
It follows that 1 f (x) dx converges, as desired.
In conclusion, the integral Z +1
x2
e dx
a
is convergent for every a 2 R. The computation of this integral is not simple at all. Although
we omit the proof, we report a beautiful result for a = 0 due to Gauss (here as never princeps
mathematicorum).

Theorem 1923 (Gauss) It holds


Z +1 p
x2
e dx = (45.9)
0 2
It is possible to prove in a similar way that
Z 0 p
x2
e dx = (45.10)
1 2
The equality between integrals (45.9) and (45.10) is quite intuitive in light of the symmetry
of the Gaussian bell with respect to the vertical axis.
Thanks to De nition 1903, the Gauss integral { i.e., the integral of the Gaussian function
{ has therefore value
Z +1 Z +1 Z 0
x2 x2 2 p
e dx = e dx + e x dx = (45.11)
1 0 1

The Gauss integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2
45.6. UNBOUNDED FUNCTIONS 1329

By proceeding by substitution, it is easy to verify that, for every pair of scalars a 2 R and
b > 0, one has Z +1
(x+a)2 p
e b2 dx = b (45.12)
1
p
By setting b = 2 and a = 0, we then have
Z +1
1 x2
p e 2 dx = 1 (45.13)
1 2

The improper integral on R of the function


1 x2
f (x) = p e 2
2

has therefore unit value and, thus, it is a density function (as it will be seen in Section 48.8).
This explains the importance of this particular form of the Gaussian function.

45.6 Unbounded functions


The second case of improper integration involves a function continuous on a bounded interval
[a; b] except at some points in a neighborhood of which it is unbounded and the limit of the
function at such points is 1.
It is enough to consider the case of only one such point { when there are a few of them,
it is enough to examine them one by one. Next we consider the case in which this point is
the supremum b of the interval.

De nition 1924 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b

exists ( nite or in nite), the function f is said to be integrable in an improper sense on


Rb Rb
[a; b] and this limit is denoted by a f (x) dx. The value a f (x) dx is called improper (or
generalized) Riemann integral.

If the unboundedness of the function concerns the other endpoint a, or both endpoints,
we can give a similar de nition based on limz!a+ .

Example 1925 Let f : [a; b) ! R be given by

f (x) = (b x) with >0

The integral function of f is such that for each x 2 [a; b)


8
>
> (b x) +1
< for 0 < 6= 1
F (x) = 1
>
>
: log (b x) for = 1
1330 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)

So,
0 if 0 < < 1
lim F (x) =
x!b +1 if 1
It follows that the improper integral
Z b
1
dx
a (b x)

exists for every > 0: it converges if 0 < < 1 and diverges positively if 1. N

R b Proposition 1916 holds also for these improper integrals and allows us to state that
a f (x) dx converges if there exists 2 (0; 1) such that

1 1
f or f = o as x ! b
(b x) (b x)

The comparison with (b x) is an important convergence criterion for these improper


integrals.

The next example requires the version of last de nition involving the limit limz!a+ .

Example 1926 Consider the improper integral


Z 1
1
p dx
0 x

We have
Z 1 Z 1
1 1 p 1 p
p dx = lim p dx = lim 2 x z
= lim 2 2 z =2
0 x z!0+ z x z!0+ z!0+

O.R. Intuitively, when the interval is unbounded, for the improper integral to converge the
function f must converge to zero quite rapidly{ e.g., as x with > 1. When the function
is unbounded, instead, f must converge to in nity fairly slowly{ e.g., as x with 2 (0; 1).
Both things are quite natural: for the area of an unbounded surface to be nite, its portion
\that escapes to in nity" must be very narrow.
To see what may go wrong,Rconsider for instance the function f : (0; 1) ! (0; 1) de ned
1
by f (x) = 1=x. Observe that 1 f (x) dx = limx!+1 log x = +1, so the integral diverges.
R1
Similarly, 0 f (x) = limz!0+ log x = +1 and again the integral diverges. H
Chapter 46

Parametric Riemann integrals


(sdoganato)

Consider a function of two variables


f : [a; b] [c; d] ! R
de ned on a rectangle [a; b] [c; d] in R2 . If for every 2 [c; d] the scalar function f ( ; ) :
[a; b] ! R is integrable on [a; b], then to every such we can associate the scalar
Z b
f (x; )dx (46.1)
a

Unlike the integrals seen so far, the value of the de nite integral (46.1) depends on the value
of the variable , which is usually interpreted as a parameter (so the choice of the symbol
). Such an integral, referred to as parametric integral, therefore de nes a scalar function
F : [c; d] ! R in the following way:
Z b
F( ) = f (x; )dx (46.2)
a
Note that, although function f is of two variables, the function F is scalar. Indeed, it does
not depend in any way on the variable x, which here plays the role of a mute variable of
integration.
Functions of type (46.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study them is important.

46.1 Properties
We will study two properties of the function F , namely continuity and di erentiability. Let
us start with continuity.

Proposition 1927 If f : [a; b] [c; d] ! R is continuous, then the function F : [c; d] ! R


is continuous, that is,
Z b Z b
lim f (x; )dx = lim f (x; )dx 8 0 2 [c; d] (46.3)
! 0 a a ! 0

1331
1332 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)

Formula (46.3) is referred to as \passage of the limit under the integral sign".

Proof Take " > 0. We must show that there exists a > 0 such that

2 [c; d] \ ( 0 ; 0 + ) =) jF ( ) F ( 0 )j < "

By the linearity and monotonicity of the integral, we have

Zb Zb
jF ( ) F ( 0 )j = (f (x; ) f (x; 0 )) dx jf (x; ) f (x; 0 )j dx
a a

By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 603, it is
therefore uniformly continuous on [a; b] [c; d] , so there is a > 0 such that
"
k(x; ) (x0 ; 0 )k < =) jf (x; ) f (x0 ; 0 )j < (46.4)
b a
for every (x; ) 2 [a; b] [c; d]. Therefore, for every 2 [c; d] \ ( 0 ; 0 + ) we have

k(x; ) (x; 0 )k =j 0j <

which, thanks to (46.4), implies that

Zb
"
jF ( ) F ( 0 )j jf (x; ) f (x; 0 )j dx < (b a) = "
b a
a

as desired.

The second result analyzes the di erentiability of the function F .

Proposition 1928 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@ :
[a; b] [c; d] ! R are both continuous.1 Then, the function F : [c; d] ! R is di erentiable on
(c; d), with
Z b
0 @
F ( )= f (x; )dx (46.5)
a @

Formula (46.5) is referred to as \di erentiation under the integral sign". Since

Zb
0 F ( + h) F ( 0) f (x; + h) f (x; )
F ( ) = lim = lim dx
h!0 h h!0 h
a

and Z Z
b b
@ f (x; + h) f (x; )
f (x; )dx = lim dx
a @ a h!0 h
1
That is, the section f (x; ) : [c; d] ! R is di erentiable, i.e., the partial derivative @f (x; ) =@ exists for
each (x; ) 2 [a; b] [c; d].
46.1. PROPERTIES 1333

formula (46.5) is then equivalent to


Zb Z b
f (x; + h) f (x; ) f (x; + h) f (x; )
lim dx = lim dx
h!0 h a h!0 h
a

that is, to exchange the order of limits and integrals.

Proof Let 0 2 (c; d). For every x 2 [a; b] the function f (x; ) : [c; d] ! R is by hypothesis
di erentiable. Take h small enough so that 0 + h 2 [c; d]. By the Mean Value Theorem,
there exists x 2 [0; 1] such that
f (x;
+ h) f (x; 0 ) 0 @f
= (x; 0 + x h)
h @
Being f continuous, the di erence quotient above is continuous in x, thus integrable (note
that x depends on x). Let us write the di erence quotient of function F at 0 :
Zb
F( 0 + h) F ( 0) @f
(x; 0 ) dx (46.6)
h @
a
Zb Zb
f (x; 0 + h) f (x; 0) @f
= dx (x; 0 ) dx
h @
a a
Zb
@f @f
= (x; 0 + x h) (x; 0) dx
@ @
a
Zb
@f @f
(x; 0 + x h) (x; 0) dx
@ @
a

The partial derivative @f =@ is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; ) (x; 0 )k < =) (x; ) (x; 0) < (46.7)
@ @ b a
for every 2 [c; d]. Therefore, for jhj < we have that
k(x; 0 + x h) (x; 0 )k = x jhj jhj < 8x 2 [a; b]
Thanks to conditions (46.6) and (46.7), this implies that
Zb
F( 0 + h) F ( 0) @f
(x; 0 ) dx <" 8jhj <
h @
a

proving that
Zb
F( 0 + h) F ( 0) @f
lim = (x; 0 ) dx
h!0 h @
a
as desired.
1334 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)

2
Example 1929 Set f (x; ) = x2 + x and
Z b
F( )= x2 + 2
x dx
a

As the hypotheses of Proposition 1928 are satis ed, we di erentiate under the integral sign:
Z b
F0 ( ) = 2 xdx = b2 a2
a

46.2 Variability: Leibniz's rule


Consider the general case in which also the limits of the integral are functions of the variable
. Speci cally, let
; : [c; d] ! [a; b]
be two functions de ned on [c; d] taking values on [a; b]. Given f : [a; b] [c; d] R2 ! R,
de ne G : [c; d] ! R by
Z( )
G( ) = f (x; )dx (46.8)
( )

The following result extends Proposition 1928 to the case of variable limits of integration.

Proposition 1930 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@
are both continuous. If ; : [c; d] ! (a; b) are continuously di erentiable, then G : [c; d] ! R
is di erentiable on (c; d), with
Z ( )
@f
G0 ( ) = (x; )dx + 0
( )f ( ( ); ) 0
( )f ( ( ); ) (46.9)
( ) @

Formula (46.9) is referred to as Leibniz's rule. Heuristically, we can derive this rule when
via the auxiliary function H : [a; b] [c; d] ! R de ned by

Zx
H (x; ) = f (t; ) dt (46.10)
a

Indeed, we can then write G( ) = H ( ( ) ; ) H ( ( ) ; ) and so, by the Chain rule,

@H @H @H @H
G0 ( ) = ( ( ); ) ( ( ); ) + ( ( ); ) 0( ) ( ( ); ) 0
( )
@ @ @x @x
Z ()
@f
= (x; ) dx + 0 ( )f ( ( ); ) 0
( )f ( ( ); )
( ) @

The proof makes rigorous and generalizes this argument.


46.2. VARIABILITY: LEIBNIZ'S RULE 1335

Proof The auxiliary function (46.10) has two sections H : [a; b] ! R and H x : [c; d] ! R
de ned by H (x) = H (x; ) and H x ( ) = H (x; ).2 Fix 2 (c; d). Since f is continuous,
by the Second Fundamental Theorem of Calculus the section H is di erentiable, with

@H dH
(x; ) = (x) = f (x; ) 8x 2 [a; b]
@x dx
Since f is continuous, this implies that @H=@x is a continuous function on (a; b) (c; d). By
Proposition 1928, the section H is, for each x, di erentiable on (c; d), with
Z x
@H dH x @
(x; ) = (x) = f (t; )dt 8 2 (c; d)
@ d a @

Since @f =@ is continuous, by Propositions 1881 and 1881 the function @H=@ is, on the
open rectangle (a; b) (c; d), Lipschitz continuous in x and continuous in . In particular,
the Lipschitz constant can be chosen to be independent of . This implies that @H=@ is
continuous on (a; b) (c; d). Indeed, if take a sequence f(xn ; n )g (a; b) (c; d) that
converges to a vector (x; ) 2 (a; b) (c; d), we have

@H @H @H @H @H @H
(x; ) (xn ; n) (x; ) (x; n ) + (x; n ) (xn ; n)
@ @ @ @ @ @
jH (x; ) H (x; n )j + M jx xn j ! 0

Since also @H=@x is continuous, by Theorem 1271 we conclude that H is di erentiable on


(a; b) (c; d).
Assume that ( ) ( ). By Corollary 1864,

Z( )

G( ) = f (x; ) dx = H ( ( ) ; ) H ( ( ); ) (46.11)
( )

Since H is di erentiable and and are continuously di erentiable, by the Chain rule
(Theorem 1296) and Corollary 1864 we have

@H @H @H @H
G0 ( ) = ( ( ); ) 0( ) + ( ( ); ) ( ( ); ) 0
( ) ( ( ); )
@x @ @x @
@H @H @H @H
= ( ( ); ) ( ( ); ) + ( ( ); ) 0( ) ( ( ); ) 0
( )
@ @ @x @x
Z ()
@f 0 0
= (x; ) dx + ( )f ( ( ); ) ( )f ( ( ); )
( ) @

We are left with the case ( )> ( ). Recall that in this case
Z ( ) Z ( )
f (x; ) dx = f (x; ) dx = G( ) 8 2 (c; d)
( ) ( )

2
See Section 20.4.1.
1336 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)

From the previous part of the proof,


Z ()
@f
G0 ( ) = (x; ) dx + 0
( )f ( ( ) ; ) 0
( )f ( ( ); )
( ) @

proving the statement.


2
Example 1931 (i) Let f (x; ) = x2 + , ( ) = sin and ( ) = cos . Set

Z
cos

G( ) = x2 + 2
dx
sin

The hypotheses of Proposition 1930 are satis ed at any compact interval, so by Leibniz's
rule we have:
Z
cos

G0 ( ) = 2 dx sin cos2 + 2
cos sin2 + 2

sin
Z
cos

= 2 dx sin cos2 + 2
+ cos sin2 + 2

sin
= 2 (cos sin ) sin cos2 + 2 + cos sin2 + 2

p
(ii) Let f (x; ) = sin x, (x) = and (x) = 3 . Set

Z3
G( ) = sin x dx
p

The hypotheses of Proposition 1930 are satis ed at any compact interval of (0; 1), so by
Leibniz's rule we have:
Z3
1 3
G0 ( ) = x cos x dx + 3 2
sin 4
p sin 2

p 2

46.3 Improper integrals


In applications the parametric integral (46.1) is often improper. Let f : I R2 ! R be
a function de ned on the rectangle I in R2 whose \sides" I and are any two closed,
boundedRor unbounded, intervals of the real line. For example, if I = R and if the improper
+1
integral 1 f (x; )dx converges for every 2 , then the function F : ! R is de ned by
Z +1
F( )= f (x; )dx (46.12)
1
46.3. IMPROPER INTEGRALS 1337

The extension of Proposition 1928 to the improper case is a delicate issue that requires a
dominance condition. For simplicity, in the statement we make the assumption that I is the
real line and a compact interval. An analogous result, which we omit for brevity, holds
when I is a half-line and an unbounded interval.

Proposition 1932 Let f : R [c; d] ! R be continuous and, for each x 2 R, di erentiable


R +1
in 2 (c; d). If there exists a positive function g : R ! R such that 1 g(x)dx converges,
with

@
jf (x; )j g (x) and f (x; ) g (x) 8 (x; ) 2 R (c; d) (46.13)
@

then F : [c; d] ! R is di erentiable on (c; d), with


Z +1
0 @
F ( )= f (x; )dx (46.14)
1 @

The proof of this result is not simple, so we omit it. Note that the dominance condition
(46.13), which is based on the auxiliary function g, guarantees inter alia that the integrals
Z +1 Z +1
@
f (x; )dx and f (x; )dx
1 1 @

converge thanks to the comparison convergence criterion stated in Corollary 1915.

Example 1933 Let F : [ 1; 1] ! R be given by


Z +1
2
x2
F( )= sin x e dx
1

We have
@ 2
x2
f (x; ) = 2 sin x e
@
and so @f (x; ) =@ is continuous
R +1 on R ( 1; 1). Let g be the Gaussian-type function
2 2
g (x) = 2e x . We have 1 2e x dx < +1 and, for each 2 ( 1; 1),

2 2 2 2
x2 x2 x2 x2 x2
sin x e = jsin xj e e =e e e g (x)

as well as
2 2
x2 x2
2 sin x e = 2 j j sin x e g (x)

The hypotheses of Proposition 1932 are satis ed, so formula (46.14) takes the form
Z +1 Z +1
0 @ 2
x2 2
x2
F ( )= sin x e dx = 2 sin x e dx = 2 F( )
1 @ 1
1338 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)

46.4 Dirichlet integral: the tree and the forest


In this section we study the Dirichlet integral
Z +1
sin x
dx
0 x

Besides its intrinsic interest, it will help us to illustrate a key methodological principle:
sometimes the best way to deal with a tree is to address rst the entire forest, and then to
go back to the tree itself.
We rst establish the convergence of the Dirichlet integral, and then we solve it.

Proposition 1934 The Dirichlet integral converges.

Proof De ne f : [0; 1) ! R by
sin x
x if x > 0
f (x) =
1 if x = 0
R +1
We can write the Dirichlet integral as 0 f (x) dx. Since limx!0 (sin x) =x = 1, the inte-
Rb
grand is continuous. So, the integral 0 f (x) dx exists for all b > 0. By integrating by parts,
for all 0 < a < b, we have3
Z b Z b
sin x cos b cos a cos x
dx = + dx
a x b a a x2

We now split the domain of integration in the same way as we did to solve the Gauss integral
{ cf. (45.8). We thus have
Z b Z a Z b Z a Z b
sin x sin x sin x sin x cos b cos a cos x
dx = dx + dx = dx + dx
0 x 0 x a x 0 x b a a x2

Set a = 1 and let b go to +1. We have


Z b Z 1 Z b
sin x sin x cos b cos x
lim dx = lim dx + cos 1 dx
b!1 0 x b!1 0 x b 1 x2
Z 1 Z b
sin x cos b cos x
= dx + cos 1 lim + dx
0 x b!1 b 1 x2
Z 1 Z b
sin x cos x
= dx + cos 1 lim dx
0 x b!1 1 x2
Rb R1
We conclude that limb!1 0 (sin x) =xdx 2 R because the improper integral 1 (cos x) =x2 dx
converges (Example 1922).

Once established its convergence, next we solve the Dirichlet integral.


3
In formula (44.67) we take g (x) = cos x and f (x) = 1=x.
46.4. DIRICHLET INTEGRAL: THE TREE AND THE FOREST 1339

Proposition 1935 We have


Z +1
sin x
dx =
0 x 2

The value of the Dirichlet integral is quite remarkable. The method used to solve this
integral is, however, even more remarkable. Indeed, since to solve it directly is di cult, we
embed it in a larger class of integrals by introducing an ad hoc positive parameter via the
function F : [0; 1) ! R given by
Z +1
sin x x
F( )= e dx
0 x

This function is easily seen to be well de ned (why?). The integral of interest corresponds
to = 0. To nd it, in the proof we will compute, all at once, the integrals corresponding
to all the values of the parameter .
This proof strategy illustrates the methodological principle previously mentioned: some-
times the best way to solve a speci c problem (a tree) is to embed it, via a suitable pa-
rameterization, in a general class of problems (the forest) and to solve directly the general
problem.

Proof It is possible to prove that the function F is continuous at 0 and that it is possible
to di erentiate under the integral sign.4 So, for every > 0 we have
Z +1 Z +1 Z +1
0 d sin x x sin x x x
F ( )= e dx = ( x) e dx = sin x e dx
0 d x 0 x 0

As the antiderivative of sin x e x is

e x( cos x sin x)
1+ 2

by the First Fundamental Theorem of Calculus we get


Z +1 x( t
x e cos x sin x) 1
sin x e dx = lim =
0 t!+1 1+ 2 0 1+ 2

2
Hence, F 0 ( ) = 1= 1 + for every > 0, that is,

F( )= arctan + k 8 >0

for some k 2 R (cf. Example 1238). Recall that arctan x : R ! ( =2; =2) is the inverse of
the function tan x (Section 6.5.3), with graph
4
Suitable versions of Propositions 1927 and 1932 are needed, as readers can check (see, e.g., Roussos, 2014,
pp. 85-90).
1340 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)

3 y

O x
-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4

Since, for each x 6= 0,


sin x
1
x
we have
sin x x sin x x x
e = e e (46.15)
x x
Hence,
Z +1 Z +1 Z t t
sin x x x x 1 e 1
jF ( )j e dx e dx = lim e dx = lim =
0 x 0 t!+1 0 t!+1

which implies lim !+1 F ( ) = 0. But,

lim ( arctan + k) = +k
!+1 2

and so k = =2. We conclude that

F( )= arctan + 8 >0
2
By the continuity of F at 0, we thus have F (0) = =2.
Chapter 47

Stieltjes' integral

Stieltjes' integral is an important generalization of Riemann's integral often used in applica-


tions. It can be thought of in the following way: while Riemann's integral is based on sums
such as
n
X n
X
mk (xk xk 1) and Mk (xk xk 1) (47.1)
k=1 k=1

the Stieltjes' integral is based on sums such as

n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (47.2)
k=1 k=1

where g is a scalar function. Clearly, (47.1) is the special case of (47.2) that corresponds to
the identity function g(x) = x.
But, why are the more general sums (47.2) relevant? Recall that the sums (47.1) arise
in Riemann integration because every interval [xi 1 ; xi ], obtained by subdividing [a; b], is
measured according to its length xi = xi xi 1 . Clearly, the length is the most natural
way to measure an interval. However, it is not the only way: in some problems it might be
more suitable to measure an interval in a di erent way. For example, if [xi 1 ; xi ] represents
levels of production between xi 1 and xi , the most appropriate economic measure for such
an interval may be the additional cost that a higher production level entails: if C (x) is
the total cost for producing x, the measure that must be assigned to [xi 1 ; xi ] is then the
di erence C (xi ) C (xi 1 ). If [xi 1 ; xi ] represents, instead, an interval in which a random
variable may assume values and F (x) is the probability that such value is x, then the most
natural way to measure [xi 1 ; xi ] is the di erence F (xi ) F (xi 1 ). In such cases, which are
quite common in economic applications (see, e.g., Section 47.8), the Stieltjes' integral is the
natural notion of integral to use.
Besides its interest for applications, however, Stieltjes integration also sheds further light
on Riemann integration. Indeed, we will see in this chapter that some results that we
established for Riemann's integrals are actually best understood in terms of the more general
Stieltjes' integral.

1341
1342 CHAPTER 47. STIELTJES' INTEGRAL

47.1 De nition
Consider two functions f; g : [a; b] R ! R with f bounded and g increasing.1 For every
subdivision = fxi gni=0 2 of [a; b], with a = x0 < x1 < < xn = b, and for every interval
[xi 1 ; xi ] we can de ne the following quantities

mi = inf f (x) and Mi = sup f (x)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

Since f is bounded, such quantities are nite. The sum


n
X
I(f; g; ) = mi (g(xi ) g(xi 1 ))
i=1

is referred to as lower Stieltjes sum, while


n
X
S(f; g; ) = Mi (g(xi ) g(xi 1 ))
i=1

is referred to as upper Stieltjes sum. It can be easily shown that, for every subdivision of
[a; b], we have
I(f; g; ) S(f; g; )

Moreover, similarly to (44.6), it is easy to see that if 0 re nes , then

0 0
I (f; g; ) I f; g; S f; g; S (f; g; ) (47.3)

Using the lower and upper Stieltjes sums, we de ne the Stieltjes' integral.

De nition 1936 A bounded function f : [a; b] ! R is said to be integrable in the sense of


Stieltjes (or Stieltjes integrable) with respect to an increasing function g : [a; b] ! R if

sup I(f; g; ) = inf S(f; g; )


2 2

Rb
The common value, denoted by a f (x)dg(x), is called integral in the sense of Stieltjes (or
Stieltjes' integral) of f with respect to g on [a; b].

When g (x) = x, we get back to Riemann's integral (cf. Proposition 1848). The functions
f and g are called
R b integrand function and integrator function, respectively. For brevity, we
will often write a f dg, thus omitting the arguments of such functions.

N.B. In the rest of the chapter (except in the coda) we will tacitly assume g is an increasing
scalar function, but not constant (that is g (b) > g (a)). If g is constant, then the Stieltjes'
Rb
integral is always de ned for any bounded function f and a f dg is trivially equal to 0. O
1
If g were decreasing, we could consider h = g instead, which is clearly increasing.
47.2. INTEGRABILITY CRITERIA 1343

47.2 Integrability criteria


There exists a few integrability criteria that ensure that a function f is Stieltjes integrable
with respect to a function g. Needless to say, when g is the identity function we get back to
integrability criteria for the Riemann's integral.
We begin with a criterion that extends the one established in Proposition 1852 for Rie-
mann's integral (the proof is analogous, so it is omitted).

Proposition 1937 A bounded function f : [a; b] ! R is Stieltjes integrable with respect to


g : [a; b] ! R if, for every " > 0, there exists a subdivision 2 such that S(f; g; )
I(f; g; ) < ".

As for Riemann's integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (recall that we assumed g to be increasing).
Rb
Proposition 1938 The integral a f dg exists if at least one of the following two conditions
is satis ed:

(i) f : [a; b] ! R is continuous;

(ii) f : [a; b] ! R is monotone and g : [a; b] ! R is continuous.

Note that (i) and (ii) generalize, respectively, Propositions 1858 and 1861 for Riemann's
integral.

Proof (i) The proof relies on the same steps as that of Proposition 1858. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass' Theorem) and uniformly continuous (Theorem
603). Take " > 0. There exists " > 0 such that

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (47.4)

Let 2 be a subdivision of [a; b] such that j j < " . By condition (47.4), for every
i = 1; 2; : : : ; n we have
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

where max and min exist by Weierstrass' Theorem. It follows that


n
X n
X
S (f; g; ) I (f; g; ) = max f (x) (g(xi ) g(xi 1 )) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
Xn
= max f (x) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
< " (g(xi ) g(xi 1 )) = "(g(b) g(a))
i=1

By Proposition 1937, the function f is integrable.


1344 CHAPTER 47. STIELTJES' INTEGRAL

(ii) Since f is monotone, f is bounded. Since g is continuous on [a; b], it is also bounded
and uniformly continuous. Let " > 0. There exists " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [a; b]

Let 2 be a subdivision of [a; b] such that j j < " . For every pair of consecutive points
of such a subdivision, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof now
follows the same steps as that of Proposition 1861. Suppose that f is increasing (if f is
decreasing the argument is analogous). We have

inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

so that
n
X
S (f; g; ) I (f; g; ) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]

Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
n
X
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

By Proposition 1937, the function f is integrable.

Lastly, we partially extend Theorem 1859 to Stieltjes' integral by requiring that g does
not share any point of discontinuity with f .2

Proposition 1939 Every bounded function f : [a; b] ! R with nitely many discontinuities
is Stieltjes integrable with respect to g : [a; b] ! R, provided g is continuous at such points.

We omit the proof of this remarkable result which, inter alia, generalizes Proposition
1938-(i). However, while Theorem 1859 allowed for in nitely many discontinuities, in this
more general setting we restrict ourselves to consider nitely many ones.

47.3 Calculus
When g is di erentiable, the Stieltjes' integral can be written as a Riemann's integral.
2
In other words, we require the two functions f and g not to be discontinuous at the same points.
47.3. CALCULUS 1345

Proposition 1940 Let f : [a; b] ! R be a bounded function, g : [a; b] ! R di erentiable,


and g 0 Riemann integrable. Then f is Stieltjes integrable with respect to g if and only if f g 0
is Riemann integrable. In such a case, we have
Z b Z b
f (x)dg (x) = f (x)g 0 (x)dx (47.5)
a a

Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a subdivision =
fxi gni=0 2 such that
n
!
X
sup g 0 (x) inf g 0 (x) xi = S(g 0 ; ) I(g 0 ; ) < " (47.6)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=1

From (47.6) we also deduce that, for any pair of points si ; ti 2 [xi 1 ; xi ] and for any i =
1; :::; n, we have
Xn
g 0 (si ) g 0 (ti ) xi < " (47.7)
i=1
By the Mean Value Theorem and since g is di erentiable, for each i = 1; :::; n there exists a
point ti 2 [xi 1 ; xi ] such that

gi = g(xi ) g(xi 1) = g 0 (ti ) xi


n
X n
X
Thus, f (si ) gi = f (si )g 0 (ti ) xi for all si 2 [xi 1 ; xi ] (with i ranging from 1 to n).
i=1 i=1
Setting M = sup[a;b] jf (x)j and using inequality (47.7), we have that for each si 2 [xi 1 ; xi ]

n
X n
X n
X
0
f (si ) gi f (si )g (si ) xi = f (si ) g 0 (ti ) g 0 (si ) xi
i=1 i=1 i=1
n
X
M g 0 (si ) g 0 (ti ) xi M"
i=1

So, we can conclude that for each si 2 [xi 1 ; xi ]


n
X n
X
M" f (si ) gi f (si )g 0 (si ) xi M"
i=1 i=1
n
X n
X
Note that S(f g 0 ; ) f (si )g 0 (si ) xi for each si 2 [xi 1 ; xi ], from which f (si ) gi
i=1 i=1
S(f g 0 ; ) + M " for each si 2 [xi 1 ; xi ]. Observe that
n
X n
X
S (f; g; ) = sup f (si ) gi = sup f (si ) gi
i=1 si 2[xi 1 ;xi ] s1 2[x0 ;x1 ];:::;sn 2[xn 1 ;xn ] i=1
0
S(f g ; ) + M "

yielding that
S(f; g; ) S(f g 0 ; ) + M " (47.8)
1346 CHAPTER 47. STIELTJES' INTEGRAL

One can symmetrically prove that

S(f g 0 ; ) S(f; g; ) + M " (47.9)

So, by combining (47.8) and (47.9), we get that

S(f; g; ) S(f g 0 ; ) M" (47.10)

Thus, for every " > 0 there exists 2 such that inequality (47.10) holds. This implies
that
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (47.11)
a a

One can analogously show that


Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (47.12)
a a

From (47.11) and (47.12) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (47.5).

When f is continuous and g is di erentiable, thanks to equation (47.5) a Stieltjes' integral


can be transformed in a Riemann's integral with integrand function

h(x) = f (x)g 0 (x)

This greatly simpli es computations because the techniques developed to solve Riemann's
integrals can be then used for Stieltjes' integrals.3
From a theoretical standpoint, Stieltjes' integral substantially extends the scope of Rie-
mann's integral, while keeping { also thanks to (47.5) { its remarkable analytical properties.
Such a remarkable balance between generality and tractability explains the importance of
Stieltjes' integral.

Next we give a useful variation on this theme.

Proposition 1941 Let g beR the integral function of a Riemann integrable function :
x
[a; b] ! R, that is, g (x) = a (t) dt for every x 2 [a; b]. If f : [a; b] ! R is continu-
ous, we have
Z b Z b
f (x)dg (x) = f (x) (x)dx
a a

We omit the proof of this result. However, when is continuous (so, Riemann integrable)
it follows from the previous result because, by the Second Fundamental Theorem of Calculus,
the function g is di erentiable with g 0 = .

We close with an interesting formula.


3
Riemann's integral is the simplest example of (47.5) where g 0 (x) = 1.
47.3. CALCULUS 1347

Proposition 1942 Let f : [c; d] ! R be continuously di erentiable and g : [a; b] ! R


continuous, with Im g [c; d]. Then
Z x
f (g (x)) f (g (a)) = f 0 (g (t)) dg (t) (47.13)
a

We call (47.13) Ito's formula because it is a precursor of this celebrated formula (a much
deeper result that readers will study in probability courses).4 This formula is easily checked
when g : [a; b] ! R is continuously di erentiable. Indeed, by the chain rule f g : [a; b] ! R is
di erentiable at each x 2 (a; b), with (f g)0 (x) = f 0 (g (x)) g 0 (x). So, (f g)0 is continuous.
We then have:
Z x Z x Z x
f (g (x)) f (g (a)) = (f g)0 (t) dt = f 0 (g (t)) g 0 (t) dt = f 0 (g (t)) dg (t)
a a a

where the rst equality follows from the First Fundamental Theorem of Calculus and the
last one from Proposition 1940. That said, next we prove Ito's formula in full generality.

Proof Let = fxi gni=0 be a subdivision of [a; x]. So, in particular, xn = x. Consider
the function h = f g : [a; b] ! R. If we add and subtract h (xi ) = f (g (xi )) for each
i = 1; 2; : : : ; n 1, we have

h (x) h (a) = h (xn ) h (xn 1 ) + h (xn 1) h (x1 ) + h (x1 ) h (x0 )


Xn
= (h (xi ) h (xi 1 ))
i=1

Let i 2 f1; :::; ng. Consider the interval [g (xi 1 ) ; g (xi )]. We have two cases:

(i) Let g (xi 1 ) < g (xi ). Consider f on [g (xi 1 ) ; g (xi )]. By the Mean Value Theorem,
there exists y^i 2 (g (xi 1 ) ; g (xi )) such that

f (g (xi )) f (g (xi 1 ))
f 0 (^
yi ) =
g (xi ) g (xi 1 )

Since g is continuous, by the Intermediate Value Theorem there exists x


^i 2 [xi 1 ; xi ]
such that g (^
xi ) = y^i . So,

f (g (xi )) f (g (xi 1 ))
f 0 (g (^
xi )) =
g (xi ) g (xi 1 )

which yields

h (xi ) h (xi 1) = f (g (xi )) f (g (xi 1 )) = f 0 (g (^


xi )) (g (xi ) g (xi 1 )) (47.14)
4
To name signi cant formulas or results after people (eponymy), typically their discoverers, is convenient:
for instance, it is much easier to refer to \Ito's formula" than to \formula (47.13)". For this reason we greatly
relied on eponymy, even when it is a bit far-fetched (as in this Ito case). Interestingly, eponymy became
important in the scienti c revolution of the sixteen and seventeen centuries to properly credit discoverers in
an increasing competitive scienti c world (cf. Wootton, 2015).
1348 CHAPTER 47. STIELTJES' INTEGRAL

(ii) Let g (xi 1) = g (xi ). De ne x


^i = xi . We trivially have

h (xi ) h (xi 1) = f (g (xi )) f (g (xi 1 )) = 0 = f 0 (g (^


xi )) (g (xi ) g (xi 1 )) (47.15)

By (47.14) and (47.15), we conclude that


n
X n
X
h (x) h (a) = (h (xi ) h (xi 1 )) = f 0 (g (^
xi )) (g (xi ) g (xi 1 ))
i=1 i=1

This implies that


I f 0 g; g; h (x) h (a) S f 0 g; g; (47.16)
Since is a generic subdivision of [a; x], (47.16) holds for all 2 and therefore

sup I f 0 g; g; h (x) h (a) inf S f 0 g; g; (47.17)


2 2

Since f 0 and g are Rcontinuous by hypothesis, their composition f 0 g is also continuous. By


b 0
R x a0 f g dg then exists. By (47.17), it follows that f (g (x)) f (g (a)) =
Proposition 1938,
h (x) h (a) = a f (g (t)) dg (t).

Example 1943 (i) For f (x) = x with 1 and g continuous, Ito's formula becomes
Z x
g (x) g (a) = g 1 (t) dg (t)
a

For instance, if = 2, a = 0, and g (0) = 0 we get the simple formula


Z x
1
g (t) dg (t) = g 2 (x)
0 2

which allows us to compute a Stieltjes integral that has the same integrand and integrator.
(ii) Let f (x) = log x and Im g (0; 1). Then, Ito's formula becomes
Z x
1
log g (x) log g (a) = dg (t)
a g (t)

For example, if a = 0 and g (0) = 1 we have a simple formula


Z x
1
dg (t) = log g (x)
0 g (t)

which allows us to compute a Stieltjes integral where the integrand is the reciprocal of the
integrator. N

Ito's formula generalizes the First Fundamental Theorem of Calculus for continuously
di erentiable f . Indeed, if g (x) = x then formula (47.13) reduces to the version (44.63) of
the First Fundamental Theorem of Calculus, that is,
Z x
f (x) f (a) = f 0 (t) dt
a
47.4. PROPERTIES 1349

Ito's formula permits to compute Stieltjes integrals featuring integrands that one is able to
recognize to have the form f 0 g, where g is the integrator. In this regard, note that if
H : [c; d] ! R is a primitive of a continuous function h : [c; d] ! R, we can rewrite Ito's
formula as Z x
H (g (x)) H (g (a)) = h (g (t)) dg (t) (47.18)
a
If we compare this version of Ito's formula with the change of variable formula
Z g(x) Z x
h (t) dt = h (g (t)) g 0 (t) dt (47.19)
g(a) a

we see that Ito's formula can be actually viewed as a Stieltjes elaboration of the change of
variable formula of Riemann integration.5

47.4 Properties
Properties similar to those of Riemann's integral hold for Stieltjes' integral. The only sub-
stantial novelty lies in a linearity property that now holds with respect to both the integrand
function f and the integrator function g. Next we list the properties without proving them
(the proofs being similar to those of Section 44.6).

(i) Linearity with respect to the integrand function:


Z b Z b Z b
( f1 + f2 )dg = f1 dg + f2 dg 8 ; 2R (47.20)
a a a

(ii) Positive linearity with respect to the integrator function:6


Z b Z b Z b
f d( g1 + g2 ) = f dg1 + f dg2 8 ; 0 (47.21)
a a a

(iii) Additivity with respect to the integration interval:


Z b Z c Z b
f dg = f dg + f dg (47.22)
a a c

where c 2 (a; b).

(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a

(v) Absolute value:


Z b Z b
f dg jf j dg
a a
5
Formula (47.19) is the version relevant here of the change of variable formula (44.68). In this regard,
compare also (47.18) with (44.73).
6
The positivity of and ensures that the integrator function g1 + g2 is increasing (cf. the end of the
coda).
1350 CHAPTER 47. STIELTJES' INTEGRAL

47.5 Step integrators


Riemann's integral is the special case of Stieltjes' integral in which the integrator function
is the identity g (x) = x. The scope of Stieltjes' integral becomes clear when we consider
integrator functions that are substantially di erent from the identity, like for example step
functions.
For simplicity, in the next statement we denote the unilateral, right and left, limits of
the integrator g : [a; b] ! R at a point x0 by g x0 and g x+ 7
0 . The di erence

g x+
0 g x0

is therefore the potential positive jump of g at x0 (recall that g was assumed to be increasing).

Proposition 1944 Let f : [a; b] ! R be continuous and g : [a; b] ! R be a step function,


with discontinuities at the points fd1 ; :::; dn g of the interval [a; b]. We have
Z b n
X h i
f dg = f (dj ) g(d+
j ) g(dj ) (47.23)
a j=1

In other words, Stieltjes' integral is the sum of all the jumps of the integrator at the
points of discontinuity, multiplied by the value of the integrand in such points. A similar
argument yields that the same equality holds if g is decreasing (cf. Section 47.9).

Proof Since f is continuous on [a; b], it is also bounded (Weierstrass' Theorem) and uni-
formly continuous (Theorem 603). In particular, for each " > 0 there exists " > 0 such
that
jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (47.24)
Fix " > 0. Let ^ = fxi gki=1 be a subdivision of [a; b] such that: 1) j^ j < " and 2) each dj
belongs to at most one of the intervals f[xi 1 ; xi ]gki=1 . Given the second property, we denote
by ij the index in f1; :::; kg such that the corresponding interval contains dj . Without loss of
generality, we assume that the discontinuity points have been listed so that d1 d2 :::
dn . By condition (47.24), for each i = 1; 2; : : : ; k we have

Mi mi = max f (x) min f (x) < "


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

where max and min exist by Weierstrass' Theorem. This implies that
k
X
S (f; g; ^ ) I (f; g; ^ ) = (Mi mi ) (g (xi ) g (xi 1 )) < " (g (b) g (a)) (47.25)
i=1

Next, we compute g (xi ) g (xi 1 ) for all i = 1; 2:::; k. If [xi 1 ; xi ] does not contain any
discontinuity point of g, then g (xi ) g (xi 1 ) = 0. If instead [xi 1 ; xi ] contains a discontinuity
point dj of g, then we have three cases:

1. dj = a. In this case, we have that dj = d1 2 [x0 ; x1 ] and g (x1 ) g (x0 ) = g(d+


j ) g(dj ).
7
That is, g x+
0 = limx!x+ g (x) and g x0 = limx!x g (x). We also set g a = g (a) and g b+ =
0 0
g (b).
47.5. STEP INTEGRATORS 1351

2. dj 2 (a; b). In this case, we have that dj 2 [xi 1 ; xi ] for some i 2 f1; :::; kg. Moreover,
since each dj belongs to at most one of the intervals f[xi 1 ; xi ]gki=1 , it follows that
dj 2 (xi 1 ; xi ). We can conclude that g (xi ) g (xi 1 ) = g(d+ j ) g(dj ).

3. dj = b. In this case, we have that dj = dn 2 [xk 1 ; xk ] and g (xk ) g (xk 1) =


g d+
j g dj .

By the previous three cases, we have that


k
X n
X
I (f; g; ^ ) = mi (g (xi ) g (xi 1 )) = mij g xij g xij 1
i=1 j=1
Xn n
X
f (dj ) g xij g xij 1 Mij g xij g xij 1
j=1 j=1
k
X
= Mi (g (xi ) g (xi 1 )) = S (f; g; ^ )
i=1
Pn
De ne K = j=1 f (dj ) g xij g xij 1 . By (47.25), we can conclude that

S (f; g; ^ ) " (g (b) g (a)) K I (f; g; ^ ) + " (g (b) g (a)) (47.26)


Rb
By Proposition 1938 and since f is continuous, the integral a f dg exists. It follows that
there exists a subdivision of [a; b] such that
Z b
S (f; g; ) " (g (b) g (a)) f dg I (f; g; ) + " (g (b) g (a)) (47.27)
a

De ne = ^ [ . By (47.3), it follows that (47.26) and (47.27) hold with in place of ^


and , respectively. In turn, this implies that
Z b
f dg K 2" (g (b) g (a))
a
Rb
Since " > 0 was arbitrarily chosen, we have that a f dg = K, proving the statement.

Example 1945 Let f; g : [0; 1] ! R be given by f (x) = x2 and


8
>
> 0 if 0 x < 21
<
3
g (x) = 4 if 12 x < 23
>
>
:
1 if 23 x 1

The discontinuities are at 1=2 and 2=3, where we have

1+ 3 1 2+ 2 3
g = ; g =0 ; g =1 ; g =
2 4 2 3 3 4
1352 CHAPTER 47. STIELTJES' INTEGRAL

Equality (47.23) thus becomes


Z 1
1 1+ 1 2 2+ 2
f dg = f g g +f g g
0 2 2 2 3 3 3
1 3 4 3 3 1 43
= 0 + 1 = + =
4 4 9 4 16 9 144
N

Consider an integrator step function with unitary jumps, that is, for every i we have
g d+
i g di =1
Equation (47.23) then becomes
Z b n
X
f dg = f (di )
a i=1
In particular, if f is the identity we get
Z b n
X
f dg = di
a i=1

Stieltjes' integral thus includes addition as a particular case. More generally, we will soon
see that expected values are represented by Stieltjes' integral (Section 48.9).

47.6 Integration by parts


For Stieltjes' integral, the integration by parts formula takes the elegant form of a role
reversal between f and g.

Proposition 1946 Given any two increasing and continuous functions f; g : [a; b] ! R, it
holds Z b Z b
f dg + g df = f (b) g (b) f (a) g (a) (47.28)
a a
Rb Rb
Proof By Proposition 1938, both integrals a f dg and a g df exist. So, for every " > 0
0
there are two subdivisions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b] such that
Z b n
X "
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1

and
Z b n
X
0
"
g df g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1
00 n00
Let = fzi gi=0 be the subdivision 00 = [ 0. By (47.3), the two inequalities still hold
for subdivision 00 . Moreover, note that
n 00 n 00
X X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1
47.7. CHANGE OF VARIABLE 1353

which implies
Z b Z b
f dg + g df f (b) g (b) + f (a) g (a) < "
a a
Since " was arbitrarily chosen, we reach the desired conclusion.

Thanks to Proposition 1940, whenever f and g are also di erentiable we get


Z b Z b
f g 0 dx + gf 0 dx = f (b) g (b) f (a) g (a)
a a
thus obtaining the integration by parts formula (44.67) for Riemann's integral.

47.7 Change of variable


The next theorem, whose simple yet tedious proof we omit, establishes the change of variable
formula for Stieltjes' integral.

Theorem 1947 Let f : [a; b] ! R be continuous and g : [a; b] ! R increasing. If ' :


[c; d] ! [a; b] is a continuous strictly increasing function, then (f ') is Stieltjes integrable
with respect to g ', with
Z d Z '(d)
f (' (t)) d (g ') (t) = f (x) dg (x) (47.29)
c '(c)

If ' is surjective, we can just write


Z d Z b
f (' (t)) d (g ') (t) = f (x) dg (x)
c a

If on top both ' and g are di erentiable, by Proposition 1940 we then have
Z d Z b
f (' (t)) g 0 (' (t)) '0 (t) dt = f (x) dg (x)
c a

In particular, if g (x) = x we get back to the Riemann formula (44.69), that is,
Z d Z b
f (' (t)) '0 (t) dt = f (x) dx
c a

The more general Stieltjes formula thus clari es the nature of this earlier formula, besides
extending its scope. After integration by parts, the change of variable formula is thus another
result that is best understood in terms of the Stieltjes' integral.

If g is continuous and strictly increasing, then g 1 is well de ned, strictly increasing,


and continuous (cf. Proposition 578). By setting g 1 = ' in (47.29) we get the noteworthy
formula Z Z
g(b) b
1
f g (t) dt = f (x) dg (x)
g(a) a
When g is continuous and strictly increasing, the Stieltjes integral can be computed via a
Riemann integral. This result complements Proposition 1940, which showed that the same
is true, but with a di erent formula, when g is di erentiable.
1354 CHAPTER 47. STIELTJES' INTEGRAL

47.8 Modelling assets' gains


In this section we show that Stieltjes' integration naturally arises in modelling the perfor-
mance of a portfolio over a time interval [0; T ]. Speci cally, suppose for simplicity that there
is a single nancial asset that can be traded at price p (t) in a frictionless nancial market
which opens at each point in time t 2 [0; T ]. The function p : [0; T ] ! R+ thus represents
the asset's price at di erent points in time.
In this temporal setting, a portfolio is described by a function x : [0; T ] ! R, where
x (t) is the number of units of the asset held at t. If x (t) is positive the portfolio's position
at t is long, viceversa if x (t) is negative, then the position is short (cf. Section 33.3).
These positions are the outcome of some trading performed on the open market. Suppose
that for some reason we change the portfolio only a nite number of times. Although the
market is open at each t 2 [0; T ], we trade the asset only nitely many times. Thus, de ne
x : [0; T ] ! R by the step function
n
X1
x (t) = ci 1[ti 1 ;ti )
(t) + cn 1[tn 1 ;tn ]
(t)
i=1

where = fti gni=0 is a subdivision of [0; T ], that is, 0 = t0 < t1 < < tn 1 < tn = T . At
each time t 2 [ti 1 ; ti ) the portfolio x (t) thus features ci units of the asset, the outcome of
trading at time ti 1 . Till time ti the portfolio does not change, so no trading is made. The
last trading occurs at tn 1 , so at T the position does not change.8
How do portfolio's gains/losses cumulate over time? This is a most basic bookkeeping
question that we need to answer to assess a portfolio's performance. To this end, de ne the
integral function Gx : [0; T ] ! R, called gains' process, by the Stieltjes' integral
Z t
Gx (t) = x (s) dp (s) (47.30)
0

where x is the integrand and p is the integrator. Since x is a step function, it is easy to see
that
8
>
> c1 (p (t) p (t0 )) if t 2 [t0 ; t1 )
< P
k 1
Gx (t) = i=1 ci (p (ti ) p (ti 1 )) + ck (p (t) p (tk 1 )) if t 2 [tk 1 ; tk ) ; k = 2; :::; n
>
> Pn
:
i=1 ci (p (ti ) p (ti 1 )) if t = T

The gains' process describes how portfolio's gains/losses cumulate over time, thus answering
the previous question. To x ideas, suppose that each ci is positive { i.e., x 0 { and
consider t 2 [t0 ; t1 ). Throughout all the time interval [t0 ; t1 ), the portfolio x features c1 units
of the asset. These units were traded at time 0 at a price p (0) and at time t their price is
p (t). The change in price is p (t) p (t0 ), so the portfolio's gains/losses up to time t are

Gx (t) = c1 (p (t) p (t0 )) (47.31)

At time t1 , our position changed from c1 to c2 and then remained constant throughout the
time interval [t1 ; t2 ). To obtain this new position, we could have for example sold c1 at time
8
For simplicity, we do not consider any dividend, so the cumulated gains/losses only come from trading
(\capital gains" in the nance jargon).
47.9. CODA: BEYOND MONOTONICITY 1355

t1 and bought simultaneously c2 or just directly acquired the di erence c2 c1 . If markets are
frictionless, these possible trading strategies are equivalent. So, let us focus on the former.
It yields that, up to time t 2 [t1 ; t2 ), the portfolio's cumulated gain is

Gx (t) = c1 (p (t1 ) p (t0 )) + c2 (p (t) p (t1 )) (47.32)

Indeed, c1 (p (t1 ) p (t0 )) are the gains/losses matured in the period [0; t1 ] coming from
buying c1 units at 0 and selling them at time t1 , while c2 (p (t) p (t1 )) are the gains/losses
occurred between [t1 ; t), given by the new position c2 . By iterating this reasoning, the
Stieltjes' integral (47.30) follows immediately { indeed, (47.31) and (47.32) correspond to
t 2 [t0 ; t1 ) and t 2 [t1 ; t2 ) in such integral. In particular, if one operates in the market
throughout, from time 0 through time T , so to keep the long and short positions of portfolio
x, then one ends up with the gains/losses Gx (T ).
Finally, we can relax the assumption that portfolios are adjusted only nitely many times:
as long as functions x and p satisfy, for example, the hypotheses of Proposition 1939, the
gains' process de ned via the Stieltjes' integral (47.30) is well-de ned and can be interpreted
in terms of gains/losses. Also, as the next section will show, we do not need to assume that
p is increasing, which clearly is not a realistic assumption for prices.

47.9 Coda: beyond monotonicity


So far we considered increasing integrators g. We can actually do much better. To this end,
we will rst introduce a new class of functions, which will be then used as integrators.

47.9.1 Functions of bounded variation


De nition 1948 The ( rst) total variation of a function g : [a; b] ! R is the quantity
n
X
tg = sup jg (xi ) g (xi 1 )j
2 i=1

where the supremum is taken over all subdivisions = fxi gni=0 2 of [a; b].

Intuitively, the total variation describes the variability of a function. Indeed, because of
the absolute value, here the ups and downs of the function add up. So, the lower is the total
variation, the lower is the cumulative magnitude of the variations that a function features
(for instance, the reader can check that a function has zero total variation if and only if it is
constant, so it has no variability).
The next de nition singles out the class of functions that have nite variability.

De nition 1949 A function g : [a; b] ! R is said to be of bounded ( total) variation if its


total variation is nite, that is, tg < +1.

A function has bounded variation if its variability is nite, however large it can be.
Otherwise, we say that it is of unbounded variation.
A rst simple property:

Proposition 1950 Functions of bounded variation are bounded.


1356 CHAPTER 47. STIELTJES' INTEGRAL

Proof Let g : [a; b] ! R be of bounded variation. Given any x 2 (a; b), if we take the
subdivision fa; x; bg we have

max fjg (b) g (x)j ; jg (x) g (a)jg jg (b) g (x)j + jg (x) g (a)j tg

yielding that jg (b) g (x)j tg and jg (x) g (a)j tg . So, for all x 2 (a; b) we have
min fg (b) ; g (a)g tg g (x) max fg (b) ; g (a)g + tg , as desired. It is immediate to see
that this inequality holds for all x 2 [a; b], proving boundedness.

Bounded variation is preserved by linear combinations.9

Proposition 1951 If f; g : [a; b] ! R are two functions of bounded variation, then also the
function f + g is of bounded variation for all ; 2 R.

Proof Let f; g : [a; b] ! R be of bounded variation and let ; 2 R. For each subdivision
2 of [a; b], we have
n
X
j( f + g) (xi ) ( f + g) (xi 1 )j
i=1
n
X
= j (f (xi ) f (xi 1 )) + (g (xi ) g (xi 1 ))j
i=1
n
X n
X
j j jf (xi ) f (xi 1 )j + j j jg (xi ) g (xi 1 )j
i=1 i=1

So,
n
X n
X
sup j( f + g) (xi ) ( f + g) (xi 1 )j j j sup jf (xi ) f (xi 1 )j
2 i=1 2 i=1
n
X
+ j j sup jg (xi ) g (xi 1 )j < +1
2 i=1

We conclude that the function f + g is of bounded variation.

Monotone functions g : [a; b] ! R are easily seen to be of bounded variation. Indeed,


assume g is increasing (for decreasing functions a specular argument holds). For each sub-
division 2 of [a; b] we have
n
X n
X
jg (xi ) g (xi 1 )j = [g (xi ) g (xi 1 )] = g (b) g (a)
i=1 i=1

where the rst equality follows from the monotonicity of g because g (xi ) g (xi 1 ). The
next key result shows that monotonicity and bounded variation are closely connected, thus
clarifying the nature of functions of bounded variation.
9
So the space of the functions of bounded variation is an example of a vector space, as readers will study
in more advanced courses.
47.9. CODA: BEYOND MONOTONICITY 1357

Theorem 1952 (Jordan) A function g : [a; b] ! R is of bounded variation if and only if


there exist two increasing functions g1 ; g2 : [a; b] ! R such that

g = g1 g2 (47.33)

If, in addition, g is continuous, then g1 and g2 can be chosen to be continuous.

In words, a function is of bounded variation if and only if it can be written as the di erence
of two increasing functions. In particular, (47.33) is called the Jordan decomposition of g.
Such decomposition is not unique: given any increasing function h : [a; b] ! R, we also have
g = (g1 + h) (g2 + h) (observe that g1 + h and g2 + h are increasing).

Proof \If". Let g = g1 g2 with g1 ; g2 : [a; b] ! R increasing. As argued above, g1 and g2


are of bounded variation. By setting = 1 and = 1, Proposition 1951 yields that g is of
bounded variation.
\Only if". Let g : [a; b] ! R be of bounded variation. De ne p; n; t : [a; b] ! R by

n
X n
X
+
p (x) = sup [g (xi ) g (xi 1 )] = sup max fg (xi ) g (xi 1 ) ; 0g
2 x i=1 2 x i=1
Xn Xn
n (x) = sup [g (xi ) g (xi 1 )] = sup min fg (xi ) g (xi 1 ) ; 0g
2 x i=1 2 x i=1
Xn
t (x) = sup jg (xi ) g (xi 1 )j
2 x i=1

where the supremum is taken over all subdivisions = fxi gni=0 2 x of [a; x], that is,
a = x0 < x1 < < xn = x. Since g is of bounded variation, we have t (x) 2 [0; 1)
for all x 2 [a; b]. From 0 p (x) t (x) and 0 n (x) t (x), it then follows that
p (x) ; n (x) 2 [0; 1) for all x 2 [a; b]. It is easy to see that p, n and t are increasing
functions.
In view of (44.13), we have that for each x 2 [a; b] for each 2 x

n
X n
X
+
[g (xi ) g (xi 1 )] [g (xi ) g (xi 1 )]
i=1 i=1
n
X n
X
= max fg (xi ) g (xi 1 ) ; 0g ( min fg (xi ) g (xi 1 ) ; 0g)
i=1 i=1
Xn
= [max fg (xi ) g (xi 1 ) ; 0g + min fg (xi ) g (xi 1 ) ; 0g]
i=1
Xn
= [g (xi ) g (xi 1 )] = g (x) g (a)
i=1
1358 CHAPTER 47. STIELTJES' INTEGRAL

So, for all x 2 [a; b] we have:


n
X +
p (x) = sup [g (xi ) g (xi 1 )]
2 x i=1
( n )
X
= sup [g (xi ) g (xi 1 )] + g (x) g (a)
2 x i=1
n
X
= g (x) g (a) + sup [g (xi ) g (xi 1 )] = g (x) g (a) + n (x)
2 x i=1

That is,
g (x) = p (x) [n (x) g (a)] 8x 2 [a; b] (47.34)
Since the functions p : [a; b] ! R and n g (a) : [a; b] ! R are both increasing, we conclude
that g is the di erence of two increasing functions de ned on [a; b]. So, (47.34) is the sought-
after decomposition.
Assume that g is continuous. We show that t : [a; b] ! R is continuous. Let x 2 (a; b].
We rst show that t is continuous at x from the left, i.e., limx!x t (x) = t (x). Let xk " x.
Fix " > 0. By Theorem 603, g is uniformly continuous. So, there exists > 0 such that
"
x x0 < =) g (x) g x0 < 8x; x0 2 [a; b] (47.35)
2
By the de nition of t, there exists a subdivision 2 x of [a; x] such that
n
X "
t (x) jg (xi ) g (xi 1 )j
2
i=1

and x xn 1 < . Otherwise, if x xn 1 one can always add points to the subdivision
,
Pn something that in any case preserves the last inequality because it increases the term
i=1 jg (xi ) g (xi 1 )j. So, by (47.35) we have jg (x) g (xn 1 )j < "=2. In turn, this implies
n
X1
t (x) t (xn 1) t (x) jg (xi ) g (xi 1 )j " (47.36)
i=1

Since xk " x, there exists k" such that xn 1 xk x for all k k" . Since t is increasing,
from (47.36) we have
0 t (x) t (xk ) " 8k k"
This implies limx!x t (x) = t (x), as desired. A similar argument shows that limx!x+ t (x) =
t (x) for x 2 [a; b), that is, t is right continuous. So, limx!x t (x) = t (x) and we conclude that
t is continuous at x for all x 2 [a; b]. In turn, this implies that the functions p; n : [a; b] ! R
are both continuous. Indeed, from t = p + n and (47.34) it follows that
t (x) g (x) + g (a) t (x) + g (x) g (a)
n (x) = and p (x) =
2 2
If g is continuous, then the increasing functions p : [a; b] ! R and n g (a) : [a; b] ! R are
both continuous.

In view of Jordan's Theorem, functions of bounded variation inherit the following re-
markable continuity property of monotone functions (cf. Proposition 564).
47.9. CODA: BEYOND MONOTONICITY 1359

Corollary 1953 A function of bounded variation can have at most countably many jump
discontinuities.

As readers will learn in more advanced courses, monotone functions de ned on an interval
are di erentiable at \almost" every point. Jordan's Theorem allows us to conclude that also
functions of bounded variation are di erentiable at almost every point. Thus, we can say
that nowhere di erentiable functions, in particular Weierstrass' monsters (Section 26.15),
are examples of functions that are of unbounded variation. Indeed, they have graphs like

with frantic ups and downs that add up to +1. Moreover, this observation shows that
continuity is not su cient to guarantee bounded variations (Weierstrass' monsters are con-
tinuous functions). The result below though shows that a stronger form of continuity, namely
Lipschitz, is enough.

Proposition 1954 A Lipschitz continuous function g : [a; b] ! R is of bounded variation.

Proof By hypothesis, there exists a positive scalar k > 0 such that

jg (x1 ) g (x2 )j k jx1 x2 j 8x1 ; x2 2 [a; b]

So,
n
X n
X
jg (xi ) g (xi 1 )j k (xi xi 1) = k (b a)
i=1 i=1

for each subdivision = fxi gni=0 in of [a; b]. We conclude that


n
X
sup jg (xi ) g (xi 1 )j k (b a) < +1
2 i=1

as desired.

We close with a \di erential" criterion of bounded variation (cf. Example 896), which
leads to an interesting integral characterization of total variation that sheds further light on
its nature of a variability measure.
1360 CHAPTER 47. STIELTJES' INTEGRAL

Proposition 1955 Let g : [a; b] ! R be di erentiable. If the derivative function g 0 : [a; b] !


R is bounded, then the function g is of bounded variation. If, in addition, g 0 is integrable,
then Z b
tg = g 0 (x) dx (47.37)
a

Proof Since g 0 is bounded, there exists a constant k > 0 such that jg 0 (x)j k for all
x 2 (a; b). Let x1 ; x2 2 [a; b]. Without loss of generality, assume that x1 x2 . If x1 = x2 ,
then trivially jg (x2 ) g (x1 )j k jx2 x1 j. By the Mean Value Theorem, if x1 < x2 , then
there exists x
^ 2 (x1 ; x2 ) such that

jg (x2 ) g (x1 )j
= g 0 (^
x) k
jx2 x1 j

So, g is Lipschitz continuous. By the last proposition, g is of bounded variation.


Assume that g 0 is integrable. So, jg 0 j is also integrable (see Section 44.4). Consider a
subdivision = fxi gni=0 2 of [a; b]. For each pair xi 1 < xi , by the Mean Value Theorem,
^i 2 (xi 1 ; xi ) such that jg (xi ) g (xi 1 )j = jg 0 (^
there exists x xi )j (xi xi 1 ). So,

inf g 0 (x) (xi xi 1) g 0 (^


xi ) (xi xi 1) sup g 0 (x) (xi xi 1)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

which in turn implies


n
X
I g0 ; = inf g 0 (x) (xi xi 1)
x2[xi 1 ;xi ]
i=1
n
X
jg (xi ) g (xi 1 )j
i=1
n
X
sup g 0 (x) (xi xi 1) =S g0 ;
i=1 x2[xi 1 ;xi ]

Since jg 0 j is integrable, this implies that


Z b n
X
g 0 (x) dx = sup I g0 ; sup jg (xi ) g (xi 1 )j
a 2 2 i=1

By contradiction, assume that


Z b n
X
g 0 (x) dx < sup jg (xi ) g (xi 1 )j
a 2 i=1

Rb
Since jg 0 j is integrable, it follows that a jg 0 (x)j dx = inf 2 S (jg 0 j ; ), yielding that
n
X
0
inf S g ; < sup jg (xi ) g (xi 1 )j
2 2 i=1
47.9. CODA: BEYOND MONOTONICITY 1361

0 00
It follows that there exist two subdivisions 0 = fx0i gni=0 2 and 00 = fx00i gni=0 2 such
that
Xn0 Xn00
0 0 0
sup g (x) xi xi 1 < g x00i g x00i 1
i=1 x2 x 0
[ i 1 i]
;x 0
i=1

De ne the common re nement by = 0 [ 00 = fxi gni=0 . It follows that


n n 0
X X
0
sup g (x) (xi xi 1) sup g 0 (x) x0i x0i 1
i=1 x2[xi 1 ;xi ] i=1 x2[ x0i 1 ;x0i ]
n00 n
X X
< g x00i g x00i 1 jg (xi ) g (xi 1 )j
i=1 i=1
P
a contradiction with ni=1 jg (xi ) g (xi 1 )j S (jg 0 j ; ) for all 2 (for the last inequality
Rb Pn
above cf. Section 47.9.3). It follows that a jg 0 (x)j dx = sup 2 i=1 jg (xi ) g (xi 1 )j.
R1
Example 1956 Since 2 1 jxj = 2, by formula (47.37) the quadratic function f (x) = x2
has total variation 2 on the interval [ 1; 1]. N

Example 1957 In Example 1372, we saw that the oscillating (so highly non-monotone)
function g : R ! R de ned by
( 2
x sin x1 x 6= 0
g (x) =
0 x=0
is di erentiable, with (
0
2x sin x1 cos x1 x 6= 0
g (x) =
0 x=0
By the last proposition, g is of bounded variation on [ 1; 1]. In contrast, the reader can
check that the function g : R ! R de ned by
(
x sin x1 x 6= 0
g (x) =
0 x=0
is not of bounded variation on [ 1; 1]. N

47.9.2 A general Stieltjes integral


Via the Jordan decomposition, we can extend the de nition of Stieltjes integral to integrators
of bounded variation.

De nition 1958 A bounded function f : [a; b] ! R is said to be integrable in the sense of


Stieltjes (or Stieltjes integrable) with respect to a function g of bounded variation if there
exists a Jordan decomposition g = g1 g2 such that f is integrable in the sense of Stieltjes
with respect to g1 and g2 . In this case, the di erence
Z b Z b
f dg1 f dg2 (47.38)
a a
1362 CHAPTER 47. STIELTJES' INTEGRAL

Rb
is denoted by a f (x)dg(x) and is called integral in the sense of Stieltjes (or Stieltjes' integral)
of f with respect to g on [a; b].

In the special case when g is increasing, we get back to the earlier de nition of Stielt-
jes' integral. More importantly, this de nition is well posed. Indeed, consider two Jordan
decompositions g = g1 g2 = g10 g20 . Then, g1 + g20 = g10 + g2 and so by (47.21) we have
Z b Z b Z b Z b Z b Z b
f dg1 + f dg20 = f d g1 + g20 = fd g10 + g2 = f dg10 + f dg2
a a a a a a

In turn, this implies


Z b Z b Z b Z b
f dg1 f dg2 = f dg10 f dg20
a a a a

The value of the integral is thus independent of the speci c Jordan decomposition considered,
so the de nition is well posed.
Integrators of bounded variations substantially extend the scope of Stieltjes' integrals.
For example, in the gains' process (47.30) we can consider any price function p of bounded
variation, not necessarily increasing (a demanding assumption).

Next we extend signi cantly the scope of Proposition 1938.


Rb
Proposition 1959 Let g : [a; b] ! R be of bounded variation. The integral a f dg exists if
at least one of the following two conditions is satis ed:

(i) f is continuous;

(ii) f is of bounded variation and g is continuous.

Proof (i) Since g is of bounded variation, there exist two increasing functions g1 ; g2 : [a; b] !
Rb Rb
R such that g = g1 g2 . By Proposition 1938, the integrals a f dg1 and a f dg2 exist. So,
Rb
the integral a f dg exists and is given by the di erence (47.38).
(ii) Since f and g are of bounded variation, there exist increasing functions f1 ; f2 ; g1 ; g2 :
[a; b] ! R such that f = f1 f2 and g = g1 g2 . In particular, since g is continuous,
Rweb can assume
Rb thatR both g1 and Rg2 are continuous. By Proposition 1938, theR integrals
b b b
f dg
1 1 , a 1 2 ,R a f2 dg1 R
f dg and a f2 dg2 exist. So, by (47.20) the integrals a f dg1 =
Rab b b R b R b R b Rb
a (f1 f2 ) dg1 = a f1 dg1 a f2 dg1 and a f dg2 = aR(f1 f2 ) dg2 = a f1 dg2
Rb Rb a f2 dg2
b
exist. In turn, this implies the existence of the integral a f dg = a f dg1 a f g2 .

A consequence of the last proposition is the following general integration by parts formula
that greatly extends the earlier formula (47.28).

Proposition 1960 Given any two continuous functions of bounded variation f; g : [a; b] !
R, it holds
Z b Z b
f dg + g df = f (b) g (b) f (a) g (a)
a a
47.9. CODA: BEYOND MONOTONICITY 1363

Rb Rb
Proof The integrals a f dg and a g df exist by the last proposition. Let f1 ; f2 ; g1 ; g2 :
Rb Rb
[a; b] ! R be as in the last proof. By Proposition 1938, the integrals a f1 dg1 , a f1 dg2 ,
Rb Rb
a f2 dg1 and a f2 dg2 exist. Then,
Z b Z b Z b Z b Z b Z b
f dg + g df = f1 dg1 f2 dg1 f1 dg2 + f2 dg2
a a a a a a
Z b Z b Z b Z b
+ g1 df1 g2 df1 g1 df2 + g2 df2
a a a a
Z b Z b Z b Z b
= f1 dg1 + g1 df1 f2 dg1 + g1 df2
a a a a
Z b Z b Z b Z b
f1 dg2 + g2 df1 + f2 dg2 + g2 df2
a a a a
= f1 (b) g1 (b) f1 (a) g1 (a) (f2 (b) g1 (b) f2 (a) g1 (a))
(f1 (b) g2 (b) f1 (a) g2 (a)) + f2 (b) g2 (b) f2 (a) g2 (a)
= f1 (b) g (b) f1 (a) g (a) f2 (b) g (b) f2 (a) g (a)
= f (b) g (b) f (a) g (a)
as desired.

Other results established for increasing integrators extend to bounded variation ones, as
readers can check. Here we close by noting that full- edged linearity holds for the general
Stieltjes integral with integrators of bounded variation: if g1 ; g2 : [a; b] ! R are functions of
bounded variation, then
Z b Z b Z b
f d( g1 + g2 ) = f dg1 + f dg2 8 ; 2R
a a a

In contrast, in property (ii) of Section 47.4 we remarked that for increasing integrators only
positive coe cients and were permitted.

47.9.3 Variability and volatility


Continuous nowhere di erentiable functions are important in applications, as remarked in
Section 26.15, yet they have unbounded variation, so in nite variability. Is this the end of
the story or, instead, functions of unbounded variation can di er according to a notion of
\second-order variability" that we may call \volatility"? This is a relevant question: for
instance, if di erent functions of unbounded variation describe the time evolution of some
economic variable of interest { e.g., a stock price { it can be important to rank them according
to this \volatility" even though their ( rst-order) variability is in nite for all them.
To address this issue, given a function g : [a; b] ! R and a subdivision = fxi gni=0 of
[a; b], let
Xn
tg = jg (xi ) g (xi 1 )j
i=1
be the variation associated to this subdivision. We have the following simple monotonicity
property:
0 0
=) tg tg (47.39)
1364 CHAPTER 47. STIELTJES' INTEGRAL

In words, ner subdivisions feature higher variations. To see why this is the case, take the
unit interval and let
1 0 1 1 3
= 0; ; 1 and = 0; ; ; ; 1 (47.40)
2 4 2 4

Then
1 1 1 1 1
tg = g g (0) + g (1) g = g g +g g (0)
2 2 2 4 4
3 3 1
+ g (1) g +g g
4 4 2
1 1 1 3 3 1 0
g g + g g (0) + g (1) g + g g = tg
2 4 4 4 4 2

A similar argument, just notationally messier, proves (47.39).


So, the total variation of a function can be seen as the limit, for ner and ner subdivi-
sions, of the variations tg . As we did in (44.28) for the Riemann integral, we can then write
in a suggestive way
tg = lim tg (47.41)
j j!0

where ner subdivisions are identi ed via smaller meshes j j. Next we make rigorous this
notion of limit along subdivisions.

Proposition 1961 Given a continuous function g : [a; b] ! R, a quantity L 2 [0; 1) is the


total variation tg if and only if for every " > 0 there exists " > 0 such that

j j< " =) L tg < " 8 2 (47.42)

In this case, we write (47.41).

When the total variation is in nite, i.e., tg = +1, this limit characterization has a
natural version. In particular, we write limj j!0 tg = +1 when, for every M > 0, there
exists M > 0 such that
j j < M =) tg > M 8 2
The analogy with (12.9) is obvious.

Proof \Only if" We assume that L = tg and show that (47.42) holds.10 Fix " > 0. Since
sup 2 tg = tg = L 2 [0; 1), there exists a subdivision ~ = f~ xi gni=0
~
such that tg tg~ < "=2.
By (47.39), we have
"
tg tg < 8 ~ (47.43)
2
Since g is continuous on [a; b], by Theorem 603 it is also uniformly continuous on [a; b]. So,
there exists ~ > 0 such that jx yj < ~ implies
"
jf (x) f (y)j <
4~
n
10
The proof is based on Wheeden and Zygmund (2015) p. 22.
47.9. CODA: BEYOND MONOTONICITY 1365
n o
for all x; y 2 [a; b]. Let " = min ~; m and m = minfi:~xi 6=x~i 1g
j~
xi x
~i 1j > 0.
Let = fxi gni=0
2 be any subdivision such that j j < " . Since " m, each interval
(xi 1 ; xi ) can contain at most one element of the subdivision ~ . Denote by I the collection
of i 2 f1; :::; ng such that (xi 1 ; xi ) contains one element of ~ . Clearly, I has at most n
~
elements. This implies
n
X X X
tg = jf (xi ) f (xi 1 )j = jf (xi ) f (xi 1 )j + jf (xi ) f (xi 1 )j
i=1 i2I i2I
=
X X
(jf (xi ) f (~
xi )j + jf (~
xi ) f (xi 1 )j) + jf (xi ) f (xi 1 )j = tg [~
i2I i2I
=

where x
~i is the unique element of the subdivision ~ that belongs to (xi 1 ; xi ). We have
X " "
(jf (xi ) f (~
xi )j + jf (~
xi ) f (xi 1 )j) 2~
n =
4~
n 2
i2I

because for each i 2 I both jf (~xi ) f (xi 1 )j and jf (xi ) f (~


xi )j are < "=4~
n since both
j~
x xi 1 j and jxi x ~j are < ~. So,
X X "
tg jf (xi ) f (xi 1 )j = tg [~ (jf (xi ) f (~
xi )j + jf (~
xi ) f (xi 1 )j) tg [~
2
i2I
= i2I

In turn, this implies "=2 tg [~ tg 0. By (47.43), tg tg [~ < "=2. We thus conclude


that
" "
0 tg t g = tg + ="tg [~ + tg [~ tg <
2 2
as desired. \If" We assume that (47.42) holds and we show that L = tg . First, we observe
that tg is nite. Indeed, if tg was not nite, then there would exists 2 such that
tg > L + 1. Consider " = 1. By (47.42), there exists " > 0 such that

j j< " =) L tg < 1 8 2

Let ^ be any subdivision such that j^ j < " . Consider 0 = ^ [ . It is immediate to see
0
that j 0 j < " and, by (47.39), that tg tg > L + 1 > L. This would imply that
0 0
1 < tg L= L tg <1

a contradiction, proving that tg is nite. We are left to show that indeed tg = L. Let " > 0.
Since tg is nite, there exists a subdivision ~ such that tg tg~ < "=2. Moreover, recall that
"
tg tg < 8 ~
2
Since (47.42) holds, let be any subdivision such that j j < 2" so that L tg < 2" . If we
consider 00 = [ ~ , it follows that 00 ~ and j 00 j < 2" , yielding that

00 00 " "
jtg Lj tg tg + tg L < + ="
2 2
1366 CHAPTER 47. STIELTJES' INTEGRAL

Since " > 0 was arbitrarily chosen, this implies that jtg Lj = 0, that is, tg = L.

N.B. In a similar spirit, we can formalize the suggestive limit (44.28) as follows: a function
Rb
f : [a; b] ! R is Riemann integrable, with a f (x) dx = I 2 R, if and only if for every " > 0
there exists " > 0 such that
n
X
j j< " =) f x0i xi I <"
i=1

for any chosen fx0i g, with x0i 2 [xi 1 ; xi ], and any subdivision 2 . In this case, we write
n
X Z b
lim f x0i xi = f (x) dx
j j!0 a
i=1

The limit (47.41) clari es that only arbitrarily ne subdivisions matter for total varia-
tion. For all scalars a we trivially have a2 jaj if jaj 1. If g is continuous on [a; b], so
uniformly continuous, over such arbitrarily ne subdivisions we have (g (xi ) g (xi 1 ))2
jg (xi ) g (xi 1 )j. One can then conjecture that, by assessing variations over subdivisions
via squares rather than via absolute values, one may get a smaller notion of variation.
All this motivates the following de nition.

De nition 1962 The second (total) variation of a function g : [a; b] ! R is the quantity
n
X 2
t2g = sup (g (xi ) g (xi 1 ))
2 i=1

where the supremum is taken over all subdivisions = fxi gni=0 2 of [a; b].

Variations are now described through squares rather than absolute values. Remarkably,
a continuous function of bounded variation has zero second variation.

Proposition 1963 If a continuous function g : [a; b] ! R is of bounded variation, then it


has zero second variation.

Proof Assume that tg < +1. By Theorem 603, g is uniformly continuous on [a; b]. Fix
" > 0. There exists " > 0 such that jxi xi 1 j < " implies jg (xi ) g (xi 1 )j < ". Now,
0
x a subdivision = fxi gni=0 and take a ner subdivision 0 = fx0i gni=0 such that j 0 j < " .
Then
n
X n
X
2
(g (xi ) g (xi 1 )) = jg (xi ) g (xi 1 )j jg (xi ) g (xi 1 )j
i=1 i=1
n 0
X
g x0i g x0i 1 g x0i g x0i 1
i=1
Xn
" g x0i g x0i 1 "tg
i=1
47.9. CODA: BEYOND MONOTONICITY 1367

Since the subdivision was arbitrarily chosen, it follows that


n
X 2
t2g = sup (g (xi ) g (xi 1 )) "tg
2 i=1

Since " was arbitrarily chosen, we conclude that t2g = 0 as desired.

In view of this result, a function can have either nite variability or in nite variability and
nite volatility or both in nite variability and volatility. For two functions f; g : [a; b] ! R,
we thus have the following mutually exclusive comparisons:
(i) f exhibits less variability than g if tf tg < +1;
(ii) f exhibits less volatility than g if tf = tg = +1 and t2f t2g < +1.
That said, let
n
X 2
tg ;2 = (g (xi ) g (xi 1 ))
i=1
;2
so that t2g= sup 2 tg . Remarkably, the monotonicity property (47.39) no longer holds
for the second variation. Indeed, take for instance the subdivisions and 0 in (47.40) and
0
consider the identity function g (x) = x; we have tg ;2 = 1=2 > tg ;2 = 1=4.
The failure of the monotonicity property is a major di erence between total variation
and second variation. In particular, it means that the limit limj j!0 t2;
g is trickier to handle
and its relations with t2g are less clear. That said, we close with a notion of second-order
variation de ned directly through limits. To this end, say that a sequence of subdivisions
k is tightly nested if k k+1 for all k 1 and limk!+1 j k j = 0. So, subdivisions in a
tightly nested sequence are nested one into another { i.e., subdivision k+1 is obtained from
subdivision k by adding one or more points { and their meshes vanish.
De nition 1964 The quadratic variation of a function g : [a; b] ! R is the quantity
nk
X 2
s2g = lim (g (xi ) g (xi 1 ))
k!+1
i=1
k nk
where the limit is taken along a tightly nested sequence of subdivisions = xki i=0
2 of
[a; b].
For the quadratic variation to be well-de ned, the limit has to be independent of the
speci c tightly nested sequence considered. In this case, we clearly have s2g t2g . A function
might thus have nite quadratic variation and yet in nite second variation. Indeed, in
probability theory this notion is used { for instance, in dealing with Brownian phenomena {
as readers will learn in more advanced courses.
For two functions f; g : [a; b] ! R, we now have three mutually exclusive comparisons:
(i) f exhibits less variability than g if tf tg < +1;
(ii) f exhibits less volatility than g if tf = tg = +1 and t2f t2g < +1;

(iii) f exhibits less \quadratic" volatility than g if t2f = t2g = +1 and s2f s2g < +1.
1368 CHAPTER 47. STIELTJES' INTEGRAL
Chapter 48

Introductory probability theory

48.1 Measures
Let 2 be the power set of a set , that is, the collection

2 = fA : A g

of all its subsets, typically denoted by A and B. When is nite, with cardinality j j, the
cardinality of the power set 2 is 2j j .1

De nition 1965 A set function : 2 ! R is a rule that associates to each subset A of


one, and only one, scalar (A).

Therefore, set functions are functions with domain the power set 2 and codomain the
real line.2

Example 1966 (i) Let = f! 1 ; ! 2 ; ! 3 g be a set with three elements. Its power set

2 = f;; f! 1 g ; f! 2 g ; f! 3 g ; f! 1 ; ! 2 g ; f! 1 ; ! 3 g ; f! 2 ; ! 3 g ; g

has cardinality 23 = 8 (cf. Example 281). A set function : 2 ! R is given by

(;) = 0 ; (f! 1 g) = 4 ; (f! 2 g) = 5 ; (f! 3 g) = 3


(f! 1 ; ! 2 g) = 8 ; (f! 1 ; ! 3 g) = 6 ; (f! 2 ; ! 3 g) = 4 ; ( ) = 12

(ii) Let be the set of all citizens of a country. A subset A of represents a group of
citizens. A basic set function is the counting measure : 2 ! R that associates to each
group of citizens the number of its members, i.e.,

(A) = jAj 8A

Assume that, say using data from a population census, we can construct the function :
! R that associates the age (!) to each citizen !. The set function : 2 ! R de ned
1
See Proposition 280. If we write in extenso = f! 1 ; :::; ! n g, the cardinality of 2 is 2n .
2
In the notation of De nition 177, we have A = 2 and B = R.

1369
1370 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

by 8 X
< 1
> (!) if A 6= ;
(A) = jAj !2A
>
: 0 if A = ;
indicates the average age of each group A of citizens.3 For example, if A is the subset of the
female citizens, (A) is their average age within the country. On the other hand, ( ) is
the average age within the country. N

Next we introduce a few basic properties of set functions.

De nition 1967 A set function : 2 ! R is:

(i) grounded if (;) = 0;

(ii) positive if (A) 0 for all A;

(iii) monotone if (A) (B) whenever A B;

(iv) additive if (A [ B) = (A) + (B) whenever A \ B = ;;

(v) normalized if ( ) = 1.

The meaning of these properties is quite natural. Next we illustrate them.

Example 1968 (i) The counting measure is readily seen to satisfy all these properties
except the last one, that is, it is grounded, positive, monotone and additive, but not nor-
malized.
(ii) The average age set function is grounded and positive, but does not satisfy the
other properties. Intuitively, is not monotone because, by enlarging a group, the average
age can either increase or decrease: for instance, the average age of a group of undergraduate
students increases (decreases) if seniors (toddlers) join them. As to additivity, for each pair
of nonempty disjoint subsets A and B we have only the subadditive property (A [ B) <
(A) + (B). Indeed,
X X X
(!) (!) + (!)
!2A[B !2A !2B
(A [ B) = =
jA [ Bj jA [ Bj
X X
(!) (!)
!2A !2B
= +
jA [ Bj jA [ Bj
X X
jAj (!) jBj (!)
!2A !2B
= +
jA [ Bj jAj jA [ Bj jBj
X X
(!) (!)
!2A !2B
< + = (A) + (B)
jAj jBj

Finally, it is obvious that, in general, is not normalized.


3
We considered the empty set separately because the denominator vanishes in the average formula.
48.1. MEASURES 1371

(iii) Let be the set of all taxpayers of a country. Now, let : ! [0; 1) be the
function that indicates, for each taxpayer !, the amount of taxes paid (!). The set function
: 2 ! [0; 1) de ned by X
(A) = (!)
!2A

records the total amount of taxes paid by a group A of taxpayers. It is easy to see that
the taxpayer set function is, like the counting measure, grounded, positive, monotone and
additive, but not normalized. Its normalized version is given by the set function : 2 !
[0; 1) de ned by = = ( ), i.e., by
1 X
(A) = (!) 8A
( )
!2A

Since ( ) is the total amount of taxes collected in the country, (A) is the proportion of
taxes paid by a group A of taxpayers. For example, if A consists of the taxpayers working in
the industrial sector and (A) = 1=4, it means that these taxpayers bear 25% of the overall
tax burden. The set function satis es all the properties (i)-(v). N

The previous properties permit to introduce a fundamental class of set functions.

De nition 1969 A grounded, positive and additive set function is called ( positive) mea-
sure.

In this case, we can write


: 2 ! [0; 1)
The counting measure is easily seen to be a measure, thus explaining its \measure" name.
Also the taxpayer set function is a measure, while the average age set function is not a
measure (as it is not additive).
Interestingly, measures are monotone: larger sets get a higher measure.

Proposition 1970 Each measure is monotone.

Proof Consider two subsets A and B of such that B A. De ne C = B A. Since


B A, it is immediate to see that B = C [ A and C \ A = ;. Since is additive and
positive, we have
(B) = (C [ A) = (C) + (A) (A)
proving monotonicity.

For measures additivity can be extended to nite collections.

Proposition 1971 Let : 2 ! [0; 1) be a measure. For every nite collection fA1 ; :::; An g
of pairwise disjoint subsets of ,4 it holds
n
! n
[ X
Ai = (Ai ) (48.1)
i=1 i=1
4
That is, Ai \ Aj = ; for all distinct indices i; j 2 f1; :::; ng.
1372 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

This property is called nite additivity. It generalizes additivity, which is the special case
n = 2.

Proof We proceed by induction. Initial step: for n = 2 equality (48.1) holds by the additivity
of the measure . Induction step: suppose that this equality holds for n 1 (induction
hypothesis). We want to show that it holds for n. Consider a collection of pairwise disjoint
[n 1
events fA1 ; :::; An g. Set A = Ai . It holds Ai \ An = ; for all i = 1; :::; n 1. Thus,
i=1

n[1
! n[1
A \ An = Ai \ An = (Ai \ An ) = ;
i=1 i=1

[n 1 Pn 1
By the induction hypothesis, Ai = i=1 (Ai ). Since is additive, we conclude
i=1
that
n
! n[1
! !
[
Ai = Ai [ An = (A [ An ) = (A) + (An )
i=1 i=1
n[1
! n
X1 n
X
= Ai + (An ) = (Ai ) + (An ) = (Ai )
i=1 i=1 i=1

as desired.

When the set A is nite, say A = f! 1 ; :::; ! n g, by nite additivity we have:5


n
X
(A) = (! i ) (48.2)
i=1

Thus, the measure of a nite set is uniquely pinned down by the measures of its elements.
Indeed, the singletons f! i g are pairwise disjoint and so, by (48.1),
n
! n
[ X
(A) = (f! 1 ; :::; ! n g) = f! i g = (! i )
i=1 i=1

It is often convenient to write equality (48.2) more compactly as


X
(A) = (!) (48.3)
!2A

without using indices. In particular, when the space itself is nite we can write
X
( )= (!)
!2

Additivity has an interesting generalization.


5
To ease notation, we write (!) in place of (f!g).
48.2. PROBABILITIES 1373

Proposition 1972 For a measure : 2 ! [0; 1) it holds

(A [ B) + (A \ B) = (A) + (B) (48.4)

for all subsets A and B of .

Formula (48.4) reduces to additivity when A and B are disjoint, so that (A \ B) =


(;) = 0. In the case of the counting measure, this formula becomes

jA [ Bj + jA \ Bj = jAj + jBj

which is an interesting cardinality relation.

Proof Let A; B . We have

A = (A B) [ (A \ B) and (A B) \ (A \ B) = ;

Since is additive, this implies that (A) = (A B) + (A \ B), that is,

(A B) = (A) (A \ B)

Similarly, by interchanging the roles of A and B we get

(B A) = (B) (A \ B)

Let us write A [ B as the union

A [ B = (A B) [ (A \ B) [ (B A)

of three pairwise disjoint subsets. By nite additivity, we then have

(A [ B) = (A B) + (A \ B) + (B A)
= (A) (A \ B) + (A \ B) + (B) (A \ B)
= (A) + (B) (A \ B)

as desired.

48.2 Probabilities
48.2.1 Generalities
The most important class of measures : 2 ! [0; 1) are the normalized ones, i.e., those
for which ( ) = 1. They play a fundamental role in the study of uncertainty, as their name
suggests.

De nition 1973 A normalized measure is called probability measure.

In this case, we can write


P : 2 ! [0; 1]
with P in place of . Thus, a probability measure (a probability, for short) P is a set function
satisfying the following properties:
1374 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

(i) P (A) 0 for all A;

(ii) P (A [ B) = P (A) + P (B) whenever A \ B = ;;

(iii) P (;) = 0 and P ( ) = 1.

We can write P : 2 ! [0; 1] because a probability takes values only in [0; 1]. Indeed, by
monotonicity (Proposition 1970) we have, for each A ,

0 = P (;) P (A) P( )=1

Probabilities are, as previously mentioned, the protagonists of the study of uncertainty.


In this case, the elements ! of the space are the possible future contingencies that a ect
the outcome of the (economic) agents' actions.6 For instance, when the agent is a producer
that has to decide, today, how much output to produce and bring tomorrow to the market,
the relevant contingency is tomorrow's market price. Thus, is the set of all possible values
that this market price may take, say = [0; 1) to allow for any positive price. Now, P (!)
becomes the probability that producers assign to the value ! 0 of the price.
In the study of uncertainty the elements ! of are called states (of the world or of
nature), the set is called the state space and its subsets A are called events. With this,
P (A) is the probability that event A obtains. In particular, when A is a singleton f!g,
P (!) becomes the probability that state ! obtains. For our producer, the state space is the
collection [0; 1) of all possible tomorrow's market prices ! 0 for the output and a generic
set A [0; 1) of possible prices is an event. So, P (A) is the probability according to the
producer that tomorrow's market price belongs to A, i.e., that event A obtains.
The study of probability emerged in the sixteen and seventeen centuries in the study of
games of chance. It is therefore natural to present some basic examples of state spaces that
arise with them.

Example 1974 (i) An agent bets on the outcome of the toss of a single coin, winning
(losing) if the coin lands heads (tails) up. There are two states, Head and Tail, so the state
space is
= fH; T g
Its power set
2 = f;; fHg ; fT g ; g
consists of 22 = 4 events. When, instead, the bet depends on the toss of two coins, the state
space becomes the Cartesian product:7

= fH; T g fH; T g = fT H; T T; HT; HHg

Its power set now consists of 24 = 16 events. For instance, the event A = fHH; HT g obtains
when \the rst toss is heads", while the event B = fHT; T Hg obtains when \the two tosses
have di erent outcomes".
6
In this chapter we use the terms \agent" and \decision maker" interchangeably.
7
To ease notation, we write HH instead of (H; T ) and so on.
48.2. PROBABILITIES 1375

(ii) Now, our agent bets on the outcome of the roll of a die. There are six states, one per
die's face, numbered 1 to 6. The state space is
= f1; :::; 6g
Its power set consists of 26 = 64 events. For example, the event A = f2; 4; 6g obtains when
\an even face comes out".
(iii) Finally, our agent bets on the drawing of a ball from an urn containing 100 balls,
numbered from 1 to 100. In this case, the state space is
= f1; :::; 100g
with a power set with 2100 events (sic!). For instance, event A = f1; 2; 3; 4; 5g obtains when
\a ball is drawn with a number 5". N
Next we present a few classic probabilities.
Example 1975 (i) The simplest example of a probability on a nite state space , like
the ones just seen in the last example, is the uniform probability P that assigns the same
probability to all states, i.e.,
1
P (!) = 8! 2
j j
In the single coin state space = fH; T g, the uniform P assigns equal probability to heads
and tails, i.e.,
1
P (H) = P (T ) =
2
It models the toss of a fair coin. In the two-coin state space = fT H; T T; HT; HHg, fair
coins deliver the uniform probability P de ned by
1
P (T H) = P (T T ) = P (HT ) = P (HH) =
4
Similarly, the uniform probability P on the roll of a die state space = f1; :::; 6g is de ned
by
1
P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =
6
It models an unbiased die. Finally, the uniform probability P on the urn state space =
f1; :::; 100g is de ned by
1
P (n) = 8n 2
100
It models a blind drawing.
Example 1976 Fix a state ! 0 in any state space , nite or in nite. The set function
P : 2 ! R de ned by (
1 if ! 0 2 A
P (A) =
0 if ! 0 2
=A
is easily checked to be a probability. It assigns probability 1 to any event containing state
! 0 and probability 0 otherwise. It is denoted by
!0

and called Dirac probability measure, named after Paul Dirac. N


1376 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Next we introduce a widely used probability on naturals, named after Simeon-Denis


Poisson.

Example 1977 Take = N = f0; 1; :::; n; :::g, i.e., the states are the natural numbers. Fix
a scalar > 0 and de ne the scalar sequence:8
n
pn = e 8n 2 N (48.5)
n!
For each event A N de ne the sequence fan g by
(
1 n2A
an = (48.6)
0 n2 =A

Armed with this two-valued sequence, de ne the Poisson probability P : 2 ! [0; 1] by


1
X
P (A) = an pn
n=0

or, equivalently, by X
P (A) = pn
n2A
for all A N. For a singleton A = fng, we get
P (n) = pn
Thus, pn is the probability of state n. The Poisson probability is well de ned because the
sandwich
0 an pn pn 8n 2 N
P1
implies that the positive series n=0 an pn converges to a number in [0; 1] (why?). When
A = ; we trivially have an = 0 for all n 2 N and so
1
X
P (;) = an pn = 0
n=0

When A = N we, instead, have an = 1 for all n 2 N and so, by Theorem 399,
1
X 1
X 1
X n
P( )= an pn = pn = e =1
n!
n=0 n=0 n=0

To prove that P is a probability it remains to check additivity. Take two disjoint events A
and B in N: As in (48.6), de ne the sequence fan g for the event A. In a similar way, de ne
the sequence fbn g for the event B and the sequence fcn g for the event A [ B. Since A and
B are disjoint, it is easy to see that cn = an + bn for all n 2 N. In turn, this implies that
1
X 1
X 1
X
P (A [ B) = cn pn = an pn + bn pn = P (A) + P (B)
n=0 n=0 n=0

proving additivity. When the Poisson probability is used, = N is often interpreted as time.
For example, state n may describe the state \a light bulb breaks after n periods". N
8
These are the coe cients of the Poisson power series (Example 474).
48.2. PROBABILITIES 1377

Probabilities over the naturals thus involve series. The Poisson probability suggests a
general rule to construct these probabilities.

Example 1978PConsider again = N. Let frn g be a sequence of positive numbers such


1
that the series n=0 rn converges, with sum R. In analogy with (48.5), de ne the scalar
sequence
rn
pn =
R
In the Poisson case we have rn = n =n!, with R = e (Theorem 399). P Relatedly, the
1
geometric probability is de ned by taking rn = q n with q 2 (0; 1). As n=0 rn is the
geometric series, in this case
pn = (1 q) q n
De ne the set function P : 2 ! [0; 1] by
1
X
P (A) = an pn (48.7)
n=0

The arguments used in the Poisson case of last example yield that P is a probability. N

Example 1979 The average taxpayer set function : 2 ! R is, formally, a probability
measure. Of course, in this case the uncertainty interpretation is meaningless. As always,
it is important to distinguish interpretation and formal analysis (which might well admit
alternative interpretations). N

Being measures, probabilities are monotone (Proposition 1970). Thus,

A B =) P (A) P (B)

As natural, larger events are more likely to obtain. Additivity readily implies that, for each
event A,
P (Ac ) = 1 P (A) (48.8)
Thus, either an event or its complement obtains, tertium non datur.
Finite additivity also holds for probabilities, i.e.,
n
! n
[ X
P Ai = P (Ai ) (48.9)
i=1 i=1

for any collection fA1 ; :::; An g of pairwise disjoint events (Proposition 1971). In particular,
for nite events we have, by (48.3),
X
P (A) = P (!) (48.10)
!2A

That is, the probability of a nite event is just the sum of the probabilities of its states. For
instance, in the last example for the two-coin event A = fHH; HT g we have

P (A) = P (HH) + P (HT )


1378 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

That is, the probability that the rst toss is heads is the sum of the probabilities of the
states HH, two consecutive heads, and HT , rst heads and then tails. When the coins are
fair this event has probability 1=2 since

1 1 1
P (A) = + =
4 4 2
Clearly, in this two-coin example we have

1 = P ( ) = P (T H) + P (T T ) + P (HT ) + P (HH)

that is, at least one of the four states obtains. In general, when the state space is nite
we have X
1=P( )= P (!) (48.11)
!2

that is, at least one of the states obtains.

48.2.2 Simple probabilities


We now introduce a class of probabilities with an attractive simple form (so their name) that
will permit to introduce some important notions with minimal technical complications.

De nition 1980 A probability measure P : 2 ! [0; 1] is simple if there exists a nite event
E such that P (E) = 1.

Probabilities de ned on a nite state space are trivially simple. This notion gets traction
when the state is in nite. In this case, it requires that all the mass be concentrated on a
nite set of states E. Indeed, by (48.8) we have

P (E c ) = 1 P (E) = 1 1=0 (48.12)

So, event E gets all the mass, nothing is left in its complement E c .

Example 1981 (i) Dirac probabilities are simple: the set E can be chosen to be the sin-
gleton f! 0 g. (ii) Take = R and let P : 2R ! R be the probability with P ( 2) = 1=3 and
P ( ) = 2=3, that is, for each event A,
8
>
> 1 2; 2 A
>
>
>
< 1
3 2 2 A and 62 A
P (A) = (48.13)
>
> 2
2 A and 2 2
6 A
> 3
>
>
:
0 2; 2=A

This probability is simple: just take E = f 2; g. N

States outside E have a tough life.

Lemma 1982 Let P : 2 ! [0; 1] be a simple probability measure. If ! 2


= E, then P (!) = 0.
48.2. PROBABILITIES 1379

Proof Let ! 2
= E. Then, f!g E c and so, by (48.12) and the monotonicity of P ,

0 P (!) P (E c ) = 0

Thus, P (!) = 0.

The nite event E is not unique. Indeed, once such an event is found, any larger nite
event can play the same role: if F E then 1 P (F ) P (E) = 1 and so P (F ) = 1. Yet,
there is a smallest one. To this end, consider the set

f! 2 : P (!) > 0g

called the support of P and denoted by supp P . It consists of all states that the probability
P actually deems possible. For instance, the simple probability (48.13) we have supp P =
f 2; g.

Lemma 1983 The support of a simple probability P : 2 ! [0; 1] is a nite event with
probability 1, that is,
P (supp P ) = 1
Moreover, P (A) = 1 implies supp P A for all events A.

Proof Since P is simple, by de nition there exists a nite event E with P (E) = 1. By
Lemma 1982, supp P E. Thus, supp P is a nite event. Since P is additive, we have

P (supp P ) = P (E c \ supp P ) + P (E \ supp P ) = P (;) + P (E) = 1

This proves that P (supp P ) = 1. To conclude, let A be any event with P (A) = 1. We
want to show that supp P A. Suppose, per contra, that there exists ! 2 supp P such that
!2= A. As P (!) > 0, by the additivity of P we have

1 = P (A) < P (A) + P (!) = P (A [ f!g) P( )=1

a contradiction. We conclude that supp P A.

As previously observed, in a nite state space knowing the probability of singletons is


enough to retrieve the probability of any event. This key property continues to hold for
simple probabilities de ned on any state space, nite or not.

Proposition 1984 Let P : 2 ! [0; 1] be a simple probability measure. For each event A,
X
P (A) = P (!) (48.14)
!2A\supp P

In words, the probability of an event is the sum of the probabilities of its states that
belong to the probability support.

Proof Let A be an event. We rst prove that

P (A) = P (A \ supp P ) (48.15)


1380 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

We have P (A \ (supp P )c ) = 0 because

0 = P ((supp P )c ) P (A \ (supp P )c ) 0

Since P is additive, we then have

P (A) = P (A \ supp P ) + P (A \ (supp P )c ) = P (A \ supp P )

This proves (48.15). Clearly,


[
A \ supp P = f!g
!2A\supp P

By (48.15) and by the additivity of P , we then have


X
P (A) = P (A \ supp P ) = P (!)
!2A\supp P

as desired.

An important consequence of this result is that simple probabilities can be written as


sums of Dirac probabilities.

Corollary 1985 Let P : 2 ! [0; 1] be a simple probability measure. For each event A,
X
P (A) = P (!) ! (A) (48.16)
!2supp P

Proof A moment's re ection shows that


X X
P (!) = P (!) ! (A)
!2A\supp P !2supp P

Thus, (48.14) implies (48.16).

We can write (48.16) more compactly as


X
P = P (!) !
!2supp P

In words, P is the weighted sum of the Dirac probabilities centered at the points of its
support, with weights given by the probabilities of these points. For instance, the simple
probability (48.13) can be written as the sum

1 2
P = 2 +
3 3
of the two Dirac probabilities centered at the two points 2 and of its support.
48.2. PROBABILITIES 1381

48.2.3 A continuity property


In applications it is common to assume that a probability P is -additive, a continuity
requirement. Recall that a probability P is nitely additive, that is,
n
! n
[ X
P Ai = P (Ai ) (48.17)
i=1 i=1

for any nite collection fAi gni=1 of pairwise disjoint events. The property of -additivity
extends this property to countable collections of this kind.

De nition 1986 A probability P : 2 ! [0; 1] is countably additive if


1
! 1
[ X
P Ai = P (Ai )
i=1 i=1

for any countable collection fAi g1


i=1 of pairwise disjoint events.
P
This de nition is well posed because the positive series 1
i=1 P (Ai ) is easily seen to con-
verge. To continue the analysis we need a piece of notation. Given any countable collection
of events fAi g1
i=1 , we write:
[1
(i) An " A if A1 A2 An and A = An ;
n=1
\1
(ii) An # A if A1 A2 An and A = An .
n=1

Example 1987 Let = R. If


1
An = [0; 1 + ]
n
and A = [0; 1], we have An # A. Instead, if

1
An = [0; 1 )
n
and A = [0; 1), we have An " A. Finally, if An = ( n; n) we have An " R, while if
An = [n; +1) we have An # ;. N

The next proposition shows that countable additivity is, as previously mentioned, a prop-
erty of continuity for probabilities. To this end, observe that the monotonicity of probabilities
implies
An " A =) P (A1 ) P (An ) 1
as well as
An # A =) P (A1 ) P (An ) 0
In both cases, limn P (An ) exists because it is the limit of a bounded monotone sequence of
scalars fP (An )g. What characterizes countable additivity is that the value of these limits
is indeed P (A), so its continuity nature.
1382 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Proposition 1988 Let P : 2 ! [0; 1] be a probability. The following statements are equiv-
alent:
(i) P is countably additive;
(ii) if An " A, then P (An ) " P (A);
(iii) if An # A, then P (An ) # P (A).
Proof (i) implies (ii). Consider a countable collection of events fAn g with An " A. De ne
the collection of events fEn g by setting
n[1
E1 = A1 ; E2 = A2 A1 ; : : : ; En = An Ai ; : : :
i=1
[n
By construction, the events fEn g are pairwise disjoint. Next, note that An = Ei for
[1 i=1
all n 1 and, in particular, A = Ei . Since P is countable additive, we conclude that
i=1
1
! 1 n
[ X X
P (A) = P Ei = P (Ei ) = lim P (Ei ) = lim P (An )
n n
i=1 i=1 i=1

as desired.

(ii) implies (iii). Consider a countable collection of events fAn g with An # A. Clearly,
Acn " Ac . By hypothesis,
P (A) = 1 P (Ac ) = 1 lim P (Acn ) = lim [1 P (Acn )] = lim P (An )
n n n

as desired.

(iii) implies (i). Consider a countable collection fAi g of pairwise disjoint events. De ne
n
[
En = Ai
i=1
1 1
!c
[ [
for all n 1. By construction, En " Ai and so Enc # Ai . By hypothesis,
i=1 i=1
1
!c !
[
P (Enc ) #P Ai
i=1

Thus,
1
! 1
!c !
[ [
P Ai = 1 P Ai =1 lim P (Enc ) = lim [1 P (Enc )] = lim P (En )
n n n
i=1 i=1
n
! n 1
[ X X
= lim P Ai = lim P (Ai ) = P (Ai )
n n
i=1 i=1 i=1

as desired.

Our simple friends are countably additive.


48.2. PROBABILITIES 1383

Proposition 1989 Simple probabilities are countably additive.

Proof We begin by proving that a Dirac probability !0 , with ! 0 2 , is countably additive.


Consider a collection of events fAn g with An # A. We have two cases: either ! 0 2 A or
!0 2= A. In the rst case, we have ! 0 2 An for all n 2 N and, in particular, !0 (An ) = 1
for all n 1, proving that limn !0 (An ) = 1 = !0 (A). In the second case, there exists
n 1 such that ! 0 2
= An for all n n. In particular, !0 (An ) = 0 for all n n, proving
that limn !0 (An ) = 0 = !0 (A). By Proposition 1988, we conclude that !0 is countably
additive.
Now, let P : 2 ! [0; 1] be a simple probability. By (48.14),
X
P (A) = P (!) 8A
!2A\supp P

By (48.16), X
P (A) = P (!) ! (A) 8A
!2supp P

Take a countable collection of events fAn g with An # A. As each Dirac probability ! is


countably additive, we have
X X
lim P (An ) = lim P (!) ! (An ) = P (!) lim ! (An )
n n n
!2supp P !2supp P
X
= P (!) ! (A) = P (A)
!2supp P

By Proposition 1988, we conclude that P is countably additive.

We next show that also the Poisson probability is countably additive. Similar arguments
will then yield that, more generally, all the probabilities on naturals de ned in Example 1978
are countably additive. To prove this property we rst further elaborate on the continuity
conditions that we showed to characterize countable additivity. Interestingly, it is enough to
check continuity at either the empty set or the entire space.

Lemma 1990 Let P : 2 ! [0; 1] be a probability. The following statements are equivalent:

(i) if An " , then P (An ) " 1;

(ii) if An " A, then P (An ) " P (A);

(iii) if An # A, then P (An ) # P (A);

(iv) if An # ;, then P (An ) # 0.

Proof By Proposition 1988, (ii) implies (iii). As (iii) trivially implies its special case (iv),
it remains to prove that (i) implies (ii) and that (iv) implies (i).

(i) implies (ii). Consider a countable collection of events fAn g with An " A. For each
n 1, de ne
Bn = An [ Ac
1384 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Since An A for all n 1, we have An \ Ac = ; for all n 1. As An " A, we have Bn " .


By hypothesis P (Bn ) " P ( ) = 1. Since An \ Ac = ; for all n 1, we then have

lim P (An ) + P (Ac ) = lim [P (An ) + P (Ac )] = lim P (Bn ) = 1


n n n

Thus, limn P (An ) = 1 P (Ac ) = P (A), proving the implication.

(iv) implies (i). Consider a countable collection of events fAn g with An " . Clearly,
Acn # ;. By hypothesis, P (Acn ) # 0. This implies that

1=1 0=1 lim P (Acn ) = lim [1 P (Acn )] = lim P (An )


n n n

as desired.

We can now prove the announced result.

Proposition 1991 The Poisson probability is countably additive.

Proof Take a countable collection of events fAn g N with An # ;. As previously remarked,


the monotonicity of the probability P ensures the existence of a scalar 0 such that
P (An ) # . In view of Proposition 1988 and Lemma 1990, P1 to prove the result we need to
show that = 0. To this end, x " > 0. Since the series k=0 pk is convergent, there exists
P \1
k" 2 N large enough so that 1 k=k" p k ". Since An = ;, for each l 2 f1; :::; k" g
n=1
there exists some event, denoted by Anl , in the collection fAn g that does not contain it; i.e.,
l2= Anl . Since An An+1 , we actually have l 2 = An for all n nl . De ne

n= max nl
l2f1;:::;k" g

It follows that f1; :::; k" g Acn for all n n. Thus,

P (f1; :::; k" g) P (Acn )

for all n n. By the de nition of the Poisson probability, we conclude that, for each n n,
k"
X 1
X 1
X
P (An ) = 1 P (Acn ) 1 P (f1; :::; k" g) = 1 pk = pk pk "
k=1 k=k" +1 k=k"

Hence, 0 P (An ) " for all n n. As " was arbitrarily chosen, this proves that
= 0, as desired.

Countable additivity is a most convenient continuity property that, as previously men-


tioned, is commonly adopted in applications. Yet, as usual, there are no free meals: next
we show that a probability as basic as the uniform has a di cult coexistence with countable
additivity. To this end, we consider a probability P : 2N ! [0; 1] that, like the Poisson one,
is de ned on the naturals. A uniform P then assigns the same probability to all natural
numbers, i.e.,
P (n) = P (m) 8n; m 2 N (48.18)
a symmetric property that we will momentarily see to be incompatible with countable addi-
tivity.
48.3. RANDOM VARIABLES 1385

Proposition 1992 There is no countably additive uniform probability P : 2N ! [0; 1].

Proof Suppose, per contra, that there exists a countable additive probability P : 2N ! [0; 1]
satisfying (48.18). Set k = P (n) for all n 2 N. Clearly, k 0. As
[
N= fng
n2N

by countable additivity we reach the contradiction


! (
[ X 0 if k = 0
1 = P (N) = P fng = P (n) = k + +k+ =
n2N n2N +1 if k > 0

This contradiction proves the result.

48.3 Random variables


Agents often have to choose among actions with uncertain monetary outcomes that depend
on contingencies that are outside agents' control. Betting on a sport event, say a football
match, is an action that yields a di erent monetary payo according to the nal result of
the game. Investing in the shares of a company is an action that has a di erent monetary
outcome depending, for instance, on the future state of the economy (that may a ect the
company performance).
In these examples, actions determine a map that to each payo -relevant contingency
associates a monetary outcome. This map is all the agents need to know about the actions,
at least as long as they ultimately care { in a consequentialist perspective { only about their
monetary outcomes. This motivates the next de nition.

De nition 1993 A real-valued function f : ! R de ned on a state space is called a


random variable (or act).

Few examples are in order.

Example 1994 (i) Back to Example 1974, consider an agent who bets on the outcome of
the toss of a single coin, winning (losing) 50 euros if the coin lands heads (tails) up. The
function f : fH; T g ! R de ned by

f (H) = 50 ; f (T ) = 50 (48.19)

is a random variable that represents this bet.


(ii) In a nancial market it is possible to buy at a price p an European call option on a
traded asset, with a strike price of 50 euros, that is, an option to buy, one year from now, 1
unit of the asset at the strike price of 50 euros.
An investor is considering the following nancial operation: buy the option, exercise it
if in a year the asset price will be 50 and, in this case, sell the asset immediately. The
outcome of this operation depends on the price, currently unknown to our investor, that the
asset will have in a year. Denote by

!2 = [0; 1)
1386 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

this future price, which is the payo -relevant contingency. The nancial operation is repre-
sented by the random variable f : [0; 1) ! R de ned by
f (!) = max f! 50; 0g p 8! 2 [0; 1)
Indeed, if ! < 50 the investor will not exercise the option and just bear its cost p. If ! 50,
the investor will instead exercise the option, with a gain of ! 50 p euros because, one
year from now, the unit of the asset is paid at the agreed strike price of 50 euros and sold at
its market price !. N

We introduced random variables to represent uncertain courses of actions. More generally,


they can just record the scalar outcomes of uncertain contingencies, independently of a
speci c decision problem.

Example 1995 The plant of a manufacturing company is subject to a failure that can be
either small or large. To repair a small (large) failure costs 100 (1000) euros and takes one
( ve) days of production interruption, with a pro t loss of 300 euros per day. The state
space is
= fs; l; ng
where n covers the happy case of no failure. The random variable f : ! R de ned by
8 8
>
> 100 + 300 if ! = s >
> 400 if ! = s
< <
f (!) = 1000 + 5 300 if ! = l = 2500 if ! = l
>
> >
>
: :
0 if ! = n 0 if ! = n
represents the uncertain loss of the company. This random variable may be then used in a
decision problem, for instance in the choice of either an insurance or a maintenance contract.
Yet, when the random variable is constructed this possible decision problem is not speci ed.
N

Let us call the pair ( ; P ) a probability space, i.e., a space endowed with a probability
measure. In this probabilistic context, when can we declare two random variables to be in-
distinguishable? Of course, if they are equal at all states, they are trivially indistinguishable.
But, this trivial case neglects the probabilistic information that P embodies. To use it, let
us consider the following notion.

De nition 1996 Two random variables f; g : ! R are equal P -almost everywhere (for
short, P -a.e.) when
P (! 2 : f (!) = g (!)) = 1
or, more compactly, when P (f = g) = 1.

In words, two random variables are equal P -a.e. when the set of all states where they
are equal has probability 1. Equivalently, when
P (! 2 : f (!) 6= g (!)) = 0
that is, when the set of states where they di er has probability zero. In this case, we regard
them as probabilistically indistinguishable, thus answering the previous question. As usual,
simple probabilities clarify.
48.4. EXPECTED VALUES I 1387

Proposition 1997 Let P : 2 ! [0; 1] be a simple probability. Two random variables


f; g : ! R are equal P -a.e. if and only if, for each ! 2 supp P , we have f (!) = g (!).

Proof \If". Let f (!) = g (!) for all ! 2 supp P . Then, supp P (f = g) and so, by the
monotonicity of P , we have P (f = g) = 1. \Only if". Suppose P (f = g) = 1. By Lemma
1983, supp P (f = g). Hence, f (!) = g (!) for all ! 2 .

In the simple case, two random variables are thus equal P -a.e. when they agree on the
support of the probability P , so on the states that P deems possible. What happens at
the zero probability states is not relevant: according to P they will not occur and so the
behavior of the random variables at these states is of no probabilistic concern.

48.4 Expected values I


Random variables and probabilities are combined in the next key notion.

De nition 1998 The expected (or mean) value of a random variable f : ! R with
respect to a simple probability P is the quantity
X
EP (f ) = f (!)P (!)
!2supp P

The expected value considers the images f (!) of the states in the support of P and
add them up, weighted according to their probability P (!). It is thus a general notion of
weighted average (cf. Section 15.10).

Example 1999 (i) Back to Example 1981, let P : 2R ! [0; 1] be the simple probability
with P ( 2) = 1=3 and P ( ) = 2=3. The expected value of a random variable f : R ! R is
f ( 2) + 2f ( )
EP (f ) = f ( 2) P ( 2) + f ( ) P ( ) =
3
(ii) In the single coin toss example, assume that P deems equally likely heads and tails.
The expected value of bet (48.19) is
1 1
EP (f ) = f (H)P (H) + f (T )P (T ) = 50 + ( 50) = 0
2 2
Bets with a zero expected value are called fair.
(iii) In the plant example, take P with P (s) = P (l) = 1=4 and P (n) = 1=2. Then,
1 1
EP (f ) = f (s)P (s) + f (l)P (l) + f (n)P (n) = 400 + 2500 = 725
4 4
is the average loss of the company. N

We begin with a key invariance property of expected values, an immediate but important
consequence of Proposition 1997.

Proposition 2000 Let P : 2 ! R be a simple probability. If two random variables f; g :


! R are equal P -a.e., then EP (f ) = EP (g).
1388 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Thus, random variables that are probabilistically indistinguishable have the same ex-
pected value. Next we discuss a few other basic properties of expected values. In particular,
they are monotone and linear.

Proposition 2001 Let P : 2 ! R be a simple probability. For all random variables


f; g : ! R,

(i) EP ( f + g) = EP (f ) + EP (f ) for all ; 2 R;

(ii) EP (f ) EP (g) if f g;
P
(iii) EP (f ) = !2A f (!)P (!) if event A is nite and contains supp P .

Proof (i) For each ; 2 R we have


X
EP ( f + g) = ( f (!) + g (!)) P (!)
!2supp P
X X
= f (!)P (!) + g (!) P (!) = EP (f ) + EP (g)
!2supp P !2supp P

(ii) Let f g, i.e., f (!) g (!) for all ! 2 . As P 0, we then have f (!) P (!)
g (!) P (!) for all ! 2 . In turn, this implies
X X
f (!)P (!) g (!) P (!)
!2supp P !2supp P

proving that EP (f ) EP (g).

(iii) Let A be a nite event with A supp P . By Lemma 1982, P (!) = 0 for all ! 2 A
with ! 2= supp P . Thus, X X
f (!)P (!) = f (!)P (!)
!2supp P !2A

as desired.

Point (iii) shows that


X X
EP (f ) = f (!)P (!) = f (!) P (!)
!2supp P !2A

The states ! that belong to A but not to supp P have zero probability, so they are super uous
but do not create problems either. This simple remark is important when dealing with a
nite state space . In this case one often writes
X
EP (f ) = f (!) P (!)
!2

without bothering
Pto specify the support of P . We can a ord this neglect because when
is nite the sum is always well de ned.
48.5. EUCLIDEAN TWIST 1389

48.5 Euclidean twist


When the state space is nite, say = f! 1 ; :::; ! n g, we can study probabilities and random
variables within the familiar Euclidean space Rn . To see how, observe that by formula
(48.10) the probability of each event is determined by the probabilities of the states that it
contains. As a result, the probability P : 2 ! [0; 1] is uniquely pinned down by the states
probabilities P (!).
With this, for each index i = 1; :::; n set

Pi = P (! i )

We have Pi 0 and, by (48.11),


n
X
Pi = 1
i=1

Thus, to de ne a probability P becomes equivalent to de ne a probability vector 9

P = (P1 ; :::; Pn ) 2 n 1 (48.20)


Thus, in a nite state space probabilities can be identi ed with elements of the standard
simplex.

Example 2002 Let = f! 1 ; ! 2 ; ! 3 ; ! 4 g. The vector

1 1 1 1
P = ; ; ; 2 3
4 4 6 3

identi es the probability P : 2 ! [0; 1] given by

1 1 1
P (! 1 ) = P (! 2 ) = ; P (! 3 ) = ; P (! 4 ) =
4 6 3
The probability of the other twelve events can then be found via formula (48.10). N

Example 2003 In the single coin toss example, assume that the coin is fair. The two
states H and T are then equally likely, that is, P (H) = P (T ) = 1=2. The probability P is
identi ed by the vector
1 1
; 2 1
2 2
In a similar vein, in the example of a roll of a die assume that each face is equally likely,
that is, P (!) = 1=6 for all ! 2 = f1; :::; 6g. In this case, the probability P is identi ed by
the vector
1 1 1 1 1 1
; ; ; ; ; 2 5
6 6 6 6 6 6
. N
9 Pn
Recall that n 1 = x 2 Rn
+ : i=1 xi = 1 is the standard simplex of Rn (see Example 774).
1390 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

In sum, the standard simplex n 1 can be seen as the collection of all possible probabili-
ties de ned on a nite state space with n elements. Similarly, random variables f : ! R
de ned on this state space can be identi ed with vectors of Rn . Indeed, for each index
i = 1; :::; n now set
fi = f (! i ) (48.21)
We can then identify f with the vector

f = (f1 ; :::; fn ) 2 Rn

Thus, the space Rn itself can be seen as the collection of all random variables that can be
de ned on a nite state space with n elements. We can then write the expected value as
the inner product
Xn
EP (f ) = f P = fi Pi (48.22)
i=1

of vectors P 2 n 1 and f 2 Rn .

Example 2004 (i) Let = f! 1 ; ! 2 ; ! 3 ; ! 4 ; ! 5 g. The vector

2 1 3 15 25
P = ; ; ; ; 2 4
10 10 10 100 100

identi es a probability P on , while the vector

f = (5; 10; 8; 6; 4) 2 R5

identi es a random variable on , for instance representing a nancial asset that pays 5
euros if ! 1 occurs, 10 euros if ! 2 occurs, 8 euros if ! 3 occurs, 6 euros if ! 4 occurs and 4
euros if ! 5 occurs. The expected value

2 1 3 15 25 63
EP (f ) = (5; 10; 8; 6; 4) ; ; ; ; =
10 10 10 100 100 10

is the average payment of asset f .


(ii) Let = f! 1 ; :::; ! n g . The vector

1 1
P = ; :::; 2 n 1
n n

identi es the uniform probability P on . The expected value of any random variable

f = (f1 ; :::; fn ) 2 Rn

has then the simple average form


n
1X
EP (f ) = f P = fi
n
i=1

N
48.6. MEASURES OF VARIABILITY 1391

As previously mentioned (cf. Example 1994), nancial assets are a notable example of
random variables f : ! R, where f (!) is the payment of the asset when state ! occurs.
With a nite state space = f! 1 ; : : : ; ! n g, an asset can be indicated with the vector
y = (y1 ; :::; yn ) 2 Rn
where
yi = f (! i )
is the payment of the asset when state ! i 2 occurs. Thus, y is the vector of the possible
payments of the asset in the di erent states. This notation, used for example in Section 24.6,
is completely consistent with considering assets as random variables.

48.6 Measures of variability


The expected value of the random variable f with respect to a simple probability P is
a typical, representative, value of f , as discussed at length in Section 15.10, so gives an
important piece of information about this random variable. Yet, two random variables with
the same expected value may be very di erent.

Example 2005 In the single coin toss example, consider the original bet f given by (48.19)
as well as other two bets g and h with
g (H) = 25 ; g(T ) = 25 and h(H) = h (T ) = 0
When P (H) = P (T ) = 1=2, the three bets have a zero expected value. But, of course, the
rst random variable seems more variable than the second one, which in turn is, obviously,
more variable than the third, constant, one. N

The variance helps to quantify the variability that distinguishes the three random vari-
ables of this example. In particular, at a state ! the quantity
(f (!) EP (f ))2 (48.23)
measures the deviation, in that state, of f from its expected value EP (f ). Since a priori
we do not mind about the speci c direction of the deviation, we square the quantity to
remove the sign of the deviation. Of course, at di erent states ! we might well have di erent
deviations (48.23). Thus, to get a representative measure of variability we have to average
out these deviations through the probability P .

De nition 2006 The variance of a random variable f : ! R with respect to a simple


probability P is given by the quantity
X
VP (f ) = (f (!) EP (f ))2 P (!)
!2supp P

More compactly, we can write


VP (f ) = EP (f EP (f ))2 (48.24)
It is easy to see that random variables that are equal P -a.e. have also the same variance.
We next report couple of other basic properties of the variance.
1392 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Proposition 2007 Let P : 2 ! R be a simple probability. For each random variable


f : ! R,

(i) VP (f ) = EP (f 2 ) EP (f )2 ;

(ii) VP ( f + ) = 2V (f ) for all ; 2 R.


P

The characterization of the variance in (i) is often useful. Point (ii) shows that all
translates f + of a random variable share the same variance, while its multiples f get the
coe cient squared.

Proof (i) For each ! 2 ,

(f (!) EP (f ))2 = f (!)2 2f (!) EP (f ) + EP (f )2

By the de nition of variance and by the linearity of the expected value,

VP (f ) = EP (f EP (f ))2 = EP f 2 2EP (f ) f + EP (f )
= EP f 2 2EP (f ) EP (f ) + EP (f )2 = EP f 2 EP (f )2

as desired.

(ii) By the linearity of the expected value, we have, for each ! 2 ,

( f (!) + EP ( f + ))2 = 2
(f (!) EP (f ))2

By (48.24), we have

VP ( f + ) = EP ( f + EP ( f + ))2 = EP 2
(f EP (f ))2
= 2
EP (f EP (f ))2 = 2
VP (f )

as desired.

Another measure of variability, strictly related to the variance, is the standard deviation
p
P (f ) = VP (f )

It is nothing but the square root of the variance. It has the advantage over the variance to
be expressed in the same units of the expected value: for instance, if the outcomes f (!) of
a random variable in the di erent states ! are expressed in euros, this is the case also for
both the expected value and the standard deviation. Note that
p
2 V (f ) = j j
P ( f + )= P P (f )

for all ; 2 R. In particular, the standard deviation is positively homogeneous as P ( f) =


P (f ) for all 0.

Moving from a single random variable to a pair of them f; g : ! R, we would like to


measure their co-movements, to understand whether they are moving in the same direction,
48.6. MEASURES OF VARIABILITY 1393

in the opposite direction or independently from one another. To this end, observe that at a
state ! the inequality
(f (!) EP (f )) (g (!) EP (g)) > 0
reveals that
f (!) EP (f ) > 0 () g (!) EP (g) > 0
as well as
f (!) EP (f ) < 0 () g (!) EP (g) < 0
So f and g are, in state !, moving in the same direction. As we did for the variance, to get a
representative measure of co-variability we need to average the values (f (!) EP (f )) (g (!) EP (g))
through the probability P .

De nition 2008 The covariance of two random variables f; g : ! R with respect to a


simple probability P is given by the quantity
X
CovP (f; g) = (f (!) EP (f )) (g (!) EP (g)) P (!)
!2supp P

More compactly, we can write

CovP (f; g) = EP (f EP (f )) (g EP (g)) (48.25)

We can interpret the inequality CovP (f; g) > 0 as saying that on average f and g move
together, and the opposite inequality CovP (f; g) < 0 as saying that, instead, on average f
and g move in opposite directions. Accordingly, we can interpret CovP (f; g) = 0 as saying
that on average f and g are moving independently.
The variance can be seen to be the special case of a \solo" covariance:

CovP (f; f ) = EP (f EP (f )) (f EP (f )) = EP (f EP (f ))2 = VP (f )

The next result extends to covariance the properties established for variance (cf. Proposition
2007).

Proposition 2009 Let P : 2 ! R be a simple probability. For all random variables


f; g : ! R,

(i) CovP (f; g) = EP (f g) EP (f ) EP (g);

(ii) CovP ( f + ; g + ) = CovP (f; g) for all ; ; ; 2 R.

Proof (i) For each ! 2 ,

(f (!) EP (f )) (g (!) EP (g)) = f (!) g (!) f (!) EP (g) EP (f ) g (!) + EP (f ) EP (g)

By the de nition of covariance and by the linearity of the expected value,

CovP (f; g) = EP (f g EP (g) f EP (f ) g + EP (f ) EP (g))


= EP (f g) 2EP (f ) EP (g) + EP (f ) EP (g) = EP (f g) EP (f ) EP (g)
1394 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

as desired.

(ii) By the linearity of the expected value, we have, for each ! 2 ,

( f (!) + EP ( f + )) ( g (!) + EP ( g + ))
= (f (!) EP (f )) (g (!) EP (g))

By the de nition of covariance,

CovP ( f + ; g + ) = EP (( f + EP ( f + )) ( g + EP ( g + )))
= EP ( (f EP (f )) (g EP (g))) = CovP (f; g)

as desired.

The covariance allows us to express the variance of a sum of random variables. At rst
sight, we might be tempted to say that the variability of f + g should be the variability of f
plus the variability of g. Yet, even at an intuitive level, we may immediately realize that the
volatility of the sum f + g may actually reduce when f and g move in opposite directions,
that is, when CovP (f; g) < 0. The next formula formalizes this intuition.

Proposition 2010 Let P : 2 ! R be a simple probability. For all random variables


f; g : ! R,
VP (f + g) = VP (f ) + VP (g) + 2CovP (f; g)

Proof By the linearity of the expected value and by (48.24), we have

VP (f + g) = EP (f + g EP (f + g))2 = EP ((f EP (f )) + (g EP (g)))2


= EP (f EP (f ))2 + EP (g EP (g))2 + 2EP (f EP (f )) (g EP (g))
= VP (f ) + VP (g) + 2CovP (f; g)

as desired.

Next we establish, as a consequence of the Cauchy-Schwarz inequality, a classic bound


on the covariance of two random variables in terms of their standard deviations.

Proposition 2011 Let P : 2 ! R be a simple probability. For all random variables


f; g : ! R,
jCovP (f; g)j P (f ) P (g) (48.26)

Proof Let supp P = f! 1 ; :::; ! n g be the support of the simple probability P . First, assume
that EP (f ) = EP (g) = 0. For each i = 1; ::; n, set
p p
xi = f (! i ) P (! i ) and yi = g (! i ) P (! i )
48.6. MEASURES OF VARIABILITY 1395

By the Cauchy-Schwarz inequality,

X n
X
jCovP (f; g)j = f (!) g (!) P (!) = xi yi = jx yj kxk kyk
!2supp P i=1
v v v v
u n u n u n u n
uX p 2 uX p uX uX
= t (xi pi ) t (yi pi ) = t
2
xi pi t
2 yi2 pi
i=1 i=1 i=1 i=1
s X s X
= f 2 (!) P (!) g 2 (!) P (!) = P (f ) P (g)
!2supp P !2supp P

This proves (48.26) when EP (f ) = EP (g) = 0. In the general case, set

f~ = f EP (f ) and g~ = g EP (g)

As EP (f~) = EP (~
g ) = 0, by what just proved we have

CovP (f~; g~) ~


P (f ) P (~
g)

By (ii) of Propositions 2007 and 2009, P (f~) = (f ), P (~ g ) = P (g) and CovP (f~; g~) =
CovP (f; g). We conclude that (48.26) holds for all random variables f; g : ! R.

The correlation coe cient


CovP (f; g)
P (f; g) =
P (f ) P (g)

is an alternative measure of the co-movements of two random variables f and g. By (ii) of


Propositions 2007 and 2009,

P ( f+ ; g+ )= P (f; g) (48.27)
j jj j
for all ; ; ; 2 R. Thus, when ; > 0 we have

P ( f+ ; g+ )= P (f; g)

In words, the correlation coe cient is invariant under positive a ne transformations.10 This
implies, inter alia, that it does not depend on the units in which the random variables are
expressed: for instance, if f and g are nancial assets, their correlation coe cient is the
same regardless of the currency in which they are expressed, say dollars or euros.
By the inequality (48.26), it holds

1 P (f; g) 1

i.e., the correlation coe cient takes values between 1 and 1. The extreme values 1
correspond to a perfect correlation between the random variables that, intuitively, occurs
when there is a linear relationship between them. The next result con rms this intuition.
10
A transformation ' : R ! R is a ne if it is an a ne function ' (x) = x + , with ; 2 R. Clearly,
' (f ) = f + . It is a positive a ne transformation when > 0 (in this case ' is strictly increasing).
1396 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Proposition 2012 Let P : 2 ! R be a simple probability. For random variables f; g :


! R, non-constant P -a.e.,11 it holds

j P (f; g)j = 1

if and only if there exist scalars 6= 0 and such that, P -a.e.,

f = g+ (48.28)

In particular,

P (f; g) = 1 () >0 and P (f; g) = 1 () <0 (48.29)

Thus, perfect correlation corresponds to a linear relationship, P -a.e., between the random
variables f and g, positive when = 1 and negative when = 1.

Proof \If". Suppose that, P -a.e., f = g + . By (48.27),

j P (f; g)j = j P ( g + ; g)j = j P (g; g)j = 1

as desired.

\Only if". Let j P (f; g)j = 1. By proceeding as in the last proof, it is easy to see that
by the Cauchy-Schwarz equality (Theorem 109) there exist 1 ; 2 2 R, not both zero, such
that
~
1 f (!) = 2 g
~ (!) 8! 2 supp P
Hence, by setting = 1 EP (f ) 2 EP (g), we get

1f (!) = 2 g (!) + 8! 2 supp P (48.30)

It holds 2 6= 0. For, suppose per contra that 2 = 0. Then, 1 6= 0 and so f (!) = = 1


for all ! 2 supp P , a contradiction because, by hypothesis, f is not constant on supp P . We
conclude that 2 6= 0. A similar argument shows that 1 6= 0. By (48.30),
2
f (!) = g (!) + 8! 2 supp P
1

By setting = 2= 1 we conclude that (48.28) holds. Finally, we leave (48.29) to the reader.

48.7 Intermezzo
To continue the analysis we state and prove a neat version of the fundamental duality between
di erentiation and integration, which we discussed earlier in the book.12 A piece of notation:
we denote by C01 ([a; b]) the class of the continuously di erentiable functions g : [a; b] ! R
such that g (a) = 0.13
11
That is, f (!) 6= g (!) for some ! 2 supp P (f and g are not constant on the support of P ).
12
Recall the discussion around equations (44.62) and (44.64).
13
The condition g (a) = 0 is a normalization needed to make the duality sharp: indeed, primitives are
unique up to a constant (cf. Proposition 1874), and this condition pins down one of them.
48.8. DISTRIBUTION FUNCTIONS 1397

Theorem 2013 (Barrow-Torricelli) A function g : [a; b] ! R, with g (a) = 0, is contin-


uously di erentiable if and only if there is a unique continuous function : [a; b] ! R such
that Z x
g (x) = (t) dt 8x 2 [a; b] (48.31)
a

The bijective function T : C01 ([a; b]) ! C ([a; b]) that to each g 2 C01 ([a; b]) associates 2
C ([a; b]) is the di erential operator
T (g) = g 0 (48.32)
Its inverse function T 1 : C ([a; b]) ! C01 ([a; b]), which to each 2 C ([a; b]) associates
T 1 ( ) 2 C01 ([a; b]), is the integral operator
Z x
1
T ( ) (x) = (t) dt 8x 2 [a; b]
a

The function T describes the duality between di erentiation and integration, which thus
works at its best for continuously di erentiable functions, a further sign of the special role
that this class of functions plays in calculus (cf. Section 27.4).14
Rx
Proof \If". Assume that there is a 2 C ([a; b]) such that g (x) = a (t) dt for all x 2 [a; b].
By the Second Fundamental Theorem of Calculus, we have g 0 = and, trivially, g (a) = 0.
So, g 2 C01 ([a; b]).
\Only if". Assume that g R2 C01 ([a; b]). By the First Fundamental Theorem of Calculus
x
and since g (a) = 0, g (x) = a g 0 (t) dt for all x 2 [a; b]. So, can be chosen to 0
R xbe g 2
C ([a; b]). It remains to prove that is unique. If 2 C ([a; b]) is such that g (x) = a (t) dt
for all x 2 [a; b], by the Second Fundamental Theorem of Calculus we then have = g 0 ,
proving uniqueness.
Consider now T . Since to each function g 2 C01 ([a; b]), T associates the function 2
C ([a; b]) that satis es (48.31), we have already seen that = g 0 , proving (48.32). We next
show that T is bijective. We start by showing it is injective. Let g1 ; g2 2 C01 ([a; b]) be such
that T (g1 ) = T (g R x2 ). Set = T (g1 ) = T (g2 ). Thus, satis es (48.31) for both g1 and g2 ,
that is, g1 (x) = a (t) dt = g2 (x) for all x 2 [a; b], and so g1 = g2 . It remains to show that
T is surjective. Let 2 C ([a; b]). De ne g as in (48.31). The previous part of the proof and
the Second Fundamental Theorem of Calculus imply that g 2 C01 ([a; b]) and T (g) = g 0 = ,
as desired.

48.8 Distribution functions


48.8.1 Generalities
Let us x a probability space ( ; P ), where P : 2 ! [0; 1] is a probability measure on
the state space . We begin by introducing distribution functions, the protagonist of this
section.15
14
The result is named after two precursors of Newton, who had already envisioned the fundamental duality.
15
To ease notation we write (f x) in place of f! 2 : f (!) xg as well as P (f x) in place of
P ((f x)).
1398 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

De nition 2014 The ( cumulative) distribution function of a random variable f : !R


is the function : R ! [0; 1] de ned by
(x) = P (f x)
for all x 2 R.
The distribution function represents the probability of the lower contour sets (f x) of
the random variable. In particular, (x) is the probability that f takes a value x.
We begin the study of distribution functions with a basic monotonicity property.
Proposition 2015 The distribution function : R ! [0; 1] is increasing.
Proof Let x; y 2 R with y x. Clearly, (f x) (f y). Since P is monotone, this
implies
(x) = P (f x) P (f y) = (y)
showing that is increasing.

The next continuity result is a dividend of countable additivity, a further sign of the
continuity nature of this property.
Proposition 2016 If P is countably additive, then is right continuous with
lim (x) = 0 and lim (x) = 1
x! 1 x!+1

Proof Take x 2 R and a scalar sequence fxn g with xn # x. Consider the set An = (f xn )
for all n 1. Since fxn g is a decreasing sequence, we have An+1 An for all n 1. It holds
\
An = (f x)
n

For, if f (!) x then f (!) x xn for all n 1. Vice versa, if f (!) xn for all n 1,
by passing to the limit we get f (!) x. By Proposition 1988, we then have
lim (xn ) = lim P (An ) = P (f x) = (x)
n n

proving the right continuity. Take now a sequence fxn g with xn # 1. Now, consider the set
An = (f xn ) for all n\ 1. Since fxn g is a decreasing sequence, we have that An+1 An
for all n 1 as well as An = ; (why?). By Proposition 1988, this implies that
n

lim (xn ) = lim P (An ) = P (;) = 0


n n

Since is increasing, we have limx! 1 (x) = 0. We leave to the reader to show that
limx!+1 (x) = 1.

Under countable additivity, the possible discontinuities of at a point x0 2 R may only


occur from the left. Being increasing and right continuous, is discontinuous at x0 2 R if
and only if
(x0 ) > lim (x)
x!x0

Next we show that, remarkably, the size of this jump is the probability P (f = x0 ) that f
takes on value x0 . In so doing, we also characterize the continuity of .
48.8. DISTRIBUTION FUNCTIONS 1399

Proposition 2017 If P is countably additive, then, for each x0 2 R,

(x0 ) lim (x) = P (f = x0 ) (48.33)


x!x0

Moreover, is continuous at x0 2 R if and only if P (f = x0 ) = 0.

Thus, the distribution function is continuous (discontinuous) at a point x0 if and only if


the random variable assumes value x0 with zero (non-zero) probability. In case, the non-zero
probability is the size of the jump.
As is monotone, there exist at most countably many points x0 2 R at which it is dis-
continuous (cf. Proposition 564), so where P (f = x0 ) > 0. Interestingly, a random variable
can thus take at most countably many distinct values with a strictly positive probability.16

Proof Consider x0 2 R. It is immediate to see that

P (f < x0 ) + P (f = x0 ) = P (f x0 ) = (x0 ) (48.34)

Next, for each " > 0, observe that

(f x0 ") (f < x0 ) (f x0 )

By the monotonicity of P , this implies that

(x0 ") = P (f x0 ") P (f < x0 ) P (f x0 ) = (x0 )

Consider a strictly positive sequence f"n g 2 (0; 1) with "n # 0. De ne An = (f x0 "n )


for all n 1. Since f"n g is a decreasing sequence, we have that An+1 An for all n 1.
Note also that [
An = (f < x0 )
n
Indeed, if f (!) < x0 , then f (!) x0 "n for some n 1. Vice versa, if f (!) x0 "n < x0
for some n 1, then f (!) < x0 . By Proposition 1988, this implies that

lim (x0 "n ) = lim P (An ) = P (f < x0 ) (48.35)


n n

By pairing (48.35) and (48.34), the equality (48.33) follows.


Finally, recall that a scalar function is continuous at a point x0 if and only it is left and
right continuous at x0 . Since is right continuous by Proposition 2016, it is continuous
at x0 if and only if it is left continuous at x0 . By (48.33), this is the case if and only if
P (! : f (!) = x0 ) = 0. This completes the proof.

Example 2018 Let !0 be the Dirac probability centered at some state ! 0 2 . The
distribution function of a random variable f : ! R is given by
8
< 0 if x < x0
(x) =
: 1 if x x0

where x0 = f (! 0 ). Here is discontinuous only at x0 , with jump of size P (f = x0 ) = 1. N


16
The coda Proposition 2053 further clari es.
1400 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Suppose that our state space is = R, with P such that

P ([ 5; 5]) = 1

That is, P concentrates all its mass on the interval [ 5; 5]. For the quadratic function
f (x) = x2 it holds
P ( 25 f 25) = 1
Thus, f is unbounded, yet probabilistically bounded under our P : with probability 1 it
assumes values between 25 and 25. This motivates the following de nition.

De nition 2019 A random variable f : ! R is essentially bounded if there exist scalars


m and M 2 R such that
P (! : m f (!) M) = 1
or, more compactly, P (m f M ) = 1.

All bounded random variables are, trivially, essentially bounded. When P is simple, all
random variables are, again trivially, essentially bounded. Next we show that, remarkably,
distribution functions of essentially bounded random variables become eventually constant.

Proposition 2020 If f : ! R is essentially bounded, there exist scalars a and b such that
(x) = 0 for all x a and (x) = 1 for all x b.

Proof Let f be essentially bounded. By de nition, there exist m; M 2 R with P (m f M) =


1. Let x < m. We have
(f x) \ (m f M ) = ;
and so (x) = P (f x) = 0. Let x M . We have

(m f M) (f x)

and so, by the monotonicity of P , we have

1 (x) = P (f x) P (m f M) = 1

Thus, (x) = 1. Now, if we set b = M and take any a < m, it holds (x) = 0 for all x a
and (x) = 1 for all x b, as desired.

An interval [a; b] is a carrier for a distribution if (x) = 0 when x a and (x) = 1


when x b. By the last result, the distribution of an essentially bounded random variable
has a carrier. Observe that carriers are not unique: by the monotonicity of , any interval
[c; d] that contains a carrier is also a carrier.

Example 2021 (i) We say that a random variable f : ! R is simple when it takes only
nitely many distinct values, that is, Im f = fx1 ; :::; xn g. Without loss of generality, we
assume that
x1 < < xn
The probability that f assumes value xi is given by
1
P f (xi ) = (xi ) xi
48.8. DISTRIBUTION FUNCTIONS 1401

for each i = 1; :::; n. With this, the distribution function is easily seen to be the step
function: 8
>
> 0 if x < x1
>
>
>
> 1 (x )
>
> P f 1 if x1 x < x2
<
(x) = P f 1 (x ) +P f 1 (x ) if x2 x < x3 (48.36)
1 2
>
>
>
>
>
>
>
> Pn
: 1 (x )
i=1 P f i =1 if x xn
Any interval [a; xn ] with a < x1 is a carrier of . Observe that is right continuous, so
x+
i = (xi ), even though P is not required to be countably additive.
(ii) Random variables de ned on a nite set are automatically simple, so their distri-
bution function is a step function. For instance, in the production plant example, we have a
simple random variable 8
>
> 400 if ! = s
<
f (!) = 2500 if ! = l
>
>
:
0 if ! = n
Here Im f = f0; 400; 2500g and
1 1 1
P (n) = P f (0) ; P (l) = P f (2500) ; P (s) = P f (400)
The distribution function of f is the right-continuous step function
8
>
> 0 if x < 0
>
>
>
>
< P (n) if 0 x < 400
(x) =
>
> P (n) + P (s) if 400 x < 2500
>
>
>
>
: P (n) + P (s) + P (l) = 1 if x 2500

Any interval [a; 2500] with a < 0 is a carrier of : If, for concreteness, we take P with
P (s) = P (l) = 1=4 and P (n) = 1=2, we have
8
>
> 0 if x < 0
>
>
>
< 1 if 0 x < 400
2
(x) =
>
> 3
> 4 if 400 x < 2500
>
>
:
1 if x 2500
Observe that there is no smallest carrier: the interval [0; 2500] is not a carrier since (0) =
1=2.
(iii) When the probability P itself is simple, the distribution function of any random
variable f is a right-continuous step function, as the reader can check. N

The plant example shows that smallest carriers may not exist. Example 2023-(i) will
show that, on the other hand, they may exist. More importantly, Example 2023-(ii) will
show that carriers might well not exist at all. As it will become clear as the analysis unfolds,
for our purposes what matters is the existence of carriers, not their possible minimality.
1402 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

48.8.2 Density functions


As in Example 2021-(i), take a distribution function which is a right-continuous step
function, discontinuous at the points

x1 < < xn

For each i = 1; :::; n, now let


pi = (xi ) xi
be the probability that the random variable f assumes value xi . We can then rewrite (48.36)
as follows: 8
>
> 0 if x < x1
>
>
>
>
>
> p if x1 x < x2
X < 1
(x) = pi = p1 + p2 if x2 x < x3 (48.37)
>
>
fi:xi xg >
>
>
>
>
>
: Pn
i=1 pi = 1 if x xn
This suggests to write as the sum
X
(x) = (xi )
fi:xi xg

where : R ! [0; 1] is de ned by


(
pi if x = xi
(x) =
0 else

The function is called (simple) density function of .


This sum representation becomes an integral representation when continuous, rather than
step function, distribution functions are considered.

De nition 2022 A positive function : R ! [0; 1) is an ( integrable) density function of


if Z x
(x) = (t) dt
1
for all x 2 R.

Thus, a distribution function has density when it is the integral function of . By


R +1
condition (i), we have 1 (x) dx = 1. This equality reduces to
Z +1 Z b
(x) dx = (x) dx = 1
1 a

when has a carrier [a; b]. In this case, a necessary condition for a distribution to be an
integral function is to be continuous (see Proposition 1881). So, integrable density functions
require continuous distribution functions.
48.8. DISTRIBUTION FUNCTIONS 1403

Example 2023 (i) Given any two scalars a < b, consider the uniform distribution function
8
>
> 0 if x < a
<
x a
(x) = b a if a x b
>
>
:
1 if x > b

The interval [a; b] is the smallest carrier of . Its density function, called uniform, is
( 1
b a if a x b
(x) =
0 else

because Z x Z x
1
(t) dt = dt = (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian distribution function is
Z x
1 t2
(x) = p e 2 dt
1 2
This distribution has no carriers. Its density function, called Gaussian, is
1 x2
(x) = p e 2
2
R +1
because 1 (t) dt = 1 (see Section 45.5). N

Continuous densities are especially important. Indeed, they are well behaved: they are
zero where they should.

Proposition 2024 Let be a distribution function with a carrier [a; b]. If has a contin-
uous density function , then
(x) = 0
for all x 2
= [a; b].

Thus, a continuous density is zero outside any carrier of its distribution function.
Rx R +1
Proof By de nition, (x) = 1 (t) dt for all x 2 R. Fix z b. Since 1 (x) dx =
Rb Rz
a (x) dx = 1, we have b (x) dx = 0 for all z b. By Corollary 1869, (t) = 0 for all
b t z. Since z was chosen arbitrarily, we conclude that (z) = 0 for all z b. A similar
argument shows that (z) = 0 for all z a.

This result requires continuity: we can always make discontinuous, so non-zero, at any
point x
~2= [a; b] as follows (
(x) if x 6= x
~
~ (x) =
y~ if x = x~
1404 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Rx
with y~ 6= (~ x) = 0. By Theorem 1859, it still holds (x) = ~ (t) dt for all x 2 R, so ~
1
is still a density of . Yet, it is not 0 outside [a; b].

When has a carrier, the Barrow-Torricelli Theorem has the following immediate, yet
remarkable, consequence.

Proposition 2025 A distribution function with a carrier [a; b] has a unique continuous
density function if and only if it is continuously di erentiable. In this case, = 0 .

So, continuous density functions are associated to continuously di erentiable distribution


functions. In this case, and are dual notions: one moves from to via di erentiation
and from to via integration, according to the fundamental duality of calculus embodied
in the Barrow-Torricelli Theorem.

48.9 Expected values II


Earlier in the chapter we de ned the expected value of a random variable with respect to a
simple probability P . We can express this expected value as a Stieltjes integral with respect
to distribution function , an insightful perspective.

Proposition 2026 Let P : 2 ! R be a simple probability. For each random variable


f : ! R,
Z b
EP (f ) = xd (x)
a

where [a; b] is a carrier of .17

Proof Let supp P = f! 1 ; :::; ! n g and set

xi = f (! i ) 8i = 1; :::; n

To ease matters, assume that these values are distinct. It is then without loss of generality
to let
x1 < < xn (48.38)
Hence, a < x1 and b xn (why?). With this, for each i = 1; :::; n we have

P (! i ) = P (f = xi )

and so
n
X n
X
EP (f ) = f (! i ) P (! i ) = xi P (f = xi )
i=1 i=1

As P is countably additive (Proposition 1989), by (48.33) we have

P (f = xi ) = (xi ) lim (x) 8i = 1; :::; n


x!xi

17
For convenience we picked the compact carrier but the choice of a speci c carrier is actually immaterial
(in any case, (x) = 0 for all x a and (x) = 1 for all x b).
48.9. EXPECTED VALUES II 1405

As a result, we can rewrite the expected value as


n
" #
X
EP (f ) = xi (xi ) lim (x)
i=1 x!xi

Since is increasing and right continuous, by Proposition 1944 we have


n
" # Z
X b
xi (xi ) lim (x) = xd (x)
i=1 x!xi a

Rb
where a < x1 < xn b.18 We conclude that EP (f ) = a xd , as desired. We leave to
the reader the general case when the inequalities in (48.38) are weak, with some of the xi
possibly equal.

In this result the choice of the carrier of is irrelevant. Indeed, take any scalars c and
d with c < a b < d. By (47.22),
Z d Z a Z b Z d
xd = xd + xd + xd
c c a b
Ra Rd
As (x) = 0 on [c; a] and (x) = 1 on [b; d], it holds c xd = b xd = 0. We conclude
that Z Z
b d
xd = xd
a c
In the rest of the analysis it is convenient to use improper Stieltjes' integral. They are
de ned in a similar way than the improper Riemann integral. For it, the proprieties (i)-(v)
of Section 47.4 continue to hold. Improperly armed, we continue the analysis by observing
that the last result suggests a general, Stieltjes-based, notion of expected value.

De nition 2027 The expected (or mean) value of a random variable f : ! R with
distribution function under a probability P : 2 ! [0; 1] is the improper Stieltjes' integral
Z 1
EP (f ) = xd
1

when it exists.

The choice of the carrier is irrelevant (in analogy with what remarked after the last
result). When has a carrier [a; b], in particular when P is simple, the distribution function
has always a carrier [a; b] and so the expected value reduces to a standard Stieltjes' integral
Z 1 Z b
EP (f ) = xd = xd (48.39)
1 a

In view of Proposition 2026, this shows that this notion of expected value indeed subsumes
the earlier one of Section 48.4. At the same time, it considerably enlarges its scope because
18
The choice of a and b is irrelevant because, in any case, (x) = 0 for all x a and (x) = 1 for all
x b.
1406 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

it applies to any distribution function, regardless of whether the underlying probability P is


simple.
The de nition of expected value requires the improper Stieltjes' integral to exist but not
to be nite. Yet, we are mostly interested in the nite case. For this reason, next we exhibit
a large class of random variables with nite expected value.
Proposition 2028 All essentially bounded random variables have nite expected value.
Proof Let f : ! R be essentially bounded under some probability P : 2 ! [0; 1]. It is
enough to observe that, by Proposition 48.8, theR b distribution function of f has a carrier
[a; b]. As is increasing, the Stieltjes integral a xd exists nite by Proposition 1938 and
Rb
so we have EP (f ) = a xd .

When has a density , by Proposition 1940 { which continues to hold in the improper
case { it holds Z +1 Z +1
EP (f ) = xd (x) = x (x) dx
1 1
This permits to reduce an expected value to an improper Riemann integral.
Example 2029 (i) For the uniform density it holds
Z +1 Z b Z b
1 1 1 b2 a2 a+b
EP (f ) = x (x) dx = x dx = xdx = =
1 a b a b a a b a 2 2
(ii) For the Gaussian density it holds
Z +1 Z +1 Z +1 Z 0
1 x2 1 x2 1 x2
EP (f ) = x (x) dx = xp e 2 dx = xp e 2 dx + xp e 2 dx
1 1 2 0 2 1 2
Z +1 Z 0
1 x2 1 x2
= x p e 2 dx ( x) p e 2 dx
0 2 1 2
Z +1 Z +1
1 x2 1 x2
= x p e 2 dx x p e 2 dx = 0
0 2 0 2
N
Riemann integration comes up also without appealing to densities. Indeed, integration
by parts { which takes an elegant form in the Stieltjes integral (Proposition 1946) { permits
to express expected values as Riemann integrals with distribution functions as integrands.
Theorem 2030 (Cavalieri) If the random variable f : ! R is essentially bounded, it
holds Z +1 Z 0
EP (f ) = (1 (x)) dx (x) dx (48.40)
0 1

If f is positive, we have (x) = 0 for all x < 0 and so the Cavalieri formula (48.40)
takes the elegant form Z +1
EP (f ) = (1 (x)) dx
0
Proof We prove the result for a bounded random variable f : ! R, leaving the \essential"
case to the reader. By hypothesis, there exist scalars m and M with m f (!) M for all
! 2 . In formula (47.28) let f = and g = IdR , i.e., g (x) = x. We consider two cases.
48.9. EXPECTED VALUES II 1407

(i) Let m 0. It holds


Z +1 Z M Z M
EP (f ) = xd (x) = xd (x) = M (x) dx
1 0 0
Z M Z 1
= (1 (x)) dx = (1 (x)) dx
0 0

(ii) Let m < 0. It holds


Z +1 Z M Z M Z 0
EP (f ) = xd (x) = xd (x) = xd (x) + xd (x)
1 m 0 m
Z M Z 0 Z M Z 0
= M (x) dx (x) dx = (1 (x)) dx (x) dx
0 m 0 m
Z +1 Z 0
= (1 (x)) dx (x) dx
0 1

This completes the proof.

Cavalieri's formula is used in the proof of the next important result showing that linearity
and monotonicity continue to hold for the general notion of expected value.

Proposition 2031 Let f; g : ! R be random variables with nite expected values. Then

(i) EP ( f + g) = EP (f ) + EP (f ) for all ; 2 R;

(ii) EP (f ) EP (g) if f g.

(iii) EP (k) = k for all k 2 R.19

Note that (i) and (iii) together imply that, for each k 2 R,

EP (f + k) = EP (f ) + k

for all f : ! R with nite expected value.

Proof As to (iii), just observe that


Z b
cd = c ( (b) (a)) = c
a

because (b) = 1 and (a) = 0. We prove (ii) when f is bounded, leaving the rest of the
proof to more advanced courses. We begin by considering two positive and bounded random
variables f and g. We have, for each x 2 R,

f (x) = P (f x) P (g t) = g (x)
19
With a standard abuse of notation (cf. Section 44.5.1), c denotes both a scalar c (on the left-hand side)
and a function constant to c (under the integral sign).
1408 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Along with the Cavalieri Theorem, the monotonicity of the Riemann integral then implies
Z 1 Z 1
EP (f ) = (1 f (x)) dx (1 g (x)) dx = EP (g)
0 0

Now, let f and g be any two bounded random variables, not necessarily positive. As they
are bounded, there exists a scalar k 0 large enough so that f + k g + k 0. By point
(i) and by what just proved in the positive case,

EP (f ) + k = EP (f + k) EP (g + k) = EP (g) + k

as desired.

48.10 Moments and all that


48.10.1 Transformations of random variables
It is often important to consider the expected value of a transformation ' f of a random
variable f , a task greatly simpli ed by the following result.

RTheorem 2032 Let f : ! R and ' : Im f ! R be such that the Stieltjes integral
+1
1 ' (x) d (x) exists nite. Then,
Z +1
EP (' f ) = ' (x) d (x) (48.41)
1

Proof Let g = ' f and denote by the distribution function of g. We prove the result
only when ' is strictly increasing and surjective, leaving a complete proof to more advanced
courses. We have:
1 1 1
(x) = P (g x) = P (' f x) = P f ' (x) = ' (x) = ' (x)

Thus,
Z +1 Z +1 Z +1
1
E (' f ) = xd (x) = xd ' (x) = ' (z) d (z)
1 1 1

under the change of variable z = ' 1 (x), as the reader can check.

Example 2033 Let u : R ! R be a utility function. As discussed in the coda, the expected
value EP (u X) is called expected utility of f . By the last result, we can write
Z +1
EP (u f ) = u (x) d (x)
1

Concave transformation are especially important. For them we can establish a Stieltjes
integral version of the all-important Jensen inequality.
48.10. MOMENTS AND ALL THAT 1409

Proposition 2034 Let be a distribution function with a carrier [a; b]. Given a concave
function ' : I ! R de ned on an open interval of the real line, we have
Z b Z b
' d (' ) d
a a

for all continuous functions : [a; b] ! R such that Im I.

By Theorem 2032, when (x) = x we then have


Z b Z b
' (EP (f )) = ' xd (x) ' (x) d (x) = EP (' f )
a a

This is the version of the Jensen inequality for expected values.

Proof The Stieltjes' integralsR on both sides exist because both and ' are continuous
b
(cf. Theorem 833). Set y0 = a d . The superdi erential @' (y0 ) of ' at y0 is not empty
(Theorem 1521). Let 2 @' (y0 ), so that ' (y) ' (y0 ) + (y y0 ) for all y 2 I. In
particular,
' ( (x)) ' (y0 ) + ( (x) y0 ) 8x 2 [a; b]
By the monotonicity of the Stieltjes integral and by the last lemma, we then have
Z b Z b Z b
' ( (x)) d (x) [' (y0 ) + ( (x) y0 )] d (x) = ' (y0 ) + (x) d (x) y0
a a a
Z b
= ' (y0 ) + (y0 y0 ) = ' (y0 ) = ' d
a

as desired.

48.10.2 Moments
When ' (x) = xn we get expected values of powers, an important class of expected values.

De nition 2035 The n-th moment of a distribution function is given by the Stieltjes
integral Z +1
n = xn d (x) (48.42)
1

For instance, 1 is the rst moment of , 2 is its second moment, 3 is its third moment,
and so on. In particular, by formula (48.41) it holds, for each n 1,
Z +1
n
EP (f ) = xn d (x) = n
1

So, the moments of correspond to the moments of the underlying random variable f . In
particular, the rst moment corresponds to the expected value of f and, as easily checked,
2
VP (f ) = 2 1

That said, it is convenient to carry out the analysis of moments directly in terms of
distribution functions.
1410 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Proposition 2036 If the moment n exists, then all lower moments k, with k n, exist.

To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume the existence of the rst moment.

RProof To ease matters, assume that there is a scalar a such that (a) = 0, so that n =
+1 n
x d (x). Since xk = o (xn ) if k < n, the version for improper Stieltjes integrals of
a R +1
Proposition 1916-(ii) ensures the convergence of a xk d (x), that is, the existence of k .

If has a density , by Proposition 1941 we have


Z +1 Z +1
n
x d = xn (x)dx (48.43)
1 1

In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .

Example 2037 (i) For the uniform density we have


Z Z
+1 b
1 1 b2 a2 a+b
1 = x (x) dx = x dx = =
1 a b a b a 2 2
Z Z
+1 b
1 1 b3 a3 1 2
2 = x2 (x) dx = x2 dx = = a + ab + b2
1 a b a b a 3 3

Z +1 Z b
n 1 1 bn+1 an+1
n = x (x) dx = xn dx =
1 a b a n+1 b a

(ii) For the Gaussian density we have:


Z +1 Z +1 Z +1 Z 0
1 x2 1 x2 1 x2
1 = x (x) dx = xp e 2 dx = xp e 2 dx + xp e 2 dx
1 1 2 0 2 1 2
Z +1 Z 0
1 x2 1 x2
= xp e 2 dx ( x) p e 2 dx
0 2 1 2
Z +1 Z +1
1 x2 1 x2
= xp e 2 dx x p e 2 dx = 0
0 2 0 2
By integrating by parts,
Z +1 Z +1 Z +1
2 1 x2
2 1 x2
2 = x (x) dx = x p e 2 dx = xp xe 2 dx
1 1 2 1 2
+1 Z +1
1 x2 1 x2
= p xe 2 + p e 2 dx = 0 + 1 = 1
2 1 1 2
48.10. MOMENTS AND ALL THAT 1411

p x2
where we adapted (44.67) to the improper case, with g (x) = x= 2 and f 0 (x) = xe 2 , so
p x2
that g 0 (x) = 1= 2 and f (x) = e 2 .
(iii) The Cauchy density, a version of Agnesi's versiera, is given by
1 1
(x) =
x2 +1
The primitive of (x) is 1 arctan x, so its distribution function is

Z x Z 0 Z x
1 1
(x) = (t) dt = (t) dt + (t) dt = lim arctan tj0x + arctan tjx0
1 1 0 x! 1
1 1 1 1
= + arctan x = + arctan x
2 2
In view of Example 1909, the mean of the Cauchy density does not exist. N

The next result { a consequence of the Stone-Weierstrass Theorem { shows under which
conditions moments uniquely pin down probability densities.

Proposition 2038 Let f1 ; f2 : [a; b] ! R be any two continuous functions. If


Z b Z b
n
x f1 (x) dx = xn f2 (x) dx 8n 0 (48.44)
a a

then f1 = f2 .

Moments thus uniquely pin down probability densities of continuously di erentiable dis-
tribution function with a carrier [a; b]. Indeed, assume that 1 and 2 are any two continuous
probability densities that are 0 outside [a; b]. If
Z b Z b
n
x 1 (x) dx = xn 2 (x) dx 8n 1
a a

then 1 = 2 .20 For any such density, to know the moments amounts to know the density
itself. For instance, the uniform density on [a; b] is the only density that has the moments
1 bn+1 an+1
n = 8n 1
n+1 b a
To specify these moments amounts to specify the uniform density.

Proof Suppose that (48.44) holds. By setting h = f1 f2 , we then have


Z b
xn h (x) dx = 0 8n 0
a

By the linearity of the integral, this easily implies that


Z b
p (x) h (x) dx = 0 8n 0 (48.45)
a
20
Rb Rb
By de nition of a probability density we have a 1 (x) dx = a 2 (x) dx = 1, so it su ces to consider
n 1.
1412 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

for any polynomial p : [a; b] ! R. Let " > 0. By the Stone-Weierstrass Theorem, there exists
a polynomial p" : [a; b] ! R such that
jh (x) p" (x)j " 8x 2 [a; b]
Since h is continuous on the compact interval [a; b], by the Weierstrass Theorem the maximum
value M = maxx2[a;b] jh (x)j exists nite. So,
h2 (x) p" (x) h (x) = jh (x)j jh (x) p" (x)j "M 8x 2 [a; b]
Then,
p" (x) h (x) "M h2 (x) p" (x) h (x) + "M 8x 2 [a; b]
By the monotonicity of the integral, we then have:
Z b Z b Z b Z b
2
p" (x) h (x) dx " (b a) M = (p" (x) h (x) + ") dx h (x) dx (p" (x) h (x) + "
a a a a
Z b
= p" (x) h (x) dx + " (b a) M
a
Hence,
Z b Z b
2
h (x) dx p" (x) h (x) dx " (b a) M
a a
By (48.45),
Z b
h2 (x) dx " (b a) M
a
Rb
Since " was arbitrarily chosen, we conclude that a h2 (x) dx = 0. By Corollary 1869, this
implies h = 0, as desired.

48.10.3 The problem of moments


Consider a distribution function with carrier the unit interval [0; 1]. Its n-th moment is
then Z 1
n = xn d (48.46)
0
If all moments exist, they form a sequence f n g of scalars in [0; 1]. For instance, if (x) = x
we have n = 1= (n + 1).
In this unit interval setting, the problem of moments takes the following form:
Given a sequence f n g of scalars in [0; 1], is there an integrator g such that, for each
n, the term n is exactly its n-th moment n ?
The question amounts to ask whether sequences of moments have a characterizing prop-
erty, which then f n g should satisfy in order to have the desired property. This question was
rst posed by Thomas Stieltjes in the same 1894-95 articles where it developed his notion
of integral. Indeed, to provide a setting where to address properly the problem of moments
was a main motivation for his integral (which, as we just remarked, is indeed the natural
setting where to de ne moments).
Next we present a most beautiful answer given by Felix Hausdor in the early 1920s. To
do it, we need to go back to the nite di erences of Chapter 10.
48.10. MOMENTS AND ALL THAT 1413

De nition 2039 A sequence fxn g1


n=0 is completely monotone if, for every n 0, we have
( 1)k k xn 0 for every k 0.

In words, a sequence is completely monotone if its nite di erences keep alternating sign
across their orders. A completely monotone sequence is positive because 0 xn = xn , as well
as decreasing because xn 0 (Lemma 419). It is the discrete analog for sequences of the
di erential notion of complete monotonicity for functions on open intervals (Section 30.3).

We can now answer the question we posed.


R1
Theorem 2040 (Hausdor ) A sequence f n g [0; 1] is such that n = 0 xn d for a
distribution function if and only if it is completely monotone.

Proof We prove the \only if" part, the converse being signi cantly more complicated.
RSo, let f n g be a sequence of moments (48.46). It su ces to show that ( 1)k k xn =
1 n
0 t (1 t)k dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 tn (1 t)k 1 dg (t)
for all n (induction hypothesis). Then,

k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
k 1
= ( 1) tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0

as desired.

The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of nite di erences is able to pin down moments' sequences. Note
that for this result the Stieltjes integral is required: in the \if" part the integrator, whose
moments turn out to be the terms of the given completely monotone sequence, might well
be non-di erentiable (so, the Riemann version (48.43) might not hold).

48.10.4 Moment generating function


Given a distribution , consider the improper Stieltjes integral
Z +1
eyx d (x) (48.47)
1

For each y 2 R, it has a positive integrand eyx and so it is either nite or equal to +1. Let
Z +1
D = y2R: eyx d (x) < +1
1

the collection of values of y for which this integral is nite.

Lemma 2041 The set D is a non-empty interval that contains the origin.
1414 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

R +1
Proof We have 0 2 D because 1 d (x) = 1. Let y; y 0 2 D and 2 [0; 1]. By the
convexity of the exponential function and the monotonicity of the integral, we have
Z +1 Z +1
[ y+(1 )y 0 ]x 0
e d (x) = e yx+(1 )y x d (x)
1 1
Z +1 Z +1
yx 0
e d (x) + (1 ) ey x d (x) < +1
1 1

as desired.

The set D can be the entire real line, but it can also reduce to the singleton f0g. The
following examples illustrate. In reading them, observe that the integral (48.47) becomes
Z +1 Z 1
yx
e d (x) = eyx (x) dx
1 1

when has a continuous density function .

Example 2042 (i) Suppose that has a carrier [a; b]. Then, for each y 2 R,
Z +1 Z b Z b Z b
yx yx jyjjxj
e d (x) = e d (x) e d (x) ejyj maxfjaj;jbjg d (x)
1 a a a
= (b a) ejyj maxfjaj;jbjg < +1
We conclude that D = R.
(ii) Let 8 1
< x2
if x > 1
(x) =
:
0 else
R +1
be the so-called Pareto density (recall from Example 1897 that 1 x 2 dx = 1). For every
y > 0,
Z +1 Z +1 Z +1 yx Z +1 yx
yx yx e e
e d (x) = e (x) dx = 2
dx = dx = +1
1 1 1 x 1 x2
Therefore, D = f0g.
(iii) Let 8
x
< e if x 0
(x) =
:
0 else
R +1 x
be the so-called exponential density with parameter > 0 (it holds 0 e = 1= ). We
have:
Z +1 Z +1 Z +1
yx yx
e d (x) = e (x) dx = eyx e x dx
1 1 0
8
Z +1 < y if y <
= e(y )x dx =
0 :
+1 if y
Thus, D = ( 1; ). N
48.10. MOMENTS AND ALL THAT 1415

Denote by I the interior of the interval D ; i.e., I = int D . The Pareto density shows
that I may be empty. When non-empty, I is an open interval that permits to de ne the
following important function.

De nition 2043 Let I 6= ;. The function F : I ! R de ned by


Z 1
F (y) = eyx d (x)
1

is the moment generating function of .

It is easy to check that F is a convex function. The next result shows its importance.

Theorem 2044 The moment generating function is analytic on a neighborhood B (0) of the
origin, with
X1
n n
F (y) = y 8y 2 B (0) (48.48)
n!
n=0
In particular,
n = F (n) (0) 8n 1 (48.49)

Formula (48.48) is thus the exact Maclaurin expansion for the moment generating func-
tion.

Proof Since 0 2 I , there is a small enough neighborhood B (0) = ( "; ") included in I .
Let y 2 B (0). By Theorem 399,
1
X
yx y 2 x2 y 3 x3 y n xn y n xn
e = 1 + yx + + + + + =
2 3! n! n!
n=0

For each n 1,
n
X jyxjk
ejyxj e"jxj e "x
+ e"x
k!
k=0
and so, by the additivity of the integral
Xn Z 1 Z 1X n Z Z
jyxjk jyxjk 1
"x
1
d (x) = d (x) e d (x)+ e"x d (x) = F ( ")+F (")
k=0 1 k! 1 k!
k=0 1 1

As this holds for all n 1, we thus have


1 Z 1
X jyxjk
d (x) F ( ") + F (") < +1
k=0 1 k!

This is easily seen to imply that has nite moments n of all orders. Moreover, it can be
proved that:21
Z 1 Z 1X1 1
X Z X1
y n xn yn 1 n yn
F (y) = eyx d (x) = d (x) = x d (x) =
1 1 n! n! 1 n! n
n=0 n=0 n=0
21
The third equality follows from a dominated convergence theorem for the Riemann integral, proved by
Cesare Arzela in 1885, that readers will learn in more advanced courses.
1416 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

As this holds for all y 2 B (0), by Proposition 1400 the restriction F : ( "; ") ! R is analytic.
Hence, it is in nitely di erentiable on ( "; "). Formula (48.49) follows from Proposition 1398,
but it is also easily checked by direct computation.

The derivative of order n at 0 of the moment generating function F is, therefore, the
n-th moment of . That is,

F 0 (0) = 1
00
F (0) = 2

F (n) (0) = n

This property of moment generating functions, which justi es their name, is important
because it is often convenient to compute moments through them.

x2 p
Example 2045 For the Gaussian density (x) = e 2 = 2 we have
Z +1 Z +1 Z +1
yx 1 yx x2 1 1
(x2 2yx)
F (y) = e (x) dx = p e e 2 dx = p e 2 dx
1 2 1 2 1
Z +1 Z +1 2
Z +1
y2
1
(x2 2yx+y 2 y 2 ) 1
(x2 2yx+y 2 )+ y2 1
(x y)2
= e 2 dx = e 2 dx = e 2 e 2 dx
1 1 1

where in the fourth equality we have added and subtracted y 2 . By (45.12),


Z +1
1 1
(x y)2
p e 2 dx = 1
2 1

and so the moment generating function F : R ! R is given by

y2
F (y) = e 2

y2 y2
We have F 0 (y) = ye 2 and F 00 (y) = e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1.
N

The exact Maclaurin expansion (48.48) may hold on the entire real line, as next we show.

Proposition 2046 If I = R, it holds


1
X
n n
F (y) = y 8y 2 R (48.50)
n!
n=1

Thus, here the moment generating function F is the generating function of the sequence
f n =n!g.
48.11. CODA OSCURA 1417

Proof Let I = R. By proceeding as in the proof of Theorem 2044, for each neighborhood
( 1=n; 1=n) of the origin it holds
1
X
n n 1 1
F (y) = y 8y 2 ;
n! n n
n=1

As the union of all intervals ( 1=n; 1=n), with n 1, is the real line, we conclude that
(48.50) holds.

Example 2047 The uniform distribution function on the unit interval


8
>
> 0 if x < 0
<
(x) = x if 0 x 1
>
>
:
1 if x > 1

has a carrier (Example 2042). So, I = R. Thus, (48.50) holds. Let us compute the moment
generating function F . For each y 6= 0, we have
Z +1 Z 1
ey 1
F (y) = eyx d (x) = eyx dx =
1 0 y
R1
As F (0) = 0 dx = 1, we conclude that the moment generating function F : R ! R is given
by ( ey 1
y if y 6= 0
F (y) =
1 if y = 0
It holds, for each y 6= 0,
1
! 1 1
ey 1 1 X yn 1 X yn X yn
= 1 = =
y y n! y n! n! (n + 1)
n=0 n=1 n=0

Thus, for each y 2 R,


1
X yn
F (y) =
n! (1 + n)
n=0

By (48.50), we conclude that


1
n =
1+n
for each n 1. N

48.11 Coda oscura


48.11.1 Zero mysteries
Simple probabilities well accord with our nitist intuition. In particular, the states in their
supports are naturally regarded as the only ones that may obtain. Indeed, the other states
1418 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

have probability zero (Lemma 1982), a probability naturally interpreted as indicating their
impossibility to obtain.
This ease of interpretation motivates our detailed study of simple probabilities. Yet, in
applications are often used probabilities that, though formally perfectly legitimate, accord
less well with our intuition. To introduce them, let = [0; 1], i.e., take as a state space the
closed unit interval. With some imagination, suppose that blind-folded you pick a point !
of [0; 1]. It seems natural to assume that, being blind-folded, the probability that you pick
a point in the interval
1 3
;
4 4
is equal to the interval, that is, 1=2. Similarly, the probability that you pick a point in any
interval
[a; b] [0; 1]
is equal to its length, that is, b a. It can be proved that there exists a probability P :
2[0;1] ! [0; 1] such that
P ([a; b]) = b a (48.51)
for all 0 a b 1. Take the middle point ! = 1=2 of the closed unit interval: what is the
probability that you pick this speci c point? For all n 2,

1 1 1
! ;! + [0; 1]
2 n n

and so, by the monotonicity of P ,

1 1 1 2
0 P P ! ;! + =
2 n n n

As this inequality holds for all n 2, we conclude that

1
P =0
2

A similar argument readily shows that, for each ! 2 [0; 1],

P (!) = 0

Our probability P ends up assigning a zero probability to all states. But, of course, at least
one of them obtains.
This simple example shows that a zero probability event is not, in general, an event that
it will not happen for sure. This puzzling fact is yet another surprising feature of actual
in nities, a Pandora box (cf. Section 7.3). Indeed, observe that for each probability value
0 x 1 there is some event A [0; 1] with P (A) = x (just take an interval in [0; 1] of
length x). So, the probability P assumes uncountably many values. It is thus drastically
di erent from the simple, nitely valued, probabilities for which our nitist intuition works
so well.
This discussion motivates the next de nition. Here is any state space whatsoever,
nite or in nite.
48.11. CODA OSCURA 1419

De nition 2048 A probability P : 2 ! [0; 1] is di use if P (!) = 0 for all ! 2 .

The previous probability P on [0; 1] is a rst instance of a di use probability. Observe


that, by additivity, a di use probability assigns probability zero to all nite events. This
means that these probabilities cannot live in nite state spaces. A di use probability which
is countably additive, as typically the case in applications, actually requires an uncountable
state space as all events that are at most countable have probability zero. Yet, for them our
Pandora box has another disconcerting surprise for us, a highly non-trivial result proved by
Stanislaw Ulam in 1930. It presupposes the continuum hypothesis.22

Theorem 2049 (Ulam) Let I be any interval, bounded or not, of the real line. There is
no countably additive probability measure P : 2I ! [0; 1] which is di use.

The power set turns out to be a too large a domain for a probability on the interval I
that, at the same time, is di use and countably additive. There are too many sets to take
care. Thus, either we give up a most convenient property like countable additivity or we
have to look for a smaller family of subsets of I over which it is possible to de ne a di use
and countably additive probability. This latter possibility is explored next.

De nition 2050 The family of Borel sets of R, denoted by B, is the smallest collection of
subsets of R that contains:

(i) the empty set as well as all intervals, bounded or not, of R (including R itself );

(ii) all nite and countable unions and intersections of its elements;23

(iii) all complements of its elements.24

Thus, B is the smallest family of sets of the real line that contains all intervals, a most im-
portant class of sets, and is closed under under nite and countable unions and intersections
as well as under complementation. These closure properties render B a suitable domain for
countably additive probability measures. More importantly, B is adequate for applications:
most sets of the real line of interest are Borel. Indeed, by de nition this is the case for all all
intervals. This implies, inter alia, that also all singletons { being one-element closed intervals
{ are Borel sets. By property (ii), all nite and countable sets are then Borel as well. Next
we show that also the topologically signi cant sets of the real line are Borel.

Proposition 2051 Open and closed sets of the real line are Borel.

This result is an immediate consequence of the next lemma.

Lemma 2052 Each open set of the real line is the union, nite or countable, of disjoint
open intervals.
22
That is, that = @1 as discussed in Section 7.3. S
23
That is, if fBi gi2I is any nite or countable collection of Borel sets, their union i2I Bi and intersection
\
Bi are still Borel sets.
i2I
24
That is, if B is a Borel set, then B c is also a Borel set.
1420 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Proof Let G be an open set in R. Take x 2 G. By de nition, there exists a small


enough " > 0 such that (x "; x + ") G. Let "x = sup f" : (x; x + ") Gg and "x =
inf f" : (x "; x) Gg. We have " 2 ( 1; 1], with "x nite if G is bounded above, and
x

"x 2 [ 1; 1), with "x nite if G is bounded below. Set Ix = (x "x ; x + "x ). We have
Ix G (why?). Let (a; b) G be an open interval that contains x. Clearly, (a; b) Ix .
Thus, Ix is the largest interval with x 2 Ix and Ix G. As a result,
[
G= Ix
x2G

Let x; y 2 G with x 6= y. The intervals Ix and Iy are either equal or disjoint. For, suppose
that Ix \ Iy 6= ;. Then, the union Ix [ Iy is an open interval (why?) containing both x and
y. Hence, by the maximality of Ix and Iy , it holds

Ix Ix [ Iy Ix ; Iy Ix [ Iy Iy

Thus, Ix = Iy . Let I be the collection of distinct, so disjoint, intervals Ix . Each of them


contains a distinct rational number q (x). The rule

I 3 Ix 7 ! q (x) 2 Q

thus de nes an injective function q : I ! Q: for any two distinct intervals in I, there exist
two distinct rationals. Thus, jQj jIj (cf. Section 7.3), that is, the collection I is at most
countable, as desired.

The family B is signi cantly smaller than the entire power set 2R : a non-trivial result
of set theory shows that B has the cardinality of the real line { i.e., jBj = c { and so, by
Cantor's Theorem, jBj < 2R . The real line thus features plenty of non-Borel sets. Yet, they
are not easy to construct, a further sign that B contains the sets of interest.
With this, let us denote by BI the family of all Borel sets that belong to an interval I.
It can be proved that it is possible to de ne di use and countably additive probabilities

P : BI ! [0; 1]

Here P (B) is the probability of a Borel set B I. So, the family BI is large enough to
contain the sets of the real line of interest, but small enough not to confront an Ulam-type
impossibility result. In particular, by taking I = [0; 1] there exists a countably additive
probability P : B[0;1] ! [0; 1] satisfying (48.51). What discussed in this section can be made
fully rigorous { despite its puzzling interpretative aspects { even under countable additivity.

We close with a simple but interesting result about the cardinality of supports, i.e., about
the number of non-zero probability states.

Proposition 2053 Let P : 2 ! [0; 1] be a probability. Its support

f! 2 : P (!) > 0g (48.52)

is at most countable.
48.11. CODA OSCURA 1421

In words, there exist at most countably many states with a non-zero probability. Of
course, di use probabilities have no states of this kind: their support is empty and has
therefore no relevance. In contrast, for a simple probability the support is a key notion, a
further indication of the dramatic di erence between di use and simple probabilities.

Proof Set, for each n 1,


1
En = !2 : P (!)
n
Fix n 1. The event En is nite. For, suppose per contra that it is in nite. Then, it
contains countably many elements f! k g. Set Dk = f! 1 ; :::; ! k g for each k 1. By the
monotonicity and additivity of P , we have, for each k 1,

k
P (En ) P (Dk ) = P (! 1 ) + + P (! k )
n
We thus reach the contradiction
k
1=P( ) P (En ) lim = +1
k!1 n

We conclude that En is nite. It is easy to see that


[
f! 2 : P (!) > 0g = En
n

Thus, the support (48.52) is a countable union of nite sets and, therefore, is at most
countable (cf. Theorem 273).

48.11.2 Probability of outcomes


So far we discussed probabilities on state spaces, so on the contingencies that determine
the outcomes of the random variables. Yet, with the study of distribution functions we
started moving the analysis one level up by considering the probabilities of the random
variables' outcomes. Indeed, distribution functions inform us about the probability that
these outcomes meet some threshold levels. In this coda we fully develop this outcome
perspective by introducing full- edged probability measures on the real line, viewed as the
outcome space of our (real-valued) random variables.
Let us x a probability space ( ; P ), where P : 2 ! [0; 1] is a probability on a state
space . A random variable f : ! R delivers a, say monetary, outcome x 2 R with
probability
P (f = x)
Indeed, the event
(f = x) = f! 2 : f (!) = xg
consists of all states at which f assumes value x: when any of them obtains, the random
variable delivers outcome x. Note that
1
(f = x) = f (x)
1422 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

that is, (f = x) is the preimage of x through f . More generally, we can take the preimage
1
f (A) = f! 2 : f (!) 2 Ag

of any Borel set A in the real line. This preimage consists of all states at which f assume a
value x in A: if any of them obtains, f delivers an outcome x in A. The quantity
1
P f (A)

is then the probability that an outcome in A obtains under f . These remarks motivate the
following result.

Proposition 2054 Given a random variable f : ! R, the set function Pf : B ! [0; 1]


de ned, for each A 2 B, by
1
Pf (A) = P f (A)
is a probability measure. Moreover,

(i) If P is simple, so does Pf .

(ii) If P is countably additive, so does Pf .

The proof of this proposition relies on the nice behavior of preimages under unions and
intersections.

Lemma 2055 Let f : X ! Y be a function between any two sets X and Y . We have
! !
[ [ \ \
1 1 1
f Ai = f (Ai ) and f Ai = f 1 (Ai )
i2I i2I i2I i2I

for any collection fAi gi2I of subsets of Y .

Thus, union (intersection) followed by inversion is the same as inversion followed by


union (intersection). A similar interchange was previously established for complementation
in Lemma 593. We leave the easy proof of this lemma to the reader.25

Proof of Proposition 2054 It is easy to see that Pf (;) = 0, Pf (R) = 1 and Pf (A) 0
for all A 2 B. It remains to check additivity. So, let A and B two disjoint Borel subsets
of R. Their preimages f 1 (A) and f 1 (B) are disjoint: by Lemma 2055, ; = f 1 (;) =
f 1 (A \ B) = f 1 (A) \ f 1 (B). Thus, again by Lemma 2055,
1 1 1
Pf (A [ B) = P f (A [ B) = P f (A) [ f (B)
1 1
= P f (A) + P f (B) = Pf (A) + Pf (B)

as desired. We conclude that Pf is a probability measure.


25
See Section 10 of Halmos (1960).
48.11. CODA OSCURA 1423

(i) Let P be countably additive. Let fAi g1i=1 be any countable collection of pairwise
disjoint Borel subsets of R. Their preimages f 1 (Ai ) are easily seen to be pairwise disjoint.
By Lemma 2055,
1
! 1
!! 1
! 1 1
[ [ [ X X
1 1
Pf Ai = P f Ai =P f (Ai ) = P f 1 (Ai ) = Pf (Ai )
i=1 i=1 i=1 i=1 i=1

We conclude that also Pf is countably additive.


(ii) Let P be simple. By de nition, there is a nite event E with P (E) = 1. By
Lemma 593-(iii), E f 1 (f (E)). By the monotonicity of P , we then have
1
1 Pf (f (E)) = P f (f (E)) P (E) = 1

As the set f (E) R is nite (so Borel), this proves that Pf is simple.

With this result, we can introduce our protagonist.

De nition 2056 The probability Pf : B ! [0; 1] induced by the random variable f : !R


is called the (probability) law of f .

The law Pf accounts for the outcomes' probabilities under f . For instance, when f
describes a nancial asset the preimage
1
f ([0; 1)) = f! 2 : f (!) 2 [0; 1)g = f! 2 : f (!) 0g = (f 0)

of the set A = [0; 1) is the collection of the states where this asset pays a positive amount
of money. Thus,
Pf ([0; 1)) = P (f 0)
represents the probability to gain with this asset. Similarly,

Pf (( 1; 0]) = P (f 0)

represents the probability to loss with this asset.


The law Pf can written as
1
Pf = P f
This compact writing reveals the law composite nature. Fortunately, Pf inherits important
properties of the underlying probability P , as proved in points (i) and (ii) of the last result.
Through it, we can express the distribution function of f as

(x) = Pf (( 1; x])

It is easy to check that the properties of seen in Propositions 2015-2017 can be proved
using those of Pf . To illustrate, let us prove in this alternative way the right continuity of
when P is countably additive. By the last result, \ Pf as well is countably additive. Take
x 2 R and a scalar sequence fxn g with xn # x. As ( 1; xn ] = ( 1; x], by the continuity
n
of Pf (see Proposition 1988) we have

lim (xn ) = lim Pf (( 1; xn ]) = Pf (( 1; x]) = (x)


n n
1424 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

This proves the right continuity of .

In sum, is a partial description of Pf . Yet, it is su cient for some purposes like,


for instance, the computation of expected values. It is, however, Pf that provides a full
probabilistic description of the random variables' outcomes.

In the simple case, using laws we can express expected values as weighted averages of
outcomes.

Proposition 2057 Let P be simple. For each random variable f : ! R, it holds


X
EP (f ) = xPf (x)
x2supp Pf

or, equivalently, EP (f ) = EPf (x).

To extend this outcome perspective beyond the simple case, however, we have to wait
for a more general notion of integral.

Proof By setting x = f (!), so that ! 2 f 1 (x), we get


X X X
EP (f ) = f (!) P (!) = xP f 1 (x) = xPf (x)
!2supp P x2supp Pf x2supp Pf

as desired.

We can use to retrieve the values of Pf on intervals. Clearly, for all x; y 2 R with
y < x it holds
Pf ((y; x]) = (x) (y)
When P is countably additive, by Proposition 2017 we have
Pf (x) = (x) lim (y) (48.53)
y!x

This allows us to retrieve the values of Pf on all intervals, bounded or not. For example,
Pf ((y; x)) = Pf ((y; x]) Pf (x). More interestingly, (48.53) has the following immediate
stark implication.

Proposition 2058 The law of a random variable with continuous distribution function is
di use.

Thus, a continuous corresponds to a di use law Pf . As continuous distributions func-


tions are widely used, this shows the importance of di use laws. It also shows the importance
to de ne Pf only over the Borel sets as, by Ulam's Theorem, there is no countably and di use
probability measure on 2R . In any case, this is not a big deal as most relevant sets of the
real line are Borel.

Proof Let be continuous. By (48.53),


Pf (x) = (x) lim (y) = (x) (x) = 0
y!x

as desired.
48.12. ULTRACODA: EXPECTED UTILITY 1425

48.12 Ultracoda: expected utility


48.12.1 Expected utility criterion
In a deterministic context, the decision maker has a preference over certain alternatives, like
bundles of goods or amounts of money. Uncertainty, however, makes alternatives random.
To x ideas, assume that the consequences of these random alternatives are amounts of
money, be they gains or losses. In this case, the decision maker has a preference over a set A
consisting of random variables f : ! R that deliver a monetary amount f (!) when state
! obtains, a loss when f (!) 0 and a gain otherwise. For instance, f can be a nancial
asset that pays di erent amounts of money as di erent states obtain. Certainty becomes
the special case when the state space is a singleton f!g consisting of a single state that,
obviously, obtains for sure.
Let us denote by R the collection of all random variables f : ! R. Let % be a
preference relation over R , where
f %g
indicates that the decision maker prefers act f over act g.26 With no uncertainty, random
variables deliver sure amounts of money and, therefore, can be identi ed with them. For-
mally, with a singleton state space f!g we can identify Rf!g with the real line R. In this
case, the preference % is represented by a utility function u : R ! R. A natural question
comes up: under uncertainty, so when is not a singleton, how do we represent %?
Fortunately, when the probability P is simple there is an easy answer to this fundamental
question: given a utility function u : R ! R, random variables f; g 2 R are ranked according
to the so-called expected utility criterion

f % g () EP (u(f )) EP (u(g))

which uses the expected utility


X
EP (u(f )) = u(f (!))P (!) (48.54)
!2supp P

Thus, the expected utility criterion ranks a random variable f via the expected value of its
utility pro le
u f :R!R
that to each state ! associates the utility u (f (!)) of the consequence f (!). Random
variables are ranked higher when, on average, provide a greater utility. In particular,

f g () EP (u(f )) = EP (u(g)) (48.55)

that is, random variables are indi erent when, on average, yield the same utility.
The expected utility criterion was introduced in 1738 by Daniel Bernoulli in a beautiful
work that was well ahead of its time. In this work Bernoulli used the logarithmic utility
function u(c) = ln c, for which
X
EP (u(f )) = log (f (!)) P (!)
!2supp P
26
Formally, % is a binary relation (see Chapter A).
1426 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

Other basic examples of utility functions are linear u(c) = c, power u(c) = c with 0 < < 1,
and negative exponential u(c) = e c . Accordingly, EP (u(f )) is
X X X
f (!)
f (!)P (!) ; f (!) P (!) ; e P (!)
!2supp P !2supp P !2supp P

As their arguments are monetary amounts, it is natural to assume that u is strictly increasing,
that is,
x > y =) u(x) > u(y) 8x; y 2 R
In this way decision makers (strictly) prefer greater amounts of money, a property satis ed
by all the aforementioned speci cations of u.
Di erent speci cations of u result in di erent rankings of the random variables. For
instance, in the single coin toss example with = fH; T g, consider again the bet f : ! R
de ned by
f (H) = 50 ; f (T ) = 50
as well as the certain random variable g : ! R that pays 0 for sure (i.e., in each state).
For a decision maker with linear u we have

EP (u(g)) = 0 = EP (u(f ))

and so, by (48.55), this decision maker is indi erent between f and g. In contrast, for a
decision maker with negative exponential u we have

1 50
EP (u(f )) = e + e50 < 0 = EP (u(g))
2
and so this decision maker (strictly) prefers g to f . As the reader will learn in more advanced
courses, it is the concavity of the negative exponential utility that underlies this, intuitively,
more prudential behavior.

48.12.2 Lotteries
The law Pf : B ! [0; 1] of a random variable f : ! R is given by
1
pf (B) = P f (B) 8C 2 B

It assigns a probability to each (Borel) set of consequences B. For instance, Pf ([0; 1)) is
the probability of a gain.
The probability measure Pf is called lottery induced by f . This terminology reminds the
gambling origins of the calculus of probability in the sixteenth and seventeenth centuries.
Lotteries permit to talk of probabilities of consequences, not just of states as we did so far.

Example 2059 In the plant example, Pf : B ! [0; 1] is given by

Pf (0) = P (n) ; Pf (400) = P (s) ; Pf (2500) = P (l)

It describes the probability of the company losses. N


48.12. ULTRACODA: EXPECTED UTILITY 1427

The lottery Pf thus informs about the probabilities of the various consequences that the
random variable f . Yet, di erent random variables, say f and g, may happen to induce the
same lottery, that is, Pf = Pg .

Example 2060 In the single coin toss example, with = fH; T g and P (H) = P (T ) = 1=2,
consider the two bets
( (
10 if ! = T 0 if ! = T
f (!) = ; g (!) =
0 if ! = C 10 if ! = C

In this case,
1
Pf (10) = Pf (0) = = Pg (10) = Pg (0)
2
and so Pf = Pg . N

Although lottery Pf contains essential information about the random variable f , this
example shows that some important information is lost in translating a random variable
into a lottery: it is no longer known which state/event generated the consequence, only its
probability.

48.12.3 Expected utility of lotteries


What we saw so far in terms of lotteries holds for any probability P , simple or not. However,
time has come to consider expected values and so we consider a simple probability P . In
this case, Proposition 2054 ensures that also lottery Pf is simple. For each random variable
f : ! R we can then de ne the expected value
X
EPf (u) = u(c)Pf (c)
c2supp Pf

This is the expected utility of lottery Pf .

Theorem 2061 Let P : 2 ! [0; 1] be a simple probability. For each random variable
f : ! R,
EP (u f ) = EPf (u)

That is, X X
u(f (!))P (!) = u(c)Pf (c)
!2supp P c2supp Pf

The expected utility EP (u f ) of f is thus equal to the expected utility EPf (u) of the
induced lottery. As long as we consider the expected utility criterion, it is equivalent to rank
random variable or the lotteries that they induce. What was lost in translation is, therefore,
immaterial for this criterion. This parsimony is one of the reasons why this criterion is widely
used.

Proof Let f : ! R. Since P is simple, we have


X
EP (u f ) = u(f (!))P (!)
!2supp P
1428 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY

De ne the distinct scalars fx1 ; :::; xn g as in (??). Set D = fx1 ; :::; xn g. By construction,
D = f (supp P ). De ne
1
Ei = f (xi ) \ supp P 8i = 1; :::; n

Since the scalars xi are distinct, the sets fE1 ; ::; En g are pairwise disjoint. Moreover,
n
[
Ei = supp P
i=1

and
1 1
Pf (xi ) = P f (xi ) = P f (xi ) \ supp P = P (Ei ) 8i = 1; :::; n
Since P is additive,
n n n
!
X X [
Pf (D) = Pf (xi ) = P (Ei ) = P Ei = P (supp P ) = 1
i=1 i=1 i=1

Moreover,

X n X
X n
X
EP (u f ) = u(f (!))P (!) = u(f (!))P (!) = u (xi ) P (Ei )
!2supp P i=1 !2Ei i=1
Xm X
= u (xi ) Pf (xi ) = u (x) Pf (x)
i=1 x2D

By Proposition 2001-(iii), since D is nite and Pf (D) = 1 we conclude that


X X X
EP (u f ) = u(f (!))P (!) = u (c) Pf (c) = u(c)Pf (c) = EPf (u)
!2supp P c2D !2supp Pf

proving the statement.


Part IX

Appendices

1429
Appendix A

Binary relations: modelling


connections (sdoganato)

A.1 De nition
Throughout the book we already encountered a few times binary relations to model connec-
tions between elements, but we never formally introduced them. In a nutshell, the notion
of binary relation formalizes the idea that an element x is in a relation with an element y.
It is an abstract notion that is best understood after having seen a few concrete examples
that make it possible to appreciate its unifying power. We discuss it in an Appendix, so that
readers can decide if and when to go through it.
A rst example of a binary relation is the relation \being greater or equal than" among
natural numbers: given any two natural numbers x and y, we can always say if x is greater
or equal than y. For instance, 6 is greater or equal than 4. In this example, x and y are
natural numbers and \being in relation with" is equivalent to say \being greater or equal
than".
Imagination is the only limit to the number of binary relations one can think of. Set
theory is the language that we can use to formalize the idea that two objects are related
to each other, that is, connected. For example, given the set of citizens C of a country, we
could say that citizen x is in relation with citizen y if x is the mother of y. In this case,
\being in relation with" amounts to \being the mother of ".
Economics is a source of examples of binary relations. For instance, consider an agent
and a set of alternatives X. The preference relation % is a binary relation. In this case, \x
is in relation with y" is equivalent to say that the agent regards \x is at least as good as y".
What do all these examples have in common? First, in all of them we considered two
elements x and y of a set X. Second, these elements were in a speci c order: indeed, one
thing is to say that x is in relation with y, another is to say that y is in relation with x. So,
the pair formed by x and y is an ordered pair (x; y) that belongs to the Cartesian product
X X. Finally, in all three examples it might well happen that a generic pair of elements
x and y is actually unrelated. For instance, if in our second example x and y are siblings,
obviously neither is a mother of the other. In other words, a given notion of \being in relation
with" might not include all pairs of elements of X.
We are now ready to give a (set-theoretic) de nition of binary relations.

1431
1432APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)

De nition 2062 Given a non-empty set X, a binary relation is a subset R of X X.

In terms of notation, we write xRy in place of (x; y) 2 R. Indeed, the notation xRy,
which reads \x is in the relation R with y", is more evocative of what the concept of binary
relation is trying to capture. So, in what follows we will adopt it.
To get acquainted with this new mathematical notion, let us now formalize our rst three
examples.

Example 2063 (i) Let X be the set of natural numbers N. The binary relation can be
viewed as the subset of N N given by

R = f(x; y) 2 N N : x is greater or equal than yg

Indeed, it contains all pairs in which the rst element x is greater or equal than the second
element y.
(ii) Let X be the set of all citizens C of a country. The binary relation \being the mother
of" can be viewed as the subset of C C given by

R = f(x; y) 2 C C : x is the mother of yg

Indeed, it contains all pairs in which the rst element is the mother of the second element.
(iii) Let X be the set of all consumption bundles Rn+ . The binary relation % can be seen
as the subset of Rn+ Rn+ given by

R = (x; y) 2 Rn+ Rn+ : x % y

Indeed, it contains all pairs of bundles in which the rst bundle is at least as good as the
second one. N

A binary relation associates to each element x of X some element y of the same set
(possibly x itself, i.e., x = y). We denote by R (x) = fy 2 X : xRyg the image of x through
R, i.e., the collections of all y that stand in the relation R with a given x.

Example 2064 (i) For the binary relation on N, the image R (x) = fy 2 N : x yg of
x 2 N consists of all natural numbers that are smaller than or equal to x. (ii) For the binary
relation \being the mother of" on C, the image R (x) consists of all children.1 (iii) For the
binary relation % on Rn+ , the image R (x) = y 2 Rn+ : x % y of x 2 Rn+ consists of all
bundles that are at most as good as x. N

Any binary relation R induces a self-correspondence : X X de ned by (x) = R (x).


Vice versa, any self-correspondence : X X induces a binary relation R on X de ned
by xRy if y 2 (x). So, binary relations and self-correspondences are two sides of the same
coin. Depending on the applications, one side may turn out to be more interesting than the
other.
1
In the Council of Toledo of 675, a relation of this kind had a remarkable theological application for the
interpretation of the holy Trinity as a single God: \... He is Father, not to Himself, but to the Son; and as
He is Son not to Himself but to the Father, similarly also the Holy Spirit refers in a relative sense not to
Himself, but to the Father and to the Son..." (trans. Deferrari from the 13th ed. of Denzinger's Enchiridion
Symbolorum).
A.2. PROPERTIES 1433

Example 2065 A self-map f : X ! X can be viewed as a binary relation

Rf = f(x; f (x)) 2 X X : x 2 Xg

on X consisting of all pairs (x; f (x)). The image Rf (x) = ff (x)g is a singleton consisting
of the image f (x). Indeed, self-maps can be regarded as the binary relations on X that have
singleton images, i.e., that associate to each element of X a unique element of X. N

This last example is the occasion to remark that, though to x ideas we focus on binary
relations R X X, the analysis easily extends to general binary relations R X Y with
X and Y possibly distinct. For instance, a function f : X ! Y can be viewed as a binary
relation
Rf = f(x; f (x)) 2 X Y : x 2 Xg
It is easy to see that binary relations R X Y correspond to the correspondences : X Y
de ned by (x) = R (x). We close by referring readers to Section D.7 for an important logic
perspective on binary relations.

A.2 Properties
A binary relation R can satisfy several properties. In particular, a binary relation R on a
set X is:

(i) re exive if xRx for all x 2 X;

(ii) transitive if xRy and yRz implies xRz for all x; y; z 2 X;

(iii) complete if, for every x; y 2 X, either xRy or yRx or both;

(iv) symmetric if xRy implies yRx for all x; y 2 X;

(v) asymmetric if xRy implies not yRx for all x; y 2 X;

(vi) antisymmetric if xRy and yRx implies x = y for all x; y 2 X.

Often we will consider binary relations that satisfy more than one of these properties.
However, some of them are incompatible, for example asymmetry and symmetry, while others
are related, for example completeness implies re exivity.2

Example 2066 (i) Consider the binary relation on N. Clearly, is complete (so, it is
re exive). Indeed, given any two natural numbers x and y, either is greater or equal than
the other. Actually, if both x y and y x, then x = y. Thus, is antisymmetric. Finally,
is transitive but it is neither symmetric nor asymmetric.
(ii) Let R be the binary relation \being the mother of" on C. Individuals cannot be
their own mothers, so R is not re exive (thus, it is not complete either). Similarly, R is not
symmetric since if x is the mother of y, then y cannot be the mother of x. We leave to the
reader to verify that R is not transitive. N
2
Indeed, a complete binary relation R on X is, in particular, able to compare an element of X with itself.
1434APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)

Example 2067 Let R be the binary relation \being married to" on C. This relation consists
of all pairs of citizens (x; y) 2 C C such that x is the spouse of y. That is, xRy means that
x is married to y. The image R (x) is a singleton consisting of the spouse. The \married to"
relation is neither re exive (individuals cannot be married to themselves) nor antisymmetric
(married couples do not become single individuals). It is symmetric since individuals are
each other spouses, while transitivity does not hold because xRy and yRz implies x = z.
Finally, this relation is not complete since it is not re exive. N

The relation on N is the prototype for the following important class of binary relations.

De nition 2068 A binary relation R on a set X is said to be a partial order if it satis es


re exivity, antisymmetry and transitivity. If re exivity is replaced by completeness, R is a
complete order.

For example, the binary relation on Rn satis es re exivity, transitivity and antisym-
metry, so it is a partial order (cf. Section 2.3). If n = 1, this binary relation is complete, thus
is a complete order. If n > 1, this is no longer the case, as we emphasized several times
in the text { for instance, the vectors (1; 2) and (2; 1) cannot be ordered by the relation .

Example 2069 (i) Consider the space R1 of all the sequences of real numbers (Section
8.2). The componentwise order on R1 de ned by x y if xn yn for each n 1 is
easily seen to be a partial order. (ii) Given any set A, consider the space RA of real-valued
functions f : A ! R (Section 6.3.2). The pointwise order on RA de ned by f g if
f (x) g (x) for all x 2 A is also easily seen to be a partial order (the componentwise order
on R1 is the special case A = N). (iii) Consider the power set 2X = fA : A Xg of a set
X, i.e., the collection of all its subsets (cf. Section 7.3). The inclusion relation on 2X
is a partial order: e.g., if X = fa; b; cg the sets fa; bg and fb; cg cannot be ordered by the
inclusion relation. N

The preference relation % is typically assumed to be re exive and transitive (Section 6.8).
It is also often assumed to be complete. In contrast, antisymmetry is a too strong property
for a preference relation in that it rules out the possibility that two di erent alternatives
be indi erent. For example, if X is a set of sports cars, an agent could rightfully declare a
Ferrari as good as a Lamborghini and obviously these two objects are quite di erent cars.
This important example motivates the next de nition.

De nition 2070 A binary relation R on a set X is said to be a preorder if it satis es


re exivity and transitivity. If re exivity is replaced by completeness, R is a complete preorder
(or a weak order).

So, the preference relations that one usually encounters in economics are an important
example of complete preorders. Interestingly, we also encountered a preorder when we dis-
cussed the notion of \having cardinality less or equal than" (Section 7.3).

Example 2071 Let 2R be the collection of all subsets of the real line. De ne the binary
relation on 2R by A B if jAj jBj, i.e., if A has cardinality higher or equal than B
(Section 7.3). By Proposition 283, is re exive and transitive, so it is a preorder. It is
A.3. EQUIVALENCE RELATIONS 1435

not, however, a partial order because antisymmetry is clearly violated: for example, the sets
A = f1; g and B = f2; 5g have the same cardinality { i.e., both A B and B A { yet
they are di erent, i.e., A 6= B. N

Clearly, a partial order is a preorder, while this example shows that the converse is false.

A.3 Equivalence relations


In analogy with how a preference relation induces an indi erence relation (Section 6.8), any
binary relation R on X induces a binary relation I on X by saying that xIy if both xRy
and yRx. This induced relation is especially well behaved when R is a preorder, as next we
show.

Proposition 2072 Let R be a preorder on a set X. The induced binary relation I is re ex-
ive, symmetric and transitive.

This result is the general abstract version of what Lemma 261 established for a preference
relation.

Proof Consider x 2 X and y = x. Since R is re exive and x = y, we have both xRy and
yRx. So, by de nition xIx, proving re exivity of I. Next assume that xIy. By de nition,
we have that xRy and yRx, which means that yRx and xRy, yielding that yIx and proving
symmetry. Finally, assume that xIy and yIz. It follows that xRy and yRx as well as yRz
and zRy. By xRy and yRz and the transitivity of R, we conclude that xRz. By yRx and
zRy and the transitivity of R, we conclude that zRx. So, we have both xRz and zRx,
yielding xIz and proving the transitivity of I.

This result motivates the following de nition.

De nition 2073 A binary relation R on a set X is an equivalence relation if it satis es


re exivity, symmetry and transitivity.

The indi erence relation is, of course, an important economic example of an equiva-
lence relation. More generally, the induced relation I is an equivalence relation by the last
proposition. Equivalence relations play an important role in both mathematics and appli-
cations because they formalize a notion of similarity. Re exivity captures the idea that an
object must be similar to itself, while symmetry amounts to say that if x is similar to y, then
y is similar to x. As for transitivity, an analogous argument holds.

Let R be an equivalence relation. Given any element x 2 X we write

[x] = fy 2 X : xRyg

The collection [x], which is nothing but the image R (x) of x, is called the equivalence class
of x.

Lemma 2074 If y 2 [x], then [y] = [x].


1436APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)

Thus, the choice of the representative x in de ning the equivalence class is immaterial:
any element of the equivalence class can play that role.

Proof Let y 2 [x]. Then [y] [x]. Indeed, if y 0 2 [y], then yRy 0 and so, being xRy, by
0 0
transitivity xRy , i.e., y 2 [x]. On the other hand, y 2 [x] implies x 2 [y] by symmetry. So,
[x] [y]. We conclude that [y] = [x].

For a preference relation, the equivalence classes are the indi erence classes, i.e., [x] is
the collection of all alternatives indi erent to x. Let us see another classic example.
Example 2075 The preorder on 2R of Example 2071 induces the equivalence relation
on 2R de ned by A B if and only if jAj = jBj, i.e., if A has the same cardinality than B.
If we consider the set Q, the equivalence class [Q] is the class of all sets that are countable,
for example N and Z. Intuitively, this binary relation declares two sets similar if they share
the same number of elements. N
At this point the reader might think that all equivalence relations of interest are derived
from an underlying preorder, so have the form I and are a derived notion. This is not the case:
the following classic equivalence relation has an independent interest and is not obtained via
a meaningful preorder.
Example 2076 Let n 2 Z be such that n 2. Consider the binary relation R on the set
of integers Z such that xRy if and only if n divides x y, that is, there exists k 2 Z such
that x y = kn. Clearly, for any x 2 Z, we have xRx since x x = kn with k = 0. At the
same time, if x and y in Z are such that xRy, then x y = kn for some k 2 Z, yielding that
y x = ( k) n. It follows that yRx, proving that R is symmetric. Finally, if x, y and z in
Z are such that xRy and yRz, then x y = kn and y z = k 0 n for some k; k 0 2 Z, yielding
that x z = (k + k 0 ) n. It follows that xRz, proving that R is transitive. We conclude that
R is an equivalence relation. It is often denoted by x = y (mod n). N
The next result shows that equivalence relations are closely connected to partitions of X,
so to subdivisions of the set of interest X in mutually exclusive classes. It generalizes the
basic property that indi erence curves are disjoint (Lemma 262).
Proposition 2077 If R is an equivalence relation on a set X, the collection of its equiva-
lence classes f[x] : x 2 Xg is a partition of X. Vice versa, any partition = fAi gi2I of X
is the collection of equivalence classes of the equivalence relation R de ned by xRy if there
exists A 2 such that x; y 2 A.
Proof Let us prove that the equivalence classes f[x] : x 2 Xg are pairwise disjoint. Given
any x; y 2 X, suppose that [x] \ [y] 6= ; for some x; y 2 X. We want to show that [x] = [y].
Since we can interchange the roles of x and y, it is enough to prove that [y] [x]. So, let
y 0 2 [y], that is, yRy 0 . Since [x] \ [y] 6= ;, there exists z 2 [x] \ [y], that is, xRz and yRz.
By symmetry, zRy and so, by transitivity, zRy 0 . Again by transitivity, along with xRz this
implies xRy 0 , that is, y 0 2 [x]. This proves the inclusion [y] [x]. We leave the rest of the
statement to the reader.

The collection f[x] : x 2 Xg of all equivalence classes determined by an equivalence re-


lation R is called quotient space and is denoted by X=R. In other words, the points of the
quotient space are the equivalence classes.
A.3. EQUIVALENCE RELATIONS 1437

Example 2078 (i) The relation \having the same age" is an equivalence relation on C,
whose equivalence classes consist of all citizens that have the same age, that is, who belong to
same age cohort. The quotient space has, as points, the age cohorts. (ii) For the indi erence
relation on Rn+ , the quotient space has, as points, the indi erence curves. N
1438APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)
Appendix B

Permutations (sdoganato)

B.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on a few combinatorial notions that are important to understand some of the
topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many ways
can we possibly dress? The answer is simple: in 3 5 = 15 ways. Indeed, let us call the pairs
of pants a, b, c and the T-shirts 1, 2, 3, 4, 5: since the choice of a certain T-shirt does not
impose any (aesthetic) restriction on the choice of the pants, the possible pairings are

a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5

We can therefore conclude that if we have to make two independent choices, one among
n di erent alternative and the other among m di erent alternatives, the total number of
possible choices is n m. In particular, suppose that A and B are any two sets with n and
m elements, respectively. Their Cartesian product A B, which is set of ordered pairs (a; b)
with a 2 A and b 2 B, has n m elements. That is:

Proposition 2079 jA Bj = jAj jBj.

What has been said can be easily extended to the case of more than two choices: if we
have to make multiple choices, none of which imposes restrictions on the others, the total
number of possible choices is the product of the numbers of alternatives for each choice.
Formally:

Proposition 2080 jA1 A2 An j = jA1 j jA2 j jAn j.

Example 2081 (i) How many Italian licence plates are possible? They have the form AA
000 AA with two letters, three digits and again two letters. There are 22 letters that can be
used and, obviously, 10 digits. The number of (di erent) plates is, therefore, 22 22 10 10
10 22 22 = 234; 256; 000. (ii) In a multiple choice test, in each question students have to

1439
1440 APPENDIX B. PERMUTATIONS (SDOGANATO)

select one of the three possible answers. If there are 13 questions, then the overall number of
possible selections is 313 = 1; 594; 323. (iii) A three-course meal in an American restaurant
consists of an appetizer, a main course and a dessert. If a menu lists 3 appetizers, 4 main
courses and 2 desserts, we can then have 3 4 2 = 24 di erent three-course meals. N

B.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:

abc , acb , bac , bca , cab , cba (B.1)

We can formalize this notion through bijective functions.

De nition 2082 Let X be any collection. A permutation on X is a bijective function


f : X ! X.

Permutations are thus nothing but the bijective functions f : X ! X. Though combi-
natorics typically considers nite sets X, the de nition is fully general.
For instance, if X = fa; b; cg the permutations f : fa; b; cg ! fa; b; cg that correspond to
the arrangements (B.1) are:

(i) abc corresponds to the permutation f (x) = x for all x 2 X;

(ii) acb corresponds to permutation f (a) = a, f (b) = c and f (c) = b;

(iii) bac corresponds to permutation f (a) = b, f (b) = a and f (c) = c;

(iv) bca corresponds to permutation f (a) = b, f (b) = c and f (c) = a;

(v) cab corresponds to permutation f (a) = c, f (b) = a and f (c) = b;

(vi) cba corresponds to permutation f (a) = c, f (b) = b and f (c) = a.

We have a rst important result.

Proposition 2083 The number of permutations on a set with n elements is n! = 1 2


n.

The number n! is called factorial of n. We set conventionally 0! = 1.


To understand, heuristically, the result consider any arrangement of the n elements.
In the rst place we can put any element, so it can occupied in n di erent ways. In the
second place we can place any of the remaining elements, so it can be occupied in n 1
di erent ways. By proceeding in this way, we see that the third position can be occupied
in n 2 di erent ways, and so on so forth, till 1 since at the end of the process we have
no choice because only one element is left. The number of the permutations is, therefore,
n (n 1) (n 2) 2 1 = n!. We let readers make this argument a rigorous proof.

Example 2084 (i) A deck of 52 cards can be reshu ed in 52! di erent ways. (ii) Six
passengers can occupy in 6! = 720 di erent ways a six-passenger car. N
B.3. ANAGRAMS 1441

The recursive formula


n! = n (n 1)!
permits to de ne the sequence of factorials xn = n! also by recurrence as xn = nxn 1 , with
rst term x1 = 1. The rate of growth of this sequence is impressive, as the following table
shows:
n 0 1 2 3 4 5 6 7 8 9 10
n! 1 1 2 6 24 120 720 5; 040 40; 320 362; 880 3; 628; 800

Indeed, Lemma 361 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de nitely deserve their exclamation mark.

B.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Specif-
ically in this section we consider Pn objects of h n di erent types, each type i with multi-
h
plicity ki , with i = 1; :::; h and i=1 ki = n.1 For instance, consider the 6 objects

a; a; b; b; b; c

There are 3 types a, b and c with multiplicity 2, 3 and 1, respectively. Indeed, 2 + 3 + 1 = 6.


How many distinguishable arrangements are there? If in this example we distinguished
all the objects by using a di erent index for the identical objects, a1 ; a2 ; b1 ; b2 ; b3 ; c, there are
6! = 720 permutations. If now we eliminate the distinctive index to the three letters b, they
can be permuted in 3! di erent ways in the terns of places occupied by them. Such 3! di erent
permutations (when we write b1 ; b2 ; b3 ) are no longer distinguishable (by writing b; b; b).
Therefore, the di erent permutations of a1 ; a2 ; b; b; b; c are 6!=3!. A similar argument shows
that, by removing the distinctive index to the two letters a, the distinguishable permutations
reduce to 6!= (3!2!) = 60.
In general, one can prove the following result.

Proposition 2085 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(B.2)
k1 !k2 ! kh !

The integers (B.2) are called multinomial coe cients, sometimes denoted by

n
k1 k2 :::kh

Example 2086 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. They are MAMMA, MAMAM, MMAMA, MMAAM, MAAMM, AM-
MMA, AMMAM, AAMMM, MMMAA, AMAMM. N
1
Note that, because of repetitions, these n objects do not form a set X. The notion of \multiset" is
sometimes used for collections in which repetitions are permitted (cf. Section 3.7).
1442 APPENDIX B. PERMUTATIONS (SDOGANATO)

In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (B.2), the number of distinct arrangements is

n!
(B.3)
k! (n k)!

This number is usually denoted by


n
k
and is called binomial coe cient. In particular,

n n! n (n 1) (n k + 1)
= =
k k! (n k)! k!

with
n n!
= =1
0 0!n!
The following identity can be easily proved, for 0 k n,

n n
= (B.4)
k n k

It captures a natural symmetry: the number of distinct arrangements remains the same,
regardless of which of the two types we focus on.

Example 2087 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:

20 20
= = 15; 504
5 15

(ii) We repeat an experiment 100 times: each time we can record either a \success" or a
\failure", so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 \successes" and 8 \failures". The number of the di erent strings that may result is:

100 100
= = 186; 087; 894; 300
92 8

We close with a nice and easily proved formula: for 1 k n, we have

n n n 1
= (B.5)
k k k 1

This formula relates binomial coe cients with the corresponding ratios and establishes a
recurrence for binomial coe cients.
B.4. A SET-THEORETIC ANGLE 1443

B.4 A set-theoretic angle


Let A be a set with n elements. Suppose that each element can be of two types, \in"
and \out", with k objects of type \in". A moment's re ection shows that a distinct ar-
rangement corresponds to a subset of A with k elements. For instance, suppose that
A = fa1 ; a2 ; a3 ; a4 ; a5 g. The arrangements ain in out out out and ain aout ain aout aout cor-
1 a2 a3 a4 a5 1 2 3 4 5
respond to the subsets fa1 ; a2 g and fa1 ; a3 g, respectively.
We thus have the following set-theoretic characterization of binomial coe cients.

Proposition 2088 Let A be a set with n elements. The number of subsets of A that have
n
k elements is .
k

n
So, the binomial coe cient gives the number of ways in which we can select k
k
di erent elements from a set that has n elements.

Example 2089 For the set A = fa1 ; a2 ; a3 ; a4 ; a5 g, the number of subsets of A that have 2
elements is
5 5!
= = 10
2 2!3!
Indeed, these sets are fa1 ; a2 g, fa1 ; a3 g, fa1 ; a4 g fa1 ; a5 g, fa2 ; a3 g, fa2 ; a4 g, fa2 ; a5 g, fa3 ; a4 g,
fa3 ; a5 g, fa4 ; a5 g. N

Example 2090 Consider two urns I and II, and 10 balls numbered from 1 to 10. If urn I
10
can contain 3 balls, there are di erent ways in which the balls can ll the two urns.N
3

n
In view of the last example, we can say that the binomial coe cient is the number
k
of ways in which n numbered (so, distinguishable) balls can ll 2 urns that can contain k
and n k balls, respectively.
In a similar vein,Prather than 2 consider h di erent urns that can contain k1 , k2 , ..., kh
balls. If we set n = hi=1 ki , the multinomial coe cient

n
k1 k2 :::kh

is the number of ways in which n numbered balls can ll these h urns.

Example 2091 Let B = fb1 ; b2 ; b3 ; b4 ; b5 g be a set of 5 numbered balls. Assume there are
3 di erent urns I, II and III that can contain 1 ball, 2 balls and again 2 balls, respectively.
The number of ways in which these 5 numbered balls can ll urns I, II and III is

5!
= 30 (B.6)
1!2!2!
1444 APPENDIX B. PERMUTATIONS (SDOGANATO)

If we put ball b1 in urn I, we have the following 6 ways to ll the urns:

fa1 g fa2 ; a3 g fa4 ; a5 g


|{z} | {z } | {z }
I II II
fa1 g fa2 ; a4 g fa3 ; a5 g
|{z} | {z } | {z }
I II III
fa1 g fa2 ; a5 g fa3 ; a4 g
|{z} | {z } | {z }
I II III
fa1 g fa4 ; a5 g fa2 ; a3 g
|{z} | {z } | {z }
I II III
fa1 g fa3 ; a5 g fa2 ; a4 g
|{z} | {z } | {z }
I II III
fa1 g fa3 ; a4 g fa2 ; a5 g
|{z} | {z } | {z }
I II III

For each ball of the 5 balls that we select to put in urn I, we have 6 similar ways to ll the
urns. So, we have 30 ways to ll them, in accordance with the multinomial coe cient (B.6).
N

B.5 Newton's binomial formula


From high school we know that

(a + b)1 = a + b ; (a + b)2 = a2 + 2ab + b2 ; (a + b)3 = a3 + 3a2 b + 3ab2 + b3

More generally, one has the following result.

Theorem 2092 (Tartaglia-Newton) It holds that

n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (B.7)
1 2 n 1
n
X n n k k
= a b
k
k=0

Proof We proceed by induction. The initial step, that is the veracity of the statement for
n = 1, is trivially veri ed. Indeed:
1
1 1 0 1 0 1 X 1 1
(a + b)1 = a + b = a1 b0 + a0 b1 = a b + a b = a k k
b
0 1 k
k=0

We next prove the induction step. We assume the statement holds for n, that is,
n
Xnn n k k
(a + b) = a b
k
k=0
B.5. NEWTON'S BINOMIAL FORMULA 1445

and we show it holds for n + 1 as well. In doing so, we will use the combinatorial identity
(10.6), that is,
n+1 n n
= + 8i = 1; :::; n
i i 1 i
Note that
n
X n n
(a + b)n+1 = (a + b) (a + b)n = (a + b) a k k
b
k
k=0
n
X n
X
n n+1 k k n n k k+1
= a b + a b
k k
k=0 k=0
Xn n+1
X
n n+1 i i n
= a b + an+1 i bi
i i 1
i=0 i=1
n
X n
n+1 n n+1 i i X n
= a + a b + an+1 i bi + bn+1
i i 1
i=1 i=1
n
X n n
= an+1 + + an+1 i bi + bn+1
i 1 i
i=1
Xn X n+1 n+1
n + 1 n+1 i i
= an+1 + a b + bn+1 = an+1 i bi
i i
i=1 i=0

So, the statement holds for n + 1, thus proving the induction step and the main statement.

Formula (B.7) is called the Newton binomial formula. It motivates the name of binomial
n
coe cients for the integers . In particular,
k
X n
n n k
(1 + x) = x (B.8)
k
k=0
If we take x = 1 we obtain the remarkable relation
n n n n
+ + + + = 2n
0 1 2 n
which can be used to prove that if a nite set has cardinality n , then its power set has
cardinality 2n (cf. Proposition 280). Indeed, by Proposition 2088 there is only one, 1 = n0 ,
subset with 0 elements (the empty set), n = n1 subsets with only one element, n2 subsets
with two elements, ..., and nally only one, 1 = nn , subset { the set itself { with all the n
elements.

More generally, one can prove the multinomial formula:


X n
(a1 + a2 + + ah )n = ak1 ak2 akhh
k1 k2 :::kh 1 2
X n!
= ak1 ak2 akhh
k1 !k2 ! kh ! 1 2
1446 APPENDIX B. PERMUTATIONS (SDOGANATO)

where the sum is over all the choices of natural numbers

k1 ; k2 ; :::; kh
P
such that hi=1 ki = n. This formula motivates the name of multinomial coe cients for the
integers (B.2). For instance, the classic formula

(a1 + a2 + a3 )3 = a31 + a32 + a33 + 3a1 a22 + 3a1 a23 + 3a21 a2 + 3a21 a3 + 3a2 a23 + 3a22 a3 + 6a1 a2 a3

can be derived via the multinomial formula as follows:


X 3
(a1 + a2 + a3 )3 = ak1 ak2 ak3
k1 k2 k3 1 2 3
fk1 ;k2 ;k3 2N:k1 +k2 +k3 =3g
3 3 3 3
= a1 a2 a3 + a2 a23 + a22 a3 + a1 a23
111 012 021 102
3 3 3
+ a1 a22 + a2 a3 + a2 a2
120 201 1 210 1
3 3 3
+ a33 + a32 + a31
003 030 300
= 6a1 a2 a3 + 3a2 a23 + 3a22 a3 + 3a1 a23 + 3a1 a22 + 3a21 a3 + 3a21 a2
+a33 + a32 + a31
Appendix C

Notions of trigonometry
(sdoganato)

C.1 Generalities

We call trigonometric circle the unit circle with center at the origin and radius 1, oriented
counterclockwise, and on which one moves starting from the point of coordinates (1; 0).1

y
1.5

0.5

(1,0)
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Clearly, each point on the circle determines an angle between the positive horizontal axis
and the straight line joining the point with the origin; vice versa, each angle determines a
point on the circle. This correspondence between points and angles can be, equivalently,
viewed as a correspondence between points and arcs of circle. In the following gure the

1
For an introduction to trigonometry we refer readers to Gelfand and Saul (2001).

1447
1448 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)

point P determines the angle , as well as the arc 0

y
1.5
P
P
2
1

α'
0.5

α
0
O P 1 x
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Angles are usually measured in either degrees or radians. A degree is the 360th part of
a round angle (corresponding to a complete round of the circle); a radian is an, apparently
strange, unit of measure that assigns measure 2 to a round angle; it is therefore its 2 -th
part. We will use the radian as unit of measure of angles because it presents some advantages
over the degree. In any case, the next table lists some equivalent values of degrees and
radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di er by one or more complete rounds of the circle are identical: to write or
+ 2k , with k 2 Z, is the same. We will therefore always take 0 <2 .

Fix a point P = (P1 ; P2 ) on the trigonometric circle, as in the previous gure. The sine
of the angle determined by the point P is the ordinate P2 of such point, while the cosine
of is the abscissa P1 .
The sine and the cosine of the angle are denoted, respectively, by sin and cos . The
sine is positive in the quadrants I and II, and negative in the quadrants III and IV. The
cosine is positive in the quadrants I and IV, and negative in the quadrants II and III. For
example,
3
0 p4 2 2 2
2
sin 0 p2 1 0 1 0
2
cos 1 2 0 1 0 1

In view of the previous discussion, for every k 2 Z we have

sin ( + 2k ) = sin and cos ( + 2k ) = cos (C.1)

Note that Pythagoras' Theorem guarantees that, for every 2 R,

sin2 + cos2 =1 (C.2)


C.2. CONCERTO D'ARCHI (STRING CONCERT) 1449

This classic identity is sometimes called the Pythagorean trigonometric identity.


Fixed again a point P on the circle, we call tangent of the angle determined by P ,
written tan , the ratio between its ordinate and its abscissa, i.e.,
sin
tan =
cos
The tangent is positive in the quadrants I and III, and negative in the quadrants II and IV.
For example,
3
0 4 2 2 2
tan 0 1 !1 0 !1 0

Again, for every k 2 Z,


tan ( + k ) = tan (C.3)
Since tan = sin = cos , from the Pythagorean trigonometric identity it follows

tan2
sin2 =
1 + tan2
Finally, the reciprocals of sine, cosine and tangent are called secant, cosecant and cotangent,
respectively.

C.2 Concerto d'archi (string concert)


We list, just for sine and cosine, some simple relations between angles (arcs).

(i) Angles and :


sin ( )= sin ; cos ( ) = cos

(ii) Angles and 2 :

sin = cos ; cos = sin


2 2

(iii) Angles and 2 + :

sin + = cos ; cos + = sin


2 2

(iv) Angles and :

sin ( ) = sin ; cos ( )= cos

(v) Angles and + :

sin ( + ) = sin ; cos ( + ) = cos

Next we list some formulas that we do not prove (in any case, it would be enough to
prove the rst two because the other ones are simple consequences).
1450 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)

Addition and subtraction formulas:

sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin

and

sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (C.4)

Doubling and bisection formulas:

sin 2 = 2 sin cos ; cos 2 = cos2 sin2

and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaphaeresis formulas (addition and subtraction):

sin ( + ) + sin ( ) = 2 sin cos ; sin ( + ) sin ( ) = 2 cos cos

and

cos ( + ) + cos ( ) = 2 cos cos ; cos ( + ) cos ( )= 2 sin sin

We close with a few classic theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:

Theorem 2093 (Law of Sines) Sides are proportional to the sines of their opposite an-
gles, that is,
a b c
= =
sin sin sin
An interesting consequence of the law of sines is that the area of a triangle can be
expressed in trigonometric form via the length of two sides and of the angle opposite to the
third side. Speci cally, if the two sides are b and c, the area is
1
bc sin (C.5)
2
Indeed, draw in the last gure a perpendicular from the top vertex to the side of length c,
and denote its length by h. From, at least, high school we know that the area of the triangle
C.2. CONCERTO D'ARCHI (STRING CONCERT) 1451

is ch=2 (it is the classic \half the base times the height" formula). Consider the right triangle
that has the side of length b as hypotenuse and the perpendicular of length h as a cathetus.
By the law of sines,

h b
=
sin sin 2

So, h = b sin . From the high school formula ch=2 it then follows the trigonometric formula
(C.5).

Example 2094 Some important geometric gures in the plane can be subdivided in trian-
gles, so their area can be recovered by adding up the area of such triangles. For instance,
consider a regular polygon with n sides of equal length and n angles of equal measure 2 =n.
For example, in the following gure we have an hexagon with six sides of equal length and
six angles of equal measure =3 (i.e., 60 degrees):

Denote by r the radius of this regular polygon. The area of each regular polygon is partitioned
in n identical isosceles triangles with two sides of equal length r. For instance, in the hexagon
there are six such triangles. By formula (C.5), the area of each of these identical isosceles
triangles is 2 1 r2 sin 2 =n, so the area of the polygon is

n 2 2
r sin (C.6)
2 n

p p
For example, the area of the hexagon is 3r2 3=2 since sin =3 = 3=2.
The subdivision of geometric gures of the plane in triangles is called triangulation, an
important technique that may permit to reduce the study of geometric gures to that of
triangles (by taking limits via arbitrarily small triangles, the technique becomes especially
powerful). N
1452 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)

Example 2095 The famous number can be de ned as the area of the closed unit ball

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

To compute amounts to compute this area, a problem that Archimedes famously ap-
proached via the method of exhaustion.2 This method considers the areas of inscribed and
circumscribed polygons, which provide lower and upper approximations for , respectively.
Indeed, the area of any inscribed polygon is always , while the area of any circumscribed
polygon is always . For instance, consider a regular polygons inscribed in the closed unit
ball, like the hexagon:

By increasing the number of sides, we get larger and larger inscribed regular polygons that
provide better and better lower approximations of . The area of each such polygon is given
by formula (C.6). Since their radius r is 1, we thus have the lower approximations

n 2
sin 8n 1
2 n
2
A remarkable early estimate of can be found in the Bible. In the rst Book of Kings (written around
the VI or V century B.C.), one reads that Solomon made a wash basin for ablution with a diameter of 10
cubits and a circumference of 30 cubits. As a result, here = 3.
C.2. CONCERTO D'ARCHI (STRING CONCERT) 1453

that are better and better as n increases. At the limit, we have:


n 2
lim sin =
n!1 2 n
Indeed, by setting x = 2 =n we have

n 2 sin n2 sin x sin x


lim sin = lim 2 = lim x = lim =
n!1 2 n n!1 x!0 x!0 x
n

Similarly, by increasing the number of sides we get smaller and smaller circumscribed regular
polygons that provide better and better upper approximations of . The radius r of the
circumscribed regular polygon with n sides is the length of the equal sides of the isosceles
triangles in which it can be partitioned. So, r = cos 1 =n > 1 as the reader can check with
the help of the next gure:

By formula (C.6), we thus have the upper approximations


n 1 2
sin 8n 1
2 cos2 n n

that are better and better as n increases. At the limit, by setting again x = 2 =n we have:
n 1 2 1 sin x
lim sin = lim x =
n!1 2 cos2 n n x!0 cos2 2 x

Summing up,
n 2 n 1 2
" sin sin # 8n 1 (C.7)
2 n 2 cos2 n n
Via a trigonometric argument, we thus showed that the areas of the inscribed and circum-
scribed regular polygons provide lower and upper approximations of that, as the number
of sides increases, better and better sandwich till, in the limit of \in nitely many sides",
they reach as their common limit value.3
3
The role of in the approximations is to identify radians, so the actual knowledge of is not needed
(thus, there is no circularity in using these approximations for ).
1454 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)

The trigonometric approximations (C.7) thus justify the use of the method of exhaustion
to compute . Archimedes was able to compute the area of the inscribed and circumscribed
regular polygons till n = 96, getting the remarkable approximation

10 1
3:1408 = 3 + 3+ = 3:1429
71 7
By computing the areas of the inscribed and circumscribed regular polygons for larger and
larger n, we get better and better approximations of . N

We close with a result that generalizes Pythagoras' Theorem, which is the special case
when the triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).

Theorem 2096 (Carnot) We have a2 = b2 + c2 2ab cos .

C.3 Perpendicularity
The trigonometric circle consists of the points x 2 R2 of unit norm, that is, kxk = 1. Hence,
any point x = (x1 ; x2 ) 2 R2 can be moved back on the unit circle by dividing it by its norm
kxk since
x
=1
kxk
The following picture illustrates:

It follows that
x2 x1
sin = and cos = (C.8)
kxk kxk
that is,
x = (kxk cos ; kxk sin )
This trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.
C.3. PERPENDICULARITY 1455

The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x and arctan x. To this end, observe that

x2
sin kxk x2
tan = = x1 =
cos kxk x1

Together with (C.8), this implies that

x2 x1 x2
= arctan = arccos = arcsin
x1 kxk kxk

The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).

Let x and y be two vectors in the plane R2 that determine the angles and :

By (C.4), we have

x y = (kxk cos ; kxk sin ) (kyk cos ; kyk sin )


= kxk kyk (cos cos + sin sin ) = kxk kyk cos ( )

that is,
x y
= cos ( )
kxk kyk
1456 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)

where is the angle that is di erence of the angles determined by the two points.

This angle is a right one, i.e., the vectors x and y are \perpendicular", when
x y
= cos = 0
kxk kyk 2

that is, if and only if x y = 0. In other words, two vectors in the plane R2 are perpendicular
when their inner product is zero.
Appendix D

Elements of intuitive logic

In this chapter we will introduce some basic notions of logic. Though, \logically", these
notions should actually be introduced at the beginning of a textbook, they can be best
appreciated after having learned some mathematics (even if in a logically disordered way).
This is why this chapter is an Appendix, leaving to the reader to judge when it is best to
read it.

D.1 Propositions
We call proposition a statement that can be either true or false. For example, \ravens are
black" and \in the year 1965 it rained in Milan" are propositions. On the contrary, the
statement \in the year 1965 it has been cold in Milan" is not a proposition, unless we specify
the meaning of cold, for example with the proposition \in the year 1965 the temperature
went below zero in Milan".
We will denote propositions by letters such as p; q; :::. Moreover, we will denote with 1
and 0, respectively, the truth or the falsity of a proposition: these are called truth values.
Thus, a true proposition has truth value 1, while a false proposition has truth value 0.

D.2 Operations
Let us introduce some operations on propositions.

(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de nition
with the following truth table
p :p
1 0
0 1

which reports the truth values of p and :p. For instance, if p is \in the year 1965 it
rained in Milan", then :p is \in the year 1965 it did not rain in Milan".

(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at

1457
1458 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

least one of the two is false. The truth table is:

p q p^q
1 1 1
1 0 0
0 1 0
0 0 0

For instance, if p is \in the year 1965 it rained in Milan" and q is \in the year 1965
the temperature went below zero in Milan", then p ^ q is \in the year 1965 it rained in
Milan and the temperature went below zero".

(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:

p q p_q
1 1 1
1 0 1
0 1 1
0 0 0

For instance, with the previous examples of p and q, then p _ q is \in the year 1965 it
rained in Milan or the temperature went below zero".

(iv) Conditional. Let p and q be two propositions; the conditional, denoted by p =) q, is


the proposition with truth table:

p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (D.1)

The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is \I go on vacation"
and the consequent q is \I go to the sea"; the conditional p =) q is \If I go on
vacation, then I go to the sea".

(v) Biconditional. Let p and q be two propositions; the biconditional, denoted by p () q,


is the proposition (p =) q) ^ (q =) p) that involves the implication p =) q and
1
Like the union symbol [, also the disjunction symbol _ reminds of the Latin \vel", an inclusive \or", as
opposed to the exclusive \aut".
D.2. OPERATIONS 1459

its converse q =) p, with truth table:

p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1

The biconditional is, therefore, true when the two involved implications are either both
true or both false. With the last example of p and q, the biconditional p () q is \I
go on vacation if and only if I go to the sea".

These ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q and r, through negation, disjunction and conditional we can
build, for example, the proposition

: ((p _ :q) =) r) (D.2)

Its truth table is:


p q r :q p _ :q (p _ :q) =) r : ((p _ :q) =) r)
1 1 1 0 1 1 0
0 1 1 0 0 1 0
1 0 1 1 1 0 1
0 0 1 1 1 1 0
1 1 0 0 1 0 1
0 1 0 0 0 1 0
1 0 0 1 1 0 1
0 0 0 1 1 0 1

For example, in your local newspaper you may read that \if this winter will be colder or
more rainy than last winter, more people will get a u". Let us rewrite this sentence in a less
catchy but more accurate form, amenable to a logical analysis: \if (in our city) in this winter
the daily average temperature will be lower than last winter or the daily average rainfall will
be higher, then doctors will diagnose a higher number of u cases". This pedantic rewriting
shows that the newspaper claim corresponds to the following proposition

(p _ :q) =) r

where p is the proposition \in this winter the daily average temperature will be than last
year", q is the proposition \in this winter the daily average rainfall will be > than last year"
and r is the proposition \in this winter doctors will diagnose a higher number of u cases".
What about the negation (D.2)? It corresponds to a rival local newspaper that tells its
readers \Forget about the other guys, just think of the opposite of what they say".

O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it rst appears as \[a thing] is or it is not" in the poem
1460 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the
set-theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of
the liar: consider the self-referential proposition \this proposition is false". Is it true or false?
Maybe it is both.2 Be that as it may, in many matters { in mathematics, let alone in the
empirical sciences { the dichotomy can be safely assumed.

D.3 Logical equivalence


Two classes of propositions are central, contradictions and tautologies. A proposition is
called contradiction if it is always false, while it is called tautology if it is always true.
Obviously, contradictions and tautologies have, respectively, truth tables with only values
0 and only values 1. For this reason, we write p 0 if p is a contradiction and p 1 if p is
a tautology. In other words, the symbol 0 denotes a generic contradiction and the symbol 1
a generic tautology.
For instance, the proposition p =) (q =) p) is a tautology, as its truth table shows:

p q q =) p p =) (q =) p)
1 1 1 1
1 0 1 1
0 1 0 1
0 0 1 1

In symbols, p =) (q =) p) 1.

Two propositions p and q are said to be (logically) equivalent, written p q, when they
have the same truth values, i.e., they are always both true or both false. In other words, two
propositions p and q are equivalent when the co-implication p () q is a tautology, i.e., it
is always true. The relation is called logical equivalence.
The following properties are evident:

(i) p ^ p p and p _ p p (idempotence);

(ii) : (:p) p (double negation);

(iii) p ^ q q ^ p and p _ q q _ p (commutativity);

(iv) (p ^ q) ^ r p ^ (q ^ r) and (p _ q) _ r p _ (q _ r) (associativity).

Moreover, one has that:

(v) p ^ :p 0 (law of non-contradiction);

(vi) p _ :p 1 (law of excluded middle).


2
A proposition such that both it and its negation are true has been called dialetheia.
D.3. LOGICAL EQUIVALENCE 1461

In words, proposition p ^ :p is a contradiction: a proposition and its negation cannot


be both true. In contrast, proposition p _ :p is a tautology: a proposition is either true or
false, tertium non datur. Indeed:

p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1

If p is the proposition \all ravens are black", the contradiction p ^ :p is \all ravens are both
black and non-black" and the tautology p _ :p is \all ravens are either black or non-black".

The de Morgan's laws are:

: (p ^ q) :p _ :q and : (p _ q) :p ^ :q

They can be proved through the truth tables; we con ne ourselves to the rst law:

p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1

The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as claimed.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan's laws.

It is easily seen that p =) q is equivalent to :q =) :p, that is,

(p =) q) (:q =) :p) (D.3)

Indeed:
p q p =) q :p :q :q =) :p
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1

The proposition :q =) :p is called the contrapositive of p =) q. Each conditional is,


therefore, equivalent to its contrapositive. For instance, the conditional \If I go on vacation,
then I go to the sea" is equivalent to its contrapositive \If I do not go to the sea, then I do
not go on vacation".

A remarkable equivalence for the conditional is

p =) q :p _ q (D.4)
1462 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

That is, the conditional p =) q is equivalent to the disjunction of q and the negation of p.
Indeed:
p q p =) q :p :p _ q
1 1 1 0 1
1 0 0 0 0
0 1 1 1 1
0 0 1 1 1

For instance, the proposition \If I go on vacation, then I go to the sea" is equivalent to the
proposition \I do not go on vacation or I go to the sea".
Similarly, for biconditionals we have the equivalence

p () q (:p _ q) ^ (p _ :q)

We conclude that conditionals and biconditionals can be expressed in terms of disjunctions,


conjunction and negations.

Finally, note that by the de Morgan laws (D.4) implies the equivalence

: (p =) q) p ^ :q (D.5)

That is, the negation of a conditional p =) q is equivalent to the conjunction between p


and the negation of q, something that can be also easily checked directly:

p q p =) q : (p =) q) p ^ :q
1 1 1 0 0
1 0 0 1 1
0 1 1 0 0
0 0 1 0 0

For instance, the proposition \If I go on vacation, then I go to the sea" is false (true) if and
only if the proposition \I go on vacation and I do not go to the sea" is true (false). Indeed,
what about a mountain vacation?

N.B. Given two equivalent propositions, one of them is a tautology if and only if the other
one is so. O

D.4 Deduction
D.4.1 Logical consequences
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
vein, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
case, if p is true then also q is true.3 We say that q is a logical consequence of p, written

p j= q
3
When p is false the implication is automatically true, as the truth table (D.1) shows. Ex falso sequitur
quodlibet.
D.4. DEDUCTION 1463

The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when simultaneously p j= q and q j= p.

Two classic implications, easily checked via truth tables, are:

(i) (p =) q) ^ p j= q (modus ponens);

(ii) (p =) q) ^ :q j= :p (modus tollens).

In words, modus ponens says that if the conditional and the antecedent are both true,
then the consequent is true. Modus tollens, instead, says that if the conditional is true and
the consequent is false, then the antecedent is false. Thus, modus ponens is about the status
of the consequent, while modus tollens is about the status of the antecedent.
We check only modus ponens:

p q p =) q (p =) q) ^ p (p =) q) ^ p =) q
1 1 1 1 1
1 0 0 0 1
0 1 1 0 1
0 0 1 0 1

For instance, as before suppose p is \I go on vacation" and q is \I go to the sea", so that


p =) q is \If I go on vacation, then I go to the sea". Modus ponens ensures that if is true
both that \if I go on vacation, then I go to the sea" and that \I go on vacation", then it is
also true that \I go to the sea". Modus tollens, on the other hand, guarantees that if it is
true that \if I go on vacation, then I go to the sea" and false that \I go to the sea", then it
is false that \I go on vacation".

A nal, easily checked, classic implication is:

(iii) (p =) q) ^ (q =) r) j= (p =) r) (hypothetical syllogism).

The transitive essence of this implication will be soon clari ed by Lemma 2098. Let p
and q be as before and r be the proposition \I swim". The hypothetical syllogism ensures
that if is true that \if I go on vacation, then I go to the sea" and that \I go to the sea, then
I swim", then it is also true that \if I go on vacation, then I swim".

D.4.2 Theorems and proofs


In our naive setup, a theorem is a proposition of the form p j= q, that is, an implication. The
proof (or demonstration) is a logical argument that proves that the conditional p =) q is
actually an implication.4 To do this it is necessary to establish that, if the hypothesis p is
true, then also the thesis q is true. Usually we choose one among the following three di erent
types of proof:
4
In these introductory notes we remain vague about what a \logical argument" is, leaving a more de-
tailed analysis to more advanced courses. We expect, however, that readers can (intuitively) recognize, and
elaborate, such arguments.
1464 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

(a) direct proof : p j= q, i.e., we establish directly that, if p is true, also q is so;
(b) proof by contraposition: :q j= :p, i.e., we establish that the contrapositive :q =) :p
is a tautology (i.e., that if q is false, so is p);
(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., we establish that
the conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false,
we reach a contradiction r ^ :r).

The proof by contraposition relies on the equivalence (D.3) and is, basically, an upside
down direct proof (for instance, Theorem 2103 will be proved by contraposition). For this
reason in what follows we will focus on the two main types of proofs, direct and by contra-
diction.

N.B. (i) When both p j= q and q j= p hold, the theorem takes the equivalence form p q.
The implications p j= q and q j= p are independent and each of them requires its own proof
(this is why in the book we studied separately the \if" and the \only if").
(ii) When, as it is often the case, the hypothesis is the conjunction of several propositions,
we write
p1 ^ ^ pn j= q (D.6)
So, the scope of the implication p j= q is broader than it may appear prima facie. O

D.4.3 Direct proofs


Sometimes p j= q can be proved with a direct argument.

Theorem 2097 If n is odd, then n2 is odd.

Proof Since n is odd, there is a natural number k such that n = 2k + 1. Then, n2 =


(2k + 1)2 = 2 2k 2 + 2k + 1, so n2 is odd.

Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
In this regard, the next result is key.

Proposition 2098 j= is transitive.

Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.

By iterating transitivity, we then get the following deduction scheme: p j= q if


p j= r1
r1 j= r2
(D.7)
rn j= q
The auxiliary n propositions ri break up the direct argument in n steps, thus forming a chain
of reasoning. We can write horizontally the scheme as:
p j= r1 j= r2 j= j= rn j= q
D.4. DEDUCTION 1465

Example 2099 (i) Assume that p is \n2 + 1 is odd" and q is \n is even". To prove p j= q,
let us consider the auxiliary proposition \n2 is even". The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 2102). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition \if n2 + 1
is odd, then n is even".
(ii) Assume that p is \the scalar function f is di erentiable" and q is \the scalar function
f is integrable". To prove p j= q is natural to consider the auxiliary proposition \the scalar
function f is continuous". The implications p j= r and r j= q are basic calculus results that,
jointly, provide a direct proof p j= r j= q of p j= q, that is, of the proposition \if the scalar
function f is di erentiable, then it is integrable". N

When p p1 _ _ pn , we have the (easily checked) equivalence

(p1 _ _ pn ) =) q (p1 =) q) ^ ^ (pn =) q)

Consequently, to establish pi j= q for each i = 1; ::; n amounts to establish p j= q. This is the


so-called proof by cases, where each pi j= q is a case. Needless to say, the proof of each case
may require its own deduction scheme (D.7).

Theorem 2100 If n is any natural number, then n2 + n is even.

Proof Assume that p is \n is any natural number", p1 is \n is an odd number", p2 is \n is


an even number" and q is \n2 + n is even". Since p p1 _ p2 , we prove the two cases p1 j= q
and p2 j= q.

Case 1: p1 j= q. We have p1 = 2k + 1 for some natural number k, so n2 + n = (2k + 1)2 +


2k + 1 = 2 2k 2 + 3k + 1 , which is even.

Case 2: p2 j= q. We have p1 = 2k for some natural number k, so n2 + n = (2k)2 + 2k =


2 2k 2 + 1 , which is even.

D.4.4 Reductio ad absurdum


To understand the rationale of the proof by contradiction, note that the truth table

p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1

proves the logical equivalence

(p =) q) (p ^ :q =) r ^ :r) (D.8)

Hence, p =) q is true if and only if p ^ :q =) r ^ :r is true. Consequently, to establish


p ^ :q j= r ^ :r amounts to establish p j= q.
1466 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

It does not matter what is the proposition r because, in any case, r^:r is a contradiction.
In a more compact way, we can indeed rewrite the last equivalence as
(p =) q) (p ^ :q =) 0)
The proof by contradiction is, intellectually, the most intriguing { recall Section 1.8 on the
birth of the deductive method. We illustrate it with one of the gems of Greek mathematics
that we saw in the rst chapter. For brevity, we do not repeat the proof of the rst chapter
and just present its logical analysis.
p
Theorem 2101 2 2 = Q.

Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is \the axioms of arithmetic, in particular those aboutp arithmetical operations, hold". Let
a be this concealed hypothesis,5 let q be the thesis \ 2 2 = Q" and let r be the proposition
\m=n is reduced to its lowest terms". The scheme of the proof is a ^ :q j= r ^ :r, i.e., if
arithmetical operations apply, the negation of the thesis leads to a contradiction.

An important special case of the equivalence (D.8) is when the role of r is played by the
hypothesis p itself. In this case, (D.8) becomes
(p =) q) (p ^ :q =) p ^ :p)
The following truth table
p q p =) q p ^ :q :p p ^ :q =) :p p ^ :q =) p ^ :p
1 1 1 0 0 0 1
1 0 0 1 0 0 0
0 1 1 0 1 1 1
0 0 1 0 1 1 1

proves the equivalence (p ^ :q =) p ^ :p) (p ^ :q =) :p). Because of the transitivity


of (Proposition 2098), in the special case r = p the reductio ad absurdum is, therefore,
based on the equivalence
(p =) q) (p ^ :q =) :p)
In words, one needs to show that the hypothesis and the negation of the thesis imply, jointly,
the negation of the hypothesis. Let us see an example.

Theorem 2102 If n2 is even, then n is even.

Proof Let us assume, by contradiction, that n is odd. Then n2 is odd, which contradicts
the hypothesis.

Logical analysis. Let p be the hypothesis \n2 is even" and q the thesis \n is even". The
scheme of the proof is p ^ :q j= :p.
5
This discussion will become clearer after the next section on the deductive method. In any case, we can
think of a = a1 ^ ^ an as the conjunction of a collection A = fa1 ; :::; an g of axioms of arithmetic (in our
naive setup, we do not worry wether all such axioms can be expressed via propositional calculus, an issue
that readers will study in more advanced courses).
D.4. DEDUCTION 1467

D.4.5 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary propositions ri that permit to articulate
a direct argument.
That said, as to terminology the implication p j= q can be read in di erent, but equivalent,
ways:

(i) p implies q;

(ii) if p, then q;

(iii) p only if q ;

(iv) q if p;

(v) p is a su cient (condition) for q;

(vi) q is a necessary (condition) for p.

The choice among these versions is a matter of expositional convenience. Similarly, the
equivalence p q can be read as:

(i) p if and only if q;

(ii) p is a necessary and su cient (condition) for q.

For example, the next simple result shows that the implication \a > 1 j= a2 > 1" is
true, i.e., that \a > 1 is a su cient condition for a2 > 1", i.e., that \a2 > 1 is a necessary
condition for a > 1".

Theorem 2103 If a > 1, then a2 > 1.

Proof Let us proceed by contraposition.


p Let a2 1. We want to show that a 1. This
follows by observing that jaj = a2 1.

Proofs are at the heart of all mathematical investigations, pure and applied, they are
their holy precincts. As such, their style has to be clear yet concise, with no frills: every
word or symbol should be there for a reason. The main purpose of a proof is prove that a
theorem is correct and, sometimes, in doing so itpmay also shed light on the result itself (a
major example is the proof of the irrationality of 2 via the odd-even dichotomy for natural
numbers). But, unfortunately, proofs might well be not that illuminating { indeed, they
might be the outcome of as much perspiration as inspiration.6
6
Le sudate carte of Leopardi's poem A Silvia.
1468 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

D.5 Deductive method in mathematics


D.5.1 Collections
Let P be a collection of propositions that is closed under the logical operations _, ^ and
:. For instance, if the propositions a, b and c belong to P then also the proposition
: ((a _ :b) =) c) belongs to P since conditional propositions can be expressed in terms of
_ and :.
If = fp 1 ; :::; pn g is a collection of propositions in P , we say that q is a logical consequence
of when the implication (D.6) holds, i.e., p1 ^ ^ pn j= q. In this case we write

j= q

Logical consequences are established via deductive reasoning. Such reasoning might well be
sequential, according for example to the deduction scheme (D.7).
If all propositions in are true, so are their logical consequences. We say that is
(logically):

(i) consistent if there is no q 2 P such that both j= q and j= :q;

(ii) independent if there is no p 2 such that fpg j= p;

(iii) complete if, for all q 2 P , either j= q or j= :q;

In words, consistency requires that the conjunction p = p1 ^ ^ pn of the propositions in


be not a contradiction, while independence requires that no proposition in P be a logical
consequences of other ones in P (so, it is super uous). Finally, completeness requires that
each proposition in P , or its negation, be a logical consequence of propositions in P .
Among these properties, consistency is especially important as it will soon become clear.

D.5.2 Deductive method


Using the few notions of propositional logic that we learned, we can now outline a (highly
stylized) description of the deductive (or axiomatic) method in mathematics, a central canon
of Western thought after Greek geometry (cf. Section 1.8).
In a mathematical theory, the propositions in P are written through primitive terms,
whose meaning is regarded as self-evident { so, not explained, famous examples being
\points" and \lines" in Euclidean geometry and \sets" in set theory { and through de-
ned terms, whose meaning is expressed via either primitive terms or previously de ned
terms.
The theory then posits a set of propositions A = fa1 ; :::; an g in P , called axioms, that
it presupposes to be true \without establishing them in any way" (e.g., the parallel axiom
in Euclidean geometry).7 The set A, called axiomatic system, has to be consistent, so that
the conjunction a = a1 ^ ^ an of the axioms is not a contradiction (which would be a
no-start). Though consistency is the key property of an axiomatic system, ideally it should
be also independent, so there are no redundant axioms.
7
As Tarski (1994) writes on p. 110. Alfred Tarski has been, along with David Hilbert and Giuseppe Peano,
a central gure in the modern analysis of the deductive method in mathematics. We refer readers to his book
for a masterly introduction to the subject.
D.5. DEDUCTIVE METHOD IN MATHEMATICS 1469

In a mathematical theory, theorems take the form

A j= p (D.9)

that stands for a j= p where the hypothesis a = a1 ^ ^ an is the conjunction of the axioms
A = fa1 ; :::; an g. The thesis p can, of course, be a proposition de ned in terms of simpler
propositions via some logical operations. For instance, theorems in a mathematical theory
have often the \if..., then..." form
A j= p ! q
where the thesis is a conditional p ! q. In this case, (D.9) takes the special form

A [ fp g j= q (D.10)

thanks to the following simple result.

Proposition 2104 r j= p ! q if and only if r ^ p j= q for all propositions p; q and r.

Proof By the transitivity of , we have


1 2 3 4 5
a ! (p ! q) :a _ (p ! q) :a _ (:p _ q) (:a _ :p) _ q : (a ^ p) _ q (a ^ p) ! q

where steps 1, 2 and 5 follows from (D.4), step 3 from the associativity of _ and step 4 from
the de Morgan laws.

The scope of a mathematical theory is given by the propositions that, via theorems
(D.9), can be derived from the axioms. Yet, to ease exposition axioms are typically omitted
in theorems' statements because they are taken for granted within the mathematical theory
at hand. So, for theorems of the form (D.10) one just writes p j= q in place of A [ fp g j= q.
A classic instance is Pythagora's Theorem \if a triangle is right, then the area of the square
whose side is the hypotenuse is equal to the sum of the areas of the squares whose sides
are the two catheti". In the mathematics jargon, p is called hypothesis of the theorem and
q thesis. In view of the last proposition, it is a correct terminology with the caveat of the
omitted axioms { which are theorems' veritable convitati di pietra (stoned guests).
In a similar vein, some statements of theorems of the form A j= q may appear to have no
hypothesis, an optical illusion that we already noted for Theorem 2101. Many theorems of
Euclidean geometry have actually this form. For instance, the important theorem \the sum of
the three interior angles of a triangle equals to right angles" tacitly assumes Euclid's axioms,
in particular the parallel postulate. This famous axiom is peculiar to Euclid's geometry and,
indeed, this theorem is no longer true in non-Euclidean geometries. Even if not explicitly
mentioned in the theorem, the parallel postulate thus looms in the background.

D.5.3 A miniature mathematical theory


Following Tarski (1994), consider a miniature mathematical theory that has two primitive
terms I and . The symbol I indicates the set of all segments (denoted by the letters x, y,
z ...) of the real line. The symbol indicates the congruence relation between segments,
so that x y reads as \the segment x is congruent with the segment y". Two axioms are
considered.
1470 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

A.1 The proposition a1 =\x y for all x; y 2 I" is true (i.e., is re exive).

A.2 The proposition a2 =\x z and y z imply x y for all x; y; z 2 I" is true.

Let q =\x y if and only if y x for all x; y 2 I" (i.e., is symmetric).

Theorem 2105 We have A = fa1 ; a2 g j= q.

Proof We have a2 j= r, where r =\z z and y z imply z y for all y; z 2 I". So, the
proof relies on the deduction scheme a1 ^ a2 j= a1 ^ r j= q.8

Thus, under axioms A.1 and A.2 the binary relation is symmetric. It is easily checked
to be also transitive.

D.5.4 Interpretations and models


The speci c meaning attached to the primitive terms is irrelevant for the formal deductions
carried out via (D.9). For instance, following again Tarski (1994), consider an alternative
interpretation of the primitive terms of the previous theory in which I now indicates a set of
numbers and the symbol indicates a congruence relation in which x y reads as \there
is an integer z such that x y = z". Axioms A.1 and A.2 and the resulting Theorem 2105
still apply.
So, the same mathematical theory may admit di erent interpretations, whose meaning is
understood outside the theory { which thus takes it for granted. The expression \self-evident"
is now replaced by this more general principle. For this reason, in modern mathematics the
emphasis is on the consistency of the axioms rather than on their self-evidence (as it was in
Greek geometry), a notion that implicitly refers to a speci c interpretation.
As readers will learn in logic courses, axioms have their own syntactic life that abstracts
from any speci c interpretations (semantics). For instance, in Tarski's miniature example
the underlying general abstract structure consists of a set X and a binary relation R on it.
Any interpretation of X and R provides a model for such abstract structure. The abstract
axioms are:

A.1 the proposition a1 =\R is re exive" is true;

A.2 the proposition a2 =\xRz and yRz imply xRy for all x; y; z 2 X" is true.

If we set q =\R is symmetric", we have the abstract version of Theorem 2105.


All this is a bit pedantic, however. In a more imprecise, yet much more suggestive, way
these two abstract axioms can be stated as:

A.1 R is re exive;

A.2 If xRz and yRz, then xRy for all x; y; z 2 X.

If we call Tarskian the property in A.2, we can state the abstract version of Theorem
2105 in a legible way.
8
It is easy to check using truth tables that from q j= r it follows p ^ q j= p ^ r for all propositions p, q and
r.
D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1471

Theorem 2106 If a binary relation is re exive and Tarskian, then it is symmetric.

In all models of the abstract structure (X; R) that assume axioms A.1 and A.2, this
theorem holds and will be suitably interpreted. The relations between the abstract structure
(X; R) and the models that we discussed can be diagrammed as follows

(X; R)
. &
(segments, congruence) (numbers, congruence)

D.6 Intermezzo: the logic of empirical scienti c theories


D.6.1 Empirical models
Inspired by the deductive method outlined before, we can sketch a description of a deductive
and realist scienti c theory about a physical or social empirical reality.9
Let P be a collection of propositions which is closed with respect to the logical operations
_, ^ and :. Propositions are written through primitive terms, whose empirical meaning is
taken for granted by the theory, as well as through de ned terms. So written, the propositions
in P are either true or false in the empirical reality under investigation and the collection P
provides a description of such a reality.10
A function v : P ! f0; 1g assigns a truth value to all propositions in P . Each truth
assignment v corresponds to a possible con guration of the empirical reality in which the
propositions in P are either true or false. Each truth assignment is, thus, a possible inter-
pretation { or empirical model { that reality may give P . There is a unique true v because
there is a unique true empirical reality (a natural principle to assume in a realist approach).
Let V be the collection of all truth assignments. A proposition p 2 P is a tautology
if v (p) = 1 for all v 2 V and is a contradiction if v (p) = 0 for all v 2 V . In words, a
tautology is a proposition which is true under all interpretations, while a contradiction is a
proposition which is false under all them. The truth value of tautologies and contradictions
thus only depend on their own form, regardless of any interpretation that they can take.11
In particular, a logical implication holds across all interpretations and is thus a statement
about all them.12
For later use, next we report a few relations between logical operations and truth assign-
ments.

Lemma 2107 Given any two propositions p and q in P , we have:

(i) v (p _ q) = max fv (p) ; v (q)g and v (p ^ q) = min fv (p) ; v (q)g for all v 2 V ;
9
Realism is a methodological position, widely held in the practice of natural and social science, that asserts
the existence of an external, objective, reality that it is the purpose of scienti c inquiries to investigate.
10
Of course, behind this sentence there are a number of highly non-trivial conceptual issues about meaning,
truth, reality, etc. etc. (an early classical analysis of these issues can be found in Carnap, 1936).
11
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous, yet often elusive (if not evanescent), Tractatus (the use of the term
tautology in logic is due to him; he also popularized the use of truth tables to handle truth assignments).
12
Debreu (1959) is a classic axiomatic work in economics. In the preface of his book, Debreu writes that
\Allegiance to rigor dictates the axiomatic form of the analysis where the theory, in the strict sense, is logically
entirely disconnected from its interpretations."
1472 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

(ii) v (:p) = 1 v (p) for all v 2 V ;

(iii) p j= q if and only if v (p) v (q) for all v 2 V .

A consequence of (iii) is that p q if and only if v (p) = v (q) for all v 2 V .

Proof As all other points are easily checked, we just prove the \only if" of (ii). If p j= q
then, p ^ q p and so, by (i), v (p) = v (p ^ q) = min fv (p) ; v (q)g v (q) for all v 2 V .

Denote by v the true con guration of the empirical reality under investigation. A
scienti c theory takes a stance about the empirical reality that it is studying by positing a
consistent collection A = fa1 ; :::; an g of propositions, called axioms, that are assumed to be
true under the (unknown) true con guration v , i.e., it is assumed that

v (ai ) = 1 8i = 1; :::; n

All propositions that are logical consequences of the axioms are then assumed to be true
under v .13 In particular, if A is complete the truth value of all propositions in P can be, in
principle, decided. So, the function v is identi ed.

Example 2108 (i) In economics, a choice theory studies the behavior of a consumer who
faces di erent bundles of goods. Consider a choice theory that has two primitive terms I and
(cf. Section D.5.3). The symbol I indicates the set of all bundles of goods available to the
consumer. The symbol indicates the consumer's indi erence relation between the bundles,
so that x y reads as \for the consumer bundle x is indi erent to bundle y".14 If the theory
assumes axioms A.1 and A.2, so the truth of propositions a1 and a2 , then is symmetric
(Theorem 2105) and transitive. By assuming these two axioms, the theory makes a stance
about the consumer's behavior, which is the empirical reality that is studying. The theory
is empirically correct as long as these axioms are empirically true, i.e., v (a1 ) = v (a2 ) = 1.
Unlike a mathematical theory, which is concerned only about the logical consistency of its
axioms, an empirical theory is also concerned about their empirical status.
(ii) In physics, special relativity is based on two axioms: a1 =\invariance of the laws of
physics in all inertial frames of reference", a2 =\the velocity of light in vacuum is the same
in all inertial frames of reference". If v is the true physical con guration, the theory is true
if v (a1 ) = v (a2 ) = 1. Special relativity is a most brilliant example of the ability to pursue
relentlessly all logical implications of the posited axioms, even if this means to challenge
fundamental ideas, for example on time, rmly held till then. N

To decide whether a scienti c theory is empirically relevant, we thus have to check


whether v (ai ) = 1 for each i = 1; :::; n. That is, we have to check whether the propo-
sition a1 ^ ^ an is true empirically { i.e., in the empirical reality under investigation. If n
is large, operationally this might be complicated { just infeasible if A is in nite. In contrast,
to falsify the theory it is enough to exhibit, directly, a single axiom of A which is empirically
false or, indirectly, a single logical consequence of the axioms which is empirically false.
13
In the words of Wittgenstein \If a god creates a world in which certain propositions are true, he creates
thereby also a world in which all propositions consequent on them are true." (Tractatus, proposition 5.123)
14
Needless to say, after congruence relations on segments and integers, the indi erence relation on bundles
of goods is yet another model of the abstract structure (X; R) of Section D.5.4.
D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1473

This operational asymmetry between veri cation and falsi cation { emphasized by Karl
Popper in the 1930s { is an important methodological aspect. Indirect falsi cation is, in
general, the kind of falsi cation that one might hope for. It is the so-called testing of the
implications of a scienti c theory. In this indirect case, however, it is unclear which one of
the posited axioms actually fails: in fact, : (p1 ^ ^ pn ) :p1 _ _:pn . If not all the
posited axioms have the same status, only some of them being \core" axioms (as opposed
to auxiliary ones), it is then unclear how serious is the falsi cation. Indeed, falsi cation is
often a chimera (especially in the social sciences), as even the highly stylized setup of this
section should suggest.
That said, with all its limitations logical argumentation is the basic method of rational
investigation of an empirical science, with theoretical reasoning at its core as a way to
understand and organize empirical data, in a tradition started by the Ionians and revived
in modern times by Galileo (recall the celebrated Saggiatore passage15 about the book of
nature written in a mathematical language).

D.6.2 Logical atomism


Atomism, broadly de ned, may refer to the idea that empirical phenomena are ultimately
formed by some irreducible elements, the atoms, in whose terms all phenomena can be
expressed (and, hopefully, understood).
To discuss this view in our logical setup, we need a few further notions. We say that two
propositions p and q are disjoint (or exclusive or incompatible) if they cannot be true at the
same time, that is, if their conjunction p ^ q is a contradiction. Their truth table is:

p q p^q
1 0 0
0 1 0
0 0 0

In symbols, two propositions p and q are disjoint when

p^q 0

The most basic instance of two disjoint propositions is given by a proposition and its negation,
i.e., p and :p. Indeed, according to the law of non-contradiction we have p ^ :p 0. Of
course, two propositions can be disjoint without being one the negation of the other: the
two disjoint propositions \in the year 1965 the average daily temperature in Milan was 15
degrees" and \in the year 1965 the average daily temperature in Milan was 16 degrees" are
a such example.
The next result captures what is peculiar to the proposition/negation case among pairs
of disjoint propositions.

Proposition 2109 Two disjoint propositions are one the negation of the other if and only
if their disjunction is a tautology.
15
\... questo grandissimo libro ... e scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed
altre gure geometriche, senza i quali mezi e impossibile a intenderne umanamente parola." (trans. \... this
grand book ... is written in the language of mathematics, and its characters are triangles, circles, and other
geometric gures, without which it is impossible for man to understand its words").
1474 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

Proof The \only if" is the law of excluded middle. As to the converse, assume that p and q
are two disjoint propositions such that p _ q 1. This implies that p and q cannot be false
at the same time, so either of them has to be true. By adding a disjunction column to the
last truth table we have
p q p^q p_q
1 0 0 1
0 1 0 1

We conclude that p and q are one the negation of the other.

We say that two propositions p and q are exhaustive if they cannot be false at the same
time, that is, if their disjunction p _ q is a tautology. Their truth table is:
p q p_q
1 1 1
0 1 1
1 0 1

In symbols, two propositions p and q are exhaustive when

p_q 1

Interestingly, also the most basic instance of two exhaustive propositions is given by a propo-
sition p and its negation :p. Indeed, by the law of excluded middle we have p _ :p 1.
Yet, we might well have two propositions p and q that are exhaustive without being one the
negation of the other, i.e., without having p ^ q 0. For example, if in our city the oldest
person is 100 years old, the two propositions p and q given by \our fellow citizen Mario is
< 50 years old" and \our fellow citizen Mario is 30 years old" are exhaustive but not
disjoint.
Two exhaustive propositions are easily seen to be disjoint if and only if one is the negation
of the other. Along with the last proposition, this implies the following simple, yet interesting,
result.

Proposition 2110 Two propositions are both disjoint and exhaustive if and only if one is
the negation of the other.

What characterizes, among all binary collections, the ones of the form fp; :pg is thus
that their elements are disjoint as well as exhaustive.
When two propositions are disjoint, let us denote their disjunction p _ q by p + q. For
instance, we write the law of excluded middle as

p + :p 1

In general, if the elements a nite collection of propositions are pairwise disjoint, we denote
their disjunction by X
p
p2

Truth assignments turn out to be additive over collections of pairwise disjoint proposi-
tions.
D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1475

Lemma 2111 Let be a nite collection of pairwise disjoint proposition. For each truth
assignment v 2 V , we have 0 1
X X
v@ pA = v (p)
p2 p2

Proof Let v 2 V . Since the elements of are pairwise disjoint propositions, at most one
of them is true. That is, either v (p) = 0 for all p 2 (all propositions in are false) or
there exists p 2 such that v (p) = 1 and v (q) = 0 for all p 6= q 2 . In the former
P P P
case, proposition p2 p is also false, so v p2 p = 0 = p2 v (p). In the latter case,
P P P
proposition p2 p is true, so v p2 p = 1 = p2 v (p).

Inspired by Proposition 2110, we single out a key class of collections of pairwise disjoint
propositions. Speci cally, we say that a nite collection of propositions is a partition if its
elements are pairwise disjoint and if their disjunction is a tautology. In symbols,

p^q 0

for all distinct p and q in , and X


p 1
p2

Among collections of pairwise disjoint propositions, partitions thus have the extra prop-
erty that its elements cannot be all false at the same time. So, it is an exhaustive collection.
Clearly, a binary collection fp; :pg is the most basic partition. Actually, by Proposition
2110 a binary collection is a partition if and only if it has this form, i.e., each element is
the negation of the other. For an example of a non-binary partition, consider again our city
whose the oldest person is 100 years old. The propositions pn given by \our fellow citizen
Mario is n years old" form a partition, with n = 0; :::; 100.16
The elements of a partition have two key features: they are mutually exclusive { at
most one of them is true under any truth assignment v { and exhaustive { at least one of
them is true under v. Indeed, this is what characterizes partitions, as next we show.

Proposition 2112 A nite collection of propositions is a partition if and only if one and
only proposition in is true, that is, for each truth assignment v 2 V , there exists p 2
such that v (p) = 1 and v (q) = 0 for all p 6= q 2 .

In view of this result, elements of a partition are called atoms. To know the true values
of a partition under a truth assignment v 2 V amounts to know which one of its atoms is
true under v (the others being then false automatically).

Proof \If". Assume that, for each v 2 V , there exists a p 2 such that v (p) = 1 and
v (q) = 0 for all p 6= q 2 . Let p0 ; p00 2 . In view of Lemma 2107, we have v (p0 ^ p00 ) =
min fv (p0 ) ; v (p00 )g = 0 for all v 2 V because at most one proposition between p0 and p00
is true under each v. So, v (p0 ^ p00 ) = 0 for all v 2 V , which implies that p0 ^ p00 is a
16
Here 0 is the age of a baby who is not yet 12 months old.
1476 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

contradiction, i.e., p0 ^ p00 0. We conclude that is a collection of pairwise disjoint


propositions.
P
On the other hand, again in view of Lemma 2107 we have v p2 p = maxp2 v (p) = 1
P
for all v 2 V because for each v there is a true proposition in . So, v p2 p = 1 for all
P P
v 2 V , which implies that p2 p is a tautology, i.e., p2 p 1. This completes the proof
that is a partition.
\Only if". Assume that is a partition. By the last lemma,
0 1
X X
0 v (p) = v @ pA = v (1) = 1 8v 2 V
p2 p2
P
So, p2 v (p) = 1 for all v 2 V . Since v (p) is either 0 or 1 for each p 2 , in this implies
that there exists one and only one p 2 such that v (p) = 1, as desired.

To make further progress in our atomic quest, observe that partitions can be re ned.
Speci cally, say that a partition 0 is ner than a partition (or that is coarser than 0 )
if, for each element p of there exists an element p0 of 0 that logically implies it, that is,
p0 j= p.

Proposition 2113 Atoms of a coarser partition are equivalent to a disjunction of atoms of


a ner partition.

Proof Let 0 and 0 be two partitions, with 0 being ner than . Let p 2 . Consider
the collection p of allPatoms p0 of 0 that logically imply p, i.e., 0
p = fp0 2 0 : p0 j= pg. It
is easy to check that p0 2 p p0 p.

So, if we know the true values of the ner partition 0 under a truth assignment v 2 V {
that is, which atom of 0 is true { then we also know the true values of partition under v.
Atoms of a ner partition can be thus regarded as more \fundamental". This naturally raises
a question: does there exist a nest partition? Indeed, its atoms could be then regarded as
genuine, irreducible, logical atoms.
The next result provides an answer to this important question when P is nite. We leave
the easy proof to the reader.

Proposition 2114 A nite collection P = fp1 ; :::; pn g of propositions, closed with respect
to the logical operations _, ^ and :, admits a nest partition. Its atoms have the form

pi11 ^ pi22 ^ ^ pinn (D.11)

where, for each k = 1; :::; n, we have

:pk if ik = 0
pikk =
pk if ik = 1

For instance, for the binary case P = fp1 ; p2 g we have the 4 atoms

p1 ^ p2 ; p1 ^ :p2 ; :p1 ^ p2 ; :p1 ^ :p2


D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1477

Atoms (D.11) that are di erent from 0 (i.e., non-contradictory) are called constituents of P .
Denote by P their collection. Its cardinality is at most 2n (which is attained when all atoms
are di erent from 0).
The constituents are the ultimate, irreducible, logical components of the collection P ,
its \logical atoms". In view of Proposition 2113, each proposition p in P is equivalent to a
disjunction of constituents, so can be expressed in their terms. Speci cally, we have
X
p= c 8p 2 P (D.12)
c2P:cj=p

which is called canonical form of p. So, each proposition p 2 P can be retrieved from the
constituents via its canonical form. Moreover, (D.12) implies that, for each truth assignment
v 2 V , we have X
v (p) = v (c) 8p 2 P (D.13)
c2P:cj=p

Once we know the truth values of the constituents { i.e., which one of the constituents is true
{ via this formula we can recover the true values of all propositions in P under any truth
assignment v 2 V . So, if a truth assignment v is a possible con guration of the empirical
reality described by P , each such con guration is uniquely pinned down by the constituents'
values
v (P) = fv (c) : c 2 Pg
Di erent con gurations corresponds to di erent constituents' values, which are all one needs
to know to retrieve the values that a con guration assigns to all propositions in P .
Summing up, both syntactically { via the disjunctions in formula (D.12) { and semanti-
cally { via the sums in formula (D.13) { the constituents are the logical elementary particles
of P .

D.6.3 Logic of certainty


Propositions can be regarded as statements can, if turned into questions, admit only two
answers: \yes" or \no". For example, the question \did in the year 1965 rain in Milan?"
admits only two possible answers: \yes, it rained" and \no, it did not rain".
Suppose that an agent knows a few answers, that is, the truth values of a few proposi-
tions. Which other questions can the agent answer, that is, the truth value of which other
propositions can he infer? This is the important question that we want to address here, with
our logical tools.
Speci cally, suppose that E = fe1 ; :::; em g is the collection of propositions in P whose
truth values
v (E) = fv (e1 ) ; :::; v (ek )g
the agent knows, whatever is the truth assignment v representing a possible con guration of
the empirical reality described by P . For instance, these propositions may describe a part
of the entire empirical reality whose con guration { i.e., the truth values of its elements { is
relevant for the outcome of a decision that the agent has to make.
For this family E = fe1 ; :::; em g, we de ne a partition whose atoms have the form

ei11 ^ ei22 ^ ^ eimm (D.14)


1478 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

where, for each k = 1; :::; m, we have


:ek if ik = 0
eikk =
ek if ik = 1
Atoms (D.14) di erent from 0 are called constituents of E. Denote by E their collection.
Clearly, if E is the entire collection P we are back to the constituents (D.11) of P . In general,
E is a partition coarser than P, with cardinality at most 2m .
The truth values of the elements of E determine the true values of the constituents via
the formula
v ei11 ^ ei22 ^ ^ eimm = min v eik 8v 2 V
k=1;:::;m

The converse is also true because, in analogy with (D.12), for each v 2 V we have
X
v (e) = v (c) 8e 2 E
c2C:cj=e

So, the knowledge of the truth values of the elements of E amounts to that of its constituents.
That said, for each proposition p 2 P de ne the set Ep = fc 2 E : c j= pg of constituents
that logically imply p. We say that p is certain (under E) if Ep 6= ; and
X
p c
c2Ep

Because of the formula X


v (p) = v (c) 8v 2 V
c2Ep

there is no uncertainty about the truth value of p once one knows the truth values of the
constituents, so of the elements of E.
De ne by A the collection of all propositions that are certain, that is,
( )
X
A = p 2 P : 9; 6= B E; p c
c2B

In view of what just observed, our agent knows all the answers for the propositions in A,
there is no uncertainty about them being true or false. So, A is the collection of propositions
whose truth value the agent can infer from the knowledge of the propositions in E.
In contrast, this is no longer the case for propositions that do not belong to A, their
truth values are unknown, so uncertain, to him. To talk about them we need probability
theory, as readers will learn in other courses.
A moment's re ection shows, however, that A should consist of all propositions that
either can be constructed from the elements of E via the logical operations _, ^ and :
or that are equivalent to propositions that are constructed in this way (e.g., recall (D.4)
for conditionals). Indeed, the truth value of any such proposition, say :e1 _ (e2 ^ e3 ), is
automatically known via truth tables once the truth values of the elements of E are known.
Though we do not pursue it analytically, this heuristic remark should nevertheless shed
further light on the nature of A, which can be constructed by carrying out all possible
logical operations on the elements of E.
Be that as it may, now we know what our agent can infer with certainty from his knowl-
edge and what remains, instead, uncertain for him.
D.7. PREDICATES AND QUANTIFIERS 1479

D.7 Predicates and quanti ers


D.7.1 Generalities
The symbols 8 and 9 mean respectively \for every" and \there exists (at least one)" and
are called the universal quanti er and the existential quanti er . Their role is fundamental
in mathematics. For example, the statement x2 = 1 is, per se, meaningless. By completing
it by writing
8x 2 R, x2 = 1 (D.15)
we would make a big mistake; by writing, instead,

9x 2 R, x2 = 1 (D.16)

we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.

To understand the role of quanti ers, we consider expressions { called (logical ) predicates
and denoted by p (x) { that contain an argument x that varies in a given set X, the domain
(or universe of discourse). For example, the predicate p (x) can be \x2 = 1" or \in the
year x it rained in Milan". Once a speci c value of the domain x is considered, we have a
proposition p (x) that may be either true or false. For instance, if X is the real line and
x = 3, the proposition \x2 = 1" is false; it becomes true if and only if x = 1.
The propositions
9x 2 X, p (x) (D.17)
and
8x 2 X, p (x) (D.18)
mean that p(x) is true at least for some x in the domain and that p(x) is true for every
such x, respectively. For example, when p (x) is \x2 = 1" propositions (D.17) and (D.18)
reduce, respectively, to propositions (D.15) and (D.16), while for the weather predicate they
become the propositions \there exists a year in the last century in which it rained in Milan"
and \every year in the last century it rained in Milan" (here X is the set of the last century
years).
Note that when the domain is nite, say X = fx1 ; :::; xk g, the propositions (D.17) and
(D.18) can be written as p (x1 ) _ _ p (xk ) and p (x1 ) ^ ^ p (xk ), respectively.
Quanti ers transform, therefore, predicates in propositions, that is, in statements that
are either true or false. If X is in nite, however, to verify whether proposition (D.18) is
true requires an in nite number of checks: for each x 2 X we have to verify whether p (x)
is true. Operationally, such truth value cannot be determined and so universal propositions
are typically not veri able.
In contrast, to verify whether (D.18) is false is enough to exhibit a single x 2 X such
that p (x) is false. Though to come up with such an element might not obvious at all, still
there is a clear asymmetry between the operational content of the two truth values of (D.18).
One actually often confronts propositions like \8x 2 X, p1 (x) ^ ^ pn (x)", the universal
version of the propositions p1 ^ ^ pn discussed in the Intermezzo (when talking about
veri ability and falsi ability). In this case, a large n further magni es the asymmetry.
1480 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

A dual asymmetry holds for the existential quanti er. The existential proposition (D.17)
can be veri ed via an element x 2 X such that p (x) is true. Of course, if X is large (let
alone if it is in nite), it may be operationally not obvious how to nd such an element. Be
that as it may, falsi cation is in a much bigger trouble: to verify that proposition (D.17)
is false we should check that, for all x 2 X, the proposition p (x) is false. Operationally,
existential propositions are typically not falsi able.
The following table summarizes our operational discussion:

Falsi able Veri able


Universal propositions Yes No
Existential propositions No Yes
N.B. (i) In the book we often write \p (x) for every x 2 X" in the form

p (x) 8x 2 X

instead of
8x 2 X; p (x)
It is a common way to handle universal quanti ers. (ii) If X = X1 Xn is a Cartesian
product, the predicate takes the form p (x1 ; :::; xn ) because x = (x1 ; :::; xn ). O

D.7.2 Algebra
In a sense, 8 and 9 represent the negation of one another. So,17

: (9x, p (x)) 8x; :p (x)

and, symmetrically,
: (8x, p (x)) 9x, :p (x)
In the example where p (x) is \x2 = 1", we can equally well write:

: 8x, x2 = 1 or 9x, x2 6= 1

(respectively: it is not true that x2 = 1 for every x and it is true that for some x one has
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)
For example, let p (x; y) be the proposition \x + y 2 = 0". We can equally assert that

: 8x; 9y, x + y 2 = 0

(it is not true that, for every x 2 R, we can nd a value of y 2 R such that the sum x + y 2
is zero: it is su cient to take x = 5) or

9x; 8y, x + y 2 6= 0
17
To ease notation, in the quanti ers we omit the clause \2 X".
D.7. PREDICATES AND QUANTIFIERS 1481

(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su cient to take x 6= y 2 ).
Note that the last few lines show that quanti ers permit to reduce binary predicates to
predicates or, even, to proposition. For instance, if the domain of p (x; y) consists of a group
of people X and it is interpreted as x is the friend of y, then 9y; p (x; y) is the predicate \x
has a friend", while 8y; 9y; p (x; y) is the proposition \each x has a friend".

D.7.3 Example: linear dependence


m
In Chapter 3 a nite set of vectors xi i=1
of Rn has been called linearly independent if, for
every set f i gm
i=1 of real numbers,

1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
1 2 m
1x + 2x + + m x = 0.
We can write these notions by making the role of predicates explicit. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates \ 1 x1 + 2 x2 + + m xm = 0" and \ 1 = 2 = = m = 0",
m
respectively. The set xi i=1 is linearly independent when

8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)

In words, for every set f i gm


i=1 of real numbers, if 1x
1 + 2x
2 + + mx
m = 0, then
1 = 2 = = m = 0.
The negation is

9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))

that is, thanks to the equivalence (D.5),

9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)

In words, there exists a set f i gm


i=1 of real numbers, not all null, such that 1x
1 + 2x
2 +
+ m xm = 0.

D.7.4 Example: negation of convergence


What is the correct negation of the de nition of convergence? Recall that a sequence fxn g
converges to a point L 2 R if for every " > 0 there exists n" 1 such that

n n" =) jxn Lj < " (D.19)

By expliciting all quanti ers, we can succinctly write

8" > 0; 9n" 1; 8n n" ; jxn Lj < "

The negation is then


9" > 0; 8k 1; 9n k; jxn Lj "
1482 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC

In other words, a sequence fxn g does not converge to a point L 2 R if there exists " > 0
such that for each k 1 there is n k such that

jxn Lj "

By denoting by nk any such n k,18 we de ne a subsequence fxnk g such that jxnk Lj "
for all k 1. So, we have the following useful characterization of non-convergence to a given
point.

Proposition 2115 A sequence fxn g does not converge to a point L 2 R if and only if there
is a subsequence fxnk g such that jxnk Lj " for all k 1.

D.7.5 Binary predicates


Predicates have a single argument, but it is natural to think of expressions that may depend
on multiple arguments. To illustrate, we consider the binary case but everything extends to
any nite number of arguments.
A binary (logical ) predicate p (x; y) is an expression that contains two arguments x and y
that vary in two domains X and Y . For instance, a binary predicate p (x; y) can be \x y"
or \in the year x it rained in city y" or \x is the mother of y" or, in a consumer theory
context, \bundle x is at least as good as y."
For speci c values x and y, we have a proposition p (x; y). For instance, the proposition
\3 4" is false, while the proposition \4 3" is true. Now the quanti ers lead to two main
types of propositions:
9x 2 X; 8y 2 Y , p (x; y)

which is true if there exists x 2 X such that p(x; y) is true for all y 2 Y , and

8x 2 X, 9y 2 Y , p (x; y)

which is true if for all x 2 X there exists y 2 Y such that p(x; y) is true.
For instance, with X = Y = R, the proposition

9x 2 R; 8y 2 R, x y

is false because it states that \there is a greatest scalar," while the proposition

8x 2 R; 9y 2 R, x y

is true because it states that \each scalar has a smaller one."


18
The construction of this subsequence is, actually, a bit delicate. Indeed, for fxnk g to be a subsequence
we need to construct nk so that n1 < n2 < < nk < nk+1 < . To start with, note that if fxn g does not
converge to L, then for each m 1 the set N (m) = fn 1 : n m and jxn Lj "g is non-empty. De ne
then n1 = min N (1) and, recursively, nk+1 = min N (nk + 1) for every k. Since each N (m) is non-empty, nk
is well de ned.
D.7. PREDICATES AND QUANTIFIERS 1483

D.7.6 A set-theoretic twist


There is a close connection between predicates and sets. Indeed, any predicate p (x) can be
identi ed with the set A of all elements x of X such that the proposition p (x) is true, i.e.,
A = fx 2 X : p (x) is trueg. Indeed, we clearly have

p (x) is true () x 2 A

So, predicates and sets are two sides of the same coin. Indeed, predicates formalize the
speci cation of sets via a property that its elements have in common, as we mentioned at
the very beginning of the book. The set fx 2 X : p (x) is trueg is called the extension of p.
In a similar vein, a binary predicate p (x; y) with two arguments that belong to the same
set X can be identi ed with the binary relation R X Y consisting of all pairs (x; y) such
that the proposition p (x; y) is true, i.e., R = f(x; y) 2 X Y : p (x; y) is trueg. Indeed,

p (x; y) is true () xRy

We conclude that binary relations are the extensions, so the set-theoretic counterparts, of
binary predicates.

Example 2116 (i) If X is a set of years and Y a set of cities, binary predicate p (x; y) given
by \in the year x it rained in city y" corresponds to the binary relation

R = f(x; y) 2 X Y : in the year x it rained in city yg

(ii) Let X = Y = N. The binary predicate p (x; y) given by \x y" corresponds to the
binary relation
= f(x; y) 2 N N : x is greater or equal than yg
(iii) Let C be the set of all citizens of a country. If X = Y = C, the binary predicate
p (x; y) given by \ x is the mother of y" corresponds to the binary relation

R = f(x; y) 2 C C : x is the mother of yg

It contains all pairs in which the rst element is the mother of the second element.
(iv) Let Rn+ be the set of all consumption bundles. If X = Y = Rn , the binary predicate
p (x; y) given by \bundle x is at least as good as y" corresponds to the binary relation

%= (x; y) 2 Rn+ Rn+ : x % y

It contains all pairs of bundles in which the rst bundle is at least as good as the second
one. N

In general, predicates with n arguments can be identi ed with n-ary relations, as readers
will learn in more advanced courses. In any case, the set-theoretic translations of some key
logical notions is a further wonder of Cantor's paradise.
1484 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Appendix E

Mathematical induction
(sdoganato)

E.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su cient to show that the \initial"
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
\subsequent" one p (n + 1). Next we formalize this domino argument:1

Theorem 2117 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:

(i) p (1) is true;


(ii) for each n, if p(n) is true, then p(n + 1) is true.
Then, proposition p (n) is true for each n.

Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.

A proof by induction thus consists of two steps:

(i) Initial step: prove that the proposition p (1) is true.


(ii) Induction step: prove that, for each n, if p(n) is true (induction hypothesis), then
p(n + 1) is true.

We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The rst has the \right scarlet fever", a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.

1485
1486 APPENDIX E. MATHEMATICAL INDUCTION (SDOGANATO)

(i) We have
n
X n (n + 1)
1+2+ +n= s=
2
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k (induction hypothesis), that is,
k
X k (k + 1)
s=
2
s=1

We must prove that it is true also for n = k + 1, i.e., that


k+1
X (k + 1) (k + 2)
s=
2
s=1

Indeed3
k+1
X k
X k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
2 2
s=1 s=1

In particular, the sum of the rst n odd numbers is n2 :


n
X n
X n
X n (n + 1)
(2s 1) = 2 s 1=2 n = n2
2
s=1 s=1 s=1

(ii) We have
n
X
2 2 2 n (n + 1) (2n + 1)
1 +2 + +n = s2 =
6
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above, we get:
k+1
X k
X k (k + 1) (2k + 1)
2
s = s2 + (k + 1)2 = + (k + 1)2
6
s=1 s=1
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as claimed.
3
Alternatively, this sum can be derived by observing that the sum of the rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n+1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
E.2. THE HARMONIC MENGOLI 1487

(iii) We have
n n
!2
X X n2 (n + 1)2
13 + 23 + + n3 = s3 = s =
4
s=1 s=1

Initial step. For n = 1 the property is trivially true:

12 (1 + 1)2
13 =
4
Induction step. By proceeding as above, we get:
k+1
X k
X k 2 (k + 1)2
s3 = s3 + (k + 1)3 = + (k + 1)3
4
s=1 s=1
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4

(iv) Consider the sum


n
X 1 qn
a + aq + aq 2 + + aq n 1
= aq s 1
=a
1 q
s=1

of n terms in the geometric progression with rst term a and common ratio q 6= 1.
Initial step. For n = 1 the formula is trivially true:

1 q
a=a
1 q

Induction step. By proceeding as above, we get


k+1
X k
X 1 qk
aq s 1
= aq s 1
+ aq k = a + aq k
1 q
s=1 s=1
1 q k + (1 q) q k 1 q k+1
=a =a
1 q 1 q

as claimed.

E.2 The harmonic Mengoli


As a last illustration of the induction principle, we report a modern version of the classic
proof by Pietro Mengoli of the divergence of the harmonic series { presented in his 1650
essay Novae quadraturae arithmeticae seu de additione fractionum.

Theorem 2118 The harmonic series is divergent.

The proof is based on a couple of lemmas, the second of which is proven by induction.
1488 APPENDIX E. MATHEMATICAL INDUCTION (SDOGANATO)

Lemma 2119 We have, for every k 2,


1 1 1 3
+ +
k 1 k k+1 k

Proof Consider the convex function f : (0; 1) ! (0; 1) de ned by f (x) = 1=x. Since
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
Jensen's inequality implies

1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1

as claimed.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.

Lemma 2120 s3n+1 sn + 1 for every n 1.

Proof We proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3:
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3

Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,

1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
which completes the induction step. In conclusion, the result holds thanks to the induction
principle.

Proof of the theorem Since the harmonic series has positive terms, the sequence of its
partial sums fsn g is monotone increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n

which is a contradiction.
Appendix F

Cast of characters

Niels Abel (Finn y 1802 { Froland 1829), mathematician.


Maria Agnesi (Milan 1718 { 1799), mathematician.
Archimedes (Syracuse 287 BC ca. { 212 BC), mathematician.
Aristotle (Stagira 384 BC { Euboea 322 BC), philosopher and physicist.
Kenneth Arrow (New York 1921 { Palo Alto 2017), economist.
Emil Artin (Vienna 1898 { Hamburg 1962), mathematician.
Cesare Arzela (Santo Stefano di Magra 1847 { 1912), mathematician.
Rene Baire (Paris 1874 { Chambery 1932), mathematician.
Stefan Banach (Krakow 1892 { Lviv 1945), mathematician.
Isaac Barrow (London 1630 { 1677), mathematician.
Heinz Bauer (Nuremberg 1928 { Erlangen 2002), mathematician.
Jeremy Bentham (London 1748 { 1832), philosopher.
Daniel Bernoulli (Groningen 1700 { Basel 1782), mathematician.
Jakob Bernoulli (Basel 1654 { 1705), mathematician.
Johann Bernoulli (Basel 1667 { 1748), mathematician.
Sergei Bernstein (Odessa 1880 { Moscow 1968), mathematician.
Jacques Binet (Rennes 1786 { Paris 1856), mathematician.
David Blackwell (Centralia 1919 { Berkeley 2010), mathematician and statistician.
Danilo Blanusa (Osijek 1903 { Zagreb 1987), mathematician.
Bernard Bolzano (Prague 1781 { 1848), mathematician and philosopher.
Emile Borel (Saint-A rique 1871 { Paris 1956), mathematician.
Francesco Brioschi (Milan, 1824 { 1897), mathematician.
Luitzen Brouwer (Overschie 1881 { Blaricum 1966), mathematician and philosopher.
Felix E. Browder (Moscow 1927 { Princeton 2016), mathematician.
Cesare Burali-Forti (Arezzo 1861 { Turin 1931), mathematician.
Renato Caccioppoli (Naples 1904 { 1959), mathematician.
Georg Cantor (Saint Petersburg 1845 { Halle 1918), mathematician.

1489
1490 APPENDIX F. CAST OF CHARACTERS

Alfredo Capelli (Milan 1855 { Naples 1910), mathematician.


Gerolamo Cardano (Pavia 1501 { Rome 1576), mathematician.
Augustin-Louis Cauchy (Paris 1789 { Sceaux 1857), mathematician.
Bonaventura Cavalieri (Milan 1598 { Bologna 1647), mathematician.
Ernesto Cesaro (Naples 1859 { Torre Annunziata 1906), mathematician.
Oscar Chisini (Bergamo 1889 { Milan 1967), mathematician.
Gustave Choquet (Solesmes 1915 { Lyon 2006), mathematician.
Lothar Collatz (Arnsberg 1910 { Varna 1990), mathematician.
Gabriel Cramer (Geneva 1704 { Bagnols-sur-Ceze 1752), mathematician.
Jean Darboux (Nimes, 1842 { Paris 1917), mathematician.
Gerard Debreu (Calais 1921 { Paris 2004), economist and mathematician.
Richard Dedekind (Braunschweig 1831 { 1916), mathematician.
Democritus (Abdera 460 BC ca. { 370 BC ca.), philosopher.
Rene Descartes (Cartesius) (La Haye 1596 { Stockholm 1650), mathematician and philoso-
pher.
Diophantus (Alexandria, II - III century BC), mathematician.
Ulisse Dini (Pisa 1845 { 1918), mathematician.
Paul Dirac (Bristol 1902 { Tallahassee 1984), physicist.
Peter Lejeune Dirichlet (D•
uren 1805 { G•ottingen 1859), mathematician.
Paul Du Bois-Reymond (Berlin 1831 { Freiburg 1889), mathematician.
Francis Edgeworth (Edgeworthstown 1845 { Oxford 1926), economist.
Epicurus (Samos 341 BC { Athens 270 BC), philosopher.
Euclid (Alexandria, IV - III century BC), mathematician.
Eudoxus (Cnidus, IV centry BC), mathematician.
Paul Erdos (Budapest 1913 { Warsaw 1996), mathematician.
Leonhard Euler (Basel 1707 { Saint Petersburg 1783), mathematician.
Leonardo da Pisa (Fibonacci) (Pisa ca. 1170 { ca. 1240), mathematician.
Francesco Faa di Bruno (Alessandria 1825 { Turin 1888), mathematician.
Werner Fenchel (Berlin 1905 { Copenhagen 1988), mathematician.
Pierre de Fermat (Beaumont-de-Lomagne 1601 { Castres 1665), lawyer and mathemati-
cian.
Gaetano Fichera (Acireale 1922 { Roma 1996), mathematician.
Bruno de Finetti (Innsbruck 1906 { Rome 1985), mathematician and statistician.
Nicolo Fontana (Tartaglia) (Brescia 1499 { Venice 1557), mathematician.
Maurice Frechet (Maligny 1878 { Paris 1973), mathematician.
Ferdinand Frobenius (Charlottenburg 1849 { Berlin 1917), mathematician.
Galileo Galilei (Pisa 1564 { Arcetri 1642), astronomer and physicist.
Carl Gauss (Brunswick 1777 { Gottingen 1855), mathematician.
1491

J rgen Gram (Nustrup 1850 { Copenaghen 1916), mathematician.


Guido Grandi (Cremona 1671 { Pisa 1742), mathematician.
Jacques Hadamard (Versailles 1865 { Paris 1963), mathematician.
Philip Hartman (Baltimore 1915 { 2015), mathematician.
Felix Hausdor (Breslau 1868 { Bonn 1942), mathematician.
David Hawkins (El Paso 1913 { Boulder 2002), philosopher.
Heinrich Heine (Berlin 1821 { Halle 1881), mathematician.
Heron (Alexandria I century AD), mathematician.
John Hicks (Warwick 1904 { Blockley 1989), economist.
David Hilbert (K•onigsberg 1862 { Gottingen 1943), mathematician.
Einar Hille (New York 1894 { La Jolla 1980), mathematician.
opital (Paris 1661 { 1704), mathematician.
Guillaume de l'H^
Hippocrates (Chios, V century BC), mathematician.
Kiyoshi Ito (Inabe 1915 { Kyoto 2008), mathematician.
Carl Jacobi (Potsdam 1804 { Berlin 1851), mathematician.
Johan Jensen (Nakskov 1859 { Copenhagen 1925), mathematician.
B rge Jessen (Copenhagen 1907 { 1993), mathematician.
William Jevons (Liverpool 1835 { Bexill 1882), economist.
Camille Jordan (Lyon 1838 { Paris 1922), mathematician.
Shizuo Kakutani (Osaka 1911 { New Haven 2004), mathematician.
Jovan Karamata (Zagreb 1902 { Geneva 1967), mathematician.
Tjalling Koopmans ('s-Graveland 1910 { New Haven 1985), economist.
Leopold Kronecker (Liegnitz 1823 { Berlin 1891), mathematician.
Harold Kuhn (Santa Monica 1925 { New York 2014), mathematician.
Muh.ammad al-Khuwarizm (750 ca { Baghdad 850 ca), astronomer and mathematician.
Giuseppe (Joseph) Lagrange (Turin 1736 { Paris 1813), mathematician.
Gabriel Lame (Tours 1795 { Paris 1870), mathematician.
Edmund Landau (Berlin 1877 { 1938), mathematician.
Pierre-Simon de Laplace (Beaumont-en-Auge 1749 { Paris 1827), mathematician and
physicist.
Adrien-Marie Legendre (Paris 1752 { 1833), mathematician.
Gottfried Leibniz (Leipzig 1646 { Hannover 1716), mathematician and philosopher.
Wassily Leontief (Saint Petersburg 1905 { New York 1999), economist.
Rudolph Lipschitz (Konigsberg 1832 { Bonn 1903), mathematician.
John Littlewood (Rochester 1885 { Cambridge 1977), mathematician.
Colin Maclaurin (Kilmodan 1698 { Edinburgh 1746), mathematician.
Andrej Markov Jr. (Saint Petersburg 1903 { Mosow 1979), mathematician.
Lorenzo Mascheroni (Bergamo, 1750 { Paris, 1800), mathematician.
1492 APPENDIX F. CAST OF CHARACTERS

Melissus (Samos V century BC), philosopher.


Pietro Mengoli (Bologna 1626 { 1686), mathematician.
Nikolaus Mercator (Eutin 1620 { Paris 1687), mathematician.
Marin Mersenne (Oize 1588 { Paris 1648), mathematician and physicist.
Franz Mertens (Sroda Wielkopolska 1840 { Vienna 1927), mathematician.
Paul-Andre Meyer (Boulogne-Billancourt 1934 { 2003), mathematician.
Hermann Minkowski (Aleksotas 1864 { Gottingen 1909), mathematician.
George Minty (Detroit 1929 { Bloomington 1986), mathematician.
Carlo Miranda (Naples 1912 { 1982), mathematician.
Abraham de Moivre (Vitry-le-Francois 1667 { London 1754), mathematician.
Oskar Morgenstern (G•orlitz 1902 { Princeton 1977), economist.
John Napier (Edinburgh 1550 { 1617), mathematician.
John Nash (Blue eld 1928 { Monroe 2015), mathematician.
Isaac Newton (Woolsthorpe 1642 { London 1727), mathematician and physicist.
Vilfredo Pareto (Paris 1848 { Celigny 1923), economist and sociologist.
Parmenides (Elea VI century BC), philosopher.
Giuseppe Peano (Spinetta di Cuneo 1858 { Turin 1932), mathematician.
Plato (Athens 484 BC ca. { 348 BC ca.), philosopher.
John Pratt (Boston 1931), economist.
Alfred Pringsheim (Olawa 1850 { Zurich 1941), mathematician.
Pythagoras (Samos 570 BC ca. { Metapontum 495 BC ca.), mathematician and philoso-
pher.
Henri Poincare (Nancy 1854 { Paris 1912), mathematician.
Simeon-Denis Poisson (Pithiviers, 1781 { Paris 1840), mathematician.
Hans Rademacher (Wandsbeck 1892 { Haverford 1969), mathematician.
Hudalricus Regius (Ulrich Rieger) (XVI century), mathematician.
Giovanni Ricci (Florence 1904 { Milan 1973), mathematician.
Bernhard Riemann (Breselenz 1826 { Selasca 1866), mathematician.
Frigyes Riesz (Gy}or 1880 { Budapest 1956), mathematician.
Michel Rolle (Ambert 1652 { Paris 1719), mathematician.
Bertrand Russell (Trellech 1872 { Penrhyndeudraeth 1970), philosopher.
Hermann Schwarz (Hermsdorf, 1843 { Berlin 1921), mathematician.
Herbert Simon (Milwaukee 1916 { Pittsburgh 2001), social scientist.
Eugen Slutsky (Yaroslav 1880 { Moscow 1948), economist and mathematician.
Baruch Spinoza (Amsterdam 1632 { The Hague 1677), philosopher.
Guido Stampacchia (Naples, 1922 { Paris, 1978), mathematician.
James Stirling (Garden 1692 { Edinburgh 1770), mathematician.
Thomas Stieltjes (Zwolle 1856 { Toulouse 1894), mathematician.
1493

Otto Stolz (Hall in Tirol 1842 { Innsbruck 1905), mathematician.


Marshall Stone (New York 1903 { Madras 1989), mathematician.
James Sylvester (London 1814 { 1897), mathematician.
Alfred Tarski (Warsaw 1902 { Berkeley 1983), mathematician.
Brook Taylor (Edmonton 1685 { London 1731), mathematician.
Carl Thomae (Laucha an der Unstrut 1840 { Jena 1921), mathematician.
Leonida Tonelli (Gallipoli 1885 { Pisa 1946), mathematician.
Evangelista Torricelli (Rome 1608 { Florence 1647), mathematician and physicist.
Albert Tucker (Oshawa 1905 { Hightstown 1995), mathematician.
Stanislaw Ulam (Lwow 1909 { Santa Fe 1984), mathematician.
Charles-Jean de la Vallee Poussin (Leuven 1866 { 1962), mathematician.
Alexandre-Theophile Vandermonde (Paris 1735 { 1796), mathematician.
Vito Volterra (Ancona 1860 { Rome 1940), mathematician.
Janos (John) von Neumann (Budapest 1903 { Washington 1957), mathematician.
Leon Walras (Evreux 1834 { Clarens-Montreux 1910), economist.
Karl Weierstrass (Ostenfelde 1815 { Berlin 1897), mathematician.
Ludwig Wittgenstein (Vienna 1889 { Cambridge 1951), philosopher.
Zeno (Elea V century BC), philosopher.
Yitang Zhang (Shanghai 1955), mathematician.
1494 APPENDIX F. CAST OF CHARACTERS
Bibliography

[1] Niels Abel, Untersuchungen u•ber die Reihe, Journal f•


ur die Reine und Angewandte
Mathematik, 1, 311-339, 1826.

[2] Maria G. Agnesi, Instituzioni analitiche ad uso della gioventu italiana, Regia Ducal
Corte, Milano, 1748 (trans. as Analytical Institutions, Taylor and Wilks, London, 1801).

[3] Tom M. Apostol, Mathematical analysis, 2nd ed., Addison Wesley, Boston, 1974.

[4] Richard M. Aron, Luis Bernal Gonzalez, Daniel M. Pellegrino, Juan B. Seoane Sepul-
veda, Lineability, CRC Press, Boca Raton, 2015.

[5] Kenneth J. Arrow, Essays in the theory of risk-bearing, Markham, Chicago, 1971.

[6] Kenneth J. Arrow, Methodological individualism and social knowledge, American Eco-
nomic Review, 84, 1-9, 1994.

[7] Emil Artin, The gamma function, Holt, Rinehart and Winston, New York, 1964.

[8] Rene Baire, Sur les fonctions de variables reelles, Annali di Matematica Pura ed Ap-
plicata, 3, 1-123, 1899.

[9] Rene Baire, Sur l'origine de la notion de semi-continuite, Bulletin de la Societe


Mathematique de France, 55, 141-142, 1927.

[10] Jonathan Barnes, The Presocratic philosophers, Routledge, Lodon, 1982.

[11] Claude Berge, Espaces topologiques et fonctions multivoques, Dunod, Paris, 1959 (trans.
as Topological spaces, Oliver and Boyd, Edinburgh, 1963).

[12] Daniel Bernoulli, Specimen theoriae novae de mensura sortis, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1738 (trans. in Econometrica, 1954).

[13] Luitzen E. J. Brouwer, Uber abbildung von mannigfaltikeiten, Mathematische Annalen,
71, 97-115, 1912.

[14] Guido Calogero, Studi sull'Eleatismo, La Nuova Italia, Firenze, 1977.

[15] Maria Cardini Timpanaro, Pitagorici, La Nuova Italia, Firenze, 1964.

[16] Rudolf Carnap, Testability and meaning, Philosophy of Science, 3, 419-471, 1936.

[17] Augustin-Louis Cauchy, Cours d'Analyse, Imprimerie Royale, Paris, 1821.

1495
1496 BIBLIOGRAPHY

[18] Ernesto Cesaro, Sur la convergence des series, Nouvelles Annales de Mathematiques,
7, 49-59, 1888.

[19] Ernesto Cesaro, Sur la multiplication des series, Bulletin des Sciences Mathematiques,
14, 114-120, 1890.

[20] Ernesto Cesaro, Corso di analisi algebraica con introduzione al calcolo in nitesimale,
Bocca, Torino, 1894.

[21] Oscar Chisini, Sul concetto di media, Periodico di Matematiche, 4, 106-116, 1929.

[22] John H. Cochrane, Asset pricing, Princeton University Press, Princeton, 2005.

[23] Giorgio Colli, La nascita della loso a, Adelphi, Milano, 1975.

[24] Gerard Debreu, De nite and semide nite quadratic forms, Econometrica, 295-300,
1952.

[25] Gerard Debreu, Theory of value, Yale University Press, New Haven, 1959.

[26] Nicolaas G. de Bruijn, Asymptotic methods in analysis, North-Holland, Amsterdam,


1961.

[27] Bruno de Finetti, Sul concetto di media, Giornale dell'Istituto Italiano degli Attuari,
2, 369-396, 1931.

[28] Bruno de Finetti, Sulle strati cazioni convesse, Annali di Matematica Pura e Applicata,
30, 173-183, 1949.

[29] Bruno de Finetti, Sulla preferibilita, Giornale degli economisti, NS 11, 685-709, 1952
(trans. in Giornale degli Economisti, 2012).

[30] Rene Descartes, Geometrie, Leiden, 1637 (trans. in D. Smith and M. L. Latham, The
Geometry of Rene Descartes, Dover, New York, 1954).

[31] Peter Lejeune Dirichlet, Sur la convergence des series trigonometriques qui servent a
representer une fonction aribitraire entre des limites donnees, Journal f•
ur die Reine
und Angewandte Mathematik, 4, 157-169, 1829.

[32] Ivar Ekeland and Roger Temam, Convex analysis and variational problems, Siam,
Philadelphia, 1999.

[33] Federigo Enriques, Sul procedimento di riduzione all'assurdo, Bolletino Mathesis, 11,
6-14, 1919.

[34] Paul Erdos, The di erence of consecutive primes, Duke Mathematical Journal, 6, 438-
441, 1940.

[35] Leonhard Euler, Variae observationes circa series in nitas, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1744.

[36] William Feller, An introduction to probability theory and its applications, 2nd ed., Wiley,
New York, 1971.
BIBLIOGRAPHY 1497

[37] Werner Fenchel, Convex cones, sets, and functions, Princeton University Press, Prince-
ton, 1953.

[38] Irving Fisher, The application of mathematics to the social sciences, Bulletin of the
American Mathematical Society, 36, 225-243, 1930.

[39] Miroslav Fiedler, Special matrices and their applications in numerical mathematics,
Nijho Publishers, Dordrecht, 1986.

[40] Kurt von Fritz, The discovery of incommensurability by Hippasus of Metapontum,


Annals of Mathematics, 46, 242{264, 1945.

[41] Carl F. Gauss, Disquisitiones Arithmeticae, Fleischer, Lipsia, 1801 (trans. Yale Uni-
versity Press, New Haven, 1966).

[42] Izrail M. Gelfand and Mark Saul, Trigonometry, Birkhauser, Boston, 2001.

[43] Angelo Genocchi and Giuseppe Peano, Calcolo di erenziale e principii di calcolo inte-
grale, Fratelli Bocca, Roma, 1884.

[44] Enrico Giusti, Matematica e commercio nel Liber abaci, in Un ponte sul Mediterraneo
(E. Giusti, ed.), Edizioni Polistampa, Firenze, 2002.

[45] Stephane Gonnord and Nicolas Tosel, Calcul di erentiel, Ellipses, Paris, 1998.

[46] Peter Gordon, Numerical cognition without words: Evidence from Amazonia, Science,
306, 496-499, 2004.

[47] Harvey P. Greenberg and William P. Pierskalla, Quasiconjugate functions and sur-
rogate duality, Cahiers du Centre d'Etude de Recherche Operationelle, 15, 437-448,
1973.

[48] Paul Halmos, Naive set theory, Van Nostrand, Princeton, 1960.

[49] Godfrey H. Hardy, Orders of in nity, Cambridge University Press, Cambridge, 1910.

[50] Godfrey H. Hardy, Divergent series, Oxford University Press, Oxford, 1949.

[51] Godfrey H. Hardy, John E. Littlewood and George Polya, Inequalities, Cambridge
University Press, Cambridge, 1934.

[52] David Hawkins and Herbert A. Simon, Some conditions of macroeconomic stability,
Econometrica, 17, 245-248, 1949.

[53] Roger A. Horn and Charles R. Johnson, Matrix analysis, 2nd ed., Cambridge University
Press, Cambridge, 2013.

[54] Aleksandar Ivic, The Riemann zeta-function, Wiley, New York, 1985.

[55] Johan Jensen, Sur les fonctions convexes et les inegalites entre les valeurs moyennes,
Acta Mathematica, 30, 175-193, 1906.
1498 BIBLIOGRAPHY

[56] B rge Jessen, Bem rkninger om konvekse funktioner og uligheder imellem mid-
delv rdier, Matematisk Tidsskrift, B, 17-28 and 84-95, 1931.

[57] Warren P. Johnson, Combinatorics of higher derivatives of inverses, American Mathe-


matical Monthly, 109, 273-277, 2002.

[58] Camille Jordan, Cours d'analyse, v. 1, Gauthier-Villars, Paris, 1893.

[59] Shizuo Kakutani, A generalization of Brouwer's xed point theorem, Duke Mathemat-
ical Journal, 8, 457-459, 1941.

[60] Jovan Karamata, Uber • die Hardy-Littlewoodschen umkehrungen des abelschen


stetigkeitssatzes, Mathematische Zeitschrift, 32, 319-320, 1930.

[61] David Kinderlehrer and Guido Stampacchia, An introduction to variational inequalities


and their applications, Academic Press, New York, 1980.

[62] Konrad Knopp, In nite sequences and series, Dover, New York, 1956.

[63] Tjalling C. Koopmans, Stationary ordinal utility and impatience, Econometrica, 28,
287-309, 1960.

[64] Tjalling C. Koopmans, Representation of preference orderings over time, in Decision


and organization (C.B. McGuire and R. Radner, eds.), North-Holland, Amsterdam,
1972.

[65] Harold W. Kuhn and Albert W. Tucker, Nonlinear programming, Proceedings of the
Second Berkeley Symposium, 481-492, University of California Press, Berkeley, 1951.

[66] Steven G. Krantz and Harold R. Parks, A primer of real analytic functions, Birkhauser,
Boston, 2002.

[67] Wladyslaw Kulpa, The Poincare-Miranda theorem, American Mathematical Monthly,


104, 545-550, 1997.

[68] Joseph L. Lagrange, Theorie des fonctions analytiques, Ve. Courcier, Paris, 1813.

[69] Gabriel Lame, Note sur la limite du nombre des divisions dans la recherche du plus
grand commun diviseur entre deux nombres entiers, Comptes Rendus de l'Academie
des Sciences de Paris, 19, 867-870, 1844.

[70] Bernard Lemaire, Application of a subdi erential of a convex composite functional


to optimal control in variational inequalities, in Nondi erentiable optimization (V. F.
Demyanov and D. Pallaschk, eds.), Springer-Verlag, Heidelberg, 1985.

[71] Lucio Lombardo Radice, L'in nito, Editori Riuniti, Roma, 1981.

[72] Andrej Marko , On mean values and exterior densities, Recueil Mathematique, 4, 165-
191, 1938.

[73] Jean Mawhin, Variations on the Brouwer xed point theorem: A survey, Mathematics,
8, 501, 1-14, 2020.
BIBLIOGRAPHY 1499

[74] Andreu Mas-Colell, The theory of general economic equilibrium: A di erentiable ap-
proach, Cambridge University Press, Cambridge, 1985.

[75] Andreu Mas-Colell, Michael D. Whinston and Jerry R. Green, Microeconomic theory,
Oxford University Press, Oxford, 1995

[76] James Maynard, Gaps between primes, Proceedings of the International Congress of
Mathematicians 2018 (B. Sirakov, P. Ney de Souza and M. Viana, eds.), v. 1, 343-360,
World Scienti c, Singapore, 2018.

[77] Lionel McKenzie, Matrices with dominant diagonals and economic theory, in Mathe-
matical Methods in the Social Sciences (K. J. Arrow, S. Karlin and P. Suppes, eds.),
Stanford University Press, Palo Alto, 1959.

[78] George J. Minty, Monotone (nonlinear) operators in Hilbert space, Duke Mathematical
Journal, 29, 341-346, 1962.

[79] Leon Mirsky, An introduction to linear algebra, Oxford University Press, Oxford, 1955.

[80] Dragoslav S. Mitrinovic, Analytic inequalities, Springer-Verlag, Heidelberg, 1970.

[81] Katta G. Murty and Santosh N. Kabadi, Some NP-complete problems in quadratic
and nonlinear programming, Mathematical Programming, 39, 117-129, 1987.

[82] John F. Muth, Rational expectations and the theory of price movements, Economet-
rica, 29, 315-335, 1961.

[83] John Napier, Miri ci logarithmorum canonis descriptio, Hart, Edinburgh, 1614.

[84] John Nash, Equilibrium points in n-person games, Proceedings of the National Academy
of Sciences, 36, 48-49, 1950.

[85] Yurii Nesterov, Introductory lectures on convex optimization, Kluwer, Boston, 2004.

[86] Otto Neugebauer, The exact sciences in antiquity, Brown University Press, Providence,
1957.

[87] Efe Ok, Real analysis with economic applications, Princeton University Press, Prince-
ton, 2007.

[88] Richard S. Palais, Natural operations on di erential forms, Transactions of the Amer-
ican Mathematical Society, 92, 125-141, 1959.

[89] Vilfredo Pareto, Sunto di alcuni capitoli di un nuovo trattato di economia pura, 20,
216-235, Giornale degli Economisti, 1900 (trans. in Giornale degli Economisti, 2008).

[90] Giuseppe Peano, Annotazioni, in Angelo Genocchi, Calcolo di erenziale e principii di


calcolo integrale, Fratelli Bocca, Roma, 1884 (trans. of some parts in H. C. Kennedy,
Selected works of Peano, Allen, London, 1973).

[91] Pierre Pica, Cathy Lemer, Veronique Izard and Stanislas Dehaene, Exact and approx-
imate arithmetic in an Amazonian indigene group, Science, 306, 499-503, 2004.
1500 BIBLIOGRAPHY

[92] Henri Poicare, Sur les integrales irregulieres des equations lineaires, Acta Mathematica,
8, 295-334, 1886.

[93] John W. Pratt, Risk aversion in the small and in the large, Econometrica, 32, 122-136,
1964.

[94] Giovanni Ricci, Ricerche aritmetiche sui polinomi, II: Intorno a una proposizione non
vera di Legendre, Rendiconti del Circolo Matematico di Palermo, 58, 190-208, 1934.

[95] Giovanni Ricci, La di erenza di numeri primi consecutivi, Rendiconti Seminario


Matematico Universita e Politecnico Torino, 11, 149-200, 1952.

[96] James Ritter, Egyptian mathematics, in Mathematics across cultures (H. Selin, ed.),
Kluwer, Dordrecht, 2000.

[97] R. Tyrrell Rockafellar, Lagrange multipliers and optimality, SIAM Review, 35, 183-238,
1993.

[98] Stephen A. Ross, Neoclassical nance, Princeton University Press, Princeton, 2005.

[99] Ioannis M. Roussos, Improper Riemann integrals, CRC Press, Boca Raton, 2014.

[100] Walter Rudin, Principles of mathematical analysis, McGraw-Hill, New York, 1964.

[101] Lucio Russo, La rivoluzione dimenticata, Feltrinelli, 1996 (trans. as The Forgotten
Revolution, Springer, Berlin, 2004).

[102] Hermann A. Schwarz, Zur Integration der partiellen Di erentialgleichung @ 2 u=@x2 +


@ 2 u=@y 2 = 0, Journal f•
ur die reine und angewandte Mathematik, 74, 218-253, 1872.

[103] Waclaw Sierpinski, Elementary theory of numbers, 2nd ed., North-Holland, Amsterdam,
1988.

[104] Jean-Luc Solere, L'ordre axiomatique comme modele d'ecriture philosophique dans
^
l'Antiquite et au Moyen Age, Revue d'histoire des sciences, 56, 323-345, 2003.

[105] Thomas J. Stieltjes, Recherches sur les fractions continues, Annales de la Faculte des
Sciences de Toulouse, J1-J122 and A5-A47, 1894 and 1895.

[106] George J. Stigler, The development of utility theory I, II, Journal of Political Economy,
58, 307-327 and 373-396, 1950.

[107] Josef Stoer and Christoph Witzgall, Convexity and optimization in nite dimensions,
Springer-Verlag, Heidelberg, 1970.

[108] Dirk J. Struik, A source book in mathematics, 1200-1800, Princeton University Press,
Princeton, 1986.

[109] Patrick Suppes, Axiomatic set theory, Van Nostrand, Princeton, 1960.

[110] Arpad Szabo, The beginnings of Greek mathematics, Reidel Publishing Company, Dor-
drecht, 1978.
BIBLIOGRAPHY 1501

[111] Alfred Tarski, Introduction to logic and to the methodology of the deductive sciences,
4th ed., Oxford University Press, Oxford, 1994.

[112] Leonida Tonelli, L'analisi funzionale nel calcolo delle variazioni, Annali della Scuola
Normale Superiore di Pisa, 9, 289-302, 1940.

[113] Donald M. Topkis, Supermodularity and complementarity, Princeton University Press,


Princeton, 2011.

[114] Stanislaw M. Ulam, Zur masstheorie in der allgemeinen mengenlehre, Fundamenta


Matimaticae, 16, 141-150, 1930.

[115] Gregory Vlastos, Studies in Greek philosophy, v. 1, Princeton University Press, Prince-
ton, 1996.

[116] Vito Volterra, Sui principii del calcolo integrale, Giornale di Matematiche, 19, 333-372,
1881.

[117] John von Neumann, Zur theorie der gesellshaftsphiele, Mathematische Annalen, 100,
295-320, 1928 (trans. in R. D. Luce and A. W. Tucker, eds., Contributions to the theory
of games IV, 13-42, Princeton University Press, Princeton, 1959).

[118] John von Neumann and Oskar Morgenstern, Theory of games and economic behavior,
Princeton University Press, Princeton, 1944.

[119] James Warren, Presocratics, Routledge, London, 2014.

[120] Richard Wheeden and Antoni Zygmund, Measure and integral: An introduction to real
analysis, 2nd ed., CRC Press, Boca Raton, 2015.

[121] David Wootton, The invention of science: A new history of the scienti c revolution,
Penguin, London, 2015.

[122] Eduardo H. Zarantonello, Projections on convex sets in Hilbert space and spectral
theory, in Contributions to nonlinear functional analysis (E. H. Zarantonello, ed.),
Academic Press, New York, 1971.

[123] Antoni Zygmund, Trigonometric series, 3rd ed., Cambridge University Press, Cam-
bridge, 2002.
Index

Absolute value, 77 Border, 95


Algebraic complement, 507 Budget
Algorithm line, 671
notion, 18 set, 668
of Euclid, 18
of Gauss, 491 C(E), 397
of Hero, 456 C^1(E), 818, 853
of Kronecker, 515 C^n(E), 821, 853
Approximation Cardinality
linear, 814, 899 of a set, 174
polinomial, 899 of the continuum, 179
quadratic, 900 Cauchy
Arbitrage, 597, 753 condition, 236
Archimedean property, 28 product, 310
Argmax, 653 Change of variable
Asset, 747 Riemann, 1311
Asymptote, 1107 Stieltjes, 1355
horizontal, 1107 Cholesky factorization, 780
oblique, 1107 Codomain, 111
vertical, 1107 Coe cient
Average binomial, 1442
arithmetic , 529 constant, 152
function, 534 Fourier, 85
geometric, 530 generalized binomial, 332
harmonic, 531 leading, 152
power, 532 multinomial, 1441
quasi-arithmetic, 532 Cofactor, 507
Axis Combination
horizontal/abscissae, 44 a ne, 557
vertical/ordinates, 44 convex, 545, 547
linear, 66
Barycentric coordinates, 559 Comparative statics, 654, 1069, 1128, 1208,
Basis, 70, 74 1212, 1217
a ne, 558 Components
orthonormal, 85 of a matrix, 467
Biconditional, 1458 of a vector, 46
Big-O of, 257 Compound factor, 597
Bits, 35 Condition

1502
INDEX 1503

rst-order, 870, 875 Cramer's rule, 517


second-order, 888 Criterion
Conditional, 1458 comparison, 230
Cone, 599 di erential of concavity, 958, 959, 981,
pointed, 729 983
recession, 726 di erential of monotonicity, 883, 972
Constant di erential of strict monotonicity, 884
Euler-Mascheroni, 270 of comparison for series, 267
Napier, 238 of the ratio for sequences, 232
Constituents, 1477 of the root for sequences, 234
Constraints of the root for series, 304
equality, 1147 ratio, 274, 303
inequality, 1174 Sylvester-Jacobi, 784
Contingent claim, 748 Cryptography, 136
Continuity, 395 Curve, 112
uniform, 428 indi erence, 126, 170
Contrapositive, 1461 level, 122
Convergence Cusp, 798
absolute (for series), 280
discounted (Abel), 339 De Morgan's laws, 10, 1461
in mean (Cesaro), 299 Decomposition of Jordan, 1359
negation, see Principle by induction Density, 29
of improper integrals, 1316 Cauchy, 1413
of sequences, 209, 217, 230 exponential, 1416
of series, 262, 274 Pareto, 1416
radius, 324 Density function
rate, 259 continuous, 1404
Converse, 1459 Gaussian, 1405
Convolution, 308 simple, 1404
Correspondence, 637 uniform, 1405
ascending, 1222 Derivative, 791
budget, 637 higher order, 819
cyclically monotone, 1009 left, 797
demand, 1208 of compounded function, 808, 823
feasibility, 1207 of the inverse function, 810
hemicontinuous, 641 of the product, 805
inverse, 637 of the quotient, 806
Nash equilibrium, 1240 of the sum, 804
solution, 1207 partial, 832, 835
superdi erential, 1008 right, 797
viable, 637 second, 819
Cosecant, 1449 third, 819
Cosine, 1448 unilateral, 797
Cotangent, 1449 Determinant, 498
Countable, 174 Diagonal of a matrix, 470
Covariance, 1395 Di eomorphism, 1081
1504 INDEX

Di erence, 7 Equation, 439, 1075


Di erence quotient, 789, 791 characteristic, 337, 766
Di erentiability with continuity, 818 inclusion, 1022
Di erential, 815 monotone, 1097
total, 845 parametric, 1099
Di erentiation under the integral sign, 1334 polynomial, 410, 439
Dimension, 75, 562, 564 smoothly well posed, 1081
Direct sum, 741 well posed, 1077
Directions Equilibrium
of constancy, 728 Arrow-Debreu, 704
of recession, 726 market, 200, 412, 441, 444, 451, 704, 1023,
Discontinuity 1069
essential, 405 Nash, 1235
jump, 405 Nash parametric, 1240
non-removable, 405 Equivalence, 1460
removable, 405 Events, 1376
Distance (Euclidean), 90 Expansion
Distribution function, 1400 asymptotic, 931
Gaussian, 1405 partial fraction, 160
moments, 1411 polinomial, 899
uniform, 1405 polynomial of Maclaurin, 902
Divergence polynomial of Taylor, 902
of improper integrals, 1316 Expectations
of sequences, 212 adaptive, 204
of series, 262 classic, 203
Domain, 111 extrapolative, 204
natural, 164 rational, 454
of derivability, 796, 837 Expected utility, 1427
Dual space, 464 Expected value, 1389, 1407
Exponential decay, 250
Edgeworth box, see Pareto optimum Extended real line, 37, 213
Eigenbasis, 767 Extension
Eigenspace, 765 of a function, 166
Eigenvalue, 763 of a predicate, 1482
Eigenvector, 763
Element Factorial, 1440
of a sequence, see Term of a sequence FOC, 870
of a vector, see Component of a vector Form, 775
Envelope linear, 775
a ne, 558 quadratic, 775
concave of a function, 989 Forms of indetermination, 38, 226
convex, 551 Formula
convex of a function, 991 binomial of Newton, 1445
Epigraph, 576 compound interest, 197
Eponymy, 1349 multinomial, 1445
Equalizer, 1251 of Cavalieri, 1408
INDEX 1505

of De Moivre-Stirling, 252 convex, 149, 567


of Euler, 867 convex at a point, 1105
of Faa di Bruno, 823 cosine, 156
of Hille-Taylor, 945 CRRA, 387
of Ito, 1349 cubic, 112
of Lagrange-Taylor, 879 cuneiform, 714
of Leibnitz, 822 decreasing, 139, 142
of Maclaurin, 902 demand, 701
of Taylor, 902 derivable, 791, 854, 974
Frontier, 95 derivative, 796
Function, 109 di erentiable, 815, 816, 838
absolute value, 114 discontinuous, 404
absolutely monotone, 938 elementary, 152
additive, 598 exponential, 153
a ne, 571 gamma, 636, 930
Agnesi's versiera, 1119 Gaussian, 680, 1118, 1315
alpha-smooth, 1138 generating, 328
analytic, 933 homothetic, 610
antiderivative, 1298 identity, 138
arccosin, 158 implicit, 1042, 1048, 1065
arcsin, 157 increasing, 139, 142
arctan, 159 indicator, 1283
asymptotic to another, 388 in mum of, 139
bijective, 131 in nite, 392
Blackwell, 618 in nitesimal, 392
bounded, 138 inframodular, 984
bounded from above, 138 injective, 130
bounded from below, 138 instantaneous utility, 121, 200
bounded variation, 1357 integrable in an improper sense, 1316
CARA, 387 integral, 1304
CES, 601 integrand, 1344
Cobb-Douglas, 116 integrator, 1344
coercive, 677 intertemporal utility, 121
comparable with another, 388 inverse, 132, 810
completely monotone, 941 invertible, 133
composite, 808 Lagrangian, 1152
composite (compoud), 129 level-proper, 1091
concave, 149, 567 linear, 461, 537
concave at a point, 1105 linear-quadratic, 984
concavi able, 988 locally decreasing, 881
constant, 139, 142 locally increasing, 881
continuous, 397 locally strictly decreasing, 881
continuous at a point, 395 locally strictly increasing, 881
continuous at a point from the left, 720 log-concave, 634
continuous at a point from the right, 719 log-convex, 634
continuously di erentiable, 818 log-exponential, 620, 1225
1506 INDEX

logarithmic, 114, 153 signum, 417, 1300


mantissa, 160 sine, 156
modular, 627 smooth, 821
moment generating, 1417 softmax, 1227
monotone, 140, 144 solution, 1207
multivariable, 112 square root, 113
n-times continuously di erentiable, 821, step, 1283
853 strictly concave, 569
negative exponential, 154 strictly concave at a point, 1105
negative part, 1272 strictly convex, 569
negligible with respect to another, 388 strictly convex at a point, 1105
normalized, 618 strictly decreasing, 139
nowhere analytic, 937 strictly increasing, 139, 143
null-superadditive, 606 strictly monotone, 140, 144
objective, 652 strongly concave, 993
of a single variable, 112 strongly convex, 995
of Dirichlet, 362 strongly increasing, 143
of Kronecker, 511 strongly monotone, 144
of Leontief, 143 submodular, 627
of n variables, 115 superlinear, 605
of several variables, 112 supermodular, 627
of Thomae, 1307 supremum of, 138
of vector, see Function of n variables surjective, 131
of Weierstrass, 826 tangent, 157
one-to-one, see Function injective translation invariant, 618
one-way, 136 ultramodular, 984
partially derivable, 834 uniformly continuous, 428
periodic, 159 utility, 119, 147, 169
perspective, 581, 605 value, 1207
polynomial, 152 vector, 112
positive homogeneous, 600 with increasing (cross) di erences, 629
positive part, 1272 zero, 464
primitive, 1298 Functional equation
production, 120 Cauchy, 593
proper, 1089 for the exponential, 595
quadratic, 113, 984 for the logarithm, 595
quasi-a ne, 583 for the power, 596
quasi-concave, 583
quasi-continuous, 724 Game, 1234
quasi-convex, 583 Gaussian elimination procedure, 491
quasi-linear, 535 Goods
rational, 160 complements, 630
Riemann integrable, 1270, 1274 perfect complements, 148
scalar, 112 perfect substitutes, 151
semicontinuous, 717 substitutes, 630
separable, 150 Gradient, 835
INDEX 1507

Gradient descent, 1138 generalized, see Improper integral, see Im-


Gram-Schmidt orthonormalization, 768 proper integral
Graph improper, 1316, 1319, 1331
of a correspondence, 638 inde nite, 1298
of a function, 117 lower, 1269
of Dirichlet, 1340
Half-spaces, 685, 686 of Gauss, 1316, 1330
Homeomorphism, 1079 of Stieltjes, 1344, 1364
Hyperplane, 564, 685 Riemann, 1270
Hypograph, 575 upper, 1269
Integral sum
Ideal, 755 lower, 1268, 1275
Image, 111 upper, 1268, 1275
counter, 121 Integration
inverse, 121 by change of variable, 1311
of a sequence, 206 by parts (Riemann), 1309
of function, 111 by parts (Stieltjes), 1354
of operator, 482 Intersection, 5, 9, 64, 1457
Implication, 1462 Interval, 23
Indeterminacies, 383 bounded, 23, 52
Index closed, 23, 51
of Arrow-Pratt, 961 convergence, 324
of relative concavity, 961 endpoints, 23
Indi erence half-closed, 23, 52
class, 167 half-open, see Interval half-closed
curve, 126, 170 open, 23, 52
map, 168 unbounded, 24, 52
relation, 166 Isocosts, 127
Induction, see Principle by induction Isoquants, 127
Inequality
Cauchy-Schwarz, 80 Join, 623
Jensen, 579, 1410 Jump, 405
log-sum, 580 Kernel
logarithmic, 991 of an operator, 482
power mean, 966 pricing, 752
triangle, 78, 81, 91
In mum, 27, 93, 96 L(R^n), 477
In nite, 392 L(R^n,R^m), 477
actual, 173 Law
potential, 173, 265 of demand, 973
In nitesimal, 392 of excluded middle, 1460
Integrability, 1270, 1275 of non-contradiction, 1460
of continuous functions, 1286 of one price, 750
of monotonic functions, 1288 of sines, 1450
Integral of Walras, 671
de nite, 1300 Law of one price, 1034
1508 INDEX

Least Upper Bound Principle, 27 cofactor, see Matrix of algebraic compo-


Lemma nents
of Karamata, 343 complete, 522
of Minty, 1253 diagonal, 470
Limit, 362 diagonalization, 772
from above, 211 echelon, 491
from below, 211 elementary, 492
inferior, 286 elementary operations, 492
left, 369 full rank, 488
of function, 357 Gram, 489
of operators, 423 Hessian, 849, 914
of scalar function, 362, 364 identity, 468
of sequence, 208 inverse, 496
one-sided, 368 invertible, 496, 511
right, 369 Jacobian, 860, 1165
superior, 286 lower triangular, 470
unilateral, 368 maximum rank, 488
vector function, 373 negative de nite, 777, 969
Linear combination negative semi-de nite, 777, 969
convex, 545 non-singular, 511
Linear system null, 469
determined, 522 of algebraic complements, 507
homogeneous, 518 orthogonal, 769
solvability, 524 positive de nite, 777, 969
solvable, 522 positive semi-de nite, 777, 969
square, 516 rectangular, 468
undetermined, 522 scalar multiplication, 469
unsolvable, 522 singular, 511
Little-o of, 243, 388 square, 468
Lottery, 1428 symmetric, 470
Lower bound, 24 symmetric part, 969
trace, 764
M(m,n), 469 transpose, 471
M(n), 498 upper triangular, 470
Marginal Vandermonde, 519
cost, 791 Maximal of a set, see Pareto optimum
product, 838 Maximizer
utility, 838 global, 162, 652
Marginal rate local, 688
of intertemporal substitution, 1056 strong, 654
of substitution, 1055 strong local, 688
of transformation, 1054 Maximum of a set
Matrix in R, 25
addition, 469 in R^n, 53
adjoint, 507 Maximum principle, 692
augmented, 522 Maxminimizer, 1230
INDEX 1509

Measure, 1373 Norm, 79


counting, 1371 Nullity, 483
nite additivity, 1374 Number
probability, 1375 cardinal, 181, 185
Meet, 623 e, 15, 238, 276
Mesh of a subdivision, 1272 pi, 15, 1452
Method Numbers
elimination, 1125 algebraic, 241
Gaussian elimination, 491 Fibonacci, 335
least squares, 707 integer, 11
of Lagrange, 1154 irrational, 15
of Laplace, 1228 natural, 11, 186
Methodology prime, 19
cardinal properties, 588 prime of Mersenne, 198
ceteris paribus, 837 rational, 12
diversi cation principle, 590 real, 15
homo oeconomicus, 651 transcendental, 241
methodological individualism, 651 Numeraire, 701
minimum action principle, 674
ordinal properties, 588, 673 Open cover, 1086
rationality, 651, 674 Operator, 112, 116, 475
Minimal of a set, see Pareto optimum continuous, 423
Minimaximizer, 1230 contraction, 613
Minimizer derivative, 837
global, 654 game, 1234
local, 688 identity, 476
Minor, 514 inner coercive, 1254
bordered, 515 inner monotone, 970
leading principal, 781 invertible, 494
principal, 781 linear, 475
Modulus, 624 Lipschitz, 613
Modus nonexpansive, 1000
ponens, 1463 null, 476
tollens, 1463 projection, 741, 996
Moments, 1411 softmax, 1226
Multiplier strictly competitive, 1237
marginal interpretation, 1221 strongly inner decreasing, 1257
of Kuhn-Tucker, 1180 zero-sum, 1237
of Lagrange, 1152, 1166, 1180 Optimizer, 711
Order
Napier's constant, 276 complete, 23, 1436
Negation, 1457 partial, 49, 1436
Neighbourhood, 92 weak, 1436
left, 93 Ordered pairs, 43
of in nite, 213 Orthogonal
right, 93 subspace, 741
1510 INDEX

vectors, 83 Portfolio, 748


Positive orthant, 44
Parabola, 117 Postulate of continuity of the real line, 14
Paradox Power set, 182
of Burali Forti, 11 Predicate, 1479
of Russell, 11 binary, 1482
of the liar, 1460 Preference
Pareto optimum, 54 complete, 168
Part de nition, 119
integer, 29 lexicographic, 172
negative, 624, 1272 monotonic, 169
positive, 624, 1272 re exive, 167
Partial sums, 262 strict, 166
Partition, 10, 1475 strictly monotonic, 169
atoms, 1475 strongly monotonic, 169
Peano curve, 1100 transitive, 167
Permutation Preimage, 121
simple, 1440 Preorder, 1436
with repetitions, 1441 Price
Plurirectangle, 1264 ask, 1032
Point bid, 1032
accumulation, 97 Prime
boundary, 94 factorization, 19
corner, 798 gap, 318
critical, 874 number, 19
cuspidal, 798 twin, 318
exterior, 94 Primitive, 1298
extremal, 655 Probability
extreme, 552 countable additivity, 1383
in ection, 1106 di use, 1421
interior, 94 Dirac, 1377
isolated, 96 geometric, 1379
limit, 97, 287 law, 1425
of Kuhn-Tucker, 1180 measure, 1375
regular, 1148, 1165, 1176 Poisson, 1378
saddle, 874, 1229 uniform, 1377
singular, 1148, 1165, 1176 Problem
stationary, 874 constrained optimization, 655
Polyhedron, 696 consumer, 669
Polynomial, 152 maximization, 655
characteristic, 766 minimization, 655
interpolating, 521 operator optimization, 711
of Bernstein, 431 optimization, 655
of Maclaurin, 902 parametric optimization, 1207
of Taylor, 902 quadratic optimization, 732, 988
Polytope, 547 unconstrained di erential optimization, 1125
INDEX 1511

unconstrained optimization, 655 of order k, 447


variational inequality, 1251 orbit, 449
vector maximization, 711 phase portrait, 449
with equality constraints, 1147 random walk, 193
with inequality constraints, 1174, 1244 Recursion, 192
Product Relation
Cartesian, 43, 46 binary, 1434
Cauchy, 310 equivalence, 1437
inner, 48, 77 Remainder
of matrices, 472 Lagrange's, 908
Projection, 741, 996 Peano's, 908
Projections, 832 Representation
Proof of linear function, 464
by contradiction, 1464 of linear operator, 478
by contraposition, 1464 Restriction of a function, 165
direct, 1464 Riesz subspace, 755
Property Root
Archimedean, 28 algebraic, 30
associative, 8, 48, 128 arithmetical, 30, 78
commutative, 8, 48, 49, 128 Rule
distributive, 9, 48, 49 chain, 808, 846, 863
satis ed eventually, 207 of Cramer, 517
satis ed in nitely often, 314 of de l'Hospital, 893
Proposition, 1457 of Leibniz, 1336
canonical form, 1477 pricing, 751
Propositions
disjoint, 1473 Scalar, 47
exhaustive, 1474 Secant, 1449
Pythagorean trigonometric identity, 1449 Self-map, 442
Semicone, 609
Quadratic form, 775 Separating element, 23
inde nite, 777 Separation of sets, 687
negative de nite, 777 Sequence, 191
negative semi-de nite, 777 arithmetic, 193
positive de nite, 777 asymptotic to another, 243
positive semi-de nite, 777 bounded, 206
Quanti er bounded from above, 206
existential, 1479 bounded from below, 206
universal, 1479 bounded one-sided, 299
Cauchy, 236
Random variable, 1387 comparable with another, 243, 258
Rank, 483, 485 completely monotone, 1415
full, 488 constant, 207
maximum, 488 convergent, 209
Recurrence, 192 convolution, 308
linear of order k, 195 decreasing, 207
1512 INDEX

divergent, 212 Borel, 1421


Fibonacci, 192 bounded, 25, 106
geometric, 192 bounded from above, 25
harmonic, 192 bounded from below, 25
increasing, 207 budget, 668
in nitesimal, 210 cardinality of, 174
irregular, 208 choice, 652
maximizing, 1142 closed, 100
monotone, 207 closure, 100
negligible with respect to another, 243 compact, 107
null, see In nitesimal sequence complement, 8
of di erences, 288 consumption, 166, 668
of second di erences, 290 convex, 545
of the partial sums of a series, 262 countable, 174
of the same order of another, 243, 258 derived, 97
oscillating, see Irregular sequence directed, 591
regular, 208 discrete, 1089
relaxing, 1137 empty, 5
unbounded, 206 nite, 174
Series, 262 image, 111
absolutely convergent, 280, 1328 interior, 94
alternating harmonic series, 282 lattice, 626
binomial, 333 linearly dependent, 64
convergent, 262 linearly independent, 64
di erence form, 288 maximum, 25, 53
generalized harmonic, 268 minimum, 25, 53
geometric, 264 open, 99
geometric power, 325 orthogonal, 84
Grandi, 301 orthonormal, 84
harmonic, 263, 1487 power of, 174
irregular, 262 unbounded, 25
MacLaurin, 933 universal, 8
Mengoli, 263 Set function, 1371
Mercator, 282 additive, 1372
negatively divergent, 262 grounded, 1372
normalized geometric power, 326 monotone, 1372
oscillating, see Irregular series normalized, 1372
Poisson power, 326 positive, 1372
positively divergent, 262 Sets
power, 323 disjoint, 5
sum, 262 homeomorphic, 1079
Taylor, 933 lower contour, 576
with positive terms, 267 upper contour, 576
Set, 3 Simplex, 559
a ne, 555 standard, 550
a nely independent, 557 Sine, 1448
INDEX 1513

Singleton, 4 extreme value, 413, 674


Solution rst welfare, 706
corner, 666, 1162 fundamental of arithmetic, 20
of an optimization problem, 655 fundamental of nance, 754
set, 653 fundamental of integral calculus ( rst),
Space, 8 1302
column, 487 fundamental of integral calculus (second),
complete, 237 1305
dual, 464 fundamental of linear programming, 699
Euclidean, 46 integral mean value, 1295
incomplete, 237 intermediate value, 416
lineality, 728 mean value, 877
R^n, 46 minimax, 1233
row, 487 of Abel, 330
vector, 61 of Arrow-Debreu, 447, 1024
Span, 68 of Arrow-Pratt, 962
Standard deviation, 1394 of Artin, 635
State space, 1376 of Bauer, 692
States of the world, 1376 of Bernstein, 433, 938
Subdivision, 1266 of Binet, 505
Submatrix, 497 of Blanusa-Karamata, 993
Subsequence, 218 of Bolzano, 409
Subset, 3 of Bolzano-Weierstrass, 220
proper, 4 of Borel-Peano, 945
Sum of Brioschi, 782
of a series, 262 of Brouwer (dimension), 1101
of Abel, 347 of Brouwer ( xed point), 443
of Cesaro, 301 of Browder-Hartman-Stampacchia, 1255
of sets, 644 of Browder-Minty, 1259
Superdi erential, 1002 of Caccioppoli-Hadamard, 1092
normalized ordinal, 1019 of Cantor, 180
ordinal, 1013 of Carnot, 1454
Supremum, 27, 93, 96 of Cauchy, 236, 593
of Cauchy-Hadamard, 324
Tangent of Cesaro-Stolz, 295
(trigonometric), 1449 of Choquet, 633
Tangent line, 793 of Choquet-Meyer, 560
Tangent plane, 841 of Collatz, 145
Tauberian theorem of Darboux, 880
of Hardy-Littlewood, 343 of de Finetti-Jessen, 964
of Landau, 299 of de l'Hospital, 893
Term of a sequence, 191 of De Moivre-Stirling, 252
Theorem of Erdos, 320
Berge's Maximum, 1213 of Euclid, 17, 22
domain invariance, 1078 of Faa di Bruno, 823
duality of linear programming, 1248 of Fenchel, 690
1514 INDEX

of Fermat, 870 of Zhang, 320


of Frobenius, 339 prime number, 254
of Hahn-Banach, 538, 1027 Projection, 740, 995
of Hardy-Littlewood, 343 Triangulation, 1451
of Hausdor , 1415 Truth
of Hawkins-Simon, 1098 table of, 1457
of Heine-Borel, 1087 value, 1457
of Hille, 944
of Jessen-Riesz, 573 Union, 6, 9, 64, 1458
of Jordan, 1359 Unit
of Kakutani, 1023 ball, 44, 107
of Koopmans, 351 circle, 45
of Kronecker-Capelli, 522 sphere, 107
of Kuhn-Tucker, 1180 Unit fraction, 12
of Lagrange (mean value), 877 Upper bound, 24
of Lagrange (optimization), 1152
of Landau, 299 Value
of Laplace, 510 absolute, 77
of Minkowski, 555 Cauchy principal, 1321
of Nash, 1237 expected, 1389, 1407
of Peano, 1100 local maximum, 688
of permanence of sign, 216, 379 local mimimum, 688
of Poincare-Miranda, 440 maximum, 162, 652
of Pringsheim, 938 minimum, 655
of Pythagoras, 84, 1454 optimum, 655
of Ricci, 321 Pareto, 711
of Riemann, 284 saddle, 1229
of Riesz, 464, 539, 744 Variability, 1369
of Riesz-Markov, 465, 540, 760 Variable
of Rolle, 876 choice, 655
of Schwarz, 851 dependent, 111
of Stampacchia, 1197 independent, 111
of Stone-Weierstrass, 430 Variance, 1393
of strong hyperplane separation, 687 Variation
of Tartaglia-Newton, 1444 quadratic, 1369
of Taylor-Lagrange, 908 second, 1368
of Taylor-Peano, 902 total, 1357
of the comparison, 230, 380 Vector, 44, 46
of the envelope, 1218, 1220 modulus, 624
of the implicit function, 1047, 1059, 1061, negative part, 624
1095 positive part, 624
of the inverse function, 1084 unit, 82
of Tonelli, 682, 722, 725 zero, 47
of Ulam, 1421 Vector subspace, 62, 68
of uniqueness of the limit, 215, 378 Vectors
of Weierstrass, 413, 674 addition, 47
INDEX 1515

collinear, 66
column, 467
disjoint, 757
linearly dependent, 65
linearly independent, 65
orthogonal, 83
product, 47
row, 468
scalar multiplication, 47
sum, 47
Venn diagrams, 4
Versors, 64, 83
Volatility, 1369

Weights, 547

You might also like